Specify dtype when Reading pandas DataFrame from CSV File in Python (Example)

 

In this tutorial you’ll learn how to set the data type for columns in a CSV file in Python programming.

The content of the post looks as follows:

So now the part you have been waiting for – the example:

 

Example Data & Software Libraries

We first need to import the pandas library, to be able to use the corresponding functions:

import pandas as pd                         # Import pandas library

We use the following data as a basis for this Python programming tutorial:

data = pd.DataFrame({'x1':range(11, 17),    # Create pandas DataFrame
                     'x2':['x', 'y', 'z', 'z', 'y', 'x'],
                     'x3':range(17, 11, - 1),
                     'x4':['a', 'b', 'c', 'd', 'e', 'f']})
print(data)                                 # Print pandas DataFrame

 

table 1 DataFrame specify dtype when reading pandas dataframe from csv file python

 

Table 1 shows the structure of our example data – It comprises six rows and four columns.

Let’s create a CSV file containing our pandas DataFrame:

data.to_csv('data.csv', index = False)      # Export pandas DataFrame to CSV

After executing the previous code, a new CSV file should appear in your current working directory. We’ll use this file as a basis for the following example.

 

Example: Set Data Type of Columns when Reading pandas DataFrame from CSV File

This example explains how to specify the data class of the columns of a pandas DataFrame when reading a CSV file into Python.

To accomplish this, we have to use the dtype argument within the read_csv function as shown in the following Python code. As you can see, we are specifying the column classes for each of the columns in our data set:

data_import = pd.read_csv('data.csv',       # Import CSV file
                          dtype = {'x1': int, 'x2': str, 'x3': int, 'x4': str})

The previous Python syntax has imported our CSV file with manually specified column classes.

Let’s check the classes of all the columns in our new pandas DataFrame:

print(data_import.dtypes)                   # Check column classes of imported data
# x1     int32
# x2    object
# x3     int32
# x4    object
# dtype: object

As you can see, the variables x1 and x3 are integers and the variables x2 and x4 are considered as string objects.

 

Video, Further Resources & Summary

Would you like to learn more about the specification of the data type for variables in a CSV file? Then you could have a look at the following video on my YouTube channel. In the video, I’m explaining the examples of this tutorial.

 

The YouTube video will be added soon.

 

In addition, you may want to have a look at the related Python tutorials on this website. I have published numerous tutorials already:

 

To summarize: In this Python tutorial you have learned how to specify the data type for columns in a CSV file. Please let me know in the comments section below, in case you have any additional questions and/or comments on the pandas library or any other statistical topic.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top