Specify dtype when Reading pandas DataFrame from CSV File in Python (Example)
In this tutorial you’ll learn how to set the data type for columns in a CSV file in Python programming.
The content of the post looks as follows:
So now the part you have been waiting for – the example:
Example Data & Software Libraries
We first need to import the pandas library, to be able to use the corresponding functions:
import pandas as pd # Import pandas library
We use the following data as a basis for this Python programming tutorial:
data = pd.DataFrame({'x1':range(11, 17), # Create pandas DataFrame 'x2':['x', 'y', 'z', 'z', 'y', 'x'], 'x3':range(17, 11, - 1), 'x4':['a', 'b', 'c', 'd', 'e', 'f']}) print(data) # Print pandas DataFrame
Table 1 shows the structure of our example data – It comprises six rows and four columns.
Let’s create a CSV file containing our pandas DataFrame:
data.to_csv('data.csv', index = False) # Export pandas DataFrame to CSV
After executing the previous code, a new CSV file should appear in your current working directory. We’ll use this file as a basis for the following example.
Example: Set Data Type of Columns when Reading pandas DataFrame from CSV File
This example explains how to specify the data class of the columns of a pandas DataFrame when reading a CSV file into Python.
To accomplish this, we have to use the dtype argument within the read_csv function as shown in the following Python code. As you can see, we are specifying the column classes for each of the columns in our data set:
data_import = pd.read_csv('data.csv', # Import CSV file dtype = {'x1': int, 'x2': str, 'x3': int, 'x4': str})
The previous Python syntax has imported our CSV file with manually specified column classes.
Let’s check the classes of all the columns in our new pandas DataFrame:
print(data_import.dtypes) # Check column classes of imported data # x1 int32 # x2 object # x3 int32 # x4 object # dtype: object
As you can see, the variables x1 and x3 are integers and the variables x2 and x4 are considered as string objects.
Video, Further Resources & Summary
Would you like to learn more about the specification of the data type for variables in a CSV file? Then you could have a look at the following video on my YouTube channel. In the video, I’m explaining the examples of this tutorial.
In addition, you may want to have a look at the related Python tutorials on this website. I have published numerous tutorials already:
- Read Only Certain Columns of CSV File as pandas DataFrame
- Set Column Names when Reading CSV as pandas DataFrame
- Load CSV File as pandas DataFrame in Python
- Set Index of pandas DataFrame in Python
- Insert Row at Specific Position of pandas DataFrame in Python
- Reverse pandas DataFrame in Python
- Check Data Type of Columns in pandas DataFrame in Python
- pandas Library Tutorial in Python
- Python Programming Examples
To summarize: In this Python tutorial you have learned how to specify the data type for columns in a CSV file. Please let me know in the comments section below, in case you have any additional questions and/or comments on the pandas library or any other statistical topic.
2 Comments. Leave new
Thanks for posting this. How do I specify the data type for only 1 column in a df using column number, not field name?
Hello Keith,
If you’re reading the DataFrame from a CSV file, and you want to specify the datatype for only one column using its index, you have to do it in two steps because pandas’ read_csv function allows type specification only by column names, not by column indices. See the following please:
Best,
Cansu