Specify dtype when Reading pandas DataFrame from CSV File in Python (Example)
In this tutorial you’ll learn how to set the data type for columns in a CSV file in Python programming.
The content of the post looks as follows:
So now the part you have been waiting for – the example:
Example Data & Software Libraries
We first need to import the pandas library, to be able to use the corresponding functions:
import pandas as pd # Import pandas library |
import pandas as pd # Import pandas library
We use the following data as a basis for this Python programming tutorial:
data = pd.DataFrame({'x1':range(11, 17), # Create pandas DataFrame 'x2':['x', 'y', 'z', 'z', 'y', 'x'], 'x3':range(17, 11, - 1), 'x4':['a', 'b', 'c', 'd', 'e', 'f']}) print(data) # Print pandas DataFrame |
data = pd.DataFrame({'x1':range(11, 17), # Create pandas DataFrame 'x2':['x', 'y', 'z', 'z', 'y', 'x'], 'x3':range(17, 11, - 1), 'x4':['a', 'b', 'c', 'd', 'e', 'f']}) print(data) # Print pandas DataFrame
Table 1 shows the structure of our example data – It comprises six rows and four columns.
Let’s create a CSV file containing our pandas DataFrame:
data.to_csv('data.csv', index = False) # Export pandas DataFrame to CSV |
data.to_csv('data.csv', index = False) # Export pandas DataFrame to CSV
After executing the previous code, a new CSV file should appear in your current working directory. We’ll use this file as a basis for the following example.
Example: Set Data Type of Columns when Reading pandas DataFrame from CSV File
This example explains how to specify the data class of the columns of a pandas DataFrame when reading a CSV file into Python.
To accomplish this, we have to use the dtype argument within the read_csv function as shown in the following Python code. As you can see, we are specifying the column classes for each of the columns in our data set:
data_import = pd.read_csv('data.csv', # Import CSV file dtype = {'x1': int, 'x2': str, 'x3': int, 'x4': str}) |
data_import = pd.read_csv('data.csv', # Import CSV file dtype = {'x1': int, 'x2': str, 'x3': int, 'x4': str})
The previous Python syntax has imported our CSV file with manually specified column classes.
Let’s check the classes of all the columns in our new pandas DataFrame:
print(data_import.dtypes) # Check column classes of imported data # x1 int32 # x2 object # x3 int32 # x4 object # dtype: object |
print(data_import.dtypes) # Check column classes of imported data # x1 int32 # x2 object # x3 int32 # x4 object # dtype: object
As you can see, the variables x1 and x3 are integers and the variables x2 and x4 are considered as string objects.
Video, Further Resources & Summary
Would you like to learn more about the specification of the data type for variables in a CSV file? Then you could have a look at the following video on my YouTube channel. In the video, I’m explaining the examples of this tutorial.
The YouTube video will be added soon.
In addition, you may want to have a look at the related Python tutorials on this website. I have published numerous tutorials already:
- Read Only Certain Columns of CSV File as pandas DataFrame
- Set Column Names when Reading CSV as pandas DataFrame
- Load CSV File as pandas DataFrame in Python
- Set Index of pandas DataFrame in Python
- Insert Row at Specific Position of pandas DataFrame in Python
- Reverse pandas DataFrame in Python
- Check Data Type of Columns in pandas DataFrame in Python
- pandas Library Tutorial in Python
- Python Programming Examples
To summarize: In this Python tutorial you have learned how to specify the data type for columns in a CSV file. Please let me know in the comments section below, in case you have any additional questions and/or comments on the pandas library or any other statistical topic.