Specify dtype when Reading pandas DataFrame from CSV File in Python (Example)

 

In this tutorial you’ll learn how to set the data type for columns in a CSV file in Python programming.

The content of the post looks as follows:

So now the part you have been waiting for – the example:

 

Example Data & Software Libraries

We first need to import the pandas library, to be able to use the corresponding functions:

import pandas as pd                         # Import pandas library

We use the following data as a basis for this Python programming tutorial:

data = pd.DataFrame({'x1':range(11, 17),    # Create pandas DataFrame
                     'x2':['x', 'y', 'z', 'z', 'y', 'x'],
                     'x3':range(17, 11, - 1),
                     'x4':['a', 'b', 'c', 'd', 'e', 'f']})
print(data)                                 # Print pandas DataFrame

 

table 1 DataFrame specify dtype when reading pandas dataframe from csv file python

 

Table 1 shows the structure of our example data – It comprises six rows and four columns.

Let’s create a CSV file containing our pandas DataFrame:

data.to_csv('data.csv', index = False)      # Export pandas DataFrame to CSV

After executing the previous code, a new CSV file should appear in your current working directory. We’ll use this file as a basis for the following example.

 

Example: Set Data Type of Columns when Reading pandas DataFrame from CSV File

This example explains how to specify the data class of the columns of a pandas DataFrame when reading a CSV file into Python.

To accomplish this, we have to use the dtype argument within the read_csv function as shown in the following Python code. As you can see, we are specifying the column classes for each of the columns in our data set:

data_import = pd.read_csv('data.csv',       # Import CSV file
                          dtype = {'x1': int, 'x2': str, 'x3': int, 'x4': str})

The previous Python syntax has imported our CSV file with manually specified column classes.

Let’s check the classes of all the columns in our new pandas DataFrame:

print(data_import.dtypes)                   # Check column classes of imported data
# x1     int32
# x2    object
# x3     int32
# x4    object
# dtype: object

As you can see, the variables x1 and x3 are integers and the variables x2 and x4 are considered as string objects.

 

Video, Further Resources & Summary

Would you like to learn more about the specification of the data type for variables in a CSV file? Then you could have a look at the following video on my YouTube channel. In the video, I’m explaining the examples of this tutorial.

 

 

In addition, you may want to have a look at the related Python tutorials on this website. I have published numerous tutorials already:

 

To summarize: In this Python tutorial you have learned how to specify the data type for columns in a CSV file. Please let me know in the comments section below, in case you have any additional questions and/or comments on the pandas library or any other statistical topic.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • Thanks for posting this. How do I specify the data type for only 1 column in a df using column number, not field name?

    Reply
    • Hello Keith,

      If you’re reading the DataFrame from a CSV file, and you want to specify the datatype for only one column using its index, you have to do it in two steps because pandas’ read_csv function allows type specification only by column names, not by column indices. See the following please:

      import pandas as pd
       
      # Load the DataFrame
      df = pd.read_csv('Pizza.csv')
      df.head()
      #   brand     id   mois   prot    fat   ash  sodium  carb   cal
      # 0     A  14069  27.82  21.43  44.87  5.11    1.77  0.77  4.93
      # 1     A  14053  28.49  21.26  43.89  5.34    1.79  1.02  4.84
      # 2     A  14025  28.35  19.99  45.78  5.08    1.63  0.80  4.95
      # 3     A  14016  30.55  20.15  43.13  4.79    1.61  1.38  4.74
      # 4     A  14005  30.49  21.28  41.65  4.82    1.64  1.76  4.67
       
      # Get the column name using column index, e.g., for the first column (index 0)
      column_name = df.columns[2]
       
      # Change the data type of the selected column, e.g., to 'str'
      df[column_name] = df[column_name].astype('int')
      df.head()
      #   brand     id  mois   prot    fat   ash  sodium  carb   cal
      # 0     A  14069    27  21.43  44.87  5.11    1.77  0.77  4.93
      # 1     A  14053    28  21.26  43.89  5.34    1.79  1.02  4.84
      # 2     A  14025    28  19.99  45.78  5.08    1.63  0.80  4.95
      # 3     A  14016    30  20.15  43.13  4.79    1.61  1.38  4.74
      # 4     A  14005    30  21.28  41.65  4.82    1.64  1.76  4.67

      Best,
      Cansu

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top