Select Multiple Columns of Pandas DataFrame in Python (4 Examples)


In this Python article you’ll learn how to extract certain columns of a pandas DataFrame.

The article will consist of four examples for the selection of DataFrame variables. To be more precise, the article is structured as follows:

Here’s how to do it:

pandas Library Creation of Example Data

As the first step, we have to import the pandas library to Python:

import pandas as pd # Load pandas

Next, we can create an example pandas DataFrame by running the following Python syntax:

data = pd.DataFrame({'x1':range(1, 6),                        # Create example data
                     'x2':["a", "c", "e", "g", "i"],
                     'x3':range(10, 5, - 1),
                     'x4':["a", "a", "b", "b", "a"],
                     'x5':range(10, 15)})
print(data)                                                   # Print example data
#    x1 x2  x3 x4  x5
# 0   1  a  10  a  10
# 1   2  c   9  a  11
# 2   3  e   8  b  12
# 3   4  g   7  b  13
# 4   5  i   6  a  14

As you can see based on the previous output, we have created a pandas DataFrame with five rows and five variables called x1, x2, x3, x4, and x5.

In the following examples, I’ll explain how to select some of these variables and how to store them in a new data set.

Keep on reading!


Example 1: Extract DataFrame Columns Using Column Names & Square Brackets

This example shows how to use the names of our variables and square brackets to subset our pandas DataFrame.

Have a look at the following Python code:

data_new1 = data[['x1', 'x3', 'x5']]                          # Subset data
print(data_new1)                                              # Print new data
#    x1  x3  x5
# 0   1  10  10
# 1   2   9  11
# 2   3   8  12
# 3   4   7  13
# 4   5   6  14

As you can see, we have created a new pandas DataFrame called data_new1 that contains only the variables x1, x3, and x5. The columns x2 and x4 have been dropped.

Looks good!

However, the Python programming language provides many alternative ways on how to select and remove DataFrame columns. In the following examples I’ll show some of these alternatives!


Example 2: Extract DataFrame Columns Using Column Names & DataFrame Function

In this example, I’ll illustrate how to use the column names and the DataFrame() function of the pandas library to get a new DataFrame with specific variables.

Check out the following syntax and its output:

data_new2 = pd.DataFrame(data, columns = ['x1', 'x3', 'x5'])  # Subset data
print(data_new2)                                              # Print new data
#    x1  x3  x5
# 0   1  10  10
# 1   2   9  11
# 2   3   8  12
# 3   4   7  13
# 4   5   6  14

We have created another pandas DataFrame called data_new2, which contains exactly the same variables and values as the DataFrame that we have created in Example 1. However, this time we have used the DataFrame() function.


Example 3: Extract DataFrame Columns Using Indices & iloc Attribute

So far, we have subsetted our DataFrame using the names of our columns. However, it is also possible to use the column indices to select certain variables from a DataFrame.

The following Python syntax demonstrates how to use the iloc command in combination with the column index to retain only some variables of our input DataFrame:

data_new3 = data.iloc[:, [0, 2, 4]].copy()                    # Subset data
print(data_new3)                                              # Print new data
#    x1  x3  x5
# 0   1  10  10
# 1   2   9  11
# 2   3   8  12
# 3   4   7  13
# 4   5   6  14

Again, we have created the same output as in the previous examples.


Example 4: Extract DataFrame Columns Using Indices & columns Attribute

In Example 4, I’ll illustrate another alternative on how to use column indices to keep only particular columns.

More precisely, we are using the columns argument to retain certain variables:

data_new4 = data[data.columns[[0, 2, 4]]]                     # Subset data
print(data_new4)                                              # Print new data
#    x1  x3  x5
# 0   1  10  10
# 1   2   9  11
# 2   3   8  12
# 3   4   7  13
# 4   5   6  14

Even though we have used a different code, the output is again the same as in the previous examples. So as you have seen, we have many alternatives when we want to remove unnecessary variables from a data matrix.


Video & Further Resources on the Topic

Any questions left? I have recently released a video on my YouTube channel, which shows the Python syntax of this article. You can find the video below:



Have a look at the following video that was published by Corey Schafer on his YouTube channel. He’s illustrating some examples on how to select rows and columns of a pandas DataFrame in the video.



In addition to the video, you may want to read some of the related articles of my website:


In this Python tutorial you have learned how to subset a DataFrame. In case you have any further questions, let me know in the comments.


Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.