Select Multiple Columns of Pandas DataFrame in Python (4 Examples)
In this Python article you’ll learn how to extract certain columns of a pandas DataFrame.
The article will consist of four examples for the selection of DataFrame variables. To be more precise, the article is structured as follows:
Here’s how to do it:
pandas Library Creation of Example Data
As the first step, we have to import the pandas library to Python:
import pandas as pd # Load pandas
Next, we can create an example pandas DataFrame by running the following Python syntax:
data = pd.DataFrame({'x1':range(1, 6), # Create example data 'x2':["a", "c", "e", "g", "i"], 'x3':range(10, 5, - 1), 'x4':["a", "a", "b", "b", "a"], 'x5':range(10, 15)}) print(data) # Print example data # x1 x2 x3 x4 x5 # 0 1 a 10 a 10 # 1 2 c 9 a 11 # 2 3 e 8 b 12 # 3 4 g 7 b 13 # 4 5 i 6 a 14
As you can see based on the previous output, we have created a pandas DataFrame with five rows and five variables called x1, x2, x3, x4, and x5.
In the following examples, I’ll explain how to select some of these variables and how to store them in a new data set.
Keep on reading!
Example 1: Extract DataFrame Columns Using Column Names & Square Brackets
This example shows how to use the names of our variables and square brackets to subset our pandas DataFrame.
Have a look at the following Python code:
data_new1 = data[['x1', 'x3', 'x5']] # Subset data print(data_new1) # Print new data # x1 x3 x5 # 0 1 10 10 # 1 2 9 11 # 2 3 8 12 # 3 4 7 13 # 4 5 6 14
As you can see, we have created a new pandas DataFrame called data_new1 that contains only the variables x1, x3, and x5. The columns x2 and x4 have been dropped.
Looks good!
However, the Python programming language provides many alternative ways on how to select and remove DataFrame columns. In the following examples I’ll show some of these alternatives!
Example 2: Extract DataFrame Columns Using Column Names & DataFrame Function
In this example, I’ll illustrate how to use the column names and the DataFrame() function of the pandas library to get a new DataFrame with specific variables.
Check out the following syntax and its output:
data_new2 = pd.DataFrame(data, columns = ['x1', 'x3', 'x5']) # Subset data print(data_new2) # Print new data # x1 x3 x5 # 0 1 10 10 # 1 2 9 11 # 2 3 8 12 # 3 4 7 13 # 4 5 6 14
We have created another pandas DataFrame called data_new2, which contains exactly the same variables and values as the DataFrame that we have created in Example 1. However, this time we have used the DataFrame() function.
Example 3: Extract DataFrame Columns Using Indices & iloc Attribute
So far, we have subsetted our DataFrame using the names of our columns. However, it is also possible to use the column indices to select certain variables from a DataFrame.
The following Python syntax demonstrates how to use the iloc command in combination with the column index to retain only some variables of our input DataFrame:
data_new3 = data.iloc[:, [0, 2, 4]].copy() # Subset data print(data_new3) # Print new data # x1 x3 x5 # 0 1 10 10 # 1 2 9 11 # 2 3 8 12 # 3 4 7 13 # 4 5 6 14
Again, we have created the same output as in the previous examples.
Example 4: Extract DataFrame Columns Using Indices & columns Attribute
In Example 4, I’ll illustrate another alternative on how to use column indices to keep only particular columns.
More precisely, we are using the columns argument to retain certain variables:
data_new4 = data[data.columns[[0, 2, 4]]] # Subset data print(data_new4) # Print new data # x1 x3 x5 # 0 1 10 10 # 1 2 9 11 # 2 3 8 12 # 3 4 7 13 # 4 5 6 14
Even though we have used a different code, the output is again the same as in the previous examples. So as you have seen, we have many alternatives when we want to remove unnecessary variables from a data matrix.
Video & Further Resources on the Topic
Any questions left? I have recently released a video on my YouTube channel, which shows the Python syntax of this article. You can find the video below:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Have a look at the following video that was published by Corey Schafer on his YouTube channel. He’s illustrating some examples on how to select rows and columns of a pandas DataFrame in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition to the video, you may want to read some of the related articles of my website:
- Select Rows of pandas DataFrame by Index in Python
- Extract Top & Bottom N Rows from pandas DataFrame
- pandas DataFrames Operations in Python
- Modify & Edit pandas DataFrames in Python
- Python Programming Overview
In this Python tutorial you have learned how to subset a DataFrame. In case you have any further questions, let me know in the comments.