Drop Rows with Blank Values from pandas DataFrame in Python (3 Examples)

 

In this tutorial, I’ll illustrate how to remove rows with empty cells from a pandas DataFrame in Python.

Table of contents:

Let’s just jump right in!

 

Example Data & Add-On Libraries

First, we have to load the pandas library.

import pandas as pd                                   # Load pandas

Next, we’ll also need to construct some data that we can use in the following example code.

data = pd.DataFrame({'x1':[1, '', '   ', 2, 3, 4],    # Create example DataFrame
                     'x2':['', '', 'a', 'b', 'c', 'd'],
                     'x3':['    ', 'a', 'b', 'c', 'd', '']})
print(data)                                           # Print example DataFrame
#     x1 x2    x3
# 0    1         
# 1             a
# 2       a     b
# 3    2  b     c
# 4    3  c     d
# 5    4  d

As you can see based on the previous Python console output, our example data has six rows and three columns. Note that some of the data cells are empty. Also note that some of the cells contain multiple white spaces.

 

Example 1: Replace Blank Cells by NaN in pandas DataFrame Using replace() Function

In Example 1, I’ll show how to replace blank values by NaN in a pandas DataFrame.

This step is needed in preparation for the removal of rows with blank values.

Have a look at the following Python syntax:

data_new1 = data.copy()                               # Create duplicate of data
data_new1 = data_new1.replace(r'^s*$', float('NaN'), regex = True)  # Replace blanks by NaN
print(data_new1)                                      # Print updated data
#     x1   x2   x3
# 0  1.0  NaN  NaN
# 1  NaN  NaN    a
# 2  NaN    a    b
# 3  2.0    b    c
# 4  3.0    c    d
# 5  4.0    d  NaN

As you can see, we have exchanged the blank values in our data by NaN.

Let’s continue!

 

Example 2: Remove Rows with Blank / NaN Values in Any Column of pandas DataFrame

In Example 2, I’ll explain how to drop all rows with an NaN (originally blank) value in any of our DataFrame variables.

For this, we can apply the dropna function to the DataFrame where we have converted the blank values to NaN as shown in following Python code:

data_new2 = data_new1.copy()                          # Create duplicate of data
data_new2.dropna(inplace = True)                      # Remove rows with NaN
print(data_new2)                                      # Print updated data
#     x1 x2 x3
# 3  2.0  b  c
# 4  3.0  c  d

The previous output of the Python console shows that we have created a DataFrame subset of those rows that are complete in all columns.

 

Example 3: Remove Rows with Blank / NaN Value in One Particular Column of pandas DataFrame

Example 3 demonstrates how to delete rows that have an NaN (originally blank) value in only one specific column of our data set.

For this task, we have to use the subset argument within the dropna function:

data_new3 = data_new1.copy()                          # Create duplicate of data
data_new3.dropna(subset = ['x1'], inplace = True)     # Remove rows with NaN
print(data_new3)                                      # Print updated data
#     x1   x2   x3
# 0  1.0  NaN  NaN
# 3  2.0    b    c
# 4  3.0    c    d
# 5  4.0    d  NaN

As you can see, the previous Python code has removed only those rows where the variable x1 contained an NaN value.

 

Video, Further Resources & Summary

This written article wasn’t clear enough? Then please have a look at the following video on my YouTube channel. In the video, I demonstrate the content of this tutorial and give some extra explanations:

 

 

Would you like to learn more about the handling of missing values in a pandas DataFrame? Then I recommend watching the following video on the Data School YouTube channel. The video explains different concepts for the manipulation of DataFrames with missing values.

 

 

In addition, you could have a look at the other tutorials on https://statisticsglobe.com/. You can find a selection of articles below:

 

Summary: In this article you have learned how to drop rows with blank character strings from a pandas DataFrame in Python programming. Please let me know in the comments, in case you have additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top