Replace NaN Values by Column Mean in Python (Example)

 

In this tutorial, I’ll explain how to impute NaN values by the mean of a pandas DataFrame column in the Python programming language.

Table of contents:

Let’s just jump right in…

 

Example Data & Libraries

First, we need to load the pandas library:

import pandas as pd                                     # Load pandas library

In addition, consider the following example data.

data = pd.DataFrame({'x1':[1, 2, float('NaN'), 3, 4],  # Create example DataFrame
                     'x2':[2, float('NaN'), 5, float('NaN'), 3],
                     'x3':[float('NaN'), float('NaN'), 3, 2, 1]})
print(data)                                            # Print example DataFrame

 

table 1 DataFrame replace nan values column mean python

 

As you can see based on Table 1, our example data is a DataFrame made of five rows and three columns.

All the variables in our data contain at least one missing value.

 

Example: Impute Missing Values by Column Mean Using fillna() & mean() Functions

In this example, I’ll explain how to replace NaN values in a pandas DataFrame column by the mean of this column.

Have a look at the following Python code:

data_new = data.copy()                                 # Create copy of DataFrame
data_new = data_new.fillna(data_new.mean())            # Mean imputation
print(data_new)                                        # Print updated DataFrame

 

table 2 DataFrame replace nan values column mean python

 

As shown in Table 2, the previous Python syntax has created a new pandas DataFrame where missing values have been exchanged by the mean of the corresponding column.

 

Video, Further Resources & Summary

Would you like to know more about the replacing of NaN values by column mean? Then I can recommend having a look at the following video on my YouTube channel. In the video, I show the Python programming code of this article and give some extra explanations:

 

 

In addition, I recommend having a look at the following video on the codebasics YouTube channel. The speaker demonstrates how to handle missing data in a pandas DataFrame in the video:

 

 

Furthermore, you may want to have a look at the other Python tutorials on my homepage. You can find some articles below:

 

In summary: In this Python tutorial you have learned how to substitute NaN values by the mean of a pandas DataFrame variable. In case you have any further comments and/or questions on missing data imputation by the mean, let me know in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top