Draw Autoplot of PCA in Python (Example)

 

On this page you’ll learn how to create an autoplot of a Principal Component Analysis (PCA) in the Python programming language.

The table of content is structured as shown below:

Let’s start with it!

 

Example Data and Add-On Libraries

In order to explain how to draw an autoplot of a PCA in Python, we will need to use some libraries which will help us with the data analysis, calculation, model building and data visualization of our PCA and its autoplot. Please load them before we start:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
from sklearn.decomposition import PCA
from sklearn.datasets import load_wine

Now, it’s time to prepare our data. For this tutorial, we will use the Wine data set from the scikit-learn library. In order to import it, we will use the load() function from scikit-learn:

wine = load_wine()

After this, we will convert it into a pandas DataFrame, so we can run our PCA. We will also define our targets, which correspond to the class in our data:

DF_data = pd.DataFrame(wine.data,
                       columns = wine.feature_names)
 
targets = pd.Series(wine.target, 
                       name = "Class")
 
DF_data.iloc[:, 0:3].head(6)

Wine DataFrame

Our data set has 178 rows and 13 columns. Above we can see the first 6 rows from the first 3 columns, by using the .head function and the .iloc[] method.

Now, we can work on the PCA.

 

Scale the Data and Perform the PCA

Before performing the PCA, our data must be scaled. For this, we will use the StandardScaler() function in our data, and then we will transform our DataFrame with it:

DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data), 
                           index = DF_data.index,
                           columns = DF_data.columns)

Now, we can define the number of components we want to include in our PCA. In our case, two components:

m = DF_standard.shape[1]
K = 2

And then use them to perform the PCA in our DataFrame, using the functions pca.fit() and pca.transform():

Mod_PCA = decomposition.PCA(n_components=m)
wine_PCA = pd.DataFrame(Mod_PCA.fit_transform(DF_standard), 
                      columns=["PC%d" % k for k in range(1,m + 1)]).iloc[:,:K]
 
wine_PCA.head(6)

Wine PCA

Above we can see the first 6 rows of our PCA DataFrame. Now, let’s see how it looks like in an autoplot.

 

Create the Autoplot of the PCA

Before, we defined our targets. In order to plot them in different colors, we can create a color list:

color_list = [{0:"b",
               1:"purple",
               2:"violet"}[x] for x in targets]

Now, we can see how our autoplot looks like plotting one principal component in each axis:

fig, ax = plt.subplots()
ax.scatter(x=wine_PCA["PC1"], 
           y=wine_PCA["PC2"], 
           color=color_list)
 
ax.set_xlabel('Principal Component 1', 
              fontsize = 15)
ax.set_ylabel('Principal Component 2', 
              fontsize = 15)
ax.set_title("PCA Plot", 
             fontsize=16)

Autoplot of PCA in Python

And that’s how we get an autoplot of our PCA using Python.

 

Video, Further Resources & Summary

Do you need more explanations on how to create an autoplot of PCA in Python? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

If you want to learn more, you could take a look at some other tutorials available on Statistics Globe:

In this post you had the opportunity to learn how to make an autoplot in Python. In case you have further questions, you may leave a comment.

 

Paula Villasante Soriano Statistician & R Programmer

This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top