Draw Biplot of PCA in Python (Example)

 

In this tutorial, you’ll learn how to create a biplot of a Principal Component Analysis (PCA) using the Python language.

The table of contents is shown below:

Let’s get started.

 

Example Data & Libraries

For this tutorial, we will be using the diabetes dataset from scikit-learn. To load it and to perform and visualize a biplot of a PCA for this dataset using the Python programming language, we will need to import some libraries first:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

We will use the load() function from the scikit-learn library to load our dataset. Then, we will convert it into a DataFrame using the pd.DataFrame() function. After doing this, we can see how the first rows of our data look like:

diabetes = load_diabetes()
df = pd.DataFrame(data=diabetes.data, 
                  columns=diabetes.feature_names)
 
df.head(6)

diabetes dataframe pca biplot

 

Now, let’s perform our PCA.

Scale your Data and Perform the PCA

Before performing the PCA, it’s important to scale our data to get better results. For this, we will use the StandardScaler() class and create an object inside it to fit our matrix:

scaler = StandardScaler()
 
scaler.fit(df)
 
Diabetes_scaled = scaler.transform(df)

After using this function, we will obtain a two-dimensional NumPy array with the same dimensions of our original matrix. We can visualize this array into a data frame:

dataframe_scaled = pd.DataFrame(data=Diabetes_scaled, 
                                columns=diabetes.feature_names)
 
dataframe_scaled.head(6)

biplot of pca in python

Now, we can perform our PCA by using the PCA algorithm from sklearn.decomposition. We will choose several components for our PCA and then transform our data:

pca = PCA(n_components=4)
PC = pca.fit_transform(Diabetes_scaled)
pca_diabetes = pd.DataFrame(data = PC,
               columns = ['PC 1', 'PC 2','PC 3', 'PC 4' ])
 
pca_diabetes.head(6)

pca diabetes for biplot

Now, we’re ready to create a biplot of our PCA.

 

Visualize the PCA in a Biplot

Let’s visualize our PCA in a biplot. To achieve this, we will create a function for the biplot. This function will contain three main elements: the principal components of our dataset, the eigenvectors and the features or labels from our data.

First, we will plot our data in a scatterplot and, then, we will use a for loop to plot the eigenvectors and the features. Altogether, we will get a biplot.

def biplot(score,coef,labels=None):
 
    xs = score[:,0]
    ys = score[:,1]
    n = coef.shape[0]
    scalex = 1.0/(xs.max() - xs.min())
    scaley = 1.0/(ys.max() - ys.min())
    plt.scatter(xs * scalex,ys * scaley,
                s=5, 
                color='orange')
 
    for i in range(n):
        plt.arrow(0, 0, coef[i,0], 
                  coef[i,1],color = 'purple',
                  alpha = 0.5)
        plt.text(coef[i,0]* 1.15, 
                 coef[i,1] * 1.15, 
                 labels[i], 
                 color = 'darkblue', 
                 ha = 'center', 
                 va = 'center')
 
    plt.xlabel("PC{}".format(1))
    plt.ylabel("PC{}".format(2))    
 
 
    plt.figure()

After defining our function, we just have to call it specifying our data:

plt.title('Biplot of PCA')
 
biplot(PC, 
       np.transpose(pca.components_), 
       list(diabetes.feature_names))

biplot pca python

And that’s how we can visualize our PCA in a biplot using Python.

 

Video, Further Resources & Summary

Do you need more explanations on how to create a biplot of a PCA in Python language? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

There are other tutorials on Statistics Globe you could have a look at:

This post has shown how to create a biplot of a PCA in Python. In case you have further questions, don’t hesitate in leaving a comment.

 

Paula Villasante Soriano Statistician & R Programmer

This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get further information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top