Draw Biplot of PCA in Python (Example)
In this tutorial, you’ll learn how to create a biplot of a Principal Component Analysis (PCA) using the Python language.
The table of contents is shown below:
Let’s get started.
Example Data & Libraries
For this tutorial, we will be using the diabetes dataset from scikit-learn. To load it and to perform and visualize a biplot of a PCA for this dataset using the Python programming language, we will need to import some libraries first:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_diabetes from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA |
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_diabetes from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA
We will use the load() function from the scikit-learn library to load our dataset. Then, we will convert it into a DataFrame using the pd.DataFrame() function. After doing this, we can see how the first rows of our data look like:
diabetes = load_diabetes() df = pd.DataFrame(data=diabetes.data, columns=diabetes.feature_names) df.head(6) |
diabetes = load_diabetes() df = pd.DataFrame(data=diabetes.data, columns=diabetes.feature_names) df.head(6)
Now, let’s perform our PCA.
Scale your Data and Perform the PCA
Before performing the PCA, it’s important to scale our data to get better results. For this, we will use the StandardScaler() class and create an object inside it to fit our matrix:
scaler = StandardScaler() scaler.fit(df) Diabetes_scaled = scaler.transform(df) |
scaler = StandardScaler() scaler.fit(df) Diabetes_scaled = scaler.transform(df)
After using this function, we will obtain a two-dimensional NumPy array with the same dimensions of our original matrix. We can visualize this array into a data frame:
dataframe_scaled = pd.DataFrame(data=Diabetes_scaled, columns=diabetes.feature_names) dataframe_scaled.head(6) |
dataframe_scaled = pd.DataFrame(data=Diabetes_scaled, columns=diabetes.feature_names) dataframe_scaled.head(6)
Now, we can perform our PCA by using the PCA algorithm from sklearn.decomposition. We will choose several components for our PCA and then transform our data:
pca = PCA(n_components=4) PC = pca.fit_transform(Diabetes_scaled) pca_diabetes = pd.DataFrame(data = PC, columns = ['PC 1', 'PC 2','PC 3', 'PC 4' ]) pca_diabetes.head(6) |
pca = PCA(n_components=4) PC = pca.fit_transform(Diabetes_scaled) pca_diabetes = pd.DataFrame(data = PC, columns = ['PC 1', 'PC 2','PC 3', 'PC 4' ]) pca_diabetes.head(6)
Now, we’re ready to create a biplot of our PCA.
Visualize the PCA in a Biplot
Let’s visualize our PCA in a biplot. To achieve this, we will create a function for the biplot. This function will contain three main elements: the principal components of our dataset, the eigenvectors and the features or labels from our data.
First, we will plot our data in a scatterplot and, then, we will use a for loop to plot the eigenvectors and the features. Altogether, we will get a biplot.
def biplot(score,coef,labels=None): xs = score[:,0] ys = score[:,1] n = coef.shape[0] scalex = 1.0/(xs.max() - xs.min()) scaley = 1.0/(ys.max() - ys.min()) plt.scatter(xs * scalex,ys * scaley, s=5, color='orange') for i in range(n): plt.arrow(0, 0, coef[i,0], coef[i,1],color = 'purple', alpha = 0.5) plt.text(coef[i,0]* 1.15, coef[i,1] * 1.15, labels[i], color = 'darkblue', ha = 'center', va = 'center') plt.xlabel("PC{}".format(1)) plt.ylabel("PC{}".format(2)) plt.figure() |
def biplot(score,coef,labels=None): xs = score[:,0] ys = score[:,1] n = coef.shape[0] scalex = 1.0/(xs.max() - xs.min()) scaley = 1.0/(ys.max() - ys.min()) plt.scatter(xs * scalex,ys * scaley, s=5, color='orange') for i in range(n): plt.arrow(0, 0, coef[i,0], coef[i,1],color = 'purple', alpha = 0.5) plt.text(coef[i,0]* 1.15, coef[i,1] * 1.15, labels[i], color = 'darkblue', ha = 'center', va = 'center') plt.xlabel("PC{}".format(1)) plt.ylabel("PC{}".format(2)) plt.figure()
After defining our function, we just have to call it specifying our data:
plt.title('Biplot of PCA') biplot(PC, np.transpose(pca.components_), list(diabetes.feature_names)) |
plt.title('Biplot of PCA') biplot(PC, np.transpose(pca.components_), list(diabetes.feature_names))
And that’s how we can visualize our PCA in a biplot using Python.
Video, Further Resources & Summary
Do you need more explanations on how to create a biplot of a PCA in Python language? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.
The YouTube video will be added soon.
There are other tutorials on Statistics Globe you could have a look at:
- Principal Component Analysis in Python
- Append Rows to pandas DataFrame in Loop in Python
- Draw 3D Plot of PCA in Python
- Change plotly Axis Labels in Python
- Combine pandas DataFrames with Same Column Names in Python
- Learn Python
This post has shown how to create a biplot of a PCA in Python. In case you have further questions, don’t hesitate in leaving a comment.
This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get further information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter