# Draw Biplot of PCA in Python (Example)

In this tutorial, you’ll learn how to create a biplot of a Principal Component Analysis (PCA) using the Python language.

Let’s get started.

## Example Data & Libraries

For this tutorial, we will be using the diabetes dataset from scikit-learn. To load it and to perform and visualize a biplot of a PCA for this dataset using the Python programming language, we will need to import some libraries first:

```import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_diabetes from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA```

We will use the load() function from the scikit-learn library to load our dataset. Then, we will convert it into a DataFrame using the pd.DataFrame() function. After doing this, we can see how the first rows of our data look like:

```diabetes = load_diabetes() df = pd.DataFrame(data=diabetes.data, columns=diabetes.feature_names)   df.head(6)``` Now, let’s perform our PCA.

## Scale your Data and Perform the PCA

Before performing the PCA, it’s important to scale our data to get better results. For this, we will use the StandardScaler() class and create an object inside it to fit our matrix:

```scaler = StandardScaler()   scaler.fit(df)   Diabetes_scaled = scaler.transform(df)```

After using this function, we will obtain a two-dimensional NumPy array with the same dimensions of our original matrix. We can visualize this array into a data frame:

```dataframe_scaled = pd.DataFrame(data=Diabetes_scaled, columns=diabetes.feature_names)   dataframe_scaled.head(6)``` Now, we can perform our PCA by using the PCA algorithm from sklearn.decomposition. We will choose several components for our PCA and then transform our data:

```pca = PCA(n_components=4) PC = pca.fit_transform(Diabetes_scaled) pca_diabetes = pd.DataFrame(data = PC, columns = ['PC 1', 'PC 2','PC 3', 'PC 4' ])   pca_diabetes.head(6)``` Now, we’re ready to create a biplot of our PCA.

## Visualize the PCA in a Biplot

Let’s visualize our PCA in a biplot. To achieve this, we will create a function for the biplot. This function will contain three main elements: the principal components of our dataset, the eigenvectors and the features or labels from our data.

First, we will plot our data in a scatterplot and, then, we will use a for loop to plot the eigenvectors and the features. Altogether, we will get a biplot.

```def biplot(score,coef,labels=None):   xs = score[:,0] ys = score[:,1] n = coef.shape scalex = 1.0/(xs.max() - xs.min()) scaley = 1.0/(ys.max() - ys.min()) plt.scatter(xs * scalex,ys * scaley, s=5, color='orange')   for i in range(n): plt.arrow(0, 0, coef[i,0], coef[i,1],color = 'purple', alpha = 0.5) plt.text(coef[i,0]* 1.15, coef[i,1] * 1.15, labels[i], color = 'darkblue', ha = 'center', va = 'center')   plt.xlabel("PC{}".format(1)) plt.ylabel("PC{}".format(2))     plt.figure()```

After defining our function, we just have to call it specifying our data:

```plt.title('Biplot of PCA')   biplot(PC, np.transpose(pca.components_), list(diabetes.feature_names))``` And that’s how we can visualize our PCA in a biplot using Python.

## Video, Further Resources & Summary

Do you need more explanations on how to create a biplot of a PCA in Python language? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

There are other tutorials on Statistics Globe you could have a look at:

This post has shown how to create a biplot of a PCA in Python. In case you have further questions, don’t hesitate in leaving a comment.

This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get further information about her academic background and the other articles she has written for Statistics Globe.

Subscribe to the Statistics Globe Newsletter