Draw Autoplot of PCA in Python (2 Examples)
On this page you’ll learn how to create an autoplot of a Principal Component Analysis (PCA) in the Python programming language.
The table of content is structured as shown below:
Let’s start with it!
Example Data and Add-On Libraries
In order to explain how to draw an autoplot of a PCA in Python, we will need to use some libraries which will help us with the data analysis, calculation, model building, and data visualization of our PCA and its autoplot. Please load them before we start.
import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn import decomposition from sklearn.decomposition import PCA from sklearn.datasets import load_wine |
import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn import decomposition from sklearn.decomposition import PCA from sklearn.datasets import load_wine
Now, it’s time to prepare our data. For this tutorial, we will use the wine dataset from the scikit-learn library. To import it, we will use the load() function.
wine = load_wine() |
wine = load_wine()
After this, we will convert it into a pandas DataFrame, so we can run our PCA. We will also define our target, which corresponds to the type of wine in our data.
df = pd.DataFrame(wine.data, columns=wine.feature_names) target = pd.Series(wine.target, name = "Class") df.iloc[:, 0:3].head(6) |
df = pd.DataFrame(wine.data, columns=wine.feature_names) target = pd.Series(wine.target, name = "Class") df.iloc[:, 0:3].head(6)
Our data set has 178 rows and 13 columns. Above, we can see the first 6 rows from the first 3 columns by using the head() and the .iloc[]
methods.
Now, we can work on the PCA.
Scale the Data and Perform the PCA
Before performing the PCA, our data must be scaled. For this, we will use the StandardScaler() function, and then we will transform our DataFrame.
scaler = StandardScaler() scaler.fit(df) wine_scaled = scaler.transform(df) |
scaler = StandardScaler() scaler.fit(df) wine_scaled = scaler.transform(df)
We can define the number of components we want to include in our PCA, and then use them to perform the PCA in our DataFrame by using the pca.fit() function.
pca = PCA(n_components=2) PC = pca.fit_transform(wine_scaled) pca_wine = pd.DataFrame(data = PC, columns = ['PC1', 'PC2']) pca_wine.head(6) |
pca = PCA(n_components=2) PC = pca.fit_transform(wine_scaled) pca_wine = pd.DataFrame(data = PC, columns = ['PC1', 'PC2']) pca_wine.head(6)
Above we can see the first 6 rows of our PCA DataFrame. Let’s see how does the autoplot look like!
Example 1: Basic Autoplot of PCA
Now, we will see how to draw an autoplot by plotting one principal component in each axis and using the scatter() function.
fig, ax = plt.subplots(figsize=(14, 9)) ax.scatter(x=pca_wine['PC1'], y=pca_wine['PC2'], c=target, s=50, cmap='cool') ax.set_xlabel('PC1', fontsize = 20) ax.set_ylabel('PC2', fontsize = 20) ax.set_title('Figure 1', fontsize=20) plt.figure() |
fig, ax = plt.subplots(figsize=(14, 9)) ax.scatter(x=pca_wine['PC1'], y=pca_wine['PC2'], c=target, s=50, cmap='cool') ax.set_xlabel('PC1', fontsize = 20) ax.set_ylabel('PC2', fontsize = 20) ax.set_title('Figure 1', fontsize=20) plt.figure()
Note that we have colored our data by its target using the c=
argument and used the “cool” colormap, although there are plenty of colormaps you can choose to color your plot. Also, the s=
argument helped us to change the size of the points in our plot.
You can also add the feature vectors to this plot. Take a look at the following example.
Example 2: Autoplot of PCA as Biplot
It’s also possible to add the feature vectors to this autoplot in case we want to create a biplot. Take a look at the following code.
xs = PC[:,0] ys = PC[:,1] scalex = 1.0/(xs.max() - xs.min()) scaley = 1.0/(ys.max() - ys.min()) fig, ax = plt.subplots(figsize=(14, 9)) for i, feature in enumerate(wine.feature_names): ax.arrow(0, 0, pca.components_[0, i], pca.components_[1, i], head_width=0.03, head_length=0.03) ax.text(pca.components_[0, i] * 1.15, pca.components_[1, i] * 1.15, feature, fontsize = 18) scatter = ax.scatter(xs * scalex,ys * scaley, c=target, s=50, cmap='cool') ax.set_xlabel('PC1', fontsize=20) ax.set_ylabel('PC2', fontsize=20) ax.set_title('Figure 2', fontsize=20) legend1 = ax.legend(*scatter.legend_elements(), loc="lower left", title="Wine Target") ax.add_artist(legend1) plt.figure() |
xs = PC[:,0] ys = PC[:,1] scalex = 1.0/(xs.max() - xs.min()) scaley = 1.0/(ys.max() - ys.min()) fig, ax = plt.subplots(figsize=(14, 9)) for i, feature in enumerate(wine.feature_names): ax.arrow(0, 0, pca.components_[0, i], pca.components_[1, i], head_width=0.03, head_length=0.03) ax.text(pca.components_[0, i] * 1.15, pca.components_[1, i] * 1.15, feature, fontsize = 18) scatter = ax.scatter(xs * scalex,ys * scaley, c=target, s=50, cmap='cool') ax.set_xlabel('PC1', fontsize=20) ax.set_ylabel('PC2', fontsize=20) ax.set_title('Figure 2', fontsize=20) legend1 = ax.legend(*scatter.legend_elements(), loc="lower left", title="Wine Target") ax.add_artist(legend1) plt.figure()
As shown in the previous Python output, we have represented the variables as vectors by using a for loop and the arrow() and text() functions. Moreover, it’s possible to modify the size and length of the arrows’ heads using the head_width=
and head_length=
arguments. Finally, we’ve also added a legend via the matplotlib legend() function. If you want to see other examples of biplot of PCA in Python, you can check our tutorial: Draw Biplot of PCA in Python.
Video, Further Resources & Summary
Do you need more explanations on how to create an autoplot of PCA in Python? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.
The YouTube video will be added soon.
If you want to learn more, you could take a look at some other tutorials available on Statistics Globe:
- Draw 3D Plot of PCA in Python
- Append Values to pandas DataFrame in Python
- Change datetime Format in pandas DataFrame in Python
- Create New pandas DataFrame from Existing Data in Python
In this post you had the opportunity to learn how to create an autoplot in Python. In case you have further questions, you may leave a comment.
This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter