Scree Plot for PCA Explained

When we perform a Principal Component Analysis (PCA), the main idea is to capture the most of the variance of our data using a lower-dimensional space. To accomplish this, we use newly defined components, which are lower in numbers than the original variables.

But how much lower? One way of deciding on this is to use a scree plot. In this tutorial, you’ll learn how to interpret a scree plot in a PCA.

This is the content you will find on this page:

1) What is Scree Plot?

2) Number of Components to Keep

3) Video, Further Resources & Summary

4) Subscribe to the Statistics Globe Newsletter

5) Thank you!

Paula Villasante Soriano Statistician & R Programmer

This page was created in collaboration with Paula Villasante Soriano and Cansu Kebabci. Please have a look at Paula’s and Cansu’s author pages to get further information about their academic backgrounds and the other articles they have written for Statistics Globe.

Rana Cansu Kebabci Statistician & Data Scientist

Let’s dive right in.

What is Scree Plot?

Scree plot is a graphic that shows the explained variance per newly defined component (principal component). The measure of the plot can be the percentage or the absolute value of the explained variance (eigenvalues). It’s common in practice that the first few principal components explain the major amount of variance.

Number of Components to Keep

In order to decide how many components to keep after creating our scree plot, there are two different methods we can use:

The Elbow Method

A method of interpreting a scree plot is to use the elbow rule. This method is about looking for the “elbow” shape on the curve and retaining all components before the point where the curve flattens out. For the demonstration, we used the built-in mtcars data and the measure of explained variance percentage. See Figure 1 below.

Figure 1: Screeplot of PCA

In Figure 1, the x-axis shows the principal components (dimensions), which are 10 in this case. The y-axis shows the percentage of the explained variance per principal component. The elbow appears to occur at the third principal component. This means that the first three components should be kept for the analysis.

Kaiser’s Rule

Kaiser’s rule is a commonly used method to select the number of components in a PCA. It’s based on keeping the components with eigenvalues greater than 1. If we create the scree plot by showing the eigenvalues this time, the graph will look like in Figure 2.

Scree_plot_PCA_eigenvalues

We can see that the first two principal components have eigenvalues greater than 1. Thus, this method also leads us to keep two components.

As shown, the elbow method and Kaiser’s rule help us to decide on the number of components to keep. Also you can check the Scree Plot in R and Scree Plot in Python tutorials if you want to learn how to draw them in R and Pyhton.

Video, Further Resources & Summary

Do you need more explanations and background on how to use a Principal Component Analysis? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.