Draw Ellipse Plot for Groups in PCA in R (2 Examples)

 

In this tutorial, you’ll learn how to draw ellipses for each group in a Principal Component Analysis (PCA) using the R programming language.

Take a look at the table of content:

Let’s take a look at the code.

 

Sample Data, Add-on Libraries & PCA

Before we start, you may need to install the packages, factoextra and ggplot2, that we will be using during this tutorial:

install.packages("factoextra")
install.packages("ggplot2")

Now, let’s load the packages:

library(factoextra)
library(ggplot2)

To explain this tutorial, we will use the iris dataset. We can see how the first rows of the data frame look like using the head() function:

head(iris)

head_iris_data

Now, we will perform a PCA for all the columns except for Species and see what the output looks like using the summary() function:

iris.pca <- prcomp(iris[, -5],  
                   scale = TRUE)
 
summary(iris.pca)
 
# Importance of components:
#                           PC1    PC2     PC3     PC4
# Standard deviation     1.7084 0.9560 0.38309 0.14393
# Proportion of Variance 0.7296 0.2285 0.03669 0.00518
# Cumulative Proportion  0.7296 0.9581 0.99482 1.00000

 

Example 1: Visualize the PCA with Ellipses Using the factoextra Package

We can plot the ellipses using the fviz_pca_ind() function of the factoextra package. The factor variable habillage= helps us to color the observations by the iris species. To add our ellipses to the plot, we will also include addEllipses=TRUE inside the function:

fviz_pca_ind(iris.pca, 
             habillage=iris$Species,
             addEllipses=TRUE)

Individuals_plot_pca_ellipses

 

Example 2: Visualize the PCA with Ellipses Using the ggplot2 Package

We can also use the ggplot2 package to plot ellipses around the data. To achieve this, we will first specify our first two principal components from the PCA:

PC1<-iris.pca$x[,1]
PC2<-iris.pca$x[,2]

Now, we can plot them using the ggplot() function, which will draw the individuals as points using geom_point() and the ellipses around the data in each group using stat_ellipse():

ggplot(iris, 
       aes(x = PC1, 
           y = PC2, 
           color = Species)) +
  geom_point() +
  stat_ellipse()

Individuals_plot_pca_ellipses_ggplot2

Please note that the axes are labeled differently depending on the loaded package. In case of the factoextra package, the axes are labeled Dim1 and Dim2 and contain the percentage of explained variance. In case of the ggplot2 package, the axis labels simply contain the name of the principal components.

 

Video, Further Resources & Summary

In case you need more explanations on how to draw an ellipse plot for groups of a PCA in R, you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

There are other contents you could check on Statistics Globe:

This post has shown how to draw an ellipse plot for groups in a PCA in R. In case you have further questions, you may leave a comment below.

 

Paula Villasante Soriano Statistician & R Programmer

This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get further information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top