Draw Ellipse Plot for Groups in PCA in R (2 Examples)
In this tutorial, you’ll learn how to draw ellipses for each group in a Principal Component Analysis (PCA) using the R programming language.
Take a look at the table of content:
Let’s take a look at the code.
Sample Data, Add-on Libraries & PCA
Before we start, you may need to install the packages, factoextra and ggplot2, that we will be using during this tutorial:
install.packages("factoextra") install.packages("ggplot2") |
install.packages("factoextra") install.packages("ggplot2")
Now, let’s load the packages:
library(factoextra) library(ggplot2) |
library(factoextra) library(ggplot2)
To explain this tutorial, we will use the iris dataset. We can see how the first rows of the data frame look like using the head() function:
head(iris) |
head(iris)
Now, we will perform a PCA for all the columns except for Species and see what the output looks like using the summary() function:
iris.pca <- prcomp(iris[, -5], scale = TRUE) summary(iris.pca) # Importance of components: # PC1 PC2 PC3 PC4 # Standard deviation 1.7084 0.9560 0.38309 0.14393 # Proportion of Variance 0.7296 0.2285 0.03669 0.00518 # Cumulative Proportion 0.7296 0.9581 0.99482 1.00000 |
iris.pca <- prcomp(iris[, -5], scale = TRUE) summary(iris.pca) # Importance of components: # PC1 PC2 PC3 PC4 # Standard deviation 1.7084 0.9560 0.38309 0.14393 # Proportion of Variance 0.7296 0.2285 0.03669 0.00518 # Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
Example 1: Visualize the PCA with Ellipses Using the factoextra Package
We can plot the ellipses using the fviz_pca_ind() function of the factoextra package. The factor variable habillage=
helps us to color the observations by the iris species. To add our ellipses to the plot, we will also include addEllipses=TRUE
inside the function:
fviz_pca_ind(iris.pca, habillage=iris$Species, addEllipses=TRUE) |
fviz_pca_ind(iris.pca, habillage=iris$Species, addEllipses=TRUE)
Example 2: Visualize the PCA with Ellipses Using the ggplot2 Package
We can also use the ggplot2 package to plot ellipses around the data. To achieve this, we will first specify our first two principal components from the PCA:
PC1<-iris.pca$x[,1] PC2<-iris.pca$x[,2] |
PC1<-iris.pca$x[,1] PC2<-iris.pca$x[,2]
Now, we can plot them using the ggplot() function, which will draw the individuals as points using geom_point() and the ellipses around the data in each group using stat_ellipse():
ggplot(iris, aes(x = PC1, y = PC2, color = Species)) + geom_point() + stat_ellipse() |
ggplot(iris, aes(x = PC1, y = PC2, color = Species)) + geom_point() + stat_ellipse()
Please note that the axes are labeled differently depending on the loaded package. In case of the factoextra package, the axes are labeled Dim1 and Dim2 and contain the percentage of explained variance. In case of the ggplot2 package, the axis labels simply contain the name of the principal components.
Video, Further Resources & Summary
In case you need more explanations on how to draw an ellipse plot for groups of a PCA in R, you should have a look at the following YouTube video of the Statistics Globe YouTube channel.
The YouTube video will be added soon.
There are other contents you could check on Statistics Globe:
- Biplot of PCA in R
- Mode Imputation (How to Impute Categorical Variables Using R)
- 3D Plot of PCA in R
- Can PCA be Used for Categorical Variables?
- Introduction to R
This post has shown how to draw an ellipse plot for groups in a PCA in R. In case you have further questions, you may leave a comment below.
This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get further information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter