Scatterplot of PCA in R (2 Examples)
In this tutorial, you’ll learn how to create a scatterplot of a Principal Component Analysis (PCA) in the R programming language.
We will be showing the following content:
Let’s go straight to the code.
Example Data & Add-On Libraries
The first step in this tutorial is to install the needed packages: ggfortify and ggplot2.
You can install them using the following code:
install.packages("ggfortify") install.packages("ggplot2") |
install.packages("ggfortify") install.packages("ggplot2")
Next, you can load the libraries before we continue with the implementation:
library(ggfortify) library(ggplot2) |
library(ggfortify) library(ggplot2)
Now we need to create some data to be used in the following examples.
set.seed(999991) x <- abs(rnorm(100)) y <- abs(rnorm(100) + 0.3 * x^3) z<-abs(rnorm(100) + 0.3*y) data <- data.frame(x,y,z) head(data) # x y z # 1 1.7647252 1.5626330 0.20379365 # 2 0.9950984 0.1961558 0.07370991 # 3 2.1333172 4.6483993 1.11783870 # 4 0.1789901 0.9066008 0.35732718 # 5 0.4838006 1.1706268 1.25779152 # 6 0.6438660 1.3055564 1.73172039 |
set.seed(999991) x <- abs(rnorm(100)) y <- abs(rnorm(100) + 0.3 * x^3) z<-abs(rnorm(100) + 0.3*y) data <- data.frame(x,y,z) head(data) # x y z # 1 1.7647252 1.5626330 0.20379365 # 2 0.9950984 0.1961558 0.07370991 # 3 2.1333172 4.6483993 1.11783870 # 4 0.1789901 0.9066008 0.35732718 # 5 0.4838006 1.1706268 1.25779152 # 6 0.6438660 1.3055564 1.73172039
After a quick view of the data, it is time to perform the PCA!
Principal Component Analysis
To perform the PCA, we will use the prcomp() function. We will specify scale=TRUE
to scale the data. If you wonder why, check out our tutorial Principal Component Analysis (PCA) in R.
df_pca <- prcomp(data, scale=TRUE) summary(df_pca) # Importance of components: # PC1 PC2 PC3 # Standard deviation 1.320 0.9387 0.6130 # Proportion of Variance 0.581 0.2937 0.1253 # Cumulative Proportion 0.581 0.8747 1.0000 |
df_pca <- prcomp(data, scale=TRUE) summary(df_pca) # Importance of components: # PC1 PC2 PC3 # Standard deviation 1.320 0.9387 0.6130 # Proportion of Variance 0.581 0.2937 0.1253 # Cumulative Proportion 0.581 0.8747 1.0000
As shown in the output above, the first two principal components explain enough variance by 87.5% in total. Now, we can visualize on a scatterplot how our observations are represented by the first two principal components.
Example 1: Scatterplot of PCA Using ggfortify
To create a scatterplot using the ggfortify package, we need to use the autoplot() function as follows.
autoplot(df_pca, colour="blue") |
autoplot(df_pca, colour="blue")
Thanks to the autoplot() function, the original data is shown in a reduced dimensional space, which is 2 dimensional in this case.
Example 2: Scatterplot of PCA Using ggplot2
Alternatively, we can visualize the PCA results using the ggplot2 package. To accomplish this, first, we need to extract the first and second principal components as seen in the following code:
PC1 <- df_pca$x[,1] PC2 <- df_pca$x[,2] |
PC1 <- df_pca$x[,1] PC2 <- df_pca$x[,2]
Now, we can employ the ggplot2 package to visualize our observations in 2D space. To draw the scatterplot, we need the geom_point() function together with ggplot(). Since ggplot2 is not specialized for PCA like ggfortify, we should explicitly specify the axis labels to show the percentages of explained variance per component like in the first visual.
ggplot(data = data, aes(x = PC1, y = PC2)) + geom_point(colour="magenta")+ xlab("PC1 (58.1%)") + ylab("PC2 (29.37%)") |
ggplot(data = data, aes(x = PC1, y = PC2)) + geom_point(colour="magenta")+ xlab("PC1 (58.1%)") + ylab("PC2 (29.37%)")
Video, Further Resources & Summary
Do you need more explanations on how to create a scatterplot of a PCA in R? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.
The YouTube video will be added soon.
Don’t forget to have a look at some other tutorials on Statistics Globe:
- PCA Using Correlation & Covariance Matrix
- Point Cloud of PCA in R
- Choose Optimal Number of Components for PCA
- Autoplot of PCA in R
- Introduction to R
This post has shown how to draw a scatterplot of PCA. In case you have further questions, you may leave a comment below.
Statistics Globe Newsletter