sample_n & sample_frac R Functions | Sample Data with dplyr Package
This article shows how to take a sample of a data set with the sample_n and sample_frac functions of the dplyr package in the R programming language.
The post is structured as follows:
- Creating Example Data
- Example 1: Sampling N Cases with sample_n Function
- Example 2: Sampling Fraction of Data with sample_frac Function
- Video & Further Resources
Let’s start right away:
Creating Example Data
In the examples of this R tutorial, we’ll use the following data frame as basement:
data <- data.frame(x1 = c(1, 2, 1, 3, 2, 3), # Create example data x2 = c("A", "A", "C", "A", "B", "C")) data # Print example data # x1 x2 # 1 1 A # 2 2 A # 3 1 C # 4 3 A # 5 2 B # 6 3 C
Our data contains six rows and two columns. Note that we could also use a tibble instead of a data frame.
In order to make the sample_n and sample_frac functions of the dplyr package available, we need to install and load the package to RStudio:
install.packages("dplyr") # Install dplyr library("dplyr") # Load dplyr
Since we are going to randomly sample data, it also makes sense to set a seed for reproducibility:
set.seed(15151) # Set seed
Now we are set up and can move on to the application of the sample_n and sample_frac functions…
Example 1: Sampling N Cases with sample_n Function
Example 1 shows how to apply the sample_n function. The sample_n function returns a sample with a certain sample size of our original data frame.
Let’s assume that we want to extract a subsample of three cases. Then, we can apply the sample_n command as follows:
sample_n(data, 3) # Apply sample_n # x1 x2 # 1 3 C # 2 2 A # 3 1 C
The previous RStudio console output shows the result – A subset of our data frame with three rows.
Example 2: Sampling Fraction of Data with sample_frac Function
In contrast to sample_n, the sample_frac function is sampling a fraction (i.e. percentage) of the input data frame. For instance, we can sample a fraction of 33% with the following R code:
sample_frac(data, 0.33) # Apply sample_frac # x1 x2 # 1 2 A # 2 3 C
Since 33% of six rows is two, the sample_frac function retains two rows of our original data.
Video & Further Resources
Do you need more information on the R codes of this article? Then you may watch the following video of my YouTube channel. I illustrate the R programming codes of this article in the video:
Furthermore, I can recommend to have a look at the other RStudio tutorials of this homepage. A selection of tutorials about the dplyr package and the sampling of data in R can be found here.
- sample Function in R
- Sample Random Rows of Data Frame
- dplyr Package in R
- R Functions List (+ Examples)
- The R Programming Language
This article explained how to select random rows of a data frame or tibble with the dplyr package in R programming. Please tell me about it in the comments, if you have any additional questions. In addition, please subscribe to my email newsletter for updates on the newest tutorials.