sample_n & sample_frac R Functions | Sample Data with dplyr Package

 

This article shows how to take a sample of a data set with the sample_n and sample_frac functions of the dplyr package in the R programming language.

The post is structured as follows:

Let’s start right away:

 

Creating Example Data

In the examples of this R tutorial, we’ll use the following data frame as basement:

data <- data.frame(x1 = c(1, 2, 1, 3, 2, 3),              # Create example data
                   x2 = c("A", "A", "C", "A", "B", "C"))
data                                                      # Print example data
#   x1 x2
# 1  1  A
# 2  2  A
# 3  1  C
# 4  3  A
# 5  2  B
# 6  3  C

Our data contains six rows and two columns. Note that we could also use a tibble instead of a data frame.

In order to make the sample_n and sample_frac functions of the dplyr package available, we need to install and load the package to RStudio:

install.packages("dplyr")                                 # Install dplyr
library("dplyr")                                          # Load dplyr

Since we are going to randomly sample data, it also makes sense to set a seed for reproducibility:

set.seed(15151)                                           # Set seed

Now we are set up and can move on to the application of the sample_n and sample_frac functions…

 

Example 1: Sampling N Cases with sample_n Function

Example 1 shows how to apply the sample_n function. The sample_n function returns a sample with a certain sample size of our original data frame.

Let’s assume that we want to extract a subsample of three cases. Then, we can apply the sample_n command as follows:

sample_n(data, 3)                                         # Apply sample_n
#   x1 x2
# 1  3  C
# 2  2  A
# 3  1  C

The previous RStudio console output shows the result – A subset of our data frame with three rows.

 

Example 2: Sampling Fraction of Data with sample_frac Function

In contrast to sample_n, the sample_frac function is sampling a fraction (i.e. percentage) of the input data frame. For instance, we can sample a fraction of 33% with the following R code:

sample_frac(data, 0.33)                                   # Apply sample_frac
#   x1 x2
# 1  2  A
# 2  3  C

Since 33% of six rows is two, the sample_frac function retains two rows of our original data.

 

Video & Further Resources

Do you need more information on the R codes of this article? Then you may watch the following video of my YouTube channel. I illustrate the R programming codes of this article in the video:

 

The YouTube video will be added soon.

 

Furthermore, I can recommend to have a look at the other RStudio tutorials of this homepage. A selection of tutorials about the dplyr package and the sampling of data in R can be found here.

 

This article explained how to select random rows of a data frame or tibble with the dplyr package in R programming. Please tell me about it in the comments, if you have any additional questions. In addition, please subscribe to my email newsletter for updates on the newest tutorials.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top