sample_n & sample_frac R Functions | Sample Data with dplyr Package

 

This article shows how to take a sample of a data set with the sample_n and sample_frac functions of the dplyr package in the R programming language.

The post is structured as follows:

Let’s start right away:

 

Creating Example Data

In the examples of this R tutorial, we’ll use the following data frame as basement:

data <- data.frame(x1 = c(1, 2, 1, 3, 2, 3),              # Create example data
                   x2 = c("A", "A", "C", "A", "B", "C"))
data                                                      # Print example data
#   x1 x2
# 1  1  A
# 2  2  A
# 3  1  C
# 4  3  A
# 5  2  B
# 6  3  C

Our data contains six rows and two columns. Note that we could also use a tibble instead of a data frame.

In order to make the sample_n and sample_frac functions of the dplyr package available, we need to install and load the package to RStudio:

install.packages("dplyr")                                 # Install dplyr
library("dplyr")                                          # Load dplyr

Since we are going to randomly sample data, it also makes sense to set a seed for reproducibility:

set.seed(15151)                                           # Set seed

Now we are set up and can move on to the application of the sample_n and sample_frac functions…

 

Example 1: Sampling N Cases with sample_n Function

Example 1 shows how to apply the sample_n function. The sample_n function returns a sample with a certain sample size of our original data frame.

Let’s assume that we want to extract a subsample of three cases. Then, we can apply the sample_n command as follows:

sample_n(data, 3)                                         # Apply sample_n
#   x1 x2
# 1  3  C
# 2  2  A
# 3  1  C

The previous RStudio console output shows the result – A subset of our data frame with three rows.

 

Example 2: Sampling Fraction of Data with sample_frac Function

In contrast to sample_n, the sample_frac function is sampling a fraction (i.e. percentage) of the input data frame. For instance, we can sample a fraction of 33% with the following R code:

sample_frac(data, 0.33)                                   # Apply sample_frac
#   x1 x2
# 1  2  A
# 2  3  C

Since 33% of six rows is two, the sample_frac function retains two rows of our original data.

 

Video & Further Resources

Do you need more information on the R codes of this article? Then you may watch the following video of my YouTube channel. I illustrate the R programming codes of this article in the video:

 

 

Furthermore, I can recommend to have a look at the other RStudio tutorials of this homepage. A selection of tutorials about the dplyr package and the sampling of data in R can be found here.

 

This article explained how to select random rows of a data frame or tibble with the dplyr package in R programming. Please tell me about it in the comments, if you have any additional questions. In addition, please subscribe to my email newsletter for updates on the newest tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top