sample_n & sample_frac R Functions | Sample Data with dplyr Package
This article shows how to take a sample of a data set with the sample_n and sample_frac functions of the dplyr package in the R programming language.
The post is structured as follows:
- Creating Example Data
- Example 1: Sampling N Cases with sample_n Function
- Example 2: Sampling Fraction of Data with sample_frac Function
- Video & Further Resources
Let’s start right away:
Creating Example Data
In the examples of this R tutorial, we’ll use the following data frame as basement:
data <- data.frame(x1 = c(1, 2, 1, 3, 2, 3), # Create example data x2 = c("A", "A", "C", "A", "B", "C")) data # Print example data # x1 x2 # 1 1 A # 2 2 A # 3 1 C # 4 3 A # 5 2 B # 6 3 C
Our data contains six rows and two columns. Note that we could also use a tibble instead of a data frame.
In order to make the sample_n and sample_frac functions of the dplyr package available, we need to install and load the package to RStudio:
install.packages("dplyr") # Install dplyr library("dplyr") # Load dplyr
Since we are going to randomly sample data, it also makes sense to set a seed for reproducibility:
set.seed(15151) # Set seed
Now we are set up and can move on to the application of the sample_n and sample_frac functions…
Example 1: Sampling N Cases with sample_n Function
Example 1 shows how to apply the sample_n function. The sample_n function returns a sample with a certain sample size of our original data frame.
Let’s assume that we want to extract a subsample of three cases. Then, we can apply the sample_n command as follows:
sample_n(data, 3) # Apply sample_n # x1 x2 # 1 3 C # 2 2 A # 3 1 C
The previous RStudio console output shows the result – A subset of our data frame with three rows.
Example 2: Sampling Fraction of Data with sample_frac Function
In contrast to sample_n, the sample_frac function is sampling a fraction (i.e. percentage) of the input data frame. For instance, we can sample a fraction of 33% with the following R code:
sample_frac(data, 0.33) # Apply sample_frac # x1 x2 # 1 2 A # 2 3 C
Since 33% of six rows is two, the sample_frac function retains two rows of our original data.
Video & Further Resources
Do you need more information on the R codes of this article? Then you may watch the following video of my YouTube channel. I illustrate the R programming codes of this article in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, I can recommend to have a look at the other RStudio tutorials of this homepage. A selection of tutorials about the dplyr package and the sampling of data in R can be found here.
- sample Function in R
- Sample Random Rows of Data Frame
- dplyr Package in R
- R Functions List (+ Examples)
- The R Programming Language
This article explained how to select random rows of a data frame or tibble with the dplyr package in R programming. Please tell me about it in the comments, if you have any additional questions. In addition, please subscribe to my email newsletter for updates on the newest tutorials.
Statistics Globe Newsletter