# sample_n & sample_frac R Functions | Sample Data with dplyr Package

This article shows how to **take a sample of a data set with the sample_n and sample_frac functions** of the dplyr package in the R programming language.

The post is structured as follows:

- Creating Example Data
- Example 1: Sampling N Cases with sample_n Function
- Example 2: Sampling Fraction of Data with sample_frac Function
- Video & Further Resources

Let’s start right away:

## Creating Example Data

In the examples of this R tutorial, we’ll use the following data frame as basement:

data <- data.frame(x1 = c(1, 2, 1, 3, 2, 3), # Create example data x2 = c("A", "A", "C", "A", "B", "C")) data # Print example data # x1 x2 # 1 1 A # 2 2 A # 3 1 C # 4 3 A # 5 2 B # 6 3 C |

data <- data.frame(x1 = c(1, 2, 1, 3, 2, 3), # Create example data x2 = c("A", "A", "C", "A", "B", "C")) data # Print example data # x1 x2 # 1 1 A # 2 2 A # 3 1 C # 4 3 A # 5 2 B # 6 3 C

Our data contains six rows and two columns. Note that we could also use a tibble instead of a data frame.

In order to make the sample_n and sample_frac functions of the dplyr package available, we need to install and load the package to RStudio:

install.packages("dplyr") # Install dplyr library("dplyr") # Load dplyr |

install.packages("dplyr") # Install dplyr library("dplyr") # Load dplyr

Since we are going to randomly sample data, it also makes sense to set a seed for reproducibility:

set.seed(15151) # Set seed |

set.seed(15151) # Set seed

Now we are set up and can move on to the application of the sample_n and sample_frac functions…

## Example 1: Sampling N Cases with sample_n Function

Example 1 shows how to apply the sample_n function. The sample_n function returns a sample with a certain sample size of our original data frame.

Let’s assume that we want to extract a subsample of three cases. Then, we can apply the sample_n command as follows:

sample_n(data, 3) # Apply sample_n # x1 x2 # 1 3 C # 2 2 A # 3 1 C |

sample_n(data, 3) # Apply sample_n # x1 x2 # 1 3 C # 2 2 A # 3 1 C

The previous RStudio console output shows the result – A subset of our data frame with three rows.

## Example 2: Sampling Fraction of Data with sample_frac Function

In contrast to sample_n, the sample_frac function is sampling a fraction (i.e. percentage) of the input data frame. For instance, we can sample a fraction of 33% with the following R code:

sample_frac(data, 0.33) # Apply sample_frac # x1 x2 # 1 2 A # 2 3 C |

sample_frac(data, 0.33) # Apply sample_frac # x1 x2 # 1 2 A # 2 3 C

Since 33% of six rows is two, the sample_frac function retains two rows of our original data.

## Video & Further Resources

Do you need more information on the R codes of this article? Then you may watch the following video of my YouTube channel. I illustrate the R programming codes of this article in the video:

**Please accept YouTube cookies to play this video.** By accepting you will be accessing content from YouTube, a service provided by an external third party.

If you accept this notice, your choice will be saved and the page will refresh.

Furthermore, I can recommend to have a look at the other RStudio tutorials of this homepage. A selection of tutorials about the dplyr package and the sampling of data in R can be found here.

- sample Function in R
- Sample Random Rows of Data Frame
- dplyr Package in R
- R Functions List (+ Examples)
- The R Programming Language

This article explained how to **select random rows of a data frame or tibble** with the dplyr package in R programming. Please tell me about it in the comments, if you have any additional questions. In addition, please subscribe to my email newsletter for updates on the newest tutorials.

**5**/

**5**(

**1**vote )

### Statistics Globe Newsletter