sample Function in R (6 Examples)

 

On this page you’ll learn how to take a random sample using the sample function in the R programming language.

Table of contents:

Let’s get started…

Definition & Basic R Syntax of sample Function

 

Definition: The sample R function takes a random sample or permutation of a data object.

 

Basic R Syntax: In the following, you can find the basic R programming syntax of the sample function.

sample(values, size_of_subsample)                                      # Basic syntax of sample

 

In the following, I’ll illustrate in six examples how to use the sample function in R programming.

 

Example Data

First, let’s construct some example data:

my_vec <- 1:5                                                          # Create example vector
my_vec                                                                 # Print example vector
# 1 2 3 4 5

As you can see based on the previous output of the RStudio console, our example data is a simple numeric vector ranging from 1 to 5.

Generally speaking: Whenever we introduce randomness, we also should set a random seed to make our R code reproducible

set.seed(873465)                                                       # Seed for reproducibility

Now, we are set up to move on to the application of the sample function. So keep on reading!

 

Example 1: Random Reordering of Data Using sample Function

The following syntax shows how to permute (i.e. randomly reorder) a data object using the sample function in R.

sample(my_vec)                                                         # Random reordering
# 1 3 4 2 5

Our vector ranging from 1 to 5 was permuted so that the output is 1 3 4 2 5.

 

Example 2: Random Sampling without Replacement Using sample Function

The most common usage of the sample function is the random subsampling of data. This Example explains how to extracts three random values of our vector. For this task, we have to specify the size argument of the sample function as shown below:

sample(my_vec, size = 3)                                               # Take subsample
# 2 4 3

The previous R code randomly selected the numbers 2, 4, and 3.

 

Example 3: Random Sampling with Replacement Using sample Function

Have a look at the following error message:

sample(my_vec, size = 10)                                              # Error
# Error in sample.int(length(x), size, replace, prob) : 
#   cannot take a sample larger than the population when 'replace = FALSE'

The R programming language is telling us that our sample is larger than the population, i.e. the size argument was specified to a larger number as the sample size of our data. We were trying to extract ten numbers from a vector of length five.

One solution for this problem is the sampling with replacement, i.e. each element of our data can be selected multiple times. In the following R code, we are specifying the replace argument to be TRUE:

sample(my_vec, size = 10, replace = TRUE)                              # Subsample with replacement
# 3 5 3 2 1 4 1 5 5 4

The RStudio console returns a numeric vector containing ten elements. Note that some of the elements are repeatedly included in the vector (e.g. 3 and 5).

 

Example 4: Sampling with Uneven Probabilities Using sample Function

So far, we have selected the elements of our data with even probabilities. However, it is also possible to choose some elements with higher probabilities than others.

The following R programming code shows how to specify the prob argument of the sample function to modify the probabilities of our random selection so that the element 1 is drawn 6 times more often than the other elements:

sample(my_vec, size = 10, replace = TRUE, prob = c(0.6, rep(0.1, 4)))  # Adjust probabilities
# 3 1 1 1 1 1 1 5 1 1

As you can see based on the previous output of the RStudio console, the value 1 was selected eight out of ten times.

 

Example 5: Random Sampling of Data Frame Rows Using sample Function

We can also use the sample function to extract a random subset of rows from a data frame. The following R programming syntax creates some example data:

my_data <- data.frame(x1 = 1:10,                                       # Create example data
                      x2 = letters[1:10])
my_data                                                                # Print example data
#    x1 x2
# 1   1  a
# 2   2  b
# 3   3  c
# 4   4  d
# 5   5  e
# 6   6  f
# 7   7  g
# 8   8  h
# 9   9  i
# 10 10  j

Our example data frame consists of ten rows and two columns. The variable x1 is ranging from 1 to 10 and the variable x2 is ranging from a to j.

Now, we can apply the sample command to take a random subset of rows:

my_data_samp <- my_data[sample(1:nrow(my_data), size = 3), ]           # Subsample of data frame rows
my_data_samp                                                           # Print subsampled data
#   x1 x2
# 9  9  i
# 3  3  c
# 7  7  g

The previous code randomly selected the three rows 9, 3, and 7. Note that the ordering of these rows was also randomly chosen.

 

Example 6: Random Sampling of List Elements Using sample Function

Another option provided by the sample function is the subsampling of list elements. First, let’s construct an example list:

my_list <- list(1:3,                                                   # Create example list
                753,
                c("A", "XXX", "Hello"),
                "YYY",
                5)
my_list                                                                # Print example list
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 753
# 
# [[3]]
# [1] "A"     "XXX"   "Hello"
# 
# [[4]]
# [1] "YYY"
# 
# [[5]]
# [1] 5

Our example list consists of five list elements. Now, we can use the following R syntax to randomly select some of the list elements:

my_list_samp <- my_list[sample(1:length(my_list), size = 3)]           # Take subsample of list
my_list_samp                                                           # Print subsampled list
# [[1]]
# [1] 5
# 
# [[2]]
# [1] "YYY"
# 
# [[3]]
# [1] 753

In this example, we have selected three list elements of our input list.

 

Video, Further Resources & Summary

Have a look at the following video that I have published on my YouTube channel. I show the R programming syntax of this tutorial in the video:

 

The YouTube video will be added soon.

 

In addition, you might have a look at some of the related posts of my website:

 

In summary: In this R tutorial you learned how to take a simple random sample. If you have additional questions and/or comments, let me know in the comments.

 



Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top