sample Function in R (6 Examples)

 

On this page you’ll learn how to take a random sample using the sample function in the R programming language.

Table of contents:

Let’s get started…

Definition & Basic R Syntax of sample Function

 

Definition: The sample R function takes a random sample or permutation of a data object.

 

Basic R Syntax: In the following, you can find the basic R programming syntax of the sample function.

sample(values, size_of_subsample)                                      # Basic syntax of sample

 

In the following, I’ll illustrate in six examples how to use the sample function in R programming.

 

Example Data

First, let’s construct some example data:

my_vec <- 1:5                                                          # Create example vector
my_vec                                                                 # Print example vector
# 1 2 3 4 5

As you can see based on the previous output of the RStudio console, our example data is a simple numeric vector ranging from 1 to 5.

Generally speaking: Whenever we introduce randomness, we also should set a random seed to make our R code reproducible

set.seed(873465)                                                       # Seed for reproducibility

Now, we are set up to move on to the application of the sample function. So keep on reading!

 

Example 1: Random Reordering of Data Using sample Function

The following syntax shows how to permute (i.e. randomly reorder) a data object using the sample function in R.

sample(my_vec)                                                         # Random reordering
# 1 3 4 2 5

Our vector ranging from 1 to 5 was permuted so that the output is 1 3 4 2 5.

 

Example 2: Random Sampling without Replacement Using sample Function

The most common usage of the sample function is the random subsampling of data. This Example explains how to extracts three random values of our vector. For this task, we have to specify the size argument of the sample function as shown below:

sample(my_vec, size = 3)                                               # Take subsample
# 2 4 3

The previous R code randomly selected the numbers 2, 4, and 3.

 

Example 3: Random Sampling with Replacement Using sample Function

Have a look at the following error message:

sample(my_vec, size = 10)                                              # Error
# Error in sample.int(length(x), size, replace, prob) : 
#   cannot take a sample larger than the population when 'replace = FALSE'

The R programming language is telling us that our sample is larger than the population, i.e. the size argument was specified to a larger number as the sample size of our data. We were trying to extract ten numbers from a vector of length five.

One solution for this problem is the sampling with replacement, i.e. each element of our data can be selected multiple times. In the following R code, we are specifying the replace argument to be TRUE:

sample(my_vec, size = 10, replace = TRUE)                              # Subsample with replacement
# 3 5 3 2 1 4 1 5 5 4

The RStudio console returns a numeric vector containing ten elements. Note that some of the elements are repeatedly included in the vector (e.g. 3 and 5).

 

Example 4: Sampling with Uneven Probabilities Using sample Function

So far, we have selected the elements of our data with even probabilities. However, it is also possible to choose some elements with higher probabilities than others.

The following R programming code shows how to specify the prob argument of the sample function to modify the probabilities of our random selection so that the element 1 is drawn 6 times more often than the other elements:

sample(my_vec, size = 10, replace = TRUE, prob = c(0.6, rep(0.1, 4)))  # Adjust probabilities
# 3 1 1 1 1 1 1 5 1 1

As you can see based on the previous output of the RStudio console, the value 1 was selected eight out of ten times.

 

Example 5: Random Sampling of Data Frame Rows Using sample Function

We can also use the sample function to extract a random subset of rows from a data frame. The following R programming syntax creates some example data:

my_data <- data.frame(x1 = 1:10,                                       # Create example data
                      x2 = letters[1:10])
my_data                                                                # Print example data
#    x1 x2
# 1   1  a
# 2   2  b
# 3   3  c
# 4   4  d
# 5   5  e
# 6   6  f
# 7   7  g
# 8   8  h
# 9   9  i
# 10 10  j

Our example data frame consists of ten rows and two columns. The variable x1 is ranging from 1 to 10 and the variable x2 is ranging from a to j.

Now, we can apply the sample command to take a random subset of rows:

my_data_samp <- my_data[sample(1:nrow(my_data), size = 3), ]           # Subsample of data frame rows
my_data_samp                                                           # Print subsampled data
#   x1 x2
# 9  9  i
# 3  3  c
# 7  7  g

The previous code randomly selected the three rows 9, 3, and 7. Note that the ordering of these rows was also randomly chosen.

 

Example 6: Random Sampling of List Elements Using sample Function

Another option provided by the sample function is the subsampling of list elements. First, let’s construct an example list:

my_list <- list(1:3,                                                   # Create example list
                753,
                c("A", "XXX", "Hello"),
                "YYY",
                5)
my_list                                                                # Print example list
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 753
# 
# [[3]]
# [1] "A"     "XXX"   "Hello"
# 
# [[4]]
# [1] "YYY"
# 
# [[5]]
# [1] 5

Our example list consists of five list elements. Now, we can use the following R syntax to randomly select some of the list elements:

my_list_samp <- my_list[sample(1:length(my_list), size = 3)]           # Take subsample of list
my_list_samp                                                           # Print subsampled list
# [[1]]
# [1] 5
# 
# [[2]]
# [1] "YYY"
# 
# [[3]]
# [1] 753

In this example, we have selected three list elements of our input list.

 

Video, Further Resources & Summary

Have a look at the following video that I have published on my YouTube channel. I show the R programming syntax of this tutorial in the video:

 

The YouTube video will be added soon.

 

In addition, you might have a look at some of the related posts of my website:

 

In summary: In this R tutorial you learned how to take a simple random sample. If you have additional questions and/or comments, let me know in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

  • SCOTT PROST-DOMASKY
    May 4, 2021 5:16 pm

    Your webpages have been very helpful. Perhaps when you show the output, you could put someplace on the page what version of R you are using? Right away, First Example, I get a difference–my (supposedly) random sample of the 5 elements in my_vec is “5 4 3 2 1”, not “1 3 4 2 5”. I am using R4.0.5 with Rstudio 1.4.1116. I get “5 4 3 2 1” when I use RGui(64-bit), so I don’t think input syntax is my problem. Of course since my first example output is different than yours, I don’t get the same results in the other Examples. Is it possible the ‘sample’ function doesn’t work right? The output doesn’t appear random—sample(my_vec) gives me “5 4 3 2 1” while sample(my_vec, size=3) gives me “1 2 3”. Doesn’t ‘look’ random to me! (yes I know it’s possible a random sample of 3 of pop. of 5 can give me the first 3 of the 5).

    Reply
    • Hey Scott,

      Thank you very much for the very kind words!

      Have you set the same random seed as I did in the beginning of the tutorial? It is important that you set this seed directly before executing the sample function.

      Regards

      Joachim

      Reply
  • On Day 1, Basket A contains 10 red balls and Basket B contains 10 blue balls. Each morning you pick a ball at random from Basket A and put it in Basket B. Each evening, you pick a ball at random from Basket B and put it in Basket A. Every time you transfer a blue ball you drop a penny in a piggy bank, every time you transfer a red ball you drop a nickel in the piggy bank. On Day 366 morning, how many balls of each color are in the two baskets? On Day 366 morning how much money is in the piggy bank? The answers are not numbers, but random variables. Give their probability mass functions.

    Can you help me to solve this problem?

    Reply
    • Hi Sultana,

      I’m sorry for the delayed response. I was on a long vacation, so unfortunately I wasn’t able to get back to you earlier. Do you still need help with your syntax?

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top