Create Data Frame & Matrix Using Rcpp Package in R (4 Examples)

 

With Rcpp, we can integrate C++ code into the R programming language. In this post, we show you how to create C++ data frames and matrices using the Rcpp and RcppArmadillo package.

We cover the following blocks:

Let us create some objects!

 

Example 1: Standard Rcpp, Create a Data Frame from Input Vectors

We load the Rcpp package to be able to use C++ code in R.

if (!require('Rcpp', quietly = TRUE)) { install.packages('Rcpp') } 
library('Rcpp') # Load package 'Rcpp'

Now, in C++ let us define a function called “f_crea_d_frame” which takes as input three vectors of different types and combines them in a data frame. It is important that these vectors have the same length.

cppFunction(' 
Rcpp::DataFrame f_crea_d_frame( Rcpp::NumericVector Input_Vec1, 
                                Rcpp::StringVector  Input_Vec2, 
                                Rcpp::LogicalVector Input_Vec3 
) {
 
 // Combine the vectors to a data frame
 // With "Named("V1") = ", we can assign a name to a column
 Rcpp::DataFrame d_frame = DataFrame::create( Named("V1") = Input_Vec1, 
                                              Named("V2") = Input_Vec2, 
                                              Named("V3") = Input_Vec3
                                      ); 
 
  return d_frame;
}
')

From the above code you see that input vector 1 is numeric, 2 is a string vector, and 3 is a logical vector. We create some random values for these vectors in R and test the function.

n = 5 # Desired number of rows of the data frame
set.seed(54)
d_frame_out <- f_crea_d_frame( rnorm(n),                       # Standard normal distribution
                               sample(LETTERS[1:26], n, TRUE), # Random letters
                               rbinom(n, 1, 0.5)               # Bernoulli distribution
)
 
# Check the format of the output of function "f_crea_d_frame"
str(d_frame_out)
# 'data.frame':	5 obs. of  3 variables:
# $ V1: num  1.884 0.495 -0.365 1.621 1.166
# $ V2: chr  "Y" "L" "D" "K" ...
# $ V3: logi  FALSE TRUE FALSE TRUE FALSE
 
d_frame_out
#           V1 V2    V3
# 1  1.8837919  Y FALSE
# 2  0.4945845  L  TRUE
# 3 -0.3646517  D FALSE
# 4  1.6206231  K  TRUE
# 5  1.1663249  T FALSE

 

Example 2: Standard Rcpp, Create a Matrix from Input Vectors

Next, we create a matrix from three input vectors. Note that — like in R — all elements of a matrix have the same data type. In function “f_crea_matrix” we again use the different data types from before: numeric, string, logical. Function “f_crea_matrix” returns a matrix of these vectors.

cppFunction(' 
Rcpp::StringMatrix f_crea_matrix( Rcpp::NumericVector Input_Vec1, 
                                  Rcpp::StringVector  Input_Vec2, 
                                  Rcpp::LogicalVector Input_Vec3 
) {
 
  // Create an empty matrix with the desired dimensions
  Rcpp::StringMatrix Output_Mat( Input_Vec1.length(), 3);
 
  // Replace the columns of the matrix by the input vectors
  Output_Mat(_ , 0) = Input_Vec1;
  Output_Mat(_ , 1) = Input_Vec2;
  Output_Mat(_ , 2) = Input_Vec3;
 
  return Output_Mat;
}
')

Test the function:

n = 5 # Number of rows of the matrix
set.seed(54)
matrix_out <- f_crea_matrix( rnorm(n), sample(LETTERS[1:26], n, TRUE), rbinom(n, 1, 0.5) )
 
str(matrix_out)
# chr [1:5, 1:3] "1.883792" "0.494584" "-0.364652" "1.620623" "1.166325" "Y" ...
 
matrix_out
#      [,1]        [,2] [,3]
# [1,] "1.883792"  "Y"  "0" 
# [2,] "0.494584"  "L"  "1" 
# [3,] "-0.364652" "D"  "0" 
# [4,] "1.620623"  "K"  "1" 
# [5,] "1.166325"  "T"  "0"

 

Example 3: Standard Rcpp, Create a Matrix with Random Values

Instead of creating a data frame or matrix from input vectors, we can also create them from scratch and fill them with random values. Below, we define function “f_crea_matrix_2” which takes as input the dimensions of the desired matrix, that is the number of rows “mat_rows” and columns “mat_cols”. Within the function we first create an empty matrix “Output_Mat”. Then, we fill the rows of matrix “Output_Mat” with randomly uniformly distributed values in the interval from 2 to 4 with function Rcpp::runif(mat_cols, 2, 4).

cppFunction(' 
Rcpp::NumericMatrix f_crea_matrix_2( int mat_rows, int mat_cols ) {
 
   // Create a matrix with the desired dimensions
   // As a default, all values are set to 0
   Rcpp::NumericMatrix Output_Mat( mat_rows, mat_cols );
   Rcout << Output_Mat << std::endl;
 
   // Row-wise fill the matrix columns by random uniformly distributed values
   for ( int i = 0; i < mat_rows; i++ ) {
   Output_Mat ( i, _ ) = Rcpp::runif(mat_cols, 2, 4);
   Rcout << Output_Mat << std::endl;
  }
 
  return Output_Mat;
}
')

Within the function, we use the Rcout command to print intermediate results. Thereby, when we test the function – as shown below – we see that we first create a matrix with all entries set to zero and then row-wise fill this matrix by random uniformly distributed values.

set.seed(86)
f_crea_matrix_2(2,6)
# 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
# 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
# 
# 3.52155 2.98134 3.80470 2.43838 2.92906 3.76059
# 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
# 
# 3.52155 2.98134 3.80470 2.43838 2.92906 3.76059
# 2.66752 2.43232 3.47414 2.81947 2.44146 2.56754
# 
# [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
# [1,] 3.521549 2.981339 3.804701 2.438383 2.929056 3.760588
# [2,] 2.667521 2.432323 3.474142 2.819472 2.441460 2.567537

 

Example 4: RcppArmadillo, Create a Matrix from Input Vectors

Now, let us turn to RcppArmadillo and see how to implement the above functions here; the documentation of Armadillo can be found here. The Armadillo library is for linear algebra. We therefore do not have a specific data frame format in Armadillo, for data frames in C++, see example 1. Below, we define function “f_crea_arma_matrix” to create an Armadillo matrix from three input vectors. Compared to example 2, we use only numeric input.

cppFunction(depends = "RcppArmadillo", ' 
arma::mat f_crea_arma_matrix( arma::vec Input_Vec1, 
                              arma::vec Input_Vec2, 
                              arma::vec Input_Vec3 
) {
 
  arma::mat Output_Mat( Input_Vec1.n_elem, 3);
  Output_Mat.col(0) = Input_Vec1;
  Output_Mat.col(1) = Input_Vec2;
  Output_Mat.col(2) = Input_Vec3;
 
  return Output_Mat;
}
')

Let us test the function, plugging in a vector with random values from a standard normal, a poisson (\(\lambda=4\)), and a bernoulli (\(p=0.5\)) distribution.

n = 5 # Number of rows of the data frame
set.seed(54)
matrix_out <- f_crea_arma_matrix( rnorm(n), rpois(n, 4), rbinom(n, 1, 0.5) )
 
str(matrix_out)
# num [1:5, 1:3] 1.884 0.495 -0.365 1.621 1.166 ...
 
matrix_out
#            [,1] [,2] [,3]
# [1,]  1.8837919    2    0
# [2,]  0.4945845    8    1
# [3,] -0.3646517    4    0
# [4,]  1.6206231    4    1
# [5,]  1.1663249    2    0

 

Video & Further Resources

In the matrices which we created, we used random values from different distributions. Have a look at our Statistics Globe YouTube video below to see some examples of how to plot values drawn from a normal distribution to get a better feeling about randomly generated values.

 

 

On https://statisticsglobe.com/, we have many more R and Python posts:

 

In this post we showed you how to generate data frames and matrices in standard Rcpp and RcppArmadillo, both from input vectors and with values generated from certain distributions. For questions and comments, please use the following options.

 

Anna-Lena Wölwer Survey Statistician & R Programmer

This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top