Create Data Frame & Matrix Using Rcpp Package in R (4 Examples)
With Rcpp, we can integrate C++ code into the R programming language. In this post, we show you how to create C++ data frames and matrices using the Rcpp and RcppArmadillo package.
We cover the following blocks:
Let us create some objects!
Example 1: Standard Rcpp, Create a Data Frame from Input Vectors
We load the Rcpp package to be able to use C++ code in R.
if (!require('Rcpp', quietly = TRUE)) { install.packages('Rcpp') } library('Rcpp') # Load package 'Rcpp'
Now, in C++ let us define a function called “f_crea_d_frame” which takes as input three vectors of different types and combines them in a data frame. It is important that these vectors have the same length.
cppFunction(' Rcpp::DataFrame f_crea_d_frame( Rcpp::NumericVector Input_Vec1, Rcpp::StringVector Input_Vec2, Rcpp::LogicalVector Input_Vec3 ) { // Combine the vectors to a data frame // With "Named("V1") = ", we can assign a name to a column Rcpp::DataFrame d_frame = DataFrame::create( Named("V1") = Input_Vec1, Named("V2") = Input_Vec2, Named("V3") = Input_Vec3 ); return d_frame; } ')
From the above code you see that input vector 1 is numeric, 2 is a string vector, and 3 is a logical vector. We create some random values for these vectors in R and test the function.
n = 5 # Desired number of rows of the data frame set.seed(54) d_frame_out <- f_crea_d_frame( rnorm(n), # Standard normal distribution sample(LETTERS[1:26], n, TRUE), # Random letters rbinom(n, 1, 0.5) # Bernoulli distribution ) # Check the format of the output of function "f_crea_d_frame" str(d_frame_out) # 'data.frame': 5 obs. of 3 variables: # $ V1: num 1.884 0.495 -0.365 1.621 1.166 # $ V2: chr "Y" "L" "D" "K" ... # $ V3: logi FALSE TRUE FALSE TRUE FALSE d_frame_out # V1 V2 V3 # 1 1.8837919 Y FALSE # 2 0.4945845 L TRUE # 3 -0.3646517 D FALSE # 4 1.6206231 K TRUE # 5 1.1663249 T FALSE
Example 2: Standard Rcpp, Create a Matrix from Input Vectors
Next, we create a matrix from three input vectors. Note that — like in R — all elements of a matrix have the same data type. In function “f_crea_matrix” we again use the different data types from before: numeric, string, logical. Function “f_crea_matrix” returns a matrix of these vectors.
cppFunction(' Rcpp::StringMatrix f_crea_matrix( Rcpp::NumericVector Input_Vec1, Rcpp::StringVector Input_Vec2, Rcpp::LogicalVector Input_Vec3 ) { // Create an empty matrix with the desired dimensions Rcpp::StringMatrix Output_Mat( Input_Vec1.length(), 3); // Replace the columns of the matrix by the input vectors Output_Mat(_ , 0) = Input_Vec1; Output_Mat(_ , 1) = Input_Vec2; Output_Mat(_ , 2) = Input_Vec3; return Output_Mat; } ')
Test the function:
n = 5 # Number of rows of the matrix set.seed(54) matrix_out <- f_crea_matrix( rnorm(n), sample(LETTERS[1:26], n, TRUE), rbinom(n, 1, 0.5) ) str(matrix_out) # chr [1:5, 1:3] "1.883792" "0.494584" "-0.364652" "1.620623" "1.166325" "Y" ... matrix_out # [,1] [,2] [,3] # [1,] "1.883792" "Y" "0" # [2,] "0.494584" "L" "1" # [3,] "-0.364652" "D" "0" # [4,] "1.620623" "K" "1" # [5,] "1.166325" "T" "0"
Example 3: Standard Rcpp, Create a Matrix with Random Values
Instead of creating a data frame or matrix from input vectors, we can also create them from scratch and fill them with random values. Below, we define function “f_crea_matrix_2” which takes as input the dimensions of the desired matrix, that is the number of rows “mat_rows” and columns “mat_cols”. Within the function we first create an empty matrix “Output_Mat”. Then, we fill the rows of matrix “Output_Mat” with randomly uniformly distributed values in the interval from 2 to 4 with function Rcpp::runif(mat_cols, 2, 4).
cppFunction(' Rcpp::NumericMatrix f_crea_matrix_2( int mat_rows, int mat_cols ) { // Create a matrix with the desired dimensions // As a default, all values are set to 0 Rcpp::NumericMatrix Output_Mat( mat_rows, mat_cols ); Rcout << Output_Mat << std::endl; // Row-wise fill the matrix columns by random uniformly distributed values for ( int i = 0; i < mat_rows; i++ ) { Output_Mat ( i, _ ) = Rcpp::runif(mat_cols, 2, 4); Rcout << Output_Mat << std::endl; } return Output_Mat; } ')
Within the function, we use the Rcout command to print intermediate results. Thereby, when we test the function – as shown below – we see that we first create a matrix with all entries set to zero and then row-wise fill this matrix by random uniformly distributed values.
set.seed(86) f_crea_matrix_2(2,6) # 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 # 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 # # 3.52155 2.98134 3.80470 2.43838 2.92906 3.76059 # 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 # # 3.52155 2.98134 3.80470 2.43838 2.92906 3.76059 # 2.66752 2.43232 3.47414 2.81947 2.44146 2.56754 # # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 3.521549 2.981339 3.804701 2.438383 2.929056 3.760588 # [2,] 2.667521 2.432323 3.474142 2.819472 2.441460 2.567537
Example 4: RcppArmadillo, Create a Matrix from Input Vectors
Now, let us turn to RcppArmadillo and see how to implement the above functions here; the documentation of Armadillo can be found here. The Armadillo library is for linear algebra. We therefore do not have a specific data frame format in Armadillo, for data frames in C++, see example 1. Below, we define function “f_crea_arma_matrix” to create an Armadillo matrix from three input vectors. Compared to example 2, we use only numeric input.
cppFunction(depends = "RcppArmadillo", ' arma::mat f_crea_arma_matrix( arma::vec Input_Vec1, arma::vec Input_Vec2, arma::vec Input_Vec3 ) { arma::mat Output_Mat( Input_Vec1.n_elem, 3); Output_Mat.col(0) = Input_Vec1; Output_Mat.col(1) = Input_Vec2; Output_Mat.col(2) = Input_Vec3; return Output_Mat; } ')
Let us test the function, plugging in a vector with random values from a standard normal, a poisson (\(\lambda=4\)), and a bernoulli (\(p=0.5\)) distribution.
n = 5 # Number of rows of the data frame set.seed(54) matrix_out <- f_crea_arma_matrix( rnorm(n), rpois(n, 4), rbinom(n, 1, 0.5) ) str(matrix_out) # num [1:5, 1:3] 1.884 0.495 -0.365 1.621 1.166 ... matrix_out # [,1] [,2] [,3] # [1,] 1.8837919 2 0 # [2,] 0.4945845 8 1 # [3,] -0.3646517 4 0 # [4,] 1.6206231 4 1 # [5,] 1.1663249 2 0
Video & Further Resources
In the matrices which we created, we used random values from different distributions. Have a look at our Statistics Globe YouTube video below to see some examples of how to plot values drawn from a normal distribution to get a better feeling about randomly generated values.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
On https://statisticsglobe.com/, we have many more R and Python posts:
- Speed Up Loop Using Rcpp Package in R (Example)
- Loop Over Rows & Columns of Rcpp Matrix in R (2 Examples)
- Introduction to the Rcpp Package in R (Examples)
- Learn R Programming (Tutorial & Examples)
In this post we showed you how to generate data frames and matrices in standard Rcpp and RcppArmadillo, both from input vectors and with values generated from certain distributions. For questions and comments, please use the following options.
This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter