RcppArmadillo Package in R (Example)

 

As a data scientist, operations that fall under linear algebra are an everyday part of the job. With package RcppArmadillo, the functionality of the rich C++ Armadillo library for linear algebra becomes available in the R programming language. We show you some examples of the functionality of RcppArmadillo in the following.

This post contains the following structure:

Let’s get right to it.

 

RcppArmadillo

R package Rcpp allows you to integrate C++ code into R, see our blog post. For C++, there exist various libraries. One particularly useful library for data scientists is the Armadillo library (Sanderson 2010) for linear algebra and scientific computing. Luckily, R package RcppArmadillo (Eddelbuettel, Sanderson 2014) allows us to use the rich functionality of Armadillo in R.

A further advantage of RcppArmadillo is that the syntax is pretty straightforward and similar to R. On the Armadillo webpage, you can find a documentation of the different Armadillo classes and functions. In the following, we make some examples to show you the functionality of the package.

 

Example 1: Generalized Least Squares

We load the Rcpp and RcppArmadillo package.

if (!require('Rcpp', quietly = TRUE)) { install.packages('Rcpp') } 
library('Rcpp') # Load package 'Rcpp'
 
if (!require('RcppArmadillo', quietly = TRUE)) { install.packages('RcppArmadillo') } 
library('RcppArmadillo') # Load package 'RcppArmadillo'

Let’s program the standard generalized least squares (GLS) solution of the fixed effects, given as \(\hat{\boldsymbol{\beta}} = (\boldsymbol{X}^{\top} \boldsymbol{V}^{-1} \boldsymbol{X})^{-1} \boldsymbol{X}^{\top} \boldsymbol{V}^{-1} \boldsymbol{y} \), where \(\hat{\boldsymbol{\beta}}\) is a \(p\)-vector of fixed effects, \(\boldsymbol{X}\) is a \(n\times p\)-dimensional model matrix of full rank, \(\boldsymbol{V}\) is a \(n\times n\)-covariance matrix, and \(\boldsymbol{y}\) is a \(n\)-vector of observations.

In R, the solution can be written as follows.

GLS_r <- function ( X, V, y) {
  V_inv <- solve(V)
  solve(t(X) %*% V_inv %*% X) %*% t(X) %*% V_inv %*% y
}

We can write a C++ function for it in R as follows.

cppFunction(depends = "RcppArmadillo",
' 
  arma::vec GLS_cpp( arma::mat X, arma::mat V, arma::vec y ) {
    arma::mat V_inv = inv(V); 
    arma::vec res   = inv(X.t() * V_inv * X) * X.t() * V_inv * y;
    return res;
  }
'
)

In the first row, you see that we explicitly include the RcppArmadillo dependency in the code. The Armadillo vectors and matrices are defined with the arma::mat or arma::vec expression. With these arma expressions, we can conduct the matrix calculations visible in function GLS_cpp(). When comparing the R and C++ code, we see that the code looks very similar. That’s a big advantage of RcppArmadillo, programming the matrix calculations is pretty straightforward and similar to R.

To illustrate the functions, we use the example application presented in the book Faraway (2004) Linear Models with R, pages 97-98. You can take a look at the book to learn more about the data. In brief, we use the Longley regression data and regress the number of employed persons in 1947-1962 (dependent variable) on the gross national product (GNP) and the number of persons 14 years and older (independent variables). Our model input is defined as follows.

X <- model.matrix(lm(Employed ~ GNP + Population, data = longley))
V <- diag(16)
V <- 0.31041^abs(row(V)-col(V))
y <- as.matrix(longley$Empl, ncol = 1)

We apply functions GLS_r() and GLS_cpp() to the data.

GLS_r  (X, V, y)
#                   [,1]
# (Intercept) 94.8988949
# GNP          0.0673895
# Population  -0.4742741
 
GLS_cpp(X, V, y)
#            [,1]
# [1,] 94.8988949
# [2,]  0.0673895
# [3,] -0.4742741

The functions return the GLS solution of \(\hat{\boldsymbol{\beta}}\). Note that usually, vectors like \(\hat{\boldsymbol{\beta}}\) are defined as column-vectors (like a matrix with only one column).

As always, when integrating C++ with R, we are interested in whether this will save us any computation time.

library(microbenchmark)
microbenchmark::microbenchmark("R"   = GLS_r  (X, V, y),
                               "Cpp" = GLS_cpp(X, V, y),
                               times = 10^{5})
 
# Unit: microseconds
# expr  min     lq     mean median     uq        max neval
#    R 53.2 56.500 73.56015 58.201 63.702 116693.400 1e+05
#  Cpp  7.9  8.601 12.16419 10.201 11.001   8617.502 1e+05

Take a look at the difference in the mean/median time of the 10^5 repetitions. It’s huge!

 

Example 2: Build a Block-Diagonal Matrix From List/Array

We make some more practical examples with RcppArmadillo.

As an exercise, assume we have a list or an array of covariance matrices and we want to transform these into a block-diagonal matrix. We generate the following example 2-by-2 matrices for a list of dimension \(D=3\).

V_list <- list()
D = 3
for (d in 1:D) {
  V_list[[d]] <- matrix(c(3, rep(0.6 * sqrt(3 * 4), 2), 4), 
                   ncol = 2, 
                   byrow = TRUE)
}
V_array <- array(unlist(V_list), dim = c(2, 2, D))

The last line of code transforms this list into an array of same dimension.

Now, in R we can use function bdiag() from the Matrix package which returns a sparse block-diagonal matrix of V_list.

Matrix::bdiag(V_list)
# 6 x 6 sparse Matrix of class "dgCMatrix"
# 
# [1,] 3.000000 2.078461 .        .        .        .       
# [2,] 2.078461 4.000000 .        .        .        .       
# [3,] .        .        3.000000 2.078461 .        .       
# [4,] .        .        2.078461 4.000000 .        .       
# [5,] .        .        .        .        3.000000 2.078461
# [6,] .        .        .        .        2.078461 4.000000

As an exercise, we want to define a C++ function which returns the same output.

cppFunction(depends = "RcppArmadillo",
            ' 
arma::mat cpp_bdiag(arma::cube V) {
 
  // From cube V, we get the number of slices, rows and columns
  int n_slices = V.n_slices;
  int n_rows   = V.n_rows;
  int n_cols   = V.n_cols;
  Rcout << "n_slices: " << n_slices << "" << std::endl;
  Rcout << "n_rows: "   << n_rows   << "" << std::endl;
  Rcout << "n_cols: "   << n_cols   << "" << std::endl;
 
  // Initialize the block-diagonal matrix V_bdiag
  arma::mat A = arma::mat(n_rows, n_cols);
  arma::mat V_bdiag = arma::repmat(A, n_slices, n_slices);
  Rcout << "A:\\n"       << A        << "" << std::endl;
  Rcout << "V_bdiag:\\n" << V_bdiag  << "" << std::endl;
 
  // Fill in the values of cube V into V_bdiag
  for (int i = 0; i < n_slices; ++i)
  {
    int a = 2 * i;
    int b = a + 1;
    V_bdiag.submat(a, a, b, b) = V.slice(i);
  }
  return V_bdiag;
}
'
)

In this exercise, we learn several things. Function cpp_bdiag() takes a arma::cube object as input. This corresponds to a 3-dimensional array in R. Command V.n_slices gives us the number of slices of cube V. Remember that in C++, we have to initialize all object before any loop. With arma::mat(n_rows, n_cols), we initialize an empty matrix with specified number of rows and columns.

With V_bdiag.submat(a, a, b, b), we index a sub-matrix of V_bdiag. The Rcout commands can be used to print intermediate results or complete objects in the console when conducting the function. It is very useful to make sure you know what is happening between the lines. The output of cpp_bdiag() is:

# n_slices: 3
# n_rows: 2
# n_cols: 2
# A:
#   0        0
#   0        0
# 
# V_bdiag:
#   0        0        0        0        0        0
#   0        0        0        0        0        0
#   0        0        0        0        0        0
#   0        0        0        0        0        0
#   0        0        0        0        0        0
#   0        0        0        0        0        0
# 
#          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
# [1,] 3.000000 2.078461 0.000000 0.000000 0.000000 0.000000
# [2,] 2.078461 4.000000 0.000000 0.000000 0.000000 0.000000
# [3,] 0.000000 0.000000 3.000000 2.078461 0.000000 0.000000
# [4,] 0.000000 0.000000 2.078461 4.000000 0.000000 0.000000
# [5,] 0.000000 0.000000 0.000000 0.000000 3.000000 2.078461
# [6,] 0.000000 0.000000 0.000000 0.000000 2.078461 4.000000

Function cpp_bdiag() returns a dense matrix. If you work with large matrices, using arma::sp_mat instead of arma::mat can be useful, it is for sparse matrices. Try it!

 

Example 3: Some More Function Examples for Matrices and Vectors

We want to make some additional examples to see how the functions and classes in RcppArmadillo work.

cppFunction(depends = "RcppArmadillo",
' 
int cpp_example3() {
 
  // Form a 2-by-2 matrix of 0s and fill some values 
  arma::mat Mat1 = arma::mat(2, 2);
  Mat1(0, 1) = 2;
  Mat1(1, 1) = 4;
  Rcout << "Mat1:\\n" << Mat1  << "" << std::endl;
  Rcout << "The diagonal of Mat1:\\n" << Mat1.diag()  << "" << std::endl;
  Rcout << "First column of Mat1:\\n" << Mat1.col(0)  << "" << std::endl;
 
  // Form a 2-vector of 0s
  arma::vec x = arma::vec(2);
  Rcout << "x:\\n" << x  << "" << std::endl;
 
  // Form a 2-vector of 1s
  arma::vec y = arma::ones<arma::vec>(2);
  Rcout << "y:\\n" << y << "" << std::endl;
  Rcout << "Return the unique elements of y: " << arma::unique(y) << "" << std::endl;
 
  // Form a 2-vector of uniform random values
  arma::vec z = arma::vec(2, arma::fill::randu);
  Rcout << "z:\\n" << z  << "" << std::endl;
 
  return 0;
}
'
)

Function cpp_example3() returns the results.

set.seed(3)
cpp_example3()
# Mat1:
#   0   2.0000
#   0   4.0000
# 
# The diagonal of Mat1:
#   0
#   4.0000
# 
# First column of Mat1:
#   0
#   0
# 
# x:
#   0
#   0
# 
# y:
#   1.0000
#   1.0000
# 
# Return the unique elements of y:    1.0000
# 
# z:
#   0.1680
#   0.8075
# 
# [1] 0

We see different versions of initializing matrices or vectors, e.g. with random numbers. Again, the Rcout command comes in handy to show us what the return of the single functions is.

 

Video & Further Resources

A list of the RcppArmadillo functions and classes is given here. For general information on Rcpp, we also recommend you to take a look at the official webpage.

Furthermore, take a look at the following tutorial by Dirk Eddelbuettel held at the virtual useR! 2020 conference, which was published on the R Consortium Youtube channel.

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

We have some more blog posts which could be of interest to you.

 

We presented you some examples of R package RcppArmadillo which integrates C++ library Armadillo into R. If you have questions or want to comment on the port, let us know below.

 

Anna-Lena Wölwer Survey Statistician & R Programmer

This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top