Matrix Multiplication Using Rcpp Package in R (Example)

 

In this post, we show you how to do matrix multiplication using the Rcpp package in R. Rcpp helps to speed up our R code by integrating C++ code for specific code parts. For matrix multiplication, we show you how to write fast and easily understandable C++ code using RcppArmadillo; Armadillo is a special C++ library for linear algebra & scientific computing.

We cover the following blocks:

Let’s multiply some matrices!

 

RcppArmadillo Function for Matrix Multiplication

We load the Rcpp package to be able to use C++ code in R.

if (!require('Rcpp', quietly = TRUE)) { install.packages('Rcpp') } 
library('Rcpp') # Load package 'Rcpp'

As an example, we calculate the linear least squares solution of the fixed effects. That is, we have an \(n\times p\)-dimensional model matrix \(\boldsymbol{X}\) of full rank and an \(n\)-vector of observations \(\boldsymbol{y}\). The linear least squares estimator of the \(p\)-vector of fixed effects \({\boldsymbol{\beta}}\) is given by \(\hat{\boldsymbol{\beta}} = (\boldsymbol{X}^{\top} \boldsymbol{X})^{-1} \boldsymbol{X}^{\top} \boldsymbol{y} \).

In C++, we can simply write the following function called “f_LS_beta” to calculate \(\hat{\boldsymbol{\beta}}\). Note that we explicitly call the library RcppArmadillo.

cppFunction(depends = "RcppArmadillo", ' 
arma::vec f_LS_beta( arma::mat X, 
                     arma::vec y 
) {
  arma::vec beta = inv(X.t() * X) * X.t() * y;
  return beta;
}
')

You see that it is pretty straightforward to write code using RcppArmadillo because it is close to the way in which we write down the formulas for linear algebra.

A general overview of RcppArmadillo is given here, a great documentation of the different object types and functions available from Armadillo can be found here. Take a look at the documentation to see the different commands we used in the function above like X.t() to get the transpose of X.

 

Example

Now, let us create some example data to see whether the code does what we expect it to do. We generate some data according to a simple linear model: \(\boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \) with \(\boldsymbol{\epsilon} \sim N_n(0, diag(3)) \), where we choose \(n=1000\) and \(\boldsymbol{\beta}=(3,3,3,3,3)^{\top}\).

set.seed(90)                                         # Set a seed for reproducible results
n          <- 1000                                   # n=1000 observations
X          <- sapply(1:5, function (x) { rnorm(n) }) # Generate X from scratch
X_plus_int <- cbind(1, X)                            # Add an extra intercept column
beta_true  <- rep(3, ncol(X_plus_int))               # Choose some "true" beta values
y          <- apply(X_plus_int, 1,                   # Generate y from the linear model
                    function (x_k) { 
                      sum(x_k * beta_true) + rnorm(1, mean = 0, sd = 3) 
                    })

Let us test function “f_LS_beta” and, for comparison, also use R function “lm”.

f_LS_beta(X_plus_int, y)
#          [,1]
# [1,] 3.015985
# [2,] 2.719975
# [3,] 2.933818
# [4,] 2.789520
# [5,] 2.955876
# [6,] 2.930802
 
lm(y ~ X)
# Call:
# lm(formula = y ~ X)
# 
# Coefficients:
# (Intercept)           X1           X2           X3           X4           X5  
#       3.016        2.720        2.934        2.790        2.956        2.931

Not only do both functions give the same results, we can also see that the resulting estimated vector of fixed effects \(\hat{\boldsymbol{\beta}}\) is close to the true vector \({\boldsymbol{\beta}}\).

As a task for you: Try changing the number of observations and conduct a small simulation study to see whether the p-values which you get as output from summary(lm(y ~ X)) are realistic.

 

Runtime Comparison

You might ask yourself: If R functions for least squares already exist, why even bother to compute tasks like matrix multiplication in C++? Runtime is the answer! With the microbenchmark package, we can easily compare the runtime of the different least squares solutions. Function “lm” includes the calculation of different additional quantities. To have a fair comparison, we therefore additionally include plain R code for calculating \({\boldsymbol{\beta}}\).

library(microbenchmark)
microbenchmark::microbenchmark("lm"  = lm(y ~ X),
                               "R"   = solve(t(X_plus_int) %*% X_plus_int) %*% t(X_plus_int) %*% y,
                               "Cpp" = f_LS_beta(X_plus_int, y),
                               times = 10^3)
# Unit: microseconds
# expr   min     lq      mean median     uq    max neval
# lm   766.0 834.95 1040.6225 903.15 1086.9 8981.6  1000
# R    124.5 135.65  187.7120 151.90  169.4 6541.2  1000
# Cpp   44.4  46.50   58.4433  55.65   64.7  294.3  1000

Well that is a good reason to consider C++ in R isn’t it? The C++ code only needs 1/3 of the runtime compared to the R solution.

 

Use RcppArmadillo Operations with Rcpp Input

It may happen that within some Rcpp code where you use Rcpp objects, like Rcpp::NumericMatrix or Rcpp::NumericVector, you want to include some operations which are written for Armadillo objects like arma::mat and arma::vec.

Below we give you an example code where we first transform the Rcpp input into RcppArmadillo objects and at the end transform the RcpArmadillo object into an Rcpp object.

cppFunction(depends = "RcppArmadillo", ' 
Rcpp::NumericVector f_LS_beta_2( Rcpp::NumericMatrix X, 
                                 Rcpp::NumericVector y 
) {
 
  // Transform Rcpp to RcppArmadillo objects
  arma::mat X_arma = as<arma::mat>(X);
  arma::vec y_arma = as<arma::vec>(y);
 
  // Conduct RcppArmadillo operations
  arma::vec beta = inv(X_arma.t() * X_arma) * X_arma.t() * y_arma;
 
  // Transform ouput
  Rcpp::NumericVector beta_rcpp = as<Rcpp::NumericVector>(wrap(beta));
  return beta_rcpp;
}
')

Let us test the function with the same example data.

f_LS_beta_2(X_plus_int, y)
#          [,1]
# [1,] 3.015985
# [2,] 2.719975
# [3,] 2.933818
# [4,] 2.789520
# [5,] 2.955876
# [6,] 2.930802

Great, we get the same results.

 

Video & Further Resources

At first, it can be a bit difficult to get into coding in C++ via Rcpp and RcppArmadillo. We recommend you to have a look at our introduction blog entries for Rcpp here and RcppArmadillo here. Furthermore, for Rcpp take a look at this site and for Armadillo at this site.

In our example above, we generated data from a standard linear model and estimated the fixed effects of the model. To get additional information on linear models, we recommend you to have a look at the following YouTube video from the Stanford Online channel.

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

We have many more practical posts on R and Python on https://statisticsglobe.com/:

 

In this post, we showed you how to do matrix multiplications with C++ code in R using Rcpp and RcppArmadillo. For questions and comments, please use the following options.

 

Anna-Lena Wölwer Survey Statistician & R Programmer

This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top