Matrix Multiplication Using Rcpp Package in R (Example)
In this post, we show you how to do matrix multiplication using the Rcpp package in R. Rcpp helps to speed up our R code by integrating C++ code for specific code parts. For matrix multiplication, we show you how to write fast and easily understandable C++ code using RcppArmadillo; Armadillo is a special C++ library for linear algebra & scientific computing.
We cover the following blocks:
Let’s multiply some matrices!
RcppArmadillo Function for Matrix Multiplication
We load the Rcpp package to be able to use C++ code in R.
if (!require('Rcpp', quietly = TRUE)) { install.packages('Rcpp') } library('Rcpp') # Load package 'Rcpp'
As an example, we calculate the linear least squares solution of the fixed effects. That is, we have an \(n\times p\)-dimensional model matrix \(\boldsymbol{X}\) of full rank and an \(n\)-vector of observations \(\boldsymbol{y}\). The linear least squares estimator of the \(p\)-vector of fixed effects \({\boldsymbol{\beta}}\) is given by \(\hat{\boldsymbol{\beta}} = (\boldsymbol{X}^{\top} \boldsymbol{X})^{-1} \boldsymbol{X}^{\top} \boldsymbol{y} \).
In C++, we can simply write the following function called “f_LS_beta” to calculate \(\hat{\boldsymbol{\beta}}\). Note that we explicitly call the library RcppArmadillo.
cppFunction(depends = "RcppArmadillo", ' arma::vec f_LS_beta( arma::mat X, arma::vec y ) { arma::vec beta = inv(X.t() * X) * X.t() * y; return beta; } ')
You see that it is pretty straightforward to write code using RcppArmadillo because it is close to the way in which we write down the formulas for linear algebra.
A general overview of RcppArmadillo is given here, a great documentation of the different object types and functions available from Armadillo can be found here. Take a look at the documentation to see the different commands we used in the function above like X.t() to get the transpose of X.
Example
Now, let us create some example data to see whether the code does what we expect it to do. We generate some data according to a simple linear model: \(\boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \) with \(\boldsymbol{\epsilon} \sim N_n(0, diag(3)) \), where we choose \(n=1000\) and \(\boldsymbol{\beta}=(3,3,3,3,3)^{\top}\).
set.seed(90) # Set a seed for reproducible results n <- 1000 # n=1000 observations X <- sapply(1:5, function (x) { rnorm(n) }) # Generate X from scratch X_plus_int <- cbind(1, X) # Add an extra intercept column beta_true <- rep(3, ncol(X_plus_int)) # Choose some "true" beta values y <- apply(X_plus_int, 1, # Generate y from the linear model function (x_k) { sum(x_k * beta_true) + rnorm(1, mean = 0, sd = 3) })
Let us test function “f_LS_beta” and, for comparison, also use R function “lm”.
f_LS_beta(X_plus_int, y) # [,1] # [1,] 3.015985 # [2,] 2.719975 # [3,] 2.933818 # [4,] 2.789520 # [5,] 2.955876 # [6,] 2.930802 lm(y ~ X) # Call: # lm(formula = y ~ X) # # Coefficients: # (Intercept) X1 X2 X3 X4 X5 # 3.016 2.720 2.934 2.790 2.956 2.931
Not only do both functions give the same results, we can also see that the resulting estimated vector of fixed effects \(\hat{\boldsymbol{\beta}}\) is close to the true vector \({\boldsymbol{\beta}}\).
As a task for you: Try changing the number of observations and conduct a small simulation study to see whether the p-values which you get as output from summary(lm(y ~ X)) are realistic.
Runtime Comparison
You might ask yourself: If R functions for least squares already exist, why even bother to compute tasks like matrix multiplication in C++? Runtime is the answer! With the microbenchmark package, we can easily compare the runtime of the different least squares solutions. Function “lm” includes the calculation of different additional quantities. To have a fair comparison, we therefore additionally include plain R code for calculating \({\boldsymbol{\beta}}\).
library(microbenchmark) microbenchmark::microbenchmark("lm" = lm(y ~ X), "R" = solve(t(X_plus_int) %*% X_plus_int) %*% t(X_plus_int) %*% y, "Cpp" = f_LS_beta(X_plus_int, y), times = 10^3) # Unit: microseconds # expr min lq mean median uq max neval # lm 766.0 834.95 1040.6225 903.15 1086.9 8981.6 1000 # R 124.5 135.65 187.7120 151.90 169.4 6541.2 1000 # Cpp 44.4 46.50 58.4433 55.65 64.7 294.3 1000
Well that is a good reason to consider C++ in R isn’t it? The C++ code only needs 1/3 of the runtime compared to the R solution.
Use RcppArmadillo Operations with Rcpp Input
It may happen that within some Rcpp code where you use Rcpp objects, like Rcpp::NumericMatrix or Rcpp::NumericVector, you want to include some operations which are written for Armadillo objects like arma::mat and arma::vec.
Below we give you an example code where we first transform the Rcpp input into RcppArmadillo objects and at the end transform the RcpArmadillo object into an Rcpp object.
cppFunction(depends = "RcppArmadillo", ' Rcpp::NumericVector f_LS_beta_2( Rcpp::NumericMatrix X, Rcpp::NumericVector y ) { // Transform Rcpp to RcppArmadillo objects arma::mat X_arma = as<arma::mat>(X); arma::vec y_arma = as<arma::vec>(y); // Conduct RcppArmadillo operations arma::vec beta = inv(X_arma.t() * X_arma) * X_arma.t() * y_arma; // Transform ouput Rcpp::NumericVector beta_rcpp = as<Rcpp::NumericVector>(wrap(beta)); return beta_rcpp; } ')
Let us test the function with the same example data.
f_LS_beta_2(X_plus_int, y) # [,1] # [1,] 3.015985 # [2,] 2.719975 # [3,] 2.933818 # [4,] 2.789520 # [5,] 2.955876 # [6,] 2.930802
Great, we get the same results.
Video & Further Resources
At first, it can be a bit difficult to get into coding in C++ via Rcpp and RcppArmadillo. We recommend you to have a look at our introduction blog entries for Rcpp here and RcppArmadillo here. Furthermore, for Rcpp take a look at this site and for Armadillo at this site.
In our example above, we generated data from a standard linear model and estimated the fixed effects of the model. To get additional information on linear models, we recommend you to have a look at the following YouTube video from the Stanford Online channel.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
We have many more practical posts on R and Python on https://statisticsglobe.com/:
- sugar Functions in Rcpp Package in R (Example)
- Loop Over Rows & Columns of Rcpp Matrix in R (2 Examples)
- Subset Rcpp Matrix by Logical Condition in R (4 Examples)
- Learn R Programming (Tutorial & Examples)
In this post, we showed you how to do matrix multiplications with C++ code in R using Rcpp and RcppArmadillo. For questions and comments, please use the following options.
This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.
Statistics Globe Newsletter