2. RcppArmadillo
Introduction
Many standard statistical models/algorithms (e.g., linear regression, principal component analysis, …) require using numerical linear algebra routines, hence this section explains how to perform such computations efficiently using the RcppArmadillo
R
package. RcppArmadillo
provides an interface to the Armadillo
C++
numerical linear algebra library, which is highly useful in practice because it provides an interesting balance between performance and ease of use. The syntax of Armadillo
resembles that of Matlab
, which is not too different from R
’s syntax. Part of the performance speed-ups of Armadillo
, relative to base R
, are obtained via automatic pooling of several linear algebra operations into one at compile-time.
Motivating example 1: matrix-matrix-vector product
To demostrate the capabilities of Armadillo
, we start by considering a simple example on matrix multiplication. In particular, consider computing the product \({\bf A} {\bf B} {\bf y}\) where \({\bf A}\) and \({\bf B}\) are \(d \times d\) matrices and \(y\) is a \(d\)-dimensional vector. In R
we would compute the product simply by doing A%*%B%*%y
. Could we get some speed-ups by using Rcpp
? To verify this, we try to create a function that computes the matrix-matrix-vector product in Rcpp
:
library(Rcpp)
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export(name = "MMv_wrong")]]
NumericVector MMv_I(NumericMatrix A, NumericMatrix B, NumericVector y) {
return A * B * y;
}'
)
The problem with this function is that it is not equivalent to A%*%B%*%y
. In fact, it does not computes A%*%B%*%y
, but something quite different:
d <- 3
A <- matrix(1, d, d)
B <- matrix(1, d, d)
y <- 1:d
MMv_wrong(A, B, y)
## [1] 1.000000e+00 2.000000e+00 3.000000e+00 0.000000e+00 2.917744e-314
## [6] 4.672873e-310 4.672877e-310 4.672877e-310 1.482197e-323
In particular, A * B
in MMv_I
is equivalent to as.vector(A * B)
is R
, that is it computes the element-wise product of two matrices and then converts the output to a vector. Then one might expect that A * B * y
in MMv_I
should be equivalent to:
as.vector(A * B) * y
## [1] 1 2 3 1 2 3 1 2 3
in R
, but this is also wrong because, when multiplying vectors of different length, Rcpp
does not perform the recycling as R
does. For example:
addDiffLengh <- cppFunction("
NumericVector addDiffLengh(NumericVector x1, NumericVector x2) {
return x1 + x2;
}")
addDiffLengh(1:3, 1)
## [1] 2 2 3
addDiffLengh(1, 1:3)
## [1] 2
does not behave in the way we would expect in R
. In the first case we are adding 1
to the vector 1:3
, and the output shows that no recycling in going on (compare this with \(1:3 + 1\) in R
). In the second case, we are adding 1:3
to a vector of length 1
and Rcpp
does not do any recycling either.
The bottom line is that we should not expect Rcpp
code to behave like R
code when it comes to recycling, and that Rcpp
does not provide a direct equivalent of the %*%
operator for matrix-matrix and matrix-vector multiplication. Hence, when doing linear algebra in Rcpp
, we are better off using the methods offered by the RcppArmadillo
package, as done in the following function:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export(name = "MMv_arma")]]
arma::vec MMv_arma_I(arma::mat& A, arma::mat& B, arma::vec& y) {
return A * B * y;
}', verbose = TRUE)
##
## Generated extern "C" functions
## --------------------------------------------------------
##
##
## #include <Rcpp.h>
## #ifdef RCPP_USE_GLOBAL_ROSTREAM
## Rcpp::Rostream<true>& Rcpp::Rcout = Rcpp::Rcpp_cout_get();
## Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
## #endif
##
## // MMv_arma_I
## arma::vec MMv_arma_I(arma::mat& A, arma::mat& B, arma::vec& y);
## RcppExport SEXP sourceCpp_5_MMv_arma_I(SEXP ASEXP, SEXP BSEXP, SEXP ySEXP) {
## BEGIN_RCPP
## Rcpp::RObject rcpp_result_gen;
## Rcpp::RNGScope rcpp_rngScope_gen;
## Rcpp::traits::input_parameter< arma::mat& >::type A(ASEXP);
## Rcpp::traits::input_parameter< arma::mat& >::type B(BSEXP);
## Rcpp::traits::input_parameter< arma::vec& >::type y(ySEXP);
## rcpp_result_gen = Rcpp::wrap(MMv_arma_I(A, B, y));
## return rcpp_result_gen;
## END_RCPP
## }
##
## Generated R functions
## -------------------------------------------------------
##
## `.sourceCpp_5_DLLInfo` <- dyn.load('/tmp/RtmpywOfWB/sourceCpp-x86_64-pc-linux-gnu-1.0.10/sourcecpp_b65e236439e7/sourceCpp_6.so')
##
## MMv_arma <- Rcpp:::sourceCppFunction(function(A, B, y) {}, FALSE, `.sourceCpp_5_DLLInfo`, 'sourceCpp_5_MMv_arma_I')
##
## rm(`.sourceCpp_5_DLLInfo`)
##
## Building shared library
## --------------------------------------------------------
##
## DIR: /tmp/RtmpywOfWB/sourceCpp-x86_64-pc-linux-gnu-1.0.10/sourcecpp_b65e236439e7
##
## /opt/R/4.2.2/lib/R/bin/R CMD SHLIB -o 'sourceCpp_6.so' 'fileb65e728181f3.cpp'
Before illustrating the performance of this function, let’s explain what is going on in the code above:
// [[Rcpp::depends(RcppArmadillo)]]
is anRcpp
attribute that states that our code depends on theRcppArmadillo
package. The attribute makes so thatsourceCpp
will configure the build environment to compile and link against theRcppArmadillo
package.#include <RcppArmadillo.h>
includes the code fromRcppArmadillo
, note that we don’t need to includeRcpp.h
as well.arma::mat
andarma::vec
indicate matrices and vectors of doubles defined in thearma
namespace of the theArmadillo
library. Here these are passed by reference usingarma::mat&
andarma::vec&
, hence no copy is produced. From the text output ofsourceCpp
we see that the originalSEXP
objects are converted toarma::mat
objects via theRcpp::traits::input_parameter
mechanism. The latter is beyond the scope of this course, but the point is that the conversion fromR
toarma
objects is handled automatically byRcppArmadillo
. Conversions in the opposite direction are handled viaRcpp::wrap
.
Let us now compare our Armadillo
implementation with the plain R
version:
d <- 1e3
A <- matrix(rnorm(d^2), d, d)
B <- matrix(rnorm(d^2), d, d)
y <- rnorm(d)
max(abs(A %*% B %*% y - MMv_arma(A, B, y)))
## [1] 4.547474e-12
library(microbenchmark)
microbenchmark(R = A %*% B %*% y,
Arma = MMv_arma(A, B, y))
## Unit: microseconds
## expr min lq mean median uq max neval
## R 20495.2 20913.95 21430.818 21103.85 21466.55 25317.7 100
## Arma 329.8 394.30 484.931 489.90 542.30 1835.2 100
The performance gains are substantial! It might be tempting to conclude that the Armadillo
library is just so much better than R
default linear algebra routines, but we need to understand how this performance gain was achieved. To do so, consider the following comparison:
microbenchmark(R = A %*% (B %*% y),
Arma = MMv_arma(A, B, y))
## Unit: microseconds
## expr min lq mean median uq max neval
## R 1196.2 1213.45 1231.914 1219.30 1228.1 2045.4 100
## Arma 318.9 328.55 335.958 332.25 335.6 638.1 100
It seems that simply adding a pair of brackets made our R
code much more efficient, and closer to Armadillo
’s performance! The reason is simply that the code A %*% B %*% y
computes the matrix product first, which is a \(O(d^3)\) operation, and then multiplies that resulting matrix by the vector y
, which is a \(O(d^2)\) operation. Instead, A %*% (B %*% y)
computes two matrix-vector multiplications, thus avoiding the \(O(d^3)\) computation. The question is whether the speed-up we are getting from Armadillo
is due to the fact that this library is automatically computing the matrix-matrix-vector product using this more efficient order of operations. To verify this, we write the following:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export(name = "MMv_arma_slow")]]
arma::vec MMv_arma_I(arma::mat& A, arma::mat& B, arma::vec& y) {
arma::mat C = A * B;
return C * y;
}')
And we compare it with the inefficient R
implementation:
microbenchmark(R = A %*% B %*% y,
Arma_slow = MMv_arma_slow(A, B, y))
## Unit: milliseconds
## expr min lq mean median uq max neval
## R 18.8147 19.00590 20.74272 20.96715 21.2787 25.9407 100
## Arma_slow 16.7753 16.90205 18.10158 18.79905 19.0159 20.4416 100
Now the performance is comparable, hence the efficiency of the Armadillo
implementation is mostly due to the order in which the computations are performed. The fact that we can achieve similar performance using base R
and a pair of (well-placed) brackets might be disappointing, but note that it is remarkable that Armadillo
is able to automatically determine the order of operations so as to optimize efficiency. In reality, Armadillo
does more than just reordering operation, but it can combine several operations into one and reduce the need for temporary objects, via delayed evaluation. This can result in faster computation and lower memory usage, relative to naive evaluation. Delayed evaluation is achieved via template meta-programming, which is beyond the scope of this course. From a user point of view, what matters is that template meta-programming allows Armadillo
to reason about linear algebra expression at compile-time, with the aim of producing code that is tailored to each mathematical expression. See Armadillo
’s website for more details.
Before describing some of the main functions and structures provide by Armadillo
and RcppArmadillo
, we cover another basic example in the next section.
Motivating example 2: sums of matrices
In the previous section we considered the R
code A %*% B %*% y
and we showed how it can be made much more efficient simply by writing it as A %*% (B %*% y)
. Here we consider a simple linear algebra example, where improving the efficiency of a base R
implementation is not straightforward. In particular, consider calculating \({\bf A} + {\bf B} + {\bf C} + {\bf D} + {\bf E}\), where \({\bf A}, \dots, {\bf E}\) are \(d \times d\) matrices (the shape of the matrices is not important). A simple Armadillo
function for computing such a sum is:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export(name = "mSum_arma")]]
mat mSum_i(mat& A, mat& B, mat& C, mat& D, mat& E) {
return A + B + C + D + E;
}')
Let us compare it with with plain R
code:
A <- B <- C <- D <- E <- matrix(rnorm(1e6), 1e3, 1e3)
microbenchmark(R = Z <- A + B + C + D + E,
arma = Z <- mSum_arma(A, B, C, D, E))
## Unit: milliseconds
## expr min lq mean median uq max neval
## R 5.2717 7.40915 13.692592 8.93920 10.76755 58.4846 100
## arma 2.2029 2.55870 5.912126 4.49205 6.34125 52.4831 100
As we might have expected, the Armadillo
version of the sum is faster. How was this better performance achieved? Note that when evaluating the A + B + C + D + E
, R
will compute A + B
and store the result in a temporary object, which we call ApB
. Then, it will compute ApB + C
and store the temporary result in another matrix, and so on. Hence, the four matrix sums will result in the allocation of four matrices, three of these are temporary and will be discarded, the last one will be stored as Z
. An efficient implementation would avoid allocating memory for three temporary matrices, but would simply allocate memory for Z
and then do:
for(i in 1:d)
for(j in 1:d)
Z[i, j] <- A[i, j] + B[i, j] + C[i, j] + D[i, j] + E[i, j]
but doing few floating point operations within a nested loop is of course hopelessly inefficient in R
. The question is whether Armadillo
has been able to work out that this is the best way of framing of computation. To investigate whether the performance of Armadillo
could be attributed to such compile-time optimization, we write a version of mSum_arma
which explicitly creates temporary matrices:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export(name = "mSum_arma_slow")]]
mat mSum_i(mat& A, mat& B, mat& C, mat& D, mat& E) {
mat T1 = A + B;
mat T2 = T1 + C;
mat T3 = T2 + D;
return T3 + E;
}')
We then add this version to the comparison:
microbenchmark(R = Z <- A + B + C + D + E,
arma = Z <- mSum_arma(A, B, C, D, E),
arma_slow = Z <- mSum_arma_slow(A, B, C, D, E))
## Unit: milliseconds
## expr min lq mean median uq max neval
## R 5.0734 6.50125 10.415961 7.21800 8.45595 55.6120 100
## arma 1.8495 2.31500 6.775004 2.61480 4.19805 51.6637 100
## arma_slow 5.2380 6.61635 11.047487 8.88785 12.71660 61.1350 100
The performance of the second Armadillo
version is close to that of the R
code, hence we can conclude that the efficiency of the first Armadillo
version is attributable to the fact that it avoids creating temporary matrices.
To conclude this example, consider its extension to the problem of computing \({\bf A} + {\bf B} + \cdots\), where the number of matrices involved in the sum is arbitrary (that is, not fixed to five as above). If the matrices are contained in a list
, for example:
nmat <- 10
mats <- lapply(1:nmat, function(.nouse) matrix(rnorm(1e6), 1e3, 1e3))
then one compact way of calculating their sum in R
is Reduce("+", M)
. Can we modify our RcppArmadillo
code to handle any number of matrices? One way of doing this so that the compile-time efficiency tricks of Armadillo
are exploited is based on some text editing. In particular, consider the following function:
getMatSumFun <- function(n){
args <- paste0("mat& ", paste0("A", 1:n), ",", collapse = ' ')
args <- substr(args, 1, nchar(args)-1)
ret <- paste0(paste0("A", 1:n), " +", collapse = ' ')
ret <- substr(ret, 1, nchar(ret)-2)
funBody <- paste0("
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export(name = \"mSum_arma_list\")]]
mat mSum_i(", args, "){
return ", ret, ";
}")
return(funBody)
}
Given n
, this function creates the text for an RcppArmadillo
function which sums up n
matrices, in particular:
armaCode <- getMatSumFun(nmat)
cat(armaCode)
##
## // [[Rcpp::depends(RcppArmadillo)]]
## #include <RcppArmadillo.h>
## using namespace arma;
##
## // [[Rcpp::export(name = "mSum_arma_list")]]
## mat mSum_i(mat& A1, mat& A2, mat& A3, mat& A4, mat& A5, mat& A6, mat& A7, mat& A8, mat& A9, mat& A10){
## return A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10;
## }
Writing down the sum explicily guarantees that Armadillo
will avoid creating temporary matrices. We can now compare the R
and Armadillo
solutions:
sourceCpp(code = armaCode)
microbenchmark(R = Reduce("+", mats), arma = do.call("mSum_arma_list", mats))
## Unit: milliseconds
## expr min lq mean median uq max neval
## R 15.7547 16.34720 20.58238 17.55995 19.65135 69.2895 100
## arma 5.9786 6.42275 7.65936 6.77025 8.78680 11.4526 100
Basic RcppArmadillo
usage
Here we describe some commonly used features of RcppArmadillo
and Armadillo
. The Armadillo
library is quite extensive, hence we refer to its documentation for more details. The most commonly used object classes are:
Mat<type>
which represents dense matrices, with element types defined via the template argumenttype
. In the examples above we usedmat
which is an alias or typedef forMat<double>
.Col<type>
andRow<type>
which represent column and row vectors. Shortcuts forCol<double>
andRow<double>
arecolvec
(or justvec
) androwvec
.
Armadillo
provides also 3D matrices, sparse matrices and fields (matrices whose elements can be arbitrary objects, such as matrices), which we will not use here. In order to effectively use an Armadillo
object class, we need to know (or look in the documentation for):
- the constructor types available for such class. For example, a matrix of doubles
A
could be construted usingmat A(10, 5)
ormat A(10, 5, fill::zeros)
, for example. The difference being that in the first case the entries of the \(10 \times 5\) matrixA
are arbitrary, while in the second they are set to 0. - The member functions and variables of that class. These are accessed using
object.member_function()
orobject.variable
. For example, ify
is aCol<float>
vector, they.n_elem
gives its lengths whiley.ones()
sets all its elements to 1. The docs clearly explain which member functions are available for each object class. - How the elements of the objects can be accessed. This is documented as part of the “Member Functions & Variables”
section the
Armadillo
docs. An element \(ij\) of a matrixA
of classMat
can be accessed viaA(i-1,j-1)
, as indexing starts at zero. It can also be accessed viaA.at(i-1,j-1)
which is faster but more dangerous as no bounds check is performed (hence we could read/write outside the memory allocated toA
). - The available operators and functions. Operators are overloaded so if, for example,
A
andB
are matrices whiles
is a scalar, thenA * B
will calculate the standard matrix product whileA * s
will multiply each element ofA
bys
. Several element-wise functions are provided (e.g.,exp(A)
,sqrt(A)
, …), as well as standard decomposition (e.g, Cholesky, LU, …) and solvers for linear systems.
Some guidance on how standard linear algebra operations in R
can be translated to RcppArmadillo
is provided here. Before moving to a more involved example, we clarify the relation between Armadillo
objects and Rcpp
matrices and vectors. Conversion from Armadillo
object to Rcpp
object is performed using Rcpp::wrap
, for example:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export(name = "getIntMat")]]
Mat<int> tmpFun_i(int nr, int nc) {
Mat<int> MA(nr, nc);
Rcpp::IntegerMatrix MR = Rcpp::wrap(MA);
MR[0, 0] = 1;
return MA;
}')
getIntMat(3, 3)
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
Here we are creating a \(3 \times 3\) un-initialized integer matrix in Armadillo
and we are converting it to an Rcpp::IntegerMatrix
using wrap
. It is interesting to note that here Rcpp::wrap
is copying MA
, which is demostrated by the fact that changing the top-left element of MR
does not entail changing MA
. The following version avoids any copy:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export(name = "getIntMat_noCopy")]]
Rcpp::IntegerMatrix tmpFun_i(int nr, int nc) {
Rcpp::IntegerMatrix MR(nr, nc);
Mat<int> MA(MR.begin(), MR.nrow(), MR.ncol(), false);
MA(0, 0) = 1;
return MR;
}')
getIntMat_noCopy(3, 3)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 0 0
## [3,] 0 0 0
Let’s see how this worked step by step:
Rcpp::IntegerMatrix MR(nr, nc)
here we allocated memory in R for the matrixMR
;Mat<int> MA(MR.begin(), MR.nrow(), MR.ncol(), false)
here we used the advancedArmadillo
constructor, which constructs aMat<int>
matrixMA
by reusing the memory allocated toMR
. Hence no copy is made and changes inMA
leads to changes inMR
.- We return the
IntegerMatrix
MR
. Given that its memory is allocated inR
, no copy is made when the object is returned. In particular, recall thatsourceCpp
will create a wrapper around ourC++
function, and the wrapper will contain the lineRcpp::wrap(tmpFun_i(nr, nc))
, as you can verify by callingsourceCpp
withverbose = TRUE
. IngetIntMat
,Rcpp::wrap
was taking a copy, because the output oftmpFun_i
does not useR
memory, while this is avoided ingetIntMat_noCopy
.
In summary getIntMat
allocates memory outside R
via Mat<int> MA(nr, nc);
then takes one copy when we do Rcpp::wrap(MA)
and another copy when Rcpp::wrap
is called within the wrapper written by sourceCpp
. In contrast,
getIntMat_noCopy
allocates R
memory via Rcpp::IntegerMatrix MR(nr, nc);
and does no take any copy. This makes the second version much more efficient, in fact:
d <- 1e3
microbenchmark(copy = getIntMat(d, d), noCopy = getIntMat_noCopy(d, d))
## Unit: microseconds
## expr min lq mean median uq max neval
## copy 1314.8 1977.30 3622.988 3599.15 4959.35 7240.9 100
## noCopy 170.6 194.35 965.220 359.50 1505.05 3680.7 100
The bottom line is that, when we want to manipulate large matrices or vectors in RcppArmadillo
, it is often more efficient to:
- allocate memory in
R
and then pass the objects by reference (e.g., viamat&
,vec&
, …) to theRcpp
function or (b) to allocateR
memory directly inRcpp
via one of its data structure (e.g.,NumericMatrix
,NumericVector
, …);
- allocate memory in
- wrap the
R
object intoArmadillo
objects using an advanced constructor, such as the one we have see above; - manipulate the
Armadillo
objects using the operators and functions provided byArmadillo
; - return an
Rcpp
data structure to avoid copies being made byRcpp::wrap
.
The advanced constructor provides a way of converting Rcpp
objects into Armadillo
objects, without taking a copy. Often taking a copy is acceptable, in which case it is more convenient to perform the conversion using the Rcpp::as<type>()
function, where the template argument type
can be an Armadillo
object class such as Mat<int>
, Row<double>
and so on. An example is:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export(name = "matVettArma")]]
Col<double> tmpFun_i(Rcpp::NumericMatrix X, Rcpp::NumericVector y) {
Mat<double> Xa = Rcpp::as<Mat<double>>(X);
Col<double> ya = Rcpp::as<Col<double>>(y);
return Xa*ya;
}')
matVettArma(matrix(1, 3, 3), 1:3)
## [,1]
## [1,] 6
## [2,] 6
## [3,] 6
which performs matrix-vector multiplication in Armadillo
. Note that Rcpp::as
generally takes a copy of the object being converted.
References
Armadillo project: http://arma.sourceforge.net/
Conrad Sanderson and Ryan Curtin. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software, Vol. 1, pp. 26, 2016.
Dirk Eddelbuettel and Conrad Sanderson, “RcppArmadillo: Accelerating R with high-performance C++ linear algebra”, Computational Statistics and Data Analysis, 2014, 71, March, pages 1054- 1063, http://dx.doi.org/10.1016/j.csda.2013.02.005. )