SC2

Statistical Computing 2

1. Building larger Rcpp programs


Spreading Rcpp code over multiple C++ files

So far in these notes, we have put all our Rcpp code in a single .cpp file or we have written the Rcpp code directly in R. An example of the latter approach is:

library(Rcpp)
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

double mySquare(double x){
  return x * x;
}

// [[Rcpp::export(name = "squareVect")]]
NumericVector squareVect_I(NumericVector v) {
  
  NumericVector out( v.length() );
  for(int ii = 0; ii < v.length(); ii++){
    out[ii] = mySquare(v[ii]);
  }
  return out;
  
}')

where the squareVect_I C++ function simply applies the mySquare function to each element of the vector v. When building larger programs, it is often convenient to spread the code over several source files, hence here we explain how to do it using header file.

Suppose that we have a source code file, squareVect.cpp, containing the function that we want to export to R:

#include <Rcpp.h>
#include "utilities.h"

using namespace Rcpp;

// [[Rcpp::export(name = "squareVect")]]
NumericVector squareVect_I(NumericVector v) {
  
  NumericVector out( v.length() );
  for(int ii = 0; ii < v.length(); ii++){
    out[ii] = mySquare(v[ii]);
  }
  return out;
  
}

Notice that here we are including the utilities.h header file, which contains:

#ifndef __UTILITIES__
#define __UTILITIES__

double mySquare(double x);

#endif // __UTILITIES__

Here #ifndef, #define and #endif are directives that will be used by the preprocessor to do some code editing before compilation. In particular, we define __UTILITIES__ and then we add the #ifndef __UTILITIES__ directive to avoid including the same header file twice by mistake. The line double mySquare(double x); is a declaration of the mySquare function, simply stating its name, and the type of its inputs and output. The definition of the function is contained in utilities.cpp:

#include "utilities.h"

double mySquare(double x){
  return x * x;
}

Now we compile the code using sourceCpp:

library(Rcpp)
sourceCpp(file = "squareVect.cpp")

squareVect(1:5) 
## [1]  1  4  9 16 25

which seems to work fine.

Using Rcpp extensions with sourceCpp

We have seen in a previous chapter that sourceCpp and cppFunction take care of the compiler flags and options needed to compile and link Rcpp programs, while such flags need to be handled manually when using R CMD SHLIB. For example, if we compile the following Rcpp function:

sourceCpp(code = '
 #include <Rcpp.h>
 using namespace Rcpp;

 // [[Rcpp::export(name = "exampleCpp")]]
 NumericVector exampleCpp_I() {
   NumericVector out(10);
   return out;
}')

then R CMD SHLIB will be use the option -I"some_directory/Rcpp/include" when compiling, which allows the compiler to find Rcpp.h. Similarly, when we use RcppArmadillo in our Rcpp code, as in this example:

sourceCpp(code = '
 // [[Rcpp::depends(RcppArmadillo)]]
 #include <RcppArmadillo.h>
 using namespace Rcpp;

 // [[Rcpp::export(name = "MMv_arma")]]
 arma::vec Mv_I(arma::mat& A, arma::vec& y) {
   return A * y;
}')

we include the // [[Rcpp::depends(RcppArmadillo)]] attribute, to make sure that the compiler options are set correctly to work with Armadillo. In particular, in this case R CMD SHLIB will need (among others) the option -I"some_directory/RcppArmadillo/include" to find RcppArmadillo.h as well as -llapack and -lblas to include the LAPACK and BLAS linear algebra libraries, which are used by Armadillo. You can see this by setting verbose = TRUE when calling sourceCpp.

The point is that the Rcpp::depends mechanism gives you access to lots of extensions that can be used within your Rcpp code, without having to worry about the compilation process. In particular, in addition to RcppArmadillo, you can use:

For example, the following code use the RNG provided by the Boost C++ library:

library(Rcpp)
sourceCpp(code = '
 // [[Rcpp::depends(BH)]]
 #include <Rcpp.h>
 #include <random>
 #include <boost/random/normal_distribution.hpp>
 using namespace Rcpp;

 // [[Rcpp::export(name = "rnormBoost")]]
 NumericVector rnorm_I(int n, int seed) {
 
  // Set up Marsenned Twister RNG from the C++ Standard Library
  // and initialize is at a given seed
  std::mt19937 eng;
  eng.seed(seed);
  
  // Create a standard normal distribution using Boost
  boost::normal_distribution<> normal(0.0, 1.0);
  
  NumericVector out(n);
  
  for(int ii = 0; ii < n; ii++){
   // Pass the RNG to the normal distribution to sample
   out[ii] = normal(eng); 
  }
  
  return out;

}')

rnormBoost(5, 3)
## [1] -0.9541617  0.3435606 -1.1171672 -2.0588714  0.1801640
rnormBoost(6, 3)
## [1] -0.9541617  0.3435606 -1.1171672 -2.0588714  0.1801640 -0.1522290
rnormBoost(6, 4)
## [1] -0.3226455  0.4446485  1.2954095  1.6020953 -1.6903968 -0.2742874

The packages listed above are some of the packages on which your Rcpp code can depend via the Rcpp::depends attribute. Understanding exactly how the latter works is beyond the scope of this course, but it is useful to be aware of the fact that sourceCpp will look for the information on the package(s) mentioned in Rcpp::depends, which is needed to set up the compilation enviroment. For example, it will look for the path to the header file to be included, which can be found by doing:

system.file("include", package = "RcppArmadillo")
## [1] "/home/runner/work/_temp/Library/RcppArmadillo/include"

Further information can retrieved via the getPlugin function, for example:

get("inlineCxxPlugin", asNamespace("RcppArmadillo"))()$env
## $PKG_LIBS
## [1] "$(SHLIB_OPENMP_CFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)"
## 
## $PKG_CPPFLAGS
## [1] "-I../inst/include $(SHLIB_OPENMP_CFLAGS)"
## 
## $USE_CXX11
## [1] "yes"

gives the additional RcppArmadillo environment variables required to successfully compile code depending on Rcpp. Again, we don’t need to know exactly how this works, but it is good to be aware of the fact that sourceCpp is doing some work to set up the compilation environment. Other Rcpp functions, such as cppFunction, work similarly.

In this section we have seen how to you can spread your Rcpp code over multiple files to build larger applications. However, a better approach to manage and distribute your Rcpp code is building an R package. The next section will explain how to do it.