1. Building larger Rcpp programs
Spreading Rcpp
code over multiple C++
files
So far in these notes, we have put all our Rcpp
code in a single .cpp
file or we have written the Rcpp
code directly in R
. An example of the latter approach is:
library(Rcpp)
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;
double mySquare(double x){
return x * x;
}
// [[Rcpp::export(name = "squareVect")]]
NumericVector squareVect_I(NumericVector v) {
NumericVector out( v.length() );
for(int ii = 0; ii < v.length(); ii++){
out[ii] = mySquare(v[ii]);
}
return out;
}')
where the squareVect_I
C++
function simply applies the mySquare
function to each element of the vector v
. When building larger programs, it is often convenient to spread the code over several source files, hence here we explain how to do it using header file.
Suppose that we have a source code file, squareVect.cpp
, containing the function that we want to export to R
:
#include <Rcpp.h>
#include "utilities.h"
using namespace Rcpp;
// [[Rcpp::export(name = "squareVect")]]
NumericVector squareVect_I(NumericVector v) {
NumericVector out( v.length() );
for(int ii = 0; ii < v.length(); ii++){
out[ii] = mySquare(v[ii]);
}
return out;
}
Notice that here we are including the utilities.h
header file, which contains:
#ifndef __UTILITIES__
#define __UTILITIES__
double mySquare(double x);
#endif // __UTILITIES__
Here #ifndef
, #define
and #endif
are directives that will be used by the preprocessor to do some code editing before compilation. In particular, we define __UTILITIES__
and then we add the #ifndef __UTILITIES__
directive to avoid including the same header file twice by mistake. The line double mySquare(double x);
is a declaration of the mySquare
function, simply stating its name, and the type of its inputs and output. The definition of the function is contained in utilities.cpp
:
#include "utilities.h"
double mySquare(double x){
return x * x;
}
Now we compile the code using sourceCpp
:
library(Rcpp)
sourceCpp(file = "squareVect.cpp")
squareVect(1:5)
## [1] 1 4 9 16 25
which seems to work fine.
Using Rcpp
extensions with sourceCpp
We have seen in a previous chapter that sourceCpp
and cppFunction
take care of the compiler flags and options needed to compile and link Rcpp
programs, while such flags need to be handled manually when using R CMD SHLIB
. For example, if we compile the following Rcpp
function:
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export(name = "exampleCpp")]]
NumericVector exampleCpp_I() {
NumericVector out(10);
return out;
}')
then R CMD SHLIB
will be use the option -I"some_directory/Rcpp/include"
when compiling, which allows the compiler to find Rcpp.h
. Similarly, when we use RcppArmadillo
in our Rcpp
code, as in this example:
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export(name = "MMv_arma")]]
arma::vec Mv_I(arma::mat& A, arma::vec& y) {
return A * y;
}')
we include the // [[Rcpp::depends(RcppArmadillo)]]
attribute, to make sure that the compiler options are set correctly to work with Armadillo
. In particular, in this case R CMD SHLIB
will need (among others) the option -I"some_directory/RcppArmadillo/include"
to find RcppArmadillo.h
as well as -llapack
and -lblas
to include the LAPACK
and BLAS
linear algebra libraries, which are used by Armadillo
. You can see this by setting verbose = TRUE
when calling sourceCpp
.
The point is that the Rcpp::depends
mechanism gives you access to lots of extensions that can be used within your Rcpp
code, without having to worry about the compilation process. In particular, in addition to RcppArmadillo
, you can use:
RcppEigen
which provides access to theEigen
linear algebra library;BH
which provides access to theBoost
C++
library;RcppGLS
for integration withGNU Scientific Library
(orGSL
) library of numerical routines;RcppMLPACK
for integration with theMLPACK
C++
machine learning library;RcppParallel
for parallel computation inRcpp
;- and several other packages…
For example, the following code use the RNG provided by the Boost
C++
library:
library(Rcpp)
sourceCpp(code = '
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <random>
#include <boost/random/normal_distribution.hpp>
using namespace Rcpp;
// [[Rcpp::export(name = "rnormBoost")]]
NumericVector rnorm_I(int n, int seed) {
// Set up Marsenned Twister RNG from the C++ Standard Library
// and initialize is at a given seed
std::mt19937 eng;
eng.seed(seed);
// Create a standard normal distribution using Boost
boost::normal_distribution<> normal(0.0, 1.0);
NumericVector out(n);
for(int ii = 0; ii < n; ii++){
// Pass the RNG to the normal distribution to sample
out[ii] = normal(eng);
}
return out;
}')
rnormBoost(5, 3)
## [1] -0.9541617 0.3435606 -1.1171672 -2.0588714 0.1801640
rnormBoost(6, 3)
## [1] -0.9541617 0.3435606 -1.1171672 -2.0588714 0.1801640 -0.1522290
rnormBoost(6, 4)
## [1] -0.3226455 0.4446485 1.2954095 1.6020953 -1.6903968 -0.2742874
The packages listed above are some of the packages on which your Rcpp
code can depend via the Rcpp::depends
attribute. Understanding exactly how the latter works is beyond the scope of this course, but it is useful to be aware of the fact that sourceCpp
will look for the information on the package(s) mentioned in Rcpp::depends
, which is needed to set up the compilation enviroment. For example, it will look for the path to the header file to be included, which can be found by doing:
system.file("include", package = "RcppArmadillo")
## [1] "/home/runner/work/_temp/Library/RcppArmadillo/include"
Further information can retrieved via the getPlugin
function, for example:
get("inlineCxxPlugin", asNamespace("RcppArmadillo"))()$env
## $PKG_LIBS
## [1] "$(SHLIB_OPENMP_CFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)"
##
## $PKG_CPPFLAGS
## [1] "-I../inst/include $(SHLIB_OPENMP_CFLAGS)"
##
## $USE_CXX11
## [1] "yes"
gives the additional RcppArmadillo
environment variables required to successfully compile code depending on Rcpp
. Again, we don’t need to know exactly how this works, but it is good to be aware of the fact that sourceCpp
is doing some work to set up the compilation environment. Other Rcpp
functions, such as cppFunction
, work similarly.
In this section we have seen how to you can spread your Rcpp
code over multiple files to build larger applications. However, a better approach to manage and distribute your Rcpp
code is building an R
package. The next section will explain how to do it.