1. Rcpp sugar
In a previous chapter we explained how Rcpp
allows users to interface R
with C++
in a convenient way. Here we cover Rcpp sugar
, which consists of a set of functions and operators which make the C++
code written using Rcpp
behave similarly to R
code. In other words, Rcpp sugar
allows us to write C++
code which looks similar to its R
equivalent, but often more efficient.
This document follows closely the official Rcpp sugar vignette.
Vectorized operators and sugar versions of basic R
functions
Consider the following Rcpp
code for summing two vectors:
library(Rcpp)
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export(name = "vsum")]]
NumericVector vsum_I(const NumericVector x1, const NumericVector x2)
{
int ni = x1.size();
NumericVector out(ni);
for(int ii = 0; ii < ni; ii++){
out[ii] = x1[ii] + x2[ii];
}
return out;
}')
First of all, let’s check whether it works:
d <- 1e5
x1 <- rnorm(d)
x2 <- rnorm(d)
y <- vsum(x1, x2)
max( abs(y - (x1+x2)) )
## [1] 0
It seems so. The Rcpp
code above is quite lengthy: do we really need to write a for
loop to simply add two vectors together? Thanks to Rcpp sugar
, we don’t have to. In particular, Rcpp sugar
uses operator overloading to vectorize many operations involving vectors represented using the Rcpp
vector classes (NumericVector
, IntegerVector
, …). Hence, we can write vector addition in a much more compact form:
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export(name = "vsumVett")]]
NumericVector vsum_I(const NumericVector x1, const NumericVector x2)
{
return x1 + x2;
}')
Let’s see whether this works:
y <- vsumVett(x1, x2)
max( abs(y - (x1+x2)) )
## [1] 0
It does! All four basic arithmetic operations +
, -
, *
and /
are vectorized, thanks to Rcpp sugar
. Vector-scalar operations are also vectorized, for instance we can do \(x1 * 3.0 + 1.0\), which gives the same result as in R
.
However, note that writing such simple arithmetic expression in Rcpp
will not give you a big performance gain in most cases, for instance:
library(microbenchmark)
microbenchmark(R = x1 + x2, Rcpp = vsum(x1, x2), RcppSugar = vsumVett(x1, x2))
## Unit: microseconds
## expr min lq mean median uq max neval
## R 127.5 426.75 521.981 431.75 440.45 3932.0 100
## Rcpp 159.5 513.10 636.968 519.30 528.20 13796.9 100
## RcppSugar 129.6 431.75 482.644 436.80 446.30 5428.8 100
This is because basic arithmetic operations in R
are computed using internal code, which is written in C
or C++
. Calling C++
from R
has also some overheads, so the C++
code can actually become much slower than the R
version, for instance:
d <- 1e2
x1 <- rnorm(d)
x2 <- rnorm(d)
microbenchmark(R = x1 + x2, Rcpp = vsum(x1, x2), RcppSugar = vsumVett(x1, x2))
## Unit: nanoseconds
## expr min lq mean median uq max neval
## R 100 300 427 400 400 4000 100
## Rcpp 2300 2600 2879 2800 3000 6100 100
## RcppSugar 2200 2500 2840 2700 2900 12100 100
Here Rcpp
is much slower than R
, because the computational effort is limited (the vectors x1
and x2
are short) and it is dominated by the cost of calling C++
from R
. Closing this brief digression on the dangers of using Rcpp
without a clear idea of where the performance gains will come from, we go back to Rcpp sugar
.
Rcpp sugar
provides vectorized versions of the main logical operators >
, <
, >=
, <=
, ==
, &
, |
and !
. For instance, we can define the function:
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export(name = "inRange")]]
LogicalVector inRange_I(const NumericVector x, const NumericVector u, const NumericVector l)
{
return x > l & x < u;
}')
which checks which elements of x
falls in the range defined by l
and u
. Example usage:
set.seed(5151)
n <- 5
u <- rep(0.8, n)
l <- rep(0.2, n)
x <- runif(n)
data.frame(x = x, inside = inRange(x, u, l))
## x inside
## 1 0.78699386 TRUE
## 2 0.07425964 FALSE
## 3 0.93492781 FALSE
## 4 0.22155905 TRUE
## 5 0.47941927 TRUE
As in base R
, some operators can be used also in a “unary” sense, for example !x
will negate a LogicalVector
and -x
will change the sign of a NumericVector
or IntegerVector
.
Rcpp sugar
also provides Rcpp
versions of many basic R
functions, including:
- logical operations such as
ifelse
,all
,any
, …; - arithmetic operation such as
sign
,sqrt
,diff
,cumsum
, …; - special functions such as
gamma
,beta
,choose
, …; - stat summaries such as
mean
,median
,range
, …; - matrix operations such as
colSums
,diag
, …; - statistical distribution such as
dnorm
,pgamma
, …; - …
See here for a full list. This makes working with Rcpp
very easy, if you are already familiar with R
. For example, we can do:
cppFunction('
List exampleSugar(NumericVector x) {
NumericVector x1 = cumsum(x);
LogicalVector x2 = any(is_na(x));
NumericMatrix x3 = cbind(x1, x1);
List out = List::create(x1, x2, x3);
return out;
}')
exampleSugar( c(1, 0, -2, NA) )
## [[1]]
## [1] 1 1 -1 NA
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [,1] [,2]
## [1,] 1 1
## [2,] 1 1
## [3,] -1 -1
## [4,] NA NA
which gives us the same results we would get in base R
.
Random number generation
Rcpp sugar
provides C++
level access to the random number generator (RNG) available in R
. For example, we can do:
cppFunction('
NumericVector rbeta_rcpp(const int n, const int a, const int b) {
return rbeta(n, a, b);
}')
hist(rbeta_rcpp(1e3, 1, 3))
Given that Rcpp
is using R
’s RNG, setting the seed within the R
session will make the results reproducible and identical to what we would get with base R
:
set.seed(321)
rbeta_rcpp(5, 1, 3)
## [1] 0.5159567 0.3422122 0.2875003 0.2889146 0.1782904
set.seed(321)
rbeta(5, 1, 3)
## [1] 0.5159567 0.3422122 0.2875003 0.2889146 0.1782904
cppFunction
adopts the Rcpp
namespace by default hence, within rbeta_rcpp
, the line rbeta(n, a, b)
is equivalent to Rcpp::rbeta(n, a, b)
. Rcpp::rbeta
outputs a NumericVector
of length n
. If, instead, we want to simulate random variables one at the time, we can use functions defined in the R
namespace. For example:
cppFunction('
NumericVector rbeta_R(const int n, const int a, const int b) {
NumericVector out(n);
for(int ii = 0; ii < n; ii++) out[ii] = R::rbeta(a, b);
return out;
}')
set.seed(321)
rbeta_R(5, 1, 3)
## [1] 0.5159567 0.3422122 0.2875003 0.2889146 0.1782904
Note that R::rbeta
returns a double
, not a NumericVector
. See here for a complete list of the statistical distribution available via Rcpp sugar
.
As explained here and in the Writing R Extensions documentation, to use R
’s internal RNG in C/C++
we need to call GetRNGState
before using the RNG and PutRNGState
before returning to R
. The first call reads the state of the RNG (see ?.Random.seed
), the second updates it. When we use R
’s RNG within functions exported via the // [[Rcpp::export]]
attribute we don’t have to call GetRNGState
or PutRNGState
, because Rcpp
does it for us in the automatically generated C++
wrapper function. The // [[Rcpp::export]]
attribute is used automatically by cppFunction
and, in fact, if we recompile a previous example:
cppFunction('
NumericVector rbeta_rcpp(const int n, const int a, const int b) {
return rbeta(n, a, b);
}', rebuild = TRUE, verbose = TRUE)
##
## Generated code for function definition:
## --------------------------------------------------------
##
## #include <Rcpp.h>
##
## using namespace Rcpp;
##
## // [[Rcpp::export]]
##
## NumericVector rbeta_rcpp(const int n, const int a, const int b) {
## return rbeta(n, a, b);
## }
##
## Generated extern "C" functions
## --------------------------------------------------------
##
##
## #include <Rcpp.h>
## #ifdef RCPP_USE_GLOBAL_ROSTREAM
## Rcpp::Rostream<true>& Rcpp::Rcout = Rcpp::Rcpp_cout_get();
## Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
## #endif
##
## // rbeta_rcpp
## NumericVector rbeta_rcpp(const int n, const int a, const int b);
## RcppExport SEXP sourceCpp_9_rbeta_rcpp(SEXP nSEXP, SEXP aSEXP, SEXP bSEXP) {
## BEGIN_RCPP
## Rcpp::RObject rcpp_result_gen;
## Rcpp::RNGScope rcpp_rngScope_gen;
## Rcpp::traits::input_parameter< const int >::type n(nSEXP);
## Rcpp::traits::input_parameter< const int >::type a(aSEXP);
## Rcpp::traits::input_parameter< const int >::type b(bSEXP);
## rcpp_result_gen = Rcpp::wrap(rbeta_rcpp(n, a, b));
## return rcpp_result_gen;
## END_RCPP
## }
##
## Generated R functions
## -------------------------------------------------------
##
## `.sourceCpp_9_DLLInfo` <- dyn.load('/tmp/Rtmp358fok/sourceCpp-x86_64-pc-linux-gnu-1.0.10/sourcecpp_b56220a9763/sourceCpp_13.so')
##
## rbeta_rcpp <- Rcpp:::sourceCppFunction(function(n, a, b) {}, FALSE, `.sourceCpp_9_DLLInfo`, 'sourceCpp_9_rbeta_rcpp')
##
## rm(`.sourceCpp_9_DLLInfo`)
##
## Building shared library
## --------------------------------------------------------
##
## DIR: /tmp/Rtmp358fok/sourceCpp-x86_64-pc-linux-gnu-1.0.10/sourcecpp_b56220a9763
##
## /opt/R/4.2.2/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_13.so' 'fileb5623898f60b.cpp'
we see that the C++
wrapper contains the line Rcpp::RNGScope rcpp_rngScope_gen;
which internally calls GetRNGState
. PutRNGState
will be called automatically when exiting the C++
wrapper function.
One case when we need to worry about the reading and writing the R
’s RNG state, is when we want to do things manually, that is without using the // [[Rcpp::export]]
attribute. To provide an example, define the function:
#include <Rcpp.h>
using namespace Rcpp;
RcppExport SEXP rnorm_manual(SEXP n) {
NumericVector out( as<int>(n) );
RNGScope rngScope;
out = rnorm(as<int>(n), 0.0, 1.0);
return out;
}
in the “rnorm_manual.cpp” file, load it and test it:
system("export PKG_CXXFLAGS=`Rscript -e \"Rcpp:::CxxFlags()\"` &&
R CMD SHLIB rnorm_manual.cpp")
dyn.load("rnorm_manual.so")
set.seed(31)
.Call("rnorm_manual", 3)
## [1] 0.05557024 -0.18423859 1.59576183
set.seed(31)
rnorm(3)
## [1] 0.05557024 -0.18423859 1.59576183
Here we needed to create an RNGScope
object, before using the RNG via rnorm
. IMPORTANTLY the RNGScope
must be declared AFTER declaring any of the outputs of the function (the vector out
in this case). Failing to do so, for example declaring an output after the RNGScope
object, will lead to very annoying intermittent crashes. Hence, it is generally preferable to use the Rcpp::export
attribute, which deals with the RNG state automatically.