# Create a test matrix
<- matrix(1:4, nrow = 2)) (test_mat
[,1] [,2]
[1,] 1 3
[2,] 2 4
# Compute the result using base R implementation
<- rowSums(test_mat)) (r_base
[1] 4 6
TheCoatlessProfessor
December 18, 2016
In the olden days of Rcpp, circa 2012 or before Rcpp 0.10.0, there was a bees knees approach to inlining C++ code with R using a package called inline
. Fast forward to when Rcpp 0.10.0 was released on November 12, 2012 and the world of Rcpp changed forever. Now, you may be thinking that I’m being a little overdramatic here. However, that is not the case at all. The pinnacle contribution of Rcpp Attributes made by J.J. Allaire changed everything. Everyday statisticians could harness the power of C++ from within R with very little knowledge of C++ as illustrated by his Rcpp Bay Area talk. While the end result between the two methods is the same, the journey to the end is remarkably different. The differences will be shown are considerable in magnitude. Therefore, we are brought to the pinnacle question of this post:
If a great algorithm exists and no one uses it, does it really exist?
From experience, the answer is resounding no and, thus, the origins of this post to hopefully discourage the use of inline
in favor of more modern techniques offered by Rcpp for inlining C++ code.
For many, the formal introduction to Rcpp happens through Dirk Eddelbuettel’s Seamless R and C++ Integration with Rcpp (free for University students with a membership to Springer, otherwise $54.99), which unfortunately makes heavy use of the inline
package. Indeed, the introductory fibonacci example to show how to embed C++ code within R makes direct use of inline
. The use of inline
would not be very problematic but it increases the barrier of entry for those just beginning their journey marrying R code with C++. In particular, to inline code with the inline
package one must:
cxxfunction()
or cfunction()
to a variableAs one can imagine, the approach yields several areas that draw away attention from the main task: improving the speed of computational intensive code. In essence, to accomplish the inlining task, one is forced to have knowledge of a smorgasbord of programming techniques.
Now, the contribution of Rcpp Attributes made by J.J. Allaire alleivated all of the above points in one swoop. The process of inlining code became remarkably straightforward to the point that you can now press a button in the RStudio IDE and voila the code is inlined! Simply put, the focus is now on the act of translating or writing computationally intensive code and not on managing it. In essence, Rcpp Attributes kept the syntactical features associated with functions and automated the previously manual process by introducing cppFunction()
and sourceCpp()
. The prior maps more closely to the intent of inline
while the later provides a more familiar route for seasoned programmers.
To investigate the properties of inline code, we’ll opt to recreate the rowSums()
function within R that takes a matrix of \(m \times n\) dimensions and reduces it to \(m\) by summing over the \(n\) elements within a row. Mathematically, the operation can be represented by the following series:
\[\sum_{j = 1}^n x_{ij}\]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[1] 4 6
Before talking about C++ code, we begin by displaying the implementation for row summation within R. Please note, the implementation is not optimized to take advantage of vectorization simply to ease the translation effort to C++. So, the resulting R implementation is given as:
Running the necessary test checks, yields:
inline
use via cxxfunction()
Before we get to the use of Rcpp attributes, we need to take a trip down memory lane or more pressingly using apriori knowledge of the inline
package process to compile code. We’ll follow the points outlined in the previous section that compared both approaches. However, this does not mean that there is any level of endorsement toward writing code with this approach. In fact, the take away message from this section should resoundingly be:
DO NOT USE inline
FOR C++ CODE!
With this being said, we begin. We start the process off by translating the code from R to C++ using Rcpp syntax:
Moving along, the above code must be morphed so that x
can receive the input value from R and be able to send the computed result back into R. In essence, a faux functional body declaration must be crafted. The results of which are a bit weird to say the least.
# Create a faux functional C++ body within R
src <- '
// Convert SEXP to NumericMatrix
Rcpp::NumericMatrix x(x_in); // x_in comes from R
// Proceed with algorithm like normal
Rcpp::NumericVector row_sums(x.nrow());
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
return row_sums; // returns the value to R
'
Notice that the last code snippet was done entirely as a string in R.
Once the function body is made, we must create the necessary signature input alignment for what type should be passed into the function. In this particular case, the function is expecting to receive a numeric
matrix and, thus, we must specify the signature of the object as numeric
.
Having all of the above components completed, we can now proceed to actually compiling the code. The compliation step is where the name of the function is derived. So, in terms of the case study, the name of row_sum_inline
is what will be used to call the function later on.
Once the compilation is done, usually takes about 4-5 seconds, we now have a C++ function within R.
Unlike inline
, Rcpp equivalents are far more modern and straight forward to use. Most notably, once a function has been written, it is able to be immediately put inline using either cppFunction()
or sourceCpp()
. The prior option more closely matches what is available within inline
with respect to defining a single operable function while the later option enables users more familiar with C++ to breakaway from embedding code as text within an R file but instead treat it as a standalone C++ file. Regardless of the choice, the end result is cleaner code that is focused principally on the computation rather than on the setup and management of the C++ function.
Unlike the prior option, if you blink, then you may miss the magic of attributes.
To begin, write a proper C++ functional form of rowSums()
.
Using cppFunction()
, enclose the above function in strings and run it!
That’s it. Ta-da! The function name given in C++ is exactly the same name one should call within R.
[1] 4 6
[1] TRUE
The addition of attributes also allowed for a true separation of concerns model to take place within marrying R code to C++. That is, there could be a distinct area for exploration (R) and for high performance computing (C++) as files would not need to be marred with a mixture of both. By opting for this approach, the gain also is in terms of appropriate syntax highlighting as the code is no longer embedded within a string.
As hinted at earlier, to achieve such a feat one would use sourceCpp()
. There are only four major differences between the aforementioned approach with cppFunction()
.
Rcpp.h
header file.// [[Rcpp::export]]
placed immediately preceeding the function.Rcpp::
as the namespace for Rcpp is not automatically loaded.
using namespace Rcpp;
, however, this is not ideal as it introduces ambiguity into function calls.Returning to the case study, create a file called row_sum_ex.cpp and embed the row sum formulation previously concocted:
#include <Rcpp.h> // include R header library
// Rcpp attribute tag require to create interface to R.
// [[Rcpp::export]]
Rcpp::NumericVector row_sum_rcpp_ext(Rcpp::NumericMatrix x){
Rcpp::NumericVector row_sums(x.nrow());
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
return row_sums;
}
Next, lets compile the row_sum_ex.cpp by using sourceCpp()
function.
If we skip this step, then when go to call row_sum_rcpp_ext
in R the function will not be available as the code has yet to be: 1. compiled and 2. made available to the R session.
To make sure everything still works as expect, let’s do one last verification check:
In essence, many things have changed since the introduction of Rcpp Attributes. One crucial turning point is related to the focus afforded by Rcpp Attributes being placed more on the actual calculation than keeping track of inputs or casting objects. While inline
is not ideal, the above guide gently provided some transitionary remarks to use more appropriate inlining techniques provided by Rcpp.