Woes of the rlang enabled tidyverse

programming
r
Author

TheCoatlessProfessor

Published

February 12, 2018

This essay was written in the StackOverflow chat for R and addressed to Hadley Wickham, whom was visiting the chatroom discussing critiques that had arisen.

The latter essay can be read here: Design hiccups of the tidyverse with rlang.


I respect the contributions you have made to the field of data science, statistical programming, and R language. Having said this, tidyeval and its syntax as applied to interfaces being directly promoted by RStudio end up making life harder for data analysts. As an example of this kind of push, consider the version of RStudio that only had an import data wizard using readr functions and not base R, which caused tibbles to be injected into the analysis.

But, the more important part is how the DSLs for data manipulation are evolving under tidyeval. To this end, let’s look at a tidyeval function written by Romain in the start of February – c.f. tweet. Within this function, the goal is to create multiple lags of the same variable. Is this able to be easily inferred from the contents of the function? How can a student much less an analyst read the code?

I think the major sticking point to this interface is the introduction of !!/!!!, quo/enquo, .x, overriden ~, and more has unintentionally increased the barrier of entry for generalizing a routine when compared to the “old” NSE _ approach. To exist within this new framework, there needs to be a much deeper understanding of R’s underlying system to describe the contexts for when these symbols must be used.

Add in the fact that there are inherent problems with the override of the existing negation operator !, c.f. Advanced R: Chapter 19 - Quasiquotation/Slicing an array, which is a side effect introduced within the tidyverse to solve a self-inflicted problem.

Under the present iteration of the tidyeval, I believe a step was taken backwards as the tidyverse team become far more fluent in R. In essence, the rungs of a ‘ladder of abstraction’ were shifted from concrete systems to working with abstracted artifacts.

This lead to the new design iteration lacking vision for what the simplest possible design could be, which caused a massive paradigm shift away from causal users to sophisticated package developers. Essentially, there lacks a clear intuition as to what guided the decisions in the design of this interface.

In short, the argument against tidyeval is:

“You cannot run before you learn to walk. Thus, we’ll first show you how to walk and, then, how to run before potentially teaching you how to fly.”

I say this because instead of trying to show users how to walk, tidyeval immediately skips to trying to teach users how to fly.