

The most notable thing is that gtools manages to claw back time (suggesting some initial overhead penalty), although it still lags the other methods. The ratios between the different methods are very close to the small data benchmarks. Reassuringly, everything stays pretty much the same from a rankings perspective.

Note that I’m dropping the slowest methods (because I’m not a masochist) and this also means that I won’t need to log-transform the x-axis anymore. Without further ado, here are the results. In other words, the resulting long-form dataset is 100 million rows deep. So, for this next set of benchmarks, I’ve scaled up to the data by two orders of magnitude: Now we want to reshape a 100,000 by 1,002 dataset from wide to long. The long-form dataset is “only” 1 million rows deep and the fastest methods complete in only a few milliseconds. Large(ish) dataĪnother thing to ponder is whether the results are sensitive to the relatively small size of the test data. Certainly much faster than the Stata equivalent. (We’ll test again in a moment on a larger dataset.) Adding options to gtools yields a fairly modest if noticeable difference, while the base R reshape() command doesn’t totally discrace itself. Interestingly enough, my two tidyr benchmarks seemed to have shuffled slightly this time around, but that’s only to be expected for very quick operations like this. However, the newly-added DataFrames (Julia) and pandas (Python) implementations certainly put in a good shout, coming in second and third, respectively. Once more, we see that data.table rules the roost. Here are the results and I’ll remind you that the x-axis has been log-transformed to handle scaling. Our first task will be to reshape the same (sparse) 1,000 by 1,002 dataset from wide to long. I’ll divide the results into two sections. Stata: greshape with the “dropmiss” and “nochecks” arguments added.… the additional benchmarks that we’ll be considering today are: Stata: reshape, sreshape (shreshape), and greshape (gtools).R: data.table::melt and tidyr::pivot_longer.So, alongside the main methods from last time… 1 However, I’m happy to put these additional benchmarks in a new blog post here. I’ve been hesitant to add these as an update, since I didn’t want to distract from the major point I was trying to make in that previous post. Various people have asked me to add some additional benchmarks to my data reshaping post from earlier this week.
