I've been programming in R for a while now. Whenever I've had performance problems, it almost always is due to
I wrote this code to be particularly slow due to its
Rewriting the silly example above,
This time, it runs too quickly for any events to be recorded with the profiling. The execution time is 0.192 seconds, whereas the first version is 3.48 seconds. A pretty good speed-up.
data.frame usage. To check what's slowing down your R code, just use the Rprof command like so:I wrote this code to be particularly slow due to its
data.frame usage. You may view the results of the profiling by running R CMD Rprof summ.prof: As you can see, it's very slow. After profiling your own code, if you find that the top calls are
Each sample represents 0.02 seconds.
Total run time: 2.98 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.
% total % self
total seconds self seconds name
81.2 2.42 1.3 0.04 "[<-"
79.9 2.38 65.8 1.96 "[<-.data.frame"
18.1 0.54 10.1 0.30 "[.data.frame"
18.1 0.54 0.0 0.00 "["
12.1 0.36 1.3 0.04 "%in%"
11.4 0.34 9.4 0.28 "match"
4.7 0.14 4.0 0.12 "anyDuplicated"
2.0 0.06 2.0 0.06 "names"
2.0 0.06 2.0 0.06 "sys.call"
1.3 0.04 1.3 0.04 "=="
0.7 0.02 0.7 0.02 ".row_names_info"
0.7 0.02 0.7 0.02 "NROW"
0.7 0.02 0.7 0.02 "anyDuplicated.default"
0.7 0.02 0.7 0.02 "cos"
% self % total
self seconds total seconds name
65.8 1.96 79.9 2.38 "[<-.data.frame"
10.1 0.30 18.1 0.54 "[.data.frame"
9.4 0.28 11.4 0.34 "match"
4.0 0.12 4.7 0.14 "anyDuplicated"
2.0 0.06 2.0 0.06 "names"
2.0 0.06 2.0 0.06 "sys.call"
1.3 0.04 81.2 2.42 "[<-"
1.3 0.04 12.1 0.36 "%in%"
1.3 0.04 1.3 0.04 "=="
0.7 0.02 0.7 0.02 ".row_names_info"
0.7 0.02 0.7 0.02 "NROW"
0.7 0.02 0.7 0.02 "anyDuplicated.default"
0.7 0.02 0.7 0.02 "cos"
[.data.frame or [<-.data.frame, then you have a data.frame problem. Here's how I solve this, in order of things I try:- avoid loops, use vectorized code (no
forloops, noapply, nosapply). In the example, used[,1]<-, for assigning an entire column - Use numeric indices when possible. In our example, that means using
d[i, 1]instead ofd[i, "x"] - Get rid of the
data.framefor heavy calculations by using thedata.matrixcommand. In the example above, just used.matrix <- data.matrix(d)
Rewriting the silly example above,
This time, it runs too quickly for any events to be recorded with the profiling. The execution time is 0.192 seconds, whereas the first version is 3.48 seconds. A pretty good speed-up.