Carlos Javier Gil Bellosta
2004-Sep-12 22:17 UTC
[R] write.table performance: an alternative?
Dear R's, I have been using R lately to perform some statistical analysis and, based on them, simulations to be exported in flat text files to other programs. These text files are nowadays of about 30MB in size, but they could finally be of up to 300MB. Writing these files with either write.table or write.matrix was desperately slow and the bottleneck of the whole process. Besides, the it took too much memory and sometimes I experienced heavy paging. So I decided to find a better way to export my R tables. Since they contained floating numbers only, in order to avoid the internal transformation into character values (both write.table and write.matrix seem to be doing it), compiling ////////////////////////// Program Start ////////////////////////// #include <stdio.h> #include <stdlib.h> void salidaOptimizada(int* l_fila, int* n_columnas, double* vector_resultados){ int i; int j; FILE* f = fopen("datosPorPeriodo.dat", "w"); for(i=0; i < *n_columnas; i++){ for(j=0; j < *l_fila; j++){ fprintf(f, " %3f", *vector_resultados); vector_resultados++; } fprintf(f, "\n"); } fclose(f); } ////////////////////// Program End //////////////////// as a shared library and linking it to my code, and invoking it with the .C function would do the trick for me. The performance gains were enormous respect to write.table(). So I decided to look for a greater degree of generality and wrote a simple C function (enclosed at the end of the message) that would accept character, integer and floating point values. It can be tested, for instance, running both /////////////// Program Start ///////////////// a1 <- rnorm(1000000) a2 <- floor(a1) a3 <- as.character(1:1000000) a <- data.frame(a1, a2, a3) Rprof() write.table(a, "salidaNoOptimizada.dat") Rprof(NULL) summaryRprof() ////////////////////// Program End ///////////////// and ///////////////////// Program Start ///////////////// dyn.load("liboptio.so") a1 <- rnorm(1000000) a2 <- floor(a1) a3 <- as.character(1:1000000) a <- data.frame(a1, a2, a3) Rprof() borrar <- .C("escribir", as.integer(1000000), as.character("dic"), as.integer(3), as.double(a[,1]), as.integer(a[,2]), as.character(a[,3])) Rprof(NULL) rm(borrar) summaryRprof() //////////////////// Program End /////////////////// to compare the performance (given that the program below is compiled as a shared library under the libopio.so name). Now, my question: Is this interesting/useful at all for anybody other then myself? Have I done something silly (I know too little about both C and R) and wasted an afternoon? Or would it be worth trying to improve the code to improve generality and wrapping it in some R code so as to make the function invocation a bit more transparent and automatic? Sincerely, Carlos J. Gil Bellosta ///////////////////// Program Start ///////////////// #include <stdio.h> #include <stdlib.h> #include <stdarg.h> void escribir(int* n_lin, char** tipo, int* n, ...){ int i, j; va_list lista; FILE* f = fopen("salidaPrueba", "w"); for(i = 0; i < *n_lin; i++){ char *pAchar = *tipo; va_start(lista, *n); for(j = 0; j < *n; j++){ if(*pAchar == 'd'){ fprintf(f, " %f", *(va_arg(lista, double*) + i)); } else if(*pAchar == 'i'){ fprintf(f, " %d", *(va_arg(lista, int*) + i)); } else if(*pAchar == 'c'){ fprintf(f, " %s", *(va_arg(lista, char**) + i)); } else fprintf(f, "mierda %c", *pAchar); pAchar++; } fprintf(f, "\n"); va_end(lista); } fclose(f); } ///////////////////// Program End /////////////////