Hi, first off, I wanna ask how do I declare a data.frame of 0 rows and n columns? Coming to my problem, I have a data.frame of 22 columns by dynamic rows which I insert using rbind. The total number of rows could go upto 2,00,000. The problem is that after about 800 or 900 get inserted rbind starts overwriting the data.frame and I end up with a total of 800-900 rows. What is up with that? The 22 columns are all strings each having about 10 characters -- Rajesh.J [[alternative HTML version deleted]]
Hi again! I'm trying to follow your general goal from your questions today but it's not easy. First, declaring a data.frame of 0 rows is a bad idea. It is much faster to define the length and number of rows from the beginning and to fill it then. Second, I don't know how to do it! What I know is that, to my knowledge (maybe I overlooked some posts in the archive), there is no easy way to do it, such as lists or vectors. The easiest might be to create a list with the correct length with list(), fill it with whatever data and then convert it to a data.frame with as.data.frame() when it's finished. Third, for your problem, maybe do.call() can help you. I don't know what you did up to now, but it sounds that you tried to do it iteratively (in a loop) instead of vectorizing it (though I don't know if do.call() can be really called vectorized). There was a post yesterday/today on do.call(). You'll surely find it if you look with RSiteSearch(). Last, I don't know if it is relevant for you, but I've read on the list many times that matrices are faster to deal with. If all your columns and rows have the same type, then you can use matrices. There are surely guys that know more about this stuff somewhere on the list, but I hope it can get you started. Ivan Le 9/6/2010 16:56, rajesh j a écrit :> Hi, > > first off, I wanna ask how do I declare a data.frame of 0 rows and n > columns? > > Coming to my problem, > > I have a data.frame of 22 columns by dynamic rows which I insert using > rbind. The total number of rows could go upto 2,00,000. The problem is that > after about 800 or 900 get inserted rbind starts overwriting the data.frame > and I end up with a total of 800-900 rows. What is up with that? > The 22 columns are all strings each having about 10 characters-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra@uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]]
This will give a matrix with 0 rows. data.frame(matrix(nrow = 0, ncol = 22, dimnames = list(NULL, LETTERS[1:22]))) But you should avoid growing dataframes is the final dataframe is going to be large. You are very likely to get memory problems. It is much to better to create a large enough dataframe and then overwrite the rows. And it is faster too...> nrows <- 2000 > ncols <- 22 > system.time({+ tmp <- data.frame(matrix(nrow = 0, ncol = ncols)) + for(i in seq_len(nrows)){ + tmp <- rbind(tmp, rnorm(ncols)) + } + }) user system elapsed 7.83 0.02 7.86> system.time({+ tmp <- data.frame(matrix(nrow = nrows, ncol = ncols)) + for(i in seq_len(nrows)){ + tmp[i, ] <- rnorm(ncols) + } + }) user system elapsed 3.75 0.00 3.76 #In this case an apply construction was even faster> system.time({+ tmp <- t(sapply(seq_len(nrows), function(i){ + rnorm(ncols) + })) + }) user system elapsed 0.02 0.00 0.02 ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Namens rajesh j > Verzonden: maandag 6 september 2010 16:57 > Aan: r-help at r-project.org > Onderwerp: [R] rbind() overwriting data.frame() > > Hi, > > first off, I wanna ask how do I declare a data.frame of 0 > rows and n columns? > > Coming to my problem, > > I have a data.frame of 22 columns by dynamic rows which I > insert using rbind. The total number of rows could go upto > 2,00,000. The problem is that after about 800 or 900 get > inserted rbind starts overwriting the data.frame and I end up > with a total of 800-900 rows. What is up with that? > The 22 columns are all strings each having about 10 characters > -- > Rajesh.J > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.