Hi, Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come more than 1GB sometimes]. When i tried to load into a variable it taking too much of time and after that when i do cbind by groups, getting an error like this " Error: cannot allocate vector of size 82.4 Mb " My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv files. Here i will give no of lines to be 'split by' as input. Below i give my code ------------------------------- SplitLargeCSVToMany <- function(DataMatrix,Destination,NoOfLineToGroup) { test <- data.frame(read.csv(DataMatrix)) # create groups No.of rows group <- rep(1:NROW(test), each=NoOfLineToGroup) new.test <- cbind(test, group=group) new.test2 <- new.test new.test2[,ncol(new.test2)] <- NULL # now get indices to write out indices <- split(seq(nrow(test)), new.test[, 'group']) # now write out the files for (i in names(indices)) { write.csv(new.test2[indices[[i]],], file=paste(Destination,"data.", i, ".csv", sep=""),row.names=FALSE) } } ----------------------------------------------------- My system Configuration is, Intel Core2 Duo speed : 3GHz 2 GB RAM OS: Windows-XP [ServicePack-3] --------------------------------------------------- Any hope to solve this issue ? Thanks in advance, Antony. -- View this message in context: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597.html Sent from the R help mailing list archive at Nabble.com.
Sarah Goslee
2012-Jul-24 17:59 UTC
[R] ERROR : cannot allocate vector of size (in MB & GB)
Sure, get more RAM. 2GB is a tiny amount if you need to load files of 1GB into R, and as you've discovered won't work. You can try a few simpler things, like making sure there's nothing loaded into R except what you absolutely need. It looks like there's no reason to read the entire file into R at once for what you want to do, so you could also load a chunk, process that, then move onto the next one. Sarah On Tue, Jul 24, 2012 at 9:45 AM, Rantony <antony.akkara at ge.com> wrote:> Hi, > > Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come > more than 1GB sometimes]. > When i tried to load into a variable it taking too much of time and after > that when i do cbind by groups, > getting an error like this > > " Error: cannot allocate vector of size 82.4 Mb " > > My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv > files. > Here i will give no of lines to be 'split by' as input. > > Below i give my code > ------------------------------- > SplitLargeCSVToMany <- function(DataMatrix,Destination,NoOfLineToGroup) > { > test <- data.frame(read.csv(DataMatrix)) > > # create groups No.of rows > group <- rep(1:NROW(test), each=NoOfLineToGroup) > new.test <- cbind(test, group=group) > new.test2 <- new.test > new.test2[,ncol(new.test2)] <- NULL > > # now get indices to write out > indices <- split(seq(nrow(test)), new.test[, 'group']) > > # now write out the files > for (i in names(indices)) > { > write.csv(new.test2[indices[[i]],], file=paste(Destination,"data.", i, > ".csv", sep=""),row.names=FALSE) > } > } > > ----------------------------------------------------- > My system Configuration is, > Intel Core2 Duo > speed : 3GHz > 2 GB RAM > OS: Windows-XP [ServicePack-3] > --------------------------------------------------- > > Any hope to solve this issue ? > > Thanks in advance, > Antony. > > > > > ---- Sarah Goslee http://www.functionaldiversity.org
try this: input <- file("yourLargeCSV", "r") fileNo <- 1 repeat{ myLines <- readLines(input, n=100000) # 100K lines / file if (length(myLines) == 0) break writeLines(myLines, sprintf("output%03d.csv", fileNo)) fileNo <- fileNo + 1 } close(input) On Tue, Jul 24, 2012 at 9:45 AM, Rantony <antony.akkara at ge.com> wrote:> Hi, > > Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come > more than 1GB sometimes]. > When i tried to load into a variable it taking too much of time and after > that when i do cbind by groups, > getting an error like this > > " Error: cannot allocate vector of size 82.4 Mb " > > My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv > files. > Here i will give no of lines to be 'split by' as input. > > Below i give my code > ------------------------------- > SplitLargeCSVToMany <- function(DataMatrix,Destination,NoOfLineToGroup) > { > test <- data.frame(read.csv(DataMatrix)) > > # create groups No.of rows > group <- rep(1:NROW(test), each=NoOfLineToGroup) > new.test <- cbind(test, group=group) > new.test2 <- new.test > new.test2[,ncol(new.test2)] <- NULL > > # now get indices to write out > indices <- split(seq(nrow(test)), new.test[, 'group']) > > # now write out the files > for (i in names(indices)) > { > write.csv(new.test2[indices[[i]],], file=paste(Destination,"data.", i, > ".csv", sep=""),row.names=FALSE) > } > } > > ----------------------------------------------------- > My system Configuration is, > Intel Core2 Duo > speed : 3GHz > 2 GB RAM > OS: Windows-XP [ServicePack-3] > --------------------------------------------------- > > Any hope to solve this issue ? > > Thanks in advance, > Antony. > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
HI, You can try like using dbLoad() from hash package to load.? Also, if you need to chunk the data, you can use ff package. A.K. ----- Original Message ----- From: Rantony <antony.akkara at ge.com> To: r-help at r-project.org Cc: Sent: Tuesday, July 24, 2012 9:45 AM Subject: [R] ERROR : cannot allocate vector of size (in MB & GB) Hi, Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come more than 1GB sometimes]. When i tried to load into a variable it taking too much of time and after that when i do cbind by groups, getting an error like this " Error: cannot allocate vector of size 82.4 Mb " My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv files. Here i will give no of lines to be 'split by' as input. Below i give my code ------------------------------- ??? ??? SplitLargeCSVToMany <- function(DataMatrix,Destination,NoOfLineToGroup) ??? ??? { ??? ??? ??? test <- data.frame(read.csv(DataMatrix)) ??? ??? ??? ??? ??? ??? # create groups No.of rows ??? ??? ??? group <- rep(1:NROW(test), each=NoOfLineToGroup) ??? ??? ??? new.test <- cbind(test, group=group) ??? ??? ??? new.test2 <- new.test ??? ??? ??? new.test2[,ncol(new.test2)] <- NULL ??? ??? ??? ??? ??? ??? # now get indices to write out ??? ??? ??? indices <- split(seq(nrow(test)), new.test[, 'group']) ??? ??? ??? ??? ??? ??? # now write out the files ??? ??? ??? for (i in names(indices)) ??? ??? ??? { ??? ??? ??? write.csv(new.test2[indices[[i]],], file=paste(Destination,"data.", i, ".csv", sep=""),row.names=FALSE) ??? ??? ??? } ??? ??? ??? ??? ??? } ----------------------------------------------------- My system Configuration is, Intel Core2 Duo speed : 3GHz 2 GB RAM OS: Windows-XP [ServicePack-3] --------------------------------------------------- Any hope to solve this issue ? Thanks in advance, Antony. -- View this message in context: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Arun, But I am using Windows(XP). From: arun kirshna [via R] [mailto:ml-node+s789695n4639435h23@n4.nabble.com] Sent: Tuesday, August 07, 2012 10:49 PM To: Akkara, Antony (GE Energy, Non-GE) Subject: Re: ERROR : cannot allocate vector of size (in MB & GB) HI, If you are using linux, this should split the file in linux prompt: #Assuming your file size is a 160KB file: split -b 40k file.csv outputfilename #This will output four 40KB files: outputfilename+suffix(aa,ab,ac,ad) A.K. ________________________________ If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB -GB-tp4637597p4639435.html To unsubscribe from ERROR : cannot allocate vector of size (in MB & GB), click here <http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscrib e_by_code&node=4637597&code=YW50b255LmFra2FyYUBnZS5jb218NDYzNzU5N3wxNTUx OTQzMDI5> . NAML <http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_view er&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.Bas icNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.tem plate.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml -instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemai l.naml> -- View this message in context: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597p4639548.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]