Hi,
Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come
more than 1GB sometimes].
When i tried to load into a variable it taking too much of time and after
that when i do cbind by groups,
getting an error like this
" Error: cannot allocate vector of size 82.4 Mb "
My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv
files.
Here i will give no of lines to be 'split by' as input.
Below i give my code
-------------------------------
SplitLargeCSVToMany <- function(DataMatrix,Destination,NoOfLineToGroup)
{
test <- data.frame(read.csv(DataMatrix))
# create groups No.of rows
group <- rep(1:NROW(test), each=NoOfLineToGroup)
new.test <- cbind(test, group=group)
new.test2 <- new.test
new.test2[,ncol(new.test2)] <- NULL
# now get indices to write out
indices <- split(seq(nrow(test)), new.test[, 'group'])
# now write out the files
for (i in names(indices))
{
write.csv(new.test2[indices[[i]],], file=paste(Destination,"data.",
i,
".csv", sep=""),row.names=FALSE)
}
}
-----------------------------------------------------
My system Configuration is,
Intel Core2 Duo
speed : 3GHz
2 GB RAM
OS: Windows-XP [ServicePack-3]
---------------------------------------------------
Any hope to solve this issue ?
Thanks in advance,
Antony.
--
View this message in context:
http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597.html
Sent from the R help mailing list archive at Nabble.com.
Sarah Goslee
2012-Jul-24 17:59 UTC
[R] ERROR : cannot allocate vector of size (in MB & GB)
Sure, get more RAM. 2GB is a tiny amount if you need to load files of 1GB into R, and as you've discovered won't work. You can try a few simpler things, like making sure there's nothing loaded into R except what you absolutely need. It looks like there's no reason to read the entire file into R at once for what you want to do, so you could also load a chunk, process that, then move onto the next one. Sarah On Tue, Jul 24, 2012 at 9:45 AM, Rantony <antony.akkara at ge.com> wrote:> Hi, > > Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come > more than 1GB sometimes]. > When i tried to load into a variable it taking too much of time and after > that when i do cbind by groups, > getting an error like this > > " Error: cannot allocate vector of size 82.4 Mb " > > My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv > files. > Here i will give no of lines to be 'split by' as input. > > Below i give my code > ------------------------------- > SplitLargeCSVToMany <- function(DataMatrix,Destination,NoOfLineToGroup) > { > test <- data.frame(read.csv(DataMatrix)) > > # create groups No.of rows > group <- rep(1:NROW(test), each=NoOfLineToGroup) > new.test <- cbind(test, group=group) > new.test2 <- new.test > new.test2[,ncol(new.test2)] <- NULL > > # now get indices to write out > indices <- split(seq(nrow(test)), new.test[, 'group']) > > # now write out the files > for (i in names(indices)) > { > write.csv(new.test2[indices[[i]],], file=paste(Destination,"data.", i, > ".csv", sep=""),row.names=FALSE) > } > } > > ----------------------------------------------------- > My system Configuration is, > Intel Core2 Duo > speed : 3GHz > 2 GB RAM > OS: Windows-XP [ServicePack-3] > --------------------------------------------------- > > Any hope to solve this issue ? > > Thanks in advance, > Antony. > > > > > ---- Sarah Goslee http://www.functionaldiversity.org
try this:
input <- file("yourLargeCSV", "r")
fileNo <- 1
repeat{
myLines <- readLines(input, n=100000) # 100K lines / file
if (length(myLines) == 0) break
writeLines(myLines, sprintf("output%03d.csv", fileNo))
fileNo <- fileNo + 1
}
close(input)
On Tue, Jul 24, 2012 at 9:45 AM, Rantony <antony.akkara at ge.com>
wrote:> Hi,
>
> Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come
> more than 1GB sometimes].
> When i tried to load into a variable it taking too much of time and after
> that when i do cbind by groups,
> getting an error like this
>
> " Error: cannot allocate vector of size 82.4 Mb "
>
> My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv
> files.
> Here i will give no of lines to be 'split by' as input.
>
> Below i give my code
> -------------------------------
> SplitLargeCSVToMany <-
function(DataMatrix,Destination,NoOfLineToGroup)
> {
> test <- data.frame(read.csv(DataMatrix))
>
> # create groups No.of rows
> group <- rep(1:NROW(test), each=NoOfLineToGroup)
> new.test <- cbind(test, group=group)
> new.test2 <- new.test
> new.test2[,ncol(new.test2)] <- NULL
>
> # now get indices to write out
> indices <- split(seq(nrow(test)), new.test[,
'group'])
>
> # now write out the files
> for (i in names(indices))
> {
> write.csv(new.test2[indices[[i]],],
file=paste(Destination,"data.", i,
> ".csv", sep=""),row.names=FALSE)
> }
> }
>
> -----------------------------------------------------
> My system Configuration is,
> Intel Core2 Duo
> speed : 3GHz
> 2 GB RAM
> OS: Windows-XP [ServicePack-3]
> ---------------------------------------------------
>
> Any hope to solve this issue ?
>
> Thanks in advance,
> Antony.
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
HI,
You can try like using dbLoad() from hash package to load.? Also, if you need to
chunk the data, you can use ff package.
A.K.
----- Original Message -----
From: Rantony <antony.akkara at ge.com>
To: r-help at r-project.org
Cc:
Sent: Tuesday, July 24, 2012 9:45 AM
Subject: [R] ERROR : cannot allocate vector of size (in MB & GB)
Hi,
Here in R, I need to load a huge file(.csv) , its size is 200MB. [may come
more than 1GB sometimes].
When i tried to load into a variable it taking too much of time and after
that when i do cbind by groups,
getting an error like this
" Error: cannot allocate vector of size 82.4 Mb "
My requirement is, spilt data from Huge-size-file(.csv) to no. of small csv
files.
Here i will give no of lines to be 'split by' as input.
Below i give my code
-------------------------------
??? ??? SplitLargeCSVToMany <-
function(DataMatrix,Destination,NoOfLineToGroup)
??? ??? {
??? ??? ??? test <- data.frame(read.csv(DataMatrix))
??? ??? ???
??? ??? ??? # create groups No.of rows
??? ??? ??? group <- rep(1:NROW(test), each=NoOfLineToGroup)
??? ??? ??? new.test <- cbind(test, group=group)
??? ??? ??? new.test2 <- new.test
??? ??? ??? new.test2[,ncol(new.test2)] <- NULL
??? ??? ???
??? ??? ??? # now get indices to write out
??? ??? ??? indices <- split(seq(nrow(test)), new.test[, 'group'])
??? ??? ???
??? ??? ??? # now write out the files
??? ??? ??? for (i in names(indices))
??? ??? ??? {
??? ??? ??? write.csv(new.test2[indices[[i]],],
file=paste(Destination,"data.", i,
".csv", sep=""),row.names=FALSE)
??? ??? ??? } ??? ??? ???
??? ??? }
-----------------------------------------------------
My system Configuration is,
Intel Core2 Duo
speed : 3GHz
2 GB RAM
OS: Windows-XP [ServicePack-3]
---------------------------------------------------
Any hope to solve this issue ?
Thanks in advance,
Antony.
--
View this message in context:
http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi Arun, But I am using Windows(XP). From: arun kirshna [via R] [mailto:ml-node+s789695n4639435h23@n4.nabble.com] Sent: Tuesday, August 07, 2012 10:49 PM To: Akkara, Antony (GE Energy, Non-GE) Subject: Re: ERROR : cannot allocate vector of size (in MB & GB) HI, If you are using linux, this should split the file in linux prompt: #Assuming your file size is a 160KB file: split -b 40k file.csv outputfilename #This will output four 40KB files: outputfilename+suffix(aa,ab,ac,ad) A.K. ________________________________ If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB -GB-tp4637597p4639435.html To unsubscribe from ERROR : cannot allocate vector of size (in MB & GB), click here <http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscrib e_by_code&node=4637597&code=YW50b255LmFra2FyYUBnZS5jb218NDYzNzU5N3wxNTUx OTQzMDI5> . NAML <http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_view er&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.Bas icNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.tem plate.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml -instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemai l.naml> -- View this message in context: http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-MB-GB-tp4637597p4639548.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]