I am using rubyzip and am trying to put a huge csv file with 1.4 million rows into the zip file. Using jruby I get a out of heap error. I believe the error happens in the block below: Zip::ZipOutputStream.open(zip_path) do |zos| zos.put_next_entry(File.basename(csv_path)) zos.print IO.read(csv_path) end -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Luis Lavena
2012-May-09 17:27 UTC
Re: out of memory (java heap space) on zip creation (jruby)
On Wednesday, May 9, 2012 1:52:27 PM UTC-3, Jedrin wrote:> > I am using rubyzip and am trying to put a huge csv file with 1.4 > million rows into the zip file. > Using jruby I get a out of heap error. > > I believe the error happens in the block below: > > Zip::ZipOutputStream.open(zip_path) do |zos| > zos.put_next_entry(File.basename(csv_path)) > zos.print IO.read(csv_path) > end >You''re reading the entire file contents into memory and then saving. Look if there is a way for you to stream chunks (16 kilobytes for example) into the zip stream. -- Luis Lavena -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/pd99kWagyskJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
The error happens on the line: zos.print IO.read(csv_path) I see that p zos.class shows: Zip::ZipOutputStream and that the print method is inherited from: http://rubyzip.sourceforge.net/classes/IOExtras/AbstractOutputStream.html where print is shown to be this according to doc: # File lib/zip/ioextras.rb, line 130 def print(*params) self << params.to_s << $\.to_s end I am not sure offhand how to stream the data, but gathered that the problem was from reading the file into memory -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Greg Akins
2012-May-09 18:42 UTC
Re: Re: out of memory (java heap space) on zip creation (jruby)
On Wed, May 9, 2012 at 2:07 PM, Jedrin <jrubiando-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > > > I am not sure offhand how to stream the data, but gathered that the > problem was from reading the > file into memory >The default heapsize for the jvm is pretty small. I believe you can pass args to jvm when you start jruby if you do something like -xmx1024m (Not sure that syntax is exactly correct, but it''s close) you might get enough. Of course that depends on the size of the file -- Greg Akins http://twitter.com/akinsgre -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On May 9, 2:42 pm, Greg Akins <angryg...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > The default heapsize for the jvm is pretty small. I believe you can pass > args to jvm when you start jruby > > if you do something like -xmx1024m (Not sure that syntax is exactly > correct, but it''s close) you might get enough. Of course that depends on > the size of the file > > -- > Greg Akinshttp://twitter.com/akinsgreWell, the csv file has something like 1.4 million rows and maybe 20 columns or something like that. When I get a chance, maybe I''ll look into that if that seems like the thing to try .. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Robert Walker
2012-May-09 20:05 UTC
Re: out of memory (java heap space) on zip creation (jruby)
Jedrin wrote in post #1060204:> On May 9, 2:42pm, Greg Akins <angryg...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >> The default heapsize for the jvm is pretty small. I believe you can pass >> args to jvm when you start jruby >> >> if you do something like -xmx1024m (Not sure that syntax is exactly >> correct, but it''s close) you might get enough. Of course that depends on >> the size of the file >> >> -- >> Greg Akinshttp://twitter.com/akinsgre > > Well, the csv file has something like 1.4 million rows and maybe 20 > columns or something like that. When I get a chance, maybe I''ll look > into that if that seems like the thing to try .."When I get a chance, maybe..."??? Greg gave you the answer. A default JVM instance heap space is limited to 64 Megabytes. If the file you''re loading, plus the memory consumed by your application, goes over that memory limit the JVM will report "out of memory" and begin exhibiting unpredictable behavior. It make no difference how much physical RAM your machine might contain. The JVM will NOT use more heap space that the maximum defined by the -xmx argument (-xmx64m being the default when not specified). -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
> > Greg gave you the answer. A default JVM instance heap space is limited > to 64 Megabytes. If the file you''re loading, plus the memory consumed by > your application, goes over that memory limit the JVM will report "out > of memory" and begin exhibiting unpredictable behavior. > > It make no difference how much physical RAM your machine might contain. > The JVM will NOT use more heap space that the maximum defined by the > -xmx argument (-xmx64m being the default when not specified). > > -- > Posted viahttp://www.ruby-forum.com/.So I launched my sinatra app like this and from my google searches the -J arg looks like what I want. jruby -J-Xmx1024m -S recordset.rb When I tried to download the csv file (which the server puts into the zip file and then crashes), I got the same heap space error, but it seemed like it did run longer before it crashed. II try to increase that number much higher than 1024m, I get: Error occurred during initialization of VM Could not reserve enough space for object heap JVM creation failed -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Greg Akins
2012-May-09 20:52 UTC
Re: Re: out of memory (java heap space) on zip creation (jruby)
On Wed, May 9, 2012 at 4:42 PM, Jedrin <jrubiando-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > When I tried to download the csv file (which the server puts into the > zip file and then crashes), > I got the same heap space error, but it seemed like it did run longer > before it crashed. II try to increase that number much higher than > 1024m, I get: >The heap contains all the objects created for the application.. In this case, it looks like your file is still too big> Error occurred during initialization of VM > Could not reserve enough space for object heap > JVM creation failed >This means that you tried to allocate more than is available on the machine Are you doing this for a single load, or will it be an application that will commonly receive large files? If it''s the latter, I''d probably try to redesign the code you''re using to load the files. Sounds like this is part of a third party gem? If that''s the case, maybe they have some mechanism for handling larger files? -- Greg Akins http://twitter.com/akinsgre -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
> The heap contains all the objects created for the application.. In this > case, it looks like your file is still too big > > > Error occurred during initialization of VM > > Could not reserve enough space for object heap > > JVM creation failed > > This means that you tried to allocate more than is available on the machine > > Are you doing this for a single load, or will it be an application that > will commonly receive large files? > > If it''s the latter, I''d probably try to redesign the code you''re using to > load the files. Sounds like this is part of a third party gem? If that''s > the case, maybe they have some mechanism for handling larger files? > > -- > Greg Akinshttp://twitter.com/akinsgreWhat I do is create a csv file from the database. I had some memory problems there, but using active record find_in_batches() seemed to solve that. The CSV file has 1.4 million rows. It gets created successfully. I then use rubyzip gem to create a zip file that just contains that CSV file. I just used examples I found from google searches on how to create the zip file which are shown earlier up in the thread. I looked at the class info on the web for rubyzip and didn''t see an obvious way to stream data into the zip file. Tomorrow I can look at perhaps some other way to create a zip file using a different gem or some such .. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Luis Lavena
2012-May-09 23:58 UTC
Re: out of memory (java heap space) on zip creation (jruby)
On Wednesday, May 9, 2012 6:21:39 PM UTC-3, Jedrin wrote:> > > > The heap contains all the objects created for the application.. In this > > case, it looks like your file is still too big > > > > > Error occurred during initialization of VM > > > Could not reserve enough space for object heap > > > JVM creation failed > > > > This means that you tried to allocate more than is available on the > machine > > > > Are you doing this for a single load, or will it be an application that > > will commonly receive large files? > > > > If it''s the latter, I''d probably try to redesign the code you''re using > to > > load the files. Sounds like this is part of a third party gem? If > that''s > > the case, maybe they have some mechanism for handling larger files? > > > > -- > > Greg Akinshttp://twitter.com/akinsgre > > What I do is create a csv file from the database. I had some memory > problems there, but using active record find_in_batches() seemed to > solve that. > > The CSV file has 1.4 million rows. It gets created successfully. I > then use rubyzip gem to create a zip file that just contains that CSV > file. I just used examples I found from google searches on how to > create the zip file which are shown earlier up in the thread. I looked > at the class info on the web for rubyzip and didn''t see an obvious way > to stream data into the zip file. Tomorrow I can look at perhaps some > other way to create a zip file using a different gem or some such .. > >As I mentioned in my previous reply and similar to the problem you had when creating the file: you''re trying to load the whole thing. There are two options for this: A) You stream the contents of your CSV file, reading by chunks into a ZipStream or B) You zip the file from outside Ruby (shelling out to gzip for example) -- Luis Lavena -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/mwyK5VTPabEJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
> As I mentioned in my previous reply and similar to the problem you had when > creating the file: you''re trying to load the whole thing. > > There are two options for this: > > A) You stream the contents of your CSV file, reading by chunks into a > ZipStream >That''s exactly what I would like to do, I wasn''t sure offhand if the zip method will read it that way or how to pass it. I was hoping for an idea on how to do that. The code where it all happens is here and the second line is where it crashes: zos.put_next_entry(File.basename(fpath)) zos.print IO.read(fpath) zos is an instance of Zip::ZipOutputStream. The print method is inherited from IOExtras::AbstractOutputStream According to the docs, print() is like this def print(*params) self << params.to_s << $\.to_s end Since it does params.to_s, I''m guessing that is going to put it all into memory. The other methods may have similar problems. However, the putc method looked interesting. There is a putc() defined like this according to the docs: def putc(anObject) self << case anObject when Fixnum then anObject.chr when String then anObject else raise TypeError, "putc: Only Fixnum and String supported" end anObject end So I tried that, here is my code, and the output follows, but the file I was trying to zip was another zip file. It appeared to be a bit bigger than it should have been and when I tried to open it, I got an error saying it was corrupted. This isn''t quite the same CSV problem, but I am doing a zip file into a zip file here. def zput(zos,fpath) p fpath zos.put_next_entry(File.basename(fpath)) f = File.new(fpath) chunk_sz = 10000000 while !f.eof? data = f.read(chunk_sz) zos.putc data puts ''read '' + data.size.to_s + '' bytes'' end end "web.war" read 10000000 bytes read 10000000 bytes read 8573823 bytes "data.war" read 10000000 bytes read 8655347 bytes "big.zip" read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 10000000 bytes read 3431079 bytes -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
I changed the putc about to a write in the above post, followed by zos.print "" at the very end. print() adds $\ to the file it appears. My byte size of the zip file inside the zip was short by two bytes and I still get corrupted zip file errors on that. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
It''s late Friday and I am done for the day, but I just tried something else. It may be that I need to open the file in binary mode and I didn''t. Initial tests seem to indicate that may be the case. Thanks for everyone''s help. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.