Hi, I''m developing an application where I''ll have to store a lot of images coming from the users. And I''m still not sure if I should store them in MySQl as blob or just store them on filesystem. If I store them on filesystem, how to scale when I''ll have to have multiple servers ? Thanks, Pratik -- rm -rf / 2>/dev/null - http://null.in
On 6/4/06, Pratik <pratiknaik@gmail.com> wrote:> Hi, > > I''m developing an application where I''ll have to store a lot of images > coming from the users. And I''m still not sure if I should store them > in MySQl as blob or just store them on filesystem. > > If I store them on filesystem, how to scale when I''ll have to have > multiple servers ? > > Thanks, > Pratik > -- > rm -rf / 2>/dev/null - http://null.in > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >Indeed storing in the filesystem is a "quick and dirty" solution with many problems. Scaling is just one of them. If you binary image data is part of your system''s data, use the database. -- -Alder
Pratik wrote:> I''m developing an application where I''ll have to store a lot of images > coming from the users. And I''m still not sure if I should store them > in MySQl as blob or just store them on filesystem. > > If I store them on filesystem, how to scale when I''ll have to have > multiple servers ?How to scale images on the file system with multiple servers: Start from scratch (0 users/hits/images) and move to the next level when the load/response/metric of your choice becomes unacceptable. At each step, various hardware upgrades and optimizations are possible - additional/faster disks, disk arrays - so there are more steps than these. 1) Store the images on the same box as your db, web server and application. 2) Move your db to a box of its own. 3) Move your application to a box of its own. Images stay on web server box. Route image requests separately from application requests on the web tier. 4) Add application servers. The web tier can serve static requests faster than your application in all probability. 5) Move the images to a dedicated image server. 6) Create an image server cluster 7) Create multiple global image server clusters Your path through these steps could be slightly different depending on the complexity of you application, the size and number of images, your requirement for immediate availability, patterns of use of images, frequency of addition of new images, processing done on newly acquired images, etc. You might add a dedicated image preprocessing server at some stage of this build out. I would be interested in seeing how you could handle images in the db through the same scaling scenario. I''m not saying you couldn''t do it, or even that it might not be a better fit under certain conditions, but it would not be my first choice in common web applications that include 5-50 images per page. If you go the db route, at every step of the build out you are going to have to serve the images through the whole stack (db=>app=>web). That reduces your flexibility in expanding and introduces a lot of overhead for every transaction. If that weren''t enough, replicating multiple dbs is much more brittle than syncing multiple file systems in my experience. That you are thinking about this at all is probably a premature optimization. You don''t really know what the use patterns of your application are, where the bottlenecks are, what problems can be solved with existing hardware, etc. You don''t even really know what "a lot" is. Is it 10K, 100K, 1M, 10M, or 100M? How many users? How many images per user? How often do they view them? For now, encapsulate access to the images and you can change when the requirements become clearer. -- Ray
If this is a big solution then I would consider MogileFS. I first heard about it in a podcast interview with the creator of odeo.com and they use it to store and distribute MP3. It''s also used in LiveJournal. It''s good at both distributing the load, providing file replication for security and great fault tolerance. Another plus is that from the ruby client level it''s actually simpler that file storage. The server and service info is here: http://www.danga.com/mogilefs/ Ruby client libraries(mogilefs-client) are here: http://dev.robotcoop.com/Libraries/ On 6/4/06, Pratik <pratiknaik@gmail.com> wrote:> Hi, > > I''m developing an application where I''ll have to store a lot of images > coming from the users. And I''m still not sure if I should store them > in MySQl as blob or just store them on filesystem. > > If I store them on filesystem, how to scale when I''ll have to have > multiple servers ? > > Thanks, > Pratik > -- > rm -rf / 2>/dev/null - http://null.in > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- -------------- Jon Gretar Borgthorsson http://www.jongretar.net/
> If I store them on filesystem, how to scale when I''ll have to have > multiple servers ?Unless you''re Flickr, it''s likely that NFS will carry you a long way for very little investment in complexity. The 37signals cluster is using NFS to handle all file uploads for hosted applications. -- David Heinemeier Hansson http://www.loudthinking.com -- Broadcasting Brain http://www.basecamphq.com -- Online project management http://www.backpackit.com -- Personal information manager http://www.rubyonrails.com -- Web-application framework
Yeah, I''m with David on this one, we just wrote a remote-backup application for our senior design project, here in Portland, using MySQL as the DB back end. We were seeing how the DB would freak-out with files being stored within the Tables. Not a good idea, since MySQL does single file Tables, and our machine was pretty wimpy with half gig-o-ram and 100 gig HDDs that machine was swapping like there was no tomorrow once we approached 1gigs worth of files stored. We also looked at doing a distributed version for scaling, but had the same problem with swapping. Store the links/paths to the files on a FS/NFS and have fun! Phil Johnston http://newsclobber.com // News Aggregation for fun! http://gwid.dietpeach.com // Event planning for fun! On 6/4/06, David Heinemeier Hansson <david.heinemeier@gmail.com> wrote:> > > If I store them on filesystem, how to scale when I''ll have to have > > multiple servers ? > > Unless you''re Flickr, it''s likely that NFS will carry you a long way > for very little investment in complexity. The 37signals cluster is > using NFS to handle all file uploads for hosted applications. > -- > David Heinemeier Hansson > http://www.loudthinking.com -- Broadcasting Brain > http://www.basecamphq.com -- Online project management > http://www.backpackit.com -- Personal information manager > http://www.rubyonrails.com -- Web-application framework > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060605/ad699d6c/attachment.html
are you used MogileFS : http://www.danga.com/mogilefs 2006/6/5, David Heinemeier Hansson <david.heinemeier@gmail.com>:> > > If I store them on filesystem, how to scale when I''ll have to have > > multiple servers ? > > Unless you''re Flickr, it''s likely that NFS will carry you a long way > for very little investment in complexity. The 37signals cluster is > using NFS to handle all file uploads for hosted applications. > -- > David Heinemeier Hansson > http://www.loudthinking.com -- Broadcasting Brain > http://www.basecamphq.com -- Online project management > http://www.backpackit.com -- Personal information manager > http://www.rubyonrails.com -- Web-application framework > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- Best Regards, Caiwangqin http://www.uuzone.com Mobile: +8613951787088 Tel: +86025-84818086 ext 233 Fax: +86025-84814993 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060605/7cbba90a/attachment.html
Thanks everyone :) I''ll post about whatever the approach I take. I''m sure I''ll need more inputs on the setup. Regards, Pratik -- rm -rf / 2>/dev/null - http://null.in
Julian ''Julik'' Tarkhanov
2006-Jun-05 18:06 UTC
[Rails] Hosting images : DB or File System
On 5-jun-2006, at 6:35, David Heinemeier Hansson wrote:>> If I store them on filesystem, how to scale when I''ll have to have >> multiple servers ? > > Unless you''re Flickr, it''s likely that NFS will carry you a long way > for very little investment in complexity. The 37signals cluster is > using NFS to handle all file uploads for hosted applications.The only thing to watch out for is for the directory length - some filesystems will throw up (literally) after 3-4 thousand files are in a single directory. file_column manages this very nicely by segmenting uploads into dirs all by itself. Storing images in a DB is a no-no - it''s +1 (convenient for you) and -6 for all the others. -- Julian ''Julik'' Tarkhanov please send all personal mail to me at julik.nl
The Wikimedia project (including Wikipedia) stores files on the DB without using anything fancy, and they don''t seem to have a problem. Now, I''m not saying we all should do that; I''m just adding a point to the discussion. -Nathan On 05/06/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl> wrote:> > On 5-jun-2006, at 6:35, David Heinemeier Hansson wrote: > > >> If I store them on filesystem, how to scale when I''ll have to have > >> multiple servers ? > > > > Unless you''re Flickr, it''s likely that NFS will carry you a long way > > for very little investment in complexity. The 37signals cluster is > > using NFS to handle all file uploads for hosted applications. > > The only thing to watch out for is for the directory length - some > filesystems will throw up (literally) after 3-4 thousand files are in > a single directory. file_column manages this very nicely by > segmenting uploads into dirs all by itself. Storing images in a DB is > a no-no - it''s +1 (convenient for you) and -6 for all the others. > > -- > Julian ''Julik'' Tarkhanov > please send all personal mail to > me at julik.nl > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
Well.. Not everyone can afford 150 servers and not everyone gets free bandwidth and rackspace in 2 different countries. And still they have bad days. On 6/5/06, njmacinnes@gmail.com <njmacinnes@gmail.com> wrote:> The Wikimedia project (including Wikipedia) stores files on the DB > without using anything fancy, and they don''t seem to have a problem. > > Now, I''m not saying we all should do that; I''m just adding a point to > the discussion. > > -Nathan > > On 05/06/06, Julian ''Julik'' Tarkhanov <listbox@julik.nl> wrote: > > > > On 5-jun-2006, at 6:35, David Heinemeier Hansson wrote: > > > > >> If I store them on filesystem, how to scale when I''ll have to have > > >> multiple servers ? > > > > > > Unless you''re Flickr, it''s likely that NFS will carry you a long way > > > for very little investment in complexity. The 37signals cluster is > > > using NFS to handle all file uploads for hosted applications. > > > > The only thing to watch out for is for the directory length - some > > filesystems will throw up (literally) after 3-4 thousand files are in > > a single directory. file_column manages this very nicely by > > segmenting uploads into dirs all by itself. Storing images in a DB is > > a no-no - it''s +1 (convenient for you) and -6 for all the others. > > > > -- > > Julian ''Julik'' Tarkhanov > > please send all personal mail to > > me at julik.nl > > > > > > _______________________________________________ > > Rails mailing list > > Rails@lists.rubyonrails.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- -------------- Jon Gretar Borgthorsson http://www.jongretar.net/
> The only thing to watch out for is for the directory length - some > filesystems will throw up (literally) after 3-4 thousand files are in > a single directorythe subject is misleading. all filesystems are databases. just some are more efficent than others - and youre going to get more efficiency out a database backed by a block device than backed by another database (eg mysql). Reiser4 claims to handle 10 million files in a directory no problem. im guessing ZFS i up to the task as well... subdir=filename.split("")[0..n].join("/") can buy a lot of leeway on other more crusty filesystems though.. -- Posted via http://www.ruby-forum.com/.