Hi all, Do you store your images as blobs directly on the db or store a path to a file? Generally I just store them as files, but I''ve seen a good deal of rails code that stores images in the db. So, what are the pros and cons? I understand that storing blobs will make the db rather slow, and doing a find_all will return the blobs with the record data and we''ll probably seldom need them. Seems like a good way to waste memory.
Caio Chassot wrote:> > Do you store your images as blobs directly on the db or store a path to > a file?Back in the day I used to avoid it - but now I embrace the idea.> So, what are the pros and cons?By storing them in a database, you get: * scalability (multiple servers use one DB, rather than a sigle NFS''d directory) * the comforting idea that everything is in one place (less complexity) * convenience (portability, migration, backup) * no filesystem limitations (DB limits are generally MUCH higher than FS) * security (to a minor degree, the less you access the FS directly the better!)> I understand that storing blobs will make the db rather slow, and doing > a find_all will return the blobs with the record data and we''ll probably > seldom need them. Seems like a good way to waste memory.You''d probably be wise to split the blob data, or image data out into its own table. Something like this: images ====== id type [jpeg, png, etc..] data people ====== id image_id ... --Steve
Caio Chassot wrote:> Hi all, > > Do you store your images as blobs directly on the db or store a path to > a file?Path to a file when possible, db when the data''s mobile.> Generally I just store them as files, but I''ve seen a good deal of rails > code that stores images in the db. > > So, what are the pros and cons?+ You can keep all your dynamic data in one place, and keep it backed up or transported with a single system (database dumps). + You don''t have to handle what happens when the image gets deleted but the DB record is still there. - Slow compared to static file serving (but then, what isn''t?). - Cacheing becomes tricky - or at least ''one more thing to do''.> I understand that storing blobs will make the db rather slow, and doing > a find_all will return the blobs with the record data and we''ll probably > seldom need them.You can get by that by having a second table for the actual blob data, such that the find(:all) only gets whatever metadata you might want (and possibly a thumbnail or two), and then you can :include the second table to grab the image when you need it. xxx -- A
Before I used Rails, I just about shared your opinion. But, with Rails, caching the images in the DB to file is so easy and effortless that all of a sudden it makes a whole lot of sense to store certain images in the DB. First of all, there are NO memory or performance issues, since you cache them to file anyways. The advantage is that if you store certain images (like a product image, or a user profile image) in the DB, there is only one single central data storage for your models. You don''t have to remember, backup or maintain the loose relationship between your Product with id 25 and the corresponding 25.jpg in whichever directory it''s stored. Just store the product images in the DB, cache them to file, and that''s it. Regards, Tomas Jogin On 10/3/05, Caio Chassot <k@v2studio.com> wrote:> Hi all, > > Do you store your images as blobs directly on the db or store a path to > a file? > > Generally I just store them as files, but I''ve seen a good deal of > rails code that stores images in the db. > > So, what are the pros and cons? > > I understand that storing blobs will make the db rather slow, and doing > a find_all will return the blobs with the record data and we''ll > probably seldom need them. Seems like a good way to waste memory. > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
Stephen, you nailed exactly what I had in mind. I''ve been considering the same pros, and exactly the same storage approach (keeping the blobs in their own table) I think I''ll give this idea a go. I suppose you also have an image controller and image caching. Any specific advice in this area? I''m wondering how exactly you handle the content-type / extension issue. I have some ideas, but I''d much rather hear from someone who''s been there before. On Oct 03, 2005, at 14:10, Stephen Waits wrote:> Caio Chassot wrote: >> Do you store your images as blobs directly on the db or store a path >> to a file? > > Back in the day I used to avoid it - but now I embrace the idea. > >> So, what are the pros and cons? > > By storing them in a database, you get: > * scalability (multiple servers use one DB, rather than a sigle NFS''d > directory) > * the comforting idea that everything is in one place (less complexity) > * convenience (portability, migration, backup) > * no filesystem limitations (DB limits are generally MUCH higher than > FS) > * security (to a minor degree, the less you access the FS directly the > better!) > >> I understand that storing blobs will make the db rather slow, and >> doing a find_all will return the blobs with the record data and we''ll >> probably seldom need them. Seems like a good way to waste memory. > > You''d probably be wise to split the blob data, or image data out into > its own table. >
> First of all, there are NO memory or performance issues, since you > cache them to file anyways. The advantage is that if you store certain > images (like a product image, or a user profile image) in the DB, > there is only one single central data storage for your models. You > don''t have to remember, backup or maintain the loose relationship > between your Product with id 25 and the corresponding 25.jpg in > whichever directory it''s stored. Just store the product images in the > DB, cache them to file, and that''s it.I actually have a module that manages image file to record relations, and there isn''t a lot of advantage about running one rsync over running two. Having said that, I''m having a lot of ideas where having the image within the db would make things much simpler, and I''m really buying the idea of having a central storage. (maybe it''s better to run just one rsync :)
On 3-okt-2005, at 19:17, Tomas Jogin wrote:> Before I used Rails, I just about shared your opinion. But, with > Rails, caching the images in the DB to file is so easy and effortless > that all of a sudden it makes a whole lot of sense to store certain > images in the DB. > > First of all, there are NO memory or performance issues, since you > cache them to file anyways. The advantage is that if you store certain > images (like a product image, or a user profile image) in the DB, > there is only one single central data storage for your models. You > don''t have to remember, backup or maintain the loose relationship > between your Product with id 25 and the corresponding 25.jpg in > whichever directory it''s stored. Just store the product images in the > DB, cache them to file, and that''s it.And then: 1. know that Rails loads EVERY image into memory of your app every time anybody sees the name of the model object on a page (in a list perhaps). Remember that Rails has no lazy loading. Which automatically means +1 model for your app (which will control images and just images). 2. Increase the time to make your database dumps on order of magnitude (as well as the time to load them) - your 1 rsync is going to take at least the same amount of time as the 2 you had before. 3. For every image view, make the request pass 3-4 layers of software instead of just 1. Compare: client -> server -> filesystem # for file storage client -> server -> FCGI socket -> Rails -> DB -> filesystem Note that the second one is a loopback - DB has to find the image and send the whole image LOB to Rails (via a socket), then Rails has to send it to the server - and that for every model being loaded! Even being such a computer science retard I know that IPC _is expensive_ and costs time. Besides, every image you load is going to have to come through Rails. Which, in turn, means that you will have to cache your images to make them available in a static fashion - which, in turn, means that you will be storing every image twice. And Rails is really not the fastest software out there (when you want it;s power you better use it for something else than what your server will do best). On the contrary, the only thing you have to worry about when using filesystem is taking care of deleting that 25.jpg from the filesystem. Like this you can truly embrace Rails page caching as well (since not only the HTML but also the images have to bypass your application to be served quickly). Come on, don''t waste resources - I mean, what all this AR power is for then? I indeed only see 2 meaningful situations where it is advisable to store images as LOBs: 1. You want to mirror your DB or cluster it and you want stuff to be really in sync. 2. You got sick of versioning client files on the server as opposed to your own code and stylesheets 3. You are too lazy to use callbacks. 4. We don''t have built-in file_column in Rails which would handle file replacement and such (this one is long overdue). My 2c. -- Julian "Julik" Tarkhanov
Julian ''Julik'' Tarkhanov wrote:> > And then: > 1. know that Rails loads EVERY image into memory of your app every time > anybody > sees the name of the model object on a page (in a list perhaps).That''s why you use a second table for the actual blob data.> Remember that Rails has no lazy loading.Yes it does.> Which automatically means +1 model for your app (which will control > images and just images).Can''t see that as a bad thing, somehow - you get to add domain-specific actions to your image handling, like thumbnailing, comment-handling, and so on.> 2. Increase the time to make your database dumps on order of magnitude > (as well as the time to load them) - your > 1 rsync is going to take at least the same amount of time as the 2 you > had before.Yeah, but it''s less complex to handle (and update) one process than two.> 3. For every image view, make the request pass 3-4 layers of software > instead of just 1. Compare: > client -> server -> filesystem # for file storage > client -> server -> FCGI socket -> Rails -> DB -> filesystemWhich is why you cache the file on the filesystem on the first access, or on update.> Note that the second one is a loopback - DB has to find the image and > send the whole image LOB to Rails (via a socket), > then Rails has to send it to the server - and that for every model > being loaded! Even being such a computer science retard > I know that IPC _is expensive_ and costs time.Combine caching with a two-table image model and that just about goes away.> Besides, every image you load is going to have to come through Rails....once...> Which, in turn, means that you will have to cache your images to make > them available in a static fashion - which, in turn, means that you > will be storing every image twice.True, but you don''t have to care about the second store.> I indeed only see 2 meaningful situations where it is advisable to > store images as LOBs: > 1. You want to mirror your DB or cluster it and you want stuff to be > really in sync.Extremely useful.> 2. You got sick of versioning client files on the server as opposed to > your own code and stylesheetsDepends on the client files. In something like a CMS, you certainly *don''t* want the client DB updates to affect the code versioning. I don''t think anyone''s suggesting that *all* site images should go into the db, but for those that change with site content, it''s a damn good idea. -- Alex
On 4-okt-2005, at 11:49, Alex Young wrote:> Julian ''Julik'' Tarkhanov wrote: > >> >> And then: >> 1. know that Rails loads EVERY image into memory of your app >> every time >> anybody >> sees the name of the model object on a page (in a list perhaps). >> > That''s why you use a second table for the actual blob data. > > >> Remember that Rails has no lazy loading. >> > Yes it does. > > >> Which automatically means +1 model for your app (which will control >> images and just images). >> > Can''t see that as a bad thing, somehow - you get to add domain- > specific > actions to your image handling, like thumbnailing, comment- > handling, and > so on. > > >> 2. Increase the time to make your database dumps on order of >> magnitude >> (as well as the time to load them) - your >> 1 rsync is going to take at least the same amount of time as the >> 2 you >> had before. >> > Yeah, but it''s less complex to handle (and update) one process than > two. > > >> 3. For every image view, make the request pass 3-4 layers of software >> instead of just 1. Compare: >> client -> server -> filesystem # for file storage >> client -> server -> FCGI socket -> Rails -> DB -> filesystem >> > Which is why you cache the file on the filesystem on the first access, > or on update. > > >> Note that the second one is a loopback - DB has to find the image and >> send the whole image LOB to Rails (via a socket), >> then Rails has to send it to the server - and that for every model >> being loaded! Even being such a computer science retard >> I know that IPC _is expensive_ and costs time. >> > Combine caching with a two-table image model and that just about > goes away. > > >> Besides, every image you load is going to have to come through Rails. >> > ...once... > >> Which, in turn, means that you will have to cache your images to make >> them available in a static fashion - which, in turn, means that you >> will be storing every image twice. >> > True, but you don''t have to care about the second store. > > >> I indeed only see 2 meaningful situations where it is advisable to >> store images as LOBs: >> 1. You want to mirror your DB or cluster it and you want stuff to be >> really in sync. >> > Extremely useful. > > >> 2. You got sick of versioning client files on the server as >> opposed to >> your own code and stylesheets >> > Depends on the client files. In something like a CMS, you certainly > *don''t* want the client DB updates to affect the code versioning. > > I don''t think anyone''s suggesting that *all* site images should go > into > the db, but for those that change with site content, it''s a damn > good idea.Odd. We both have our arguments :-) I agree that storing images in the DB is _easier_ and much more _conveniet_ but incurs too much overhead (and I fear overhead also because Rails is not the fastest web framework out there). Let''s agree on that if maintaining convenience (or single storage point) is more important than the speed of the app - you win :-) I think it''s fair to say that if you use DB you get convenience over speed, and the opposite if you use the filesystem. personally I think that the tools AR gives us allow us to use the tools Rails has and just hit Page.destroy_all being sure that all images are going to be unlinked automatically. But this one is my personal opinion of course :-) -- Julian "Julik" Tarkhanov
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 4, 2005, at 2:49 AM, Alex Young wrote:> Julian ''Julik'' Tarkhanov wrote: >> Remember that Rails has no lazy loading. > Yes it does.ActiveRecord lazy-loads associations, not attributes. This does pose a problem for tables with expensive LOB fields. Regards, jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (Darwin) iD8DBQFDQqoaAQHALep9HFYRAvAnAJ9BzyZRR2hClnvfm3Px1ivGBnTqZACfSB8G VakONA/4+Wmc95CQjFTakTc=mCIT -----END PGP SIGNATURE-----
Jeremy Kemper wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Oct 4, 2005, at 2:49 AM, Alex Young wrote: > >> Julian ''Julik'' Tarkhanov wrote: >> >>> Remember that Rails has no lazy loading. >> >> Yes it does. > > > ActiveRecord lazy-loads associations, not attributes. > This does pose a problem for tables with expensive LOB fields.True - but, in context, perhaps I should have been clearer that I was referring to the lazy loading from the second LOB table that would be possible with that arrangement... -- Alex
I can''t believe that no one''s mentioned file_column as the easiest way to handle images in Rails. On 10/4/05, Alex Young <alex-qV/boFbD8Meu8LGVeLuP/g@public.gmane.org> wrote:> Jeremy Kemper wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > On Oct 4, 2005, at 2:49 AM, Alex Young wrote: > > > >> Julian ''Julik'' Tarkhanov wrote: > >> > >>> Remember that Rails has no lazy loading. > >> > >> Yes it does. > > > > > > ActiveRecord lazy-loads associations, not attributes. > > This does pose a problem for tables with expensive LOB fields. > True - but, in context, perhaps I should have been clearer that I was > referring to the lazy loading from the second LOB table that would be > possible with that arrangement... > > -- > Alex > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
I''m glad you guys are talking about storing binary resources (in this case images) in the database. Most likely if you are considering storing binary resources in a database they are resources that will be created and added to the system dynamically over time. I''ve thought about it a bit and would prefer to store all user/admin created data in a database, even binary data. Of course there are repercussions to this as mentioned before. 1. Database dumps 2. Table sizes - 2GB MAX? ~ 40,000 / 50KB images - would need a numbering system to locate the image in a table in a database. Then what about fragmentation if you delete an image? Depending on the requests a website will receive, normally a few application/web servers need to serve the file requests with one database. Requesting the images from the database on every request is of course not an option. If you have 20,000 images that are used 99% of the time, caching all of them in memory on your application/web server shouldn''t be a problem, if you have 2GB of ram on your servers. Now what about the resources need to process that 1 out of a 100 requests that cannot hit cache? We cannot be using precious database/network resources for this requests I think there should be another level of cache for Least Frequently Used resources on the file system. So, basically on the first request if the web/application server does not have the image in RAM, it will check the file system, if not found either, ask the database. By combining these 2 methods you get all your data centralized and managed in a database. Plus you will get the performance benefits transparently. Of course then you will have a copy of each image on every server. If you want to take this conversation further this will have the same issue of database clustering (replicate all data to each node or have a SAN). This was just an idea I was toying with when confronted with the "storing images" issue. There are more issues to flush out with a system like this, like comparing database timestamp and file''s timestamp to make sure the web server has the latest files. But, the basic idea is there. On 10/4/05, Alex Young <alex-qV/boFbD8Meu8LGVeLuP/g@public.gmane.org> wrote:> Jeremy Kemper wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > On Oct 4, 2005, at 2:49 AM, Alex Young wrote: > > > >> Julian ''Julik'' Tarkhanov wrote: > >> > >>> Remember that Rails has no lazy loading. > >> > >> Yes it does. > > > > > > ActiveRecord lazy-loads associations, not attributes. > > This does pose a problem for tables with expensive LOB fields. > True - but, in context, perhaps I should have been clearer that I was > referring to the lazy loading from the second LOB table that would be > possible with that arrangement... > > -- > Alex > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
On Oct 3, 2005, at 11:26 AM, Caio Chassot wrote:> Stephen, you nailed exactly what I had in mind. I''ve been > considering the same pros, and exactly the same storage approach > (keeping the blobs in their own table) > > I think I''ll give this idea a go. > > I suppose you also have an image controller and image caching. Any > specific advice in this area? I''m wondering how exactly you handle > the content-type / extension issue. I have some ideas, but I''d much > rather hear from someone who''s been there before. > > On Oct 03, 2005, at 14:10, Stephen Waits wrote: > > >> Caio Chassot wrote: >> >>> Do you store your images as blobs directly on the db or store a >>> path to a file? >>> >> >> Back in the day I used to avoid it - but now I embrace the idea. >> >> >>> So, what are the pros and cons? >>> >> >> By storing them in a database, you get: >> * scalability (multiple servers use one DB, rather than a sigle >> NFS''d directory) >> * the comforting idea that everything is in one place (less >> complexity) >> * convenience (portability, migration, backup) >> * no filesystem limitations (DB limits are generally MUCH higher >> than FS) >> * security (to a minor degree, the less you access the FS directly >> the better!) >>Ok, so I am looking for more of the nuts and bolts about how exactly this is done: Are your images stored as base64 encoded strings, or raw data, and if so, how do you correctly insert/modify/update the data. Currently, I am storing the image data as a base64 encoded string, but am running into problems getting it to display correctly in the browser. Also, how/where do you do the caching? A quick howto or at least a better description of the guts of how all this is done would be very much appreciated. Thanks! -- Kimball
On Oct 04, 2005, at 14:32, Joe Toth wrote:> storing binary resources in a database they are resources that will be > created and added to the system dynamically over time.yes.> So, basically on the first request if the web/application > server does not have the image in RAM, it will check the file system, > if not found either, ask the database.You cache everything to the file system using rails'' caches_page, and let the webserver cache the most popular requests to memory if it''s so inclined.> There are more issues to flush out with a system like this, like > comparing database timestamp and file''s timestamp to make sure the web > server has the latest files. But, the basic idea is there.You flush the filesystem cache when the db entry is updated via one of the callback hooks in activerecord. (or with sweepers, which is basically the same thing without the boring part)
On Oct 04, 2005, at 13:13, Jeremy Kemper wrote:> ActiveRecord lazy-loads associations, not attributes. > This does pose a problem for tables with expensive LOB fields. >You can use a custom finder that does not select the blob field, and a custom field accessor that lazily loads it when first called. Should rails implement lazy loading of expensive fields? I mean, is that desirable feature, what are the pros and cons of having it builtin?
On Oct 04, 2005, at 14:15, Kyle Maxwell wrote:> I can''t believe that no one''s mentioned file_column as the easiest way > to handle images in Rails. >What''s that?
On Oct 07, 2005, at 14:04, Caio Chassot wrote:> Should rails implement lazy loading of expensive fields? I mean, is > that desirable feature, what are the pros and cons of having it > builtin? >I see rails trunk implements a :select option for AR::Base.find, so you can override the default value of "*" (in "SELECT * FROM..."). The documentation says it''s useful for joins, but with a little tweaking it looks like it can be used to implement lazy loading when desirable. It seems, though, in the case of lazy loading a more useful approach would be: Model.find :all, :select_except => :my_blob_field Which would be equivalent to Model.find :all, :select => (Model.column_names - [''my_blob_field'']).join(", ")