Mike Laurence
2008-Apr-19 21:45 UTC
Thesaurus - use rails/database, or serve files via apache?
I need to serve thesaurus content via AJAX requests. I can think of several ways to do it, but performance will definitely be an issue - if there are thousands upon thousands of requests, I want to make sure it''s as fast and efficient as possible. So, what do you folks think is the optimal way to go about this? The obvious route is to use a controller that queries a database for the word and returns a simple list of synonyms in return, but I wonder if it would be faster to use some sort of caching? I''m pondering "exploding" the thesaurus data out into thousands of folders and subfolders and small text files, and serving it up via Apache: /aar/aardvark If Apache returns something, it would be the list of synonyms at /aar/aardvark. If There is no word there, or it has no synonyms, it could just return a 404, and the AJAX request would deal with the failure appropriately. These folders could be nested enough so that no folder had too many thousands of entries (because that could be a system bottleneck.) Any opinions? Thanks for your input! Mike Laurence mikelaurence.com -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Marc Byrd
2008-Apr-19 21:53 UTC
Re: Thesaurus - use rails/database, or serve files via apache?
Memcached would be great for this. You could even simply store the synonym list for every possible word, which is of course very inefficient from a storage point of view, but then again, all caching is by definition. Marc CloudCache.net Sent from my iPhone On Apr 19, 2008, at 2:45 PM, Mike Laurence <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org > wrote:> > I need to serve thesaurus content via AJAX requests. I can think of > several ways to do it, but performance will definitely be an issue - > if > there are thousands upon thousands of requests, I want to make sure > it''s > as fast and efficient as possible. > > So, what do you folks think is the optimal way to go about this? The > obvious route is to use a controller that queries a database for the > word and returns a simple list of synonyms in return, but I wonder > if it > would be faster to use some sort of caching? I''m pondering "exploding" > the thesaurus data out into thousands of folders and subfolders and > small text files, and serving it up via Apache: > > /aar/aardvark > > If Apache returns something, it would be the list of synonyms at > /aar/aardvark. If There is no word there, or it has no synonyms, it > could just return a 404, and the AJAX request would deal with the > failure appropriately. These folders could be nested enough so that no > folder had too many thousands of entries (because that could be a > system > bottleneck.) > > Any opinions? > > Thanks for your input! > > Mike Laurence > mikelaurence.com > -- > Posted via http://www.ruby-forum.com/. > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Mike Laurence
2008-Apr-19 22:02 UTC
Re: Thesaurus - use rails/database, or serve files via apach
Marc Byrd wrote:> Memcached would be great for this. You could even simply store the > synonym list for every possible word, which is of course very > inefficient from a storage point of view, but then again, all caching > is by definition. > > Marc > CloudCache.net > > > Sent from my iPhone > > On Apr 19, 2008, at 2:45 PM, Mike Laurence > <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.orgThat would be pretty speedy. One issue - the thesaurus data is about 12 MB per language, so if many languages are available, that could be hundreds of MB of RAM tied up. Not terrible, but not ideal. Do you see any issues with the Apache model I mentioned above? I don''t have much experience with Apache, so I''m unsure if there would be performance issues to due large numbers of folders/files in the paths. Thanks! Mike -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Marc Byrd
2008-Apr-19 22:19 UTC
Re: Thesaurus - use rails/database, or serve files via apach
Some sizing estimates: Number of words in a good dictionary: 1M Average length of word: 8 bytes Average number of words in thesaurus for each word: 30 Size of memcached "exploded" thesaurus for each language: 256 MB Cost of a 1.7 GB machine on EC2: $65/month. Serving up thesaurus results fast enough for AJAX: priceless ;^) Cheers, Marc CloudCache.net On Sat, Apr 19, 2008 at 3:02 PM, Mike Laurence < rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > Marc Byrd wrote: > > Memcached would be great for this. You could even simply store the > > synonym list for every possible word, which is of course very > > inefficient from a storage point of view, but then again, all caching > > is by definition. > > > > Marc > > CloudCache.net > > > > > > Sent from my iPhone > > > > On Apr 19, 2008, at 2:45 PM, Mike Laurence > > <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org > > That would be pretty speedy. One issue - the thesaurus data is about 12 > MB per language, so if many languages are available, that could be > hundreds of MB of RAM tied up. Not terrible, but not ideal. > > Do you see any issues with the Apache model I mentioned above? I don''t > have much experience with Apache, so I''m unsure if there would be > performance issues to due large numbers of folders/files in the paths. > > Thanks! > > Mike > -- > Posted via http://www.ruby-forum.com/. > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Mike, My opinion: 1- Cache is way faster than the file system 2- Once it is cached, it doesn''t matter if it comes from the file system or the database 3- Managing your thesaurus in file system could become a big mess So, I would definitely go for DB + memcached. Cheers, Sazima On Apr 19, 7:19 pm, "Marc Byrd" <dr.marc.b...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Some sizing estimates: > > Number of words in a good dictionary: 1M > > Average length of word: 8 bytes > > Average number of words in thesaurus for each word: 30 > > Size of memcached "exploded" thesaurus for each language: 256 MB > > Cost of a 1.7 GB machine on EC2: $65/month. > > Serving up thesaurus results fast enough for AJAX: priceless ;^) > > Cheers, > > Marc > CloudCache.net > > On Sat, Apr 19, 2008 at 3:02 PM, Mike Laurence < > > rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote: > > > Marc Byrd wrote: > > > Memcached would be great for this. You could even simply store the > > > synonym list for every possible word, which is of course very > > > inefficient from a storage point of view, but then again, all caching > > > is by definition. > > > > Marc > > > CloudCache.net > > > > Sent from my iPhone > > > > On Apr 19, 2008, at 2:45 PM, Mike Laurence > > > <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org > > > That would be pretty speedy. One issue - the thesaurus data is about 12 > > MB per language, so if many languages are available, that could be > > hundreds of MB of RAM tied up. Not terrible, but not ideal. > > > Do you see any issues with the Apache model I mentioned above? I don''t > > have much experience with Apache, so I''m unsure if there would be > > performance issues to due large numbers of folders/files in the paths. > > > Thanks! > > > Mike > > -- > > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---