we have using Instiki 0.10.1 to build our wiki site, wiki.perlchina.org, and after spammed, we using a script to post pages which exported by markup to the new wiki site, and patch the captcha to it, which path provide by James Qiang. and in shortly time period, the google search bot requesting some resouce, and make the site some fatal error, and stop. here is the ticket I submitted, the Instiki output has been pasted on it: http://dev.instiki.org/ticket/215 many perl programmer are waiting the wiki site recover for daily work, so we hope you can check it and make it fine. I am chunzi, I will stay irc://irc.freenode.net/ #perlchina I hope you can join there to solve it together. thanks.
chunzi wrote:>we have using Instiki 0.10.1 to build our wiki site, wiki.perlchina.org, > >Your Instiki ran out of memory, because it had to remember too many page revisions (created by spambots). Recent downtimes of both rubyonrails.org and instiki.org was caused by the same. As I understand it now, this problem (RAM abuse by wikis with a lot of page revisions, usually due to the spam) comes from the fact that Instiki rendering engine (app/models/chunks and friends) unnecessarily keeps in the attributes of Revision objects a lot of data that it doesn''t really need. Namely, it remembers (and stores in Madeleine snapshots) all the chunks (fragments of wiki markup) that it sees while rendering the page. So, let''s say you have a humble page with 2 kb of text. With the cached results of HTML rendering and all the chunks it can occupy, say, ~50 kb of RAM in the ObjectSpace. Multiply that by 200 revisions (thanks to spam / restore cycle), and all of a sudden we are talking 10 Mb for just that one page. SOLUTIONS 1. Survival for now SVN trunk to some extent goes around this problem by clearing all the cached data on startup (although it means 10-20 seconds startup time for relatively large wikis). With it your wiki should be able to start and stay up for at least some time. You will need a robots.txt file on your site to prevent web crawlers from loading old page revisions (this recreates their display cache, which eventually blows the wiki out of RAM). And, I''m afraid, you will have to restart it from time to time. You may also think about performing export/import. This gets rid of all revision history and drastically reduces memory utilisation. Besides, some form of spam blocking (such as the captcha thingie on your backup site, or the content filter that I committed to Instiki SVN couple of days ago) help to prevent the explosion of page revisions. 2. Tactical fix I need to make some changes in what Revision remembers about its display cache. I cannot make it compatible with old storages (which is one big reason why I didn''t do it until now). So you''ll need to do export/import to migrate the storage to this version, and that will delete revisions history. Old Madeleine storage compatibility is subtle and bug-ridden. Which brings me to: 3. Strategic solution Instiki with SQL backend. If you follow dev.instiki.org, you may notice that things are moving in that direction now. Madeleine will have to go. This would take care of a whole list of problems faced by large/public wikis, including the one discussed above. Best regards, Alex
Alex Verhovsky дµÀ:>chunzi wrote: > > > >>we have using Instiki 0.10.1 to build our wiki site, wiki.perlchina.org, >> >> >> >> > >Your Instiki ran out of memory, because it had to remember too many page >revisions (created by spambots). Recent downtimes of both >rubyonrails.org and instiki.org was caused by the same. > >As I understand it now, this problem (RAM abuse by wikis with a lot of >page revisions, usually due to the spam) comes from the fact that >Instiki rendering engine (app/models/chunks and friends) unnecessarily >keeps in the attributes of Revision objects a lot of data that it >doesn''t really need. Namely, it remembers (and stores in Madeleine >snapshots) all the chunks (fragments of wiki markup) that it sees while >rendering the page. > >So, let''s say you have a humble page with 2 kb of text. With the cached >results of HTML rendering and all the chunks it can occupy, say, ~50 kb >of RAM in the ObjectSpace. Multiply that by 200 revisions (thanks to >spam / restore cycle), and all of a sudden we are talking 10 Mb for just >that one page. > >SOLUTIONS > >1. Survival for now > >SVN trunk to some extent goes around this problem by clearing all the >cached data on startup (although it means 10-20 seconds startup time for >relatively large wikis). With it your wiki should be able to start and >stay up for at least some time. You will need a robots.txt file on your >site to prevent web crawlers from loading old page revisions (this >recreates their display cache, which eventually blows the wiki out of >RAM). And, I''m afraid, you will have to restart it from time to time. >You may also think about performing export/import. This gets rid of all >revision history and drastically reduces memory utilisation. Besides, >some form of spam blocking (such as the captcha thingie on your backup >site, or the content filter that I committed to Instiki SVN couple of >days ago) help to prevent the explosion of page revisions. > >2. Tactical fix > >I need to make some changes in what Revision remembers about its display >cache. I cannot make it compatible with old storages (which is one big >reason why I didn''t do it until now). So you''ll need to do export/import >to migrate the storage to this version, and that will delete revisions >history. Old Madeleine storage compatibility is subtle and bug-ridden. >Which brings me to: > >3. Strategic solution > >Instiki with SQL backend. If you follow dev.instiki.org, you may notice >that things are moving in that direction now. Madeleine will have to go. >This would take care of a whole list of problems faced by large/public >wikis, including the one discussed above. > >Best regards, >Alex > >_______________________________________________ >Instiki-users mailing list >Instiki-users@rubyforge.org >http://rubyforge.org/mailman/listinfo/instiki-users > > > >well, thanks a lot. And I make the wiki/rollback.rhtml line 2 @title = "Rollback to #{@page.plain_name} Rev ##{@revision.number}" to be @title = "Rollback to #{@page.plain_name} Rev ##{@revision.number}" if @revision and line 13: <textarea name="content" style="font-size: 12px; width: 450px; height: 500px"><%= @revision.content %></textarea> to be <textarea name="content" style="font-size: 12px; width: 450px; height: 500px"><%= @revision.content if @revision %></textarea> and till now, the problem seems gone.( of course, the captcha helps refuse spam bot) we using Instiki because it''s simple enough, which let us concerate on the businiess, so we hope the Instiki continue to be that, thanks again.
> SOLUTIONS > > 1. Survival for now > > SVN trunk to some extent goes around this problem by clearing all the > cached data on startup (although it means 10-20 seconds startup time for > relatively large wikis). With it your wiki should be able to start and > stay up for at least some time. You will need a robots.txt file on your > site to prevent web crawlers from loading old page revisions (this > recreates their display cache, which eventually blows the wiki out of > RAM). And, I''m afraid, you will have to restart it from time to time. > You may also think about performing export/import. This gets rid of all > revision history and drastically reduces memory utilisation. Besides,if i understand you correctly, the more revisions there are, the bigger the memory problem is. we have restored our wiki site and all pages have no revision, that should fix the memory problem at least for now, right? can you specify which file in svn trunk is needed to achieve the solution one? is this the solution that rubyonrails.org using? Qiang __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
James.Q.L wrote:>if i understand you correctly, the more revisions there are, the bigger the memory problem is. we >have restored our wiki site and all pages have no revision, that should fix the memory problem at >least for now, right? > >Yes.>can you specify which file in svn trunk is needed to achieve the solution one? >is this the solution that rubyonrails.org using? > >It''s these two pieces from wiki_service.rb: 1. A method to clear all rendering caches on all revisions of all pages in the system # One interesting property of Madeleine as persistence mechanism is that it saves # (and restores) the whole ObjectSpace. And in there, storage from older version may contain # who knows what in temporary variables, such as caches of various kinds. # The reason why it is nearly impossible to control is that there may be bugs, people may # use modified versions of things, etc etc etc # Therefore, upon loading the storage from a file, it is a good idea to clear all such # variables. It would be better yet if Madeleine could be somehow instructed not to save that # data in a snapshot at all. Alas, such a feature is not presently available. def clear_all_caches return if @system.webs.nil? @system.webs.each_value do |web| next if web.nil? or web.pages.nil? web.pages.each_value do |page| next if page.nil? or page.revisions.nil? page.revisions.each { |revision| revision.clear_display_cache } end end end 2. A place that invokes the above: def instance @madeleine ||= MadeleineServer.new(self) @system = @madeleine.system clear_all_caches return @system end In the next release I will probably make clearing of the caches a command-line option, as it is quite useless to call it on every startup, but it takes quite a it of time to re-render all revisions. Alex