I''ve been meaning for months to summarize the patterns of use and abuse of the Rails wiki. Today I decided to do this by reviewing all the updates in a typical day - and yesterday (Friday 26th May 2006, GMT) looked a good enough candidate. Getting these figures has involved some hand counting, so they might be a little bit off. There were 373 changes, affecting 57 pages. 34 pages had valid updates (86 changes), and 25 pages were spammed (248 changes) or despammed (39 changes). Two pages had valid as well as spam-related changes. Usful editing activity centred on the RealWorldUsage page. This page started overflowing the wiki''s 64K limit on 14th March 2006, and has suffered from truncation on most updates since then. Yesterday Larry Gilbert split the content between two new pages, RealWorldUsagePage1 and RealWorldUsagePage2, and began bringing lost content back from past versions. This accounted for 23 changes on 3 pages. The other useful activity averaged about 2 changes each on 31 pages. The most heavily spammed page, addentry.php, had 174 updates (and passed version 10,000) yesterday. Nobody bothers to remove spam on that page - the page has never had valid content. The other 74 spammings were spread over 24 pages, and were countered by 39 rollbacks. Some pages are more popular than others for spamming; the popular ones are spammed many times more often than they are rolled back, so only show valid content for a small proportion of the time. Rolling these pages back is tedious, as it involves looking back through many versions for a valid one to roll back to (and the wiki is quite slow). Useful pages suffering from heavy spamming include MySQL, ActiveRecordAssociations, DeadPages (which would be useful if anyone took any notice of it), Contributors, and OpenSourceProjects. Various individuals'' pages are badly hit too, e.g. DimitrySabanin, PabloFlores and DanielVonFange. IP addresses used by spammers are mostly faked, and the user names they use are often randomly generated. I don''t believe the present mechanisms for controlling updates to the wiki are adequate. It''s possible that using a CAPTCHA would improve matters, but I''d rather see registration required for wiki editing. Cookies could be used to remember registered users. As well as stopping bad content getting in, it is high time someone was able to strip the existing junk pages out. I sent Dan a list of about 300 junk pages (pages which had never held any useful content) back in February, and later added those to the DeadPages wiki page, but the pages are still there. (I''ve noticed occasionally since then that someone has, for the first time, put valid content into a page which has been around for months, so trawling for junk pages would have to be done again when someone is ready to start deleting.) Deletion could be logical (like the Attic in CVS) - so long as it removes pages from normal view, including the All Pages and Recently Revised lists. Once these things are sorted out it will be easier to focus on improving the wiki content. Remember the bit about the Broken Window Theory in the Pragmatic Programmer? "In the original experiment leading to the ''Broken Window Theory,'' an abandoned car sat for a week untouched. But once a single window was broken, the car was stripped and turned upside down within hours." The spam and junk pages on the wiki are a broken window in the Rails neighbourhood. Please fix it - I''d rather be contributing value than rolling back spam. regards Justin
On May 27, 2006, at 9:05 AM, Justin Forder wrote:> I''ve been meaning for months to summarize the patterns of use and > abuse of the Rails wiki. Today I decided to do this by reviewing > all the updates in a typical day - and yesterday (Friday 26th May > 2006, GMT) looked a good enough candidate. > >> <snip>> dow was broken, the car was stripped and turned upside down within > hours." > > The spam and junk pages on the wiki are a broken window in the > Rails neighbourhood. Please fix it - I''d rather be contributing > value than rolling back spam. > > regards > > Justin > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-coreI agree that the wiki needs something to happen to make it stand up to the massive amount of spam it gets. I wonder if we should look at what Jim Weirich is working on for the rubygarden wiki? He has written a new wiki engine for it called ruse with a major focus on keeping it spam free. Details here: http://onestepback.org/demos/ruse.htm Thoughts? -Ezra
Ezra Zygmuntowicz schrieb:> I agree that the wiki needs something to happen to make it stand up > to the massive amount of spam it gets.It''s actually quite simple: create a regex filter for words like "x anax", "l oans", "p harmacy", that will keep out 99% of the spam. The rest is easy to deal with.
CAPTCHA is a very good idea - I was hazard a guess and say that will eliminate almost all of the spam altogether. Ezra''s idea is a good secondary. On 5/28/06, Andreas Schwarz <usenet@andreas-s.net> wrote:> Ezra Zygmuntowicz schrieb: > > I agree that the wiki needs something to happen to make it stand up > > to the massive amount of spam it gets. > > It''s actually quite simple: create a regex filter for words like "x > anax", "l oans", "p harmacy", that will keep out 99% of the spam. The > rest is easy to deal with. > > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core >
Ezra Zygmuntowicz wrote:> I agree that the wiki needs something to happen to make it stand up > to the massive amount of spam it gets. I wonder if we should look at > what Jim Weirich is working on for the rubygarden wiki? He has written a > new wiki engine for it called ruse with a major focus on keeping it spam > free. Details here: > > http://onestepback.org/demos/ruse.htm > > Thoughts?Thanks, Ezra - that''s very interesting. His screencast of what used to be involved in despamming the rubygarden wiki is similar to the process I and others go through with the Rails wiki. I see that Ruse is now in live use for rubygarden. The ability to take anonymous edits that then require approval, and the way in which a new registered user goes through a probationary period before becoming trusted, are both good, and the tarpit is a brilliant idea. There''s a summary of the antispam features here: http://wikis.onestepback.org/Ruse/page/show/AntiSpamMeasures The main wiki page for Ruse is here: http://wikis.onestepback.org/Ruse/page/show/RuseWiki Most interestingly, it''s a Rails application. I don''t know yet if the markup is pluggable - I would hope so. regards Justin
Andreas Schwarz wrote:> Ezra Zygmuntowicz schrieb: >> I agree that the wiki needs something to happen to make it stand up >> to the massive amount of spam it gets. > > It''s actually quite simple: create a regex filter for words like "x > anax", "l oans", "p harmacy", that will keep out 99% of the spam. The > rest is easy to deal with.I saved a copy of all the spam I removed earlier today; many pages had the same dating-related links. A new page (liming) was full of links and Chinese text (a little of the text was English, containing items like "Water jet cutting machine" and "CNC Router"). One page had spam about debt consolidation, a couple just had brief messages of the "brilliant site" variety, and none (for a change) related to pharmaceuticals or porn. Blocking specific words is only partly effective, and can interfere with legitimate content. There is some regex-based blocking on the Rails wiki, and it has been reasonably effective in blocking patterns used to hide links. On the other hand, I was forced to edit one page in order to roll it back because the wiki now blocks mail.ru email addresses, and the original page content had a legitimate one. regards Justin