Talin
2010-Oct-13 21:25 UTC
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
One of the tools I use most frequently when coding is Google codesearch. Unfortunately, llvm.org's robots.txt appears to block all crawlers from indexing the llvm.org svn archive. This means that when you search for an LLVM-related symbol in code search, you get one of the many (possibly out-of-date) mirrors, rather than the up-to-date llvm.org version. This is sad. For more info, see the codesearch FAQ entry (item 9): http://www.google.com/intl/en/help/faq_codesearch.html#regexp -- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101013/cf5ec840/attachment.html>
Anton Korobeynikov
2010-Oct-14 06:10 UTC
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
> indexing the llvm.org svn archive. This means that when you search for an > LLVM-related symbol in code search, you get one of the many (possibly > out-of-date) mirrors, rather than the up-to-date llvm.org version. This is > sad.This is intentional. The workload of the server was pretty huge w/o this. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Nick Lewycky
2010-Oct-14 06:18 UTC
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
Anton Korobeynikov wrote:>> indexing the llvm.org svn archive. This means that when you search for an >> LLVM-related symbol in code search, you get one of the many (possibly >> out-of-date) mirrors, rather than the up-to-date llvm.org version. This is >> sad. > This is intentional. The workload of the server was pretty huge w/o this.That was the old server though, wasn't it? Would we actually have any problems if we reenabled this? Nick
Talin
2010-Oct-14 17:28 UTC
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
On Wed, Oct 13, 2010 at 11:10 PM, Anton Korobeynikov < anton at korobeynikov.info> wrote:> > indexing the llvm.org svn archive. This means that when you search for > an > > LLVM-related symbol in code search, you get one of the many (possibly > > out-of-date) mirrors, rather than the up-to-date llvm.org version. This > is > > sad. > This is intentional. The workload of the server was pretty huge w/o this. >Could we at least add a rule allowing the codesearch crawler, rather than opening it up to all crawlers? The user agent string is SVN/1.5.4/GoogleCodeSearch.> > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101014/3b0e8851/attachment.html>