similar to: [LLVMdev] llvm.org robots.txt prevents crawling by Google code search?

Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?"

2010 Oct 14
1
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
On Wed, Oct 13, 2010 at 11:10 PM, Anton Korobeynikov < anton at korobeynikov.info> wrote: > > indexing the llvm.org svn archive. This means that when you search for > an > > LLVM-related symbol in code search, you get one of the many (possibly > > out-of-date) mirrors, rather than the up-to-date llvm.org version. This > is > > sad. > This is intentional. The
2010 Oct 14
0
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
> indexing the llvm.org svn archive. This means that when you search for an > LLVM-related symbol in code search, you get one of the many (possibly > out-of-date) mirrors, rather than the up-to-date llvm.org version. This is > sad. This is intentional. The workload of the server was pretty huge w/o this. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics,
2009 Aug 28
4
favicon.ico and robots.txt
Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them? Thanks. Dave.
2012 Nov 17
1
fast parallel crawling of file systems
Hi, I use a disk space inventory tool called TreeSizePro to scan file filesystems on windows and linux boxes. On Linux systems I export these shares via samba to scan them. TreeSizePro is multi-threaded (32 crawlers) and I run it on windows 7. I am scanning file systems that are local to the linux servers and also nfs mounts that are re-exported via samba. If I scan a windows 2008 server I can
2006 Apr 16
4
Preventing crawlers on link_to''s
My understanding was that using the :post=>true on a link_to() was supposed to prevent search engine crawlers from triggering the link. However, this does not seem to be working for me. Is there something else that I should be/can be doing to accomplish this? Thanks. -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL:
2010 Jan 16
3
httpd and robots.txt
would anyone out there care to share their robots.txt experience using centos as a webserver and their robots.txt files? i realize this is a somewhat simple exercise, yet i am sure there are both large and small hosters out there and possibly those that have high traffic modify their robots.txt files differently that others ??? please share if you can or care to please? for years we have just
2006 Feb 10
3
robots.txt best practices
I''d been ignoring this error message in my log for a while: ActionController::RoutingError (Recognition failed for "/robots.txt"): I had never touched robots.txt. So I decided to make it a proper robots.txtfile I found this great article... http://www.ilovejackdaniels.com/seo/robots-txt-file/ ...where Dave explains the ins and outs of the file. Before I changed mine, I
2007 Jul 27
3
Is mechanize thread safe?
Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn''t really find any information about it anywhere. I''ve been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I''m trying to write a spider that does multiple gets in
2009 May 12
4
Controlling outbound bandwidth utilization by port
Among other things, I run an http server on my home DSL line (6M/768kbit). The content includes several large image galleries, and when certain crawlers hit our server w/ multiple large image uploads, we end up with large ping time delays - sufficient to disrupt the kids'' on-line gaming. Attempts to control this with robots.txt has not be very successful; Solaris IPQoS appears quite
2009 Aug 06
2
robots.txt
Hi all, I have again noticed that the wiki does not really show up in search results and wonder if it has any impact that robots.txt on wiki.centos.org is empty. Perhaps it should at least contain User-agent: * ? Best Regards Marcus
2010 Oct 26
2
Opensource Websearch Engine Project
Hi, I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing programming at the ECE (a french school) and this year, the topic of my project is "google and indexing". To summarize, it deals with creating my own google in only one year :p ! I saw that you made yourself an opensource websearch engine written in C (Xapian). I already made the php/CSS interface for my own
2010 Oct 26
2
Opensource Websearch Engine Project
Hi, I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing programming at the ECE (a french school) and this year, the topic of my project is "google and indexing". To summarize, it deals with creating my own google in only one year :p ! I saw that you made yourself an opensource websearch engine written in C (Xapian). I already made the php/CSS interface for my own
2008 Sep 23
1
[LLVMdev] Web Server Problems Persist
Hi John, > If you run into problems, please email llvmdev. I'll periodically check > llvm.org to make sure it's still up. I'm seeing long delays on llvm.org again. Pages are served eventually, but it takes minutes for each requests. Are there any dynamic scripts on the server that can eat a lot of resources? I think the nightly tester result pages would qualify? Perhaps
2011 Nov 11
1
When collected warnings exceeds 50
Hi, I've been tracking down a memory leak in an rApache application, http://data.vanderbilt.edu/rapache/bbplot. The code was deployed in 2007 and has survived numerous upgrades of both R and rApache (including upgrades and bugs in RMySQL). It's written in such a way so that web crawlers will download every possible URL the app will create. It's not a high-traffic app, but just about
2011 Jan 07
5
Deployment issues
Hello, For those of you who have solved the learning hurdle of rails deployment, all I can say is congratulations! I''m struggling, frustrated that my app, which runs so well on my linux box, generates such odd errors on my vps and completely fails to do anything. For example, at the moment my production.log file has the error: ActionController::RoutingError (No route matches
2002 Aug 01
1
Outlook/Express Crawling with Domains
I'm gonna try to give as complete a description as I can. Maybe someone can point me in the right direction, as I haven't seen anything exactly like this. I'm attempting to switch from Workgroup to Domain at work. On a couple machines I did fresh installs of XP, used the registry patch, and succesffully got logons, roaming profiles working. However, when a user attempts to open
2007 Feb 08
1
Getting custom field data from the page through crawling
Now on to my next question.. I've got the search and indexing working well for now.. My next quest is to implement a system of creating custom fields in the index. Our site is fully dynamic. That is, every page is generated in PHP and there are enough different kinds of pages that I wouldn't want to get into the business of indexing the DB directly, so I think that using htdig to crawl
2001 Dec 30
1
WARNING: Your email is vulnerable to SPAM Robots!
Dear Email user, Your email address samba@samba.org was harvested by a SPAM robot. It got your address from the webpage http://us1.samba.org/samba/docs/man/smb.conf.5.html For mor information about SPAM Robots and how you can protect yourself, Click Here http://www.email-cloak.com/default.asp?ID=396631&T=35429 Sincerely, Champ Mitchell Anti-Spam Services http://www.email-cloak.com
2002 Aug 02
1
{samba digest, Vol 1 #1475 - 26 msgs} Outlook/Express Crawling with Domains
> Message: 7 > From: "Glover George" <dime@gulfsales.com> > Also, as a side note, maybe someone might have the answer to this as > well. Once I switched to a domain, the user can no longer > open Windows > Messenger in XP. Is this normal? IS there some setting I need to > change? (i.e., is Messenger looking for an exchange server > only when in >
2008 Mar 25
0
Questions about backgroundrb
Cc''ing to the list for archival purposes: On Tue, Mar 25, 2008 at 7:55 PM, Brian Noguchi <brian.noguchi at gmail.com> wrote: > Hi Hemant, > > I''m Brian Noguchi, a developer in the Bay Area. I have some questions about > backgroundrb, and I found your contact info on a forum. I figured its > probably best to get answers straight from the source. > >