thr3ads.net - similar to: "[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?"

Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?"

[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?

2010 Oct 14

[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?

On Wed, Oct 13, 2010 at 11:10 PM, Anton Korobeynikov < anton at korobeynikov.info> wrote: > > indexing the llvm.org svn archive. This means that when you search for > an > > LLVM-related symbol in code search, you get one of the many (possibly > > out-of-date) mirrors, rather than the up-to-date llvm.org version. This > is > > sad. > This is intentional. The

[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?

2010 Oct 14

[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?

> indexing the llvm.org svn archive. This means that when you search for an > LLVM-related symbol in code search, you get one of the many (possibly > out-of-date) mirrors, rather than the up-to-date llvm.org version. This is > sad. This is intentional. The workload of the server was pretty huge w/o this. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics,

favicon.ico and robots.txt

2009 Aug 28

favicon.ico and robots.txt

Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them? Thanks. Dave.

fast parallel crawling of file systems

2012 Nov 17

fast parallel crawling of file systems

Hi, I use a disk space inventory tool called TreeSizePro to scan file filesystems on windows and linux boxes. On Linux systems I export these shares via samba to scan them. TreeSizePro is multi-threaded (32 crawlers) and I run it on windows 7. I am scanning file systems that are local to the linux servers and also nfs mounts that are re-exported via samba. If I scan a windows 2008 server I can

Preventing crawlers on link_to''s

2006 Apr 16

Preventing crawlers on link_to''s

My understanding was that using the :post=>true on a link_to() was supposed to prevent search engine crawlers from triggering the link. However, this does not seem to be working for me. Is there something else that I should be/can be doing to accomplish this? Thanks. -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL:

httpd and robots.txt

2010 Jan 16

httpd and robots.txt

would anyone out there care to share their robots.txt experience using centos as a webserver and their robots.txt files? i realize this is a somewhat simple exercise, yet i am sure there are both large and small hosters out there and possibly those that have high traffic modify their robots.txt files differently that others ??? please share if you can or care to please? for years we have just

robots.txt best practices

2006 Feb 10

robots.txt best practices

I''d been ignoring this error message in my log for a while: ActionController::RoutingError (Recognition failed for "/robots.txt"): I had never touched robots.txt. So I decided to make it a proper robots.txtfile I found this great article... http://www.ilovejackdaniels.com/seo/robots-txt-file/ ...where Dave explains the ins and outs of the file. Before I changed mine, I

Is mechanize thread safe?

2007 Jul 27

Is mechanize thread safe?

Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn''t really find any information about it anywhere. I''ve been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I''m trying to write a spider that does multiple gets in

Controlling outbound bandwidth utilization by port

2009 May 12

Controlling outbound bandwidth utilization by port

Among other things, I run an http server on my home DSL line (6M/768kbit). The content includes several large image galleries, and when certain crawlers hit our server w/ multiple large image uploads, we end up with large ping time delays - sufficient to disrupt the kids'' on-line gaming. Attempts to control this with robots.txt has not be very successful; Solaris IPQoS appears quite

robots.txt

2009 Aug 06

robots.txt

Hi all, I have again noticed that the wiki does not really show up in search results and wonder if it has any impact that robots.txt on wiki.centos.org is empty. Perhaps it should at least contain User-agent: * ? Best Regards Marcus

Opensource Websearch Engine Project

2010 Oct 26

Opensource Websearch Engine Project

Hi, I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing programming at the ECE (a french school) and this year, the topic of my project is "google and indexing". To summarize, it deals with creating my own google in only one year :p ! I saw that you made yourself an opensource websearch engine written in C (Xapian). I already made the php/CSS interface for my own

Opensource Websearch Engine Project

2010 Oct 26

Opensource Websearch Engine Project

[LLVMdev] Web Server Problems Persist

2008 Sep 23

[LLVMdev] Web Server Problems Persist

Hi John, > If you run into problems, please email llvmdev. I'll periodically check > llvm.org to make sure it's still up. I'm seeing long delays on llvm.org again. Pages are served eventually, but it takes minutes for each requests. Are there any dynamic scripts on the server that can eat a lot of resources? I think the nightly tester result pages would qualify? Perhaps

When collected warnings exceeds 50

2011 Nov 11

When collected warnings exceeds 50

Hi, I've been tracking down a memory leak in an rApache application, http://data.vanderbilt.edu/rapache/bbplot. The code was deployed in 2007 and has survived numerous upgrades of both R and rApache (including upgrades and bugs in RMySQL). It's written in such a way so that web crawlers will download every possible URL the app will create. It's not a high-traffic app, but just about

Deployment issues

2011 Jan 07

Deployment issues

Hello, For those of you who have solved the learning hurdle of rails deployment, all I can say is congratulations! I''m struggling, frustrated that my app, which runs so well on my linux box, generates such odd errors on my vps and completely fails to do anything. For example, at the moment my production.log file has the error: ActionController::RoutingError (No route matches

Outlook/Express Crawling with Domains

2002 Aug 01

Outlook/Express Crawling with Domains

I'm gonna try to give as complete a description as I can. Maybe someone can point me in the right direction, as I haven't seen anything exactly like this. I'm attempting to switch from Workgroup to Domain at work. On a couple machines I did fresh installs of XP, used the registry patch, and succesffully got logons, roaming profiles working. However, when a user attempts to open

Getting custom field data from the page through crawling

2007 Feb 08

Getting custom field data from the page through crawling

Now on to my next question.. I've got the search and indexing working well for now.. My next quest is to implement a system of creating custom fields in the index. Our site is fully dynamic. That is, every page is generated in PHP and there are enough different kinds of pages that I wouldn't want to get into the business of indexing the DB directly, so I think that using htdig to crawl

WARNING: Your email is vulnerable to SPAM Robots!

2001 Dec 30

WARNING: Your email is vulnerable to SPAM Robots!

Dear Email user, Your email address samba@samba.org was harvested by a SPAM robot. It got your address from the webpage http://us1.samba.org/samba/docs/man/smb.conf.5.html For mor information about SPAM Robots and how you can protect yourself, Click Here http://www.email-cloak.com/default.asp?ID=396631&T=35429 Sincerely, Champ Mitchell Anti-Spam Services http://www.email-cloak.com

{samba digest, Vol 1 #1475 - 26 msgs} Outlook/Express Crawling with Domains

2002 Aug 02

{samba digest, Vol 1 #1475 - 26 msgs} Outlook/Express Crawling with Domains

> Message: 7 > From: "Glover George" <dime@gulfsales.com> > Also, as a side note, maybe someone might have the answer to this as > well. Once I switched to a domain, the user can no longer > open Windows > Messenger in XP. Is this normal? IS there some setting I need to > change? (i.e., is Messenger looking for an exchange server > only when in >

Questions about backgroundrb

2008 Mar 25

Questions about backgroundrb

Cc''ing to the list for archival purposes: On Tue, Mar 25, 2008 at 7:55 PM, Brian Noguchi <brian.noguchi at gmail.com> wrote: > Hi Hemant, > > I''m Brian Noguchi, a developer in the Bay Area. I have some questions about > backgroundrb, and I found your contact info on a forum. I figured its > probably best to get answers straight from the source. > >

similar to: [LLVMdev] llvm.org robots.txt prevents crawling by Google code search?