Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?"
2010 Oct 14
1
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
On Wed, Oct 13, 2010 at 11:10 PM, Anton Korobeynikov <
anton at korobeynikov.info> wrote:
> > indexing the llvm.org svn archive. This means that when you search for
> an
> > LLVM-related symbol in code search, you get one of the many (possibly
> > out-of-date) mirrors, rather than the up-to-date llvm.org version. This
> is
> > sad.
> This is intentional. The
2010 Oct 14
0
[LLVMdev] llvm.org robots.txt prevents crawling by Google code search?
> indexing the llvm.org svn archive. This means that when you search for an
> LLVM-related symbol in code search, you get one of the many (possibly
> out-of-date) mirrors, rather than the up-to-date llvm.org version. This is
> sad.
This is intentional. The workload of the server was pretty huge w/o this.
--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics,
2009 Aug 28
4
favicon.ico and robots.txt
Hello,
I'm running an apache 2.2 webserver on centos 5.3. I'm seeing
frequent requests for robots.txt and favicon.ico from the logs those files
should be in the document root area. What are these files, is this something
the rpm installs, or do i have to retrieve or generate them?
Thanks.
Dave.
2012 Nov 17
1
fast parallel crawling of file systems
Hi, I use a disk space inventory tool called TreeSizePro to scan file
filesystems on windows and linux boxes. On Linux systems I export
these shares via samba to scan them. TreeSizePro is multi-threaded (32
crawlers) and I run it on windows 7. I am scanning file systems that
are local to the linux servers and also nfs mounts that are
re-exported via samba.
If I scan a windows 2008 server I can
2006 Apr 16
4
Preventing crawlers on link_to''s
My understanding was that using the :post=>true on a link_to() was supposed
to prevent search engine crawlers from triggering the link. However, this
does not seem to be working for me. Is there something else that I should
be/can be doing to accomplish this? Thanks.
-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2010 Jan 16
3
httpd and robots.txt
would anyone out there care to share their robots.txt experience using
centos as a webserver and their robots.txt files?
i realize this is a somewhat simple exercise, yet i am sure there are both
large and small hosters out there and possibly those that have high traffic
modify their robots.txt files differently that others ???
please share if you can or care to please?
for years we have just
2006 Feb 10
3
robots.txt best practices
I''d been ignoring this error message in my log for a while:
ActionController::RoutingError (Recognition failed for "/robots.txt"):
I had never touched robots.txt. So I decided to make it a proper robots.txtfile
I found this great article...
http://www.ilovejackdaniels.com/seo/robots-txt-file/
...where Dave explains the ins and outs of the file.
Before I changed mine, I
2007 Jul 27
3
Is mechanize thread safe?
Hello all,
I was just wondering if anybody knew whether mechanize is supposed to
be thread-safe or not? I didn''t really find any information about it
anywhere. I''ve been getting a strange error in protocol.rb when I run
a script that uses mechanize in a multi threaded fashion, but not with
a single thread.
I''m trying to write a spider that does multiple gets in
2009 May 12
4
Controlling outbound bandwidth utilization by port
Among other things, I run an http server on my home DSL line
(6M/768kbit). The content includes several large image
galleries, and when certain crawlers hit our server w/
multiple large image uploads, we end up with large
ping time delays - sufficient to disrupt the kids''
on-line gaming. Attempts to control this with robots.txt
has not be very successful; Solaris IPQoS appears quite
2009 Aug 06
2
robots.txt
Hi all,
I have again noticed that the wiki does not really show up in search
results and wonder if it has any impact that robots.txt on
wiki.centos.org is empty.
Perhaps it should at least contain User-agent: * ?
Best Regards
Marcus
2010 Oct 26
2
Opensource Websearch Engine Project
Hi,
I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing programming at the ECE (a french school) and this year, the topic of my project is "google and indexing".
To summarize, it deals with creating my own google in only one year :p !
I saw that you made yourself an opensource websearch engine written in C (Xapian).
I already made the php/CSS interface for my own
2010 Oct 26
2
Opensource Websearch Engine Project
Hi,
I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing programming at the ECE (a french school) and this year, the topic of my project is "google and indexing".
To summarize, it deals with creating my own google in only one year :p !
I saw that you made yourself an opensource websearch engine written in C (Xapian).
I already made the php/CSS interface for my own
2008 Sep 23
1
[LLVMdev] Web Server Problems Persist
Hi John,
> If you run into problems, please email llvmdev. I'll periodically check
> llvm.org to make sure it's still up.
I'm seeing long delays on llvm.org again. Pages are served eventually, but it
takes minutes for each requests.
Are there any dynamic scripts on the server that can eat a lot of resources? I
think the nightly tester result pages would qualify? Perhaps
2011 Nov 11
1
When collected warnings exceeds 50
Hi,
I've been tracking down a memory leak in an rApache application,
http://data.vanderbilt.edu/rapache/bbplot. The code was deployed in
2007 and has survived numerous upgrades of both R and rApache
(including upgrades and bugs in RMySQL). It's written in such a way so
that web crawlers will download every possible URL the app will
create. It's not a high-traffic app, but just about
2011 Jan 07
5
Deployment issues
Hello,
For those of you who have solved the learning hurdle of rails deployment,
all I can say is congratulations! I''m struggling, frustrated that my app,
which runs so well on my linux box, generates such odd errors on my vps and
completely fails to do anything. For example, at the moment my
production.log file has the error:
ActionController::RoutingError (No route matches
2002 Aug 01
1
Outlook/Express Crawling with Domains
I'm gonna try to give as complete a description as I can. Maybe someone
can point me in the right direction, as I haven't seen anything exactly
like this.
I'm attempting to switch from Workgroup to Domain at work. On a couple
machines I did fresh installs of XP, used the registry patch, and
succesffully got logons, roaming profiles working. However, when a user
attempts to open
2007 Feb 08
1
Getting custom field data from the page through crawling
Now on to my next question.. I've got the search and indexing working well for now..
My next quest is to implement a system of creating custom fields in the index. Our site
is fully dynamic. That is, every page is generated in PHP and there are enough
different kinds of pages that I wouldn't want to get into the business of indexing the
DB directly, so I think that using htdig to crawl
2001 Dec 30
1
WARNING: Your email is vulnerable to SPAM Robots!
Dear Email user,
Your email address samba@samba.org was harvested by a SPAM robot. It got
your address from the webpage
http://us1.samba.org/samba/docs/man/smb.conf.5.html
For mor information about SPAM Robots and how you can protect yourself,
Click Here
http://www.email-cloak.com/default.asp?ID=396631&T=35429
Sincerely,
Champ Mitchell
Anti-Spam Services
http://www.email-cloak.com
2002 Aug 02
1
{samba digest, Vol 1 #1475 - 26 msgs} Outlook/Express Crawling with Domains
> Message: 7
> From: "Glover George" <dime@gulfsales.com>
> Also, as a side note, maybe someone might have the answer to this as
> well. Once I switched to a domain, the user can no longer
> open Windows
> Messenger in XP. Is this normal? IS there some setting I need to
> change? (i.e., is Messenger looking for an exchange server
> only when in
>
2008 Mar 25
0
Questions about backgroundrb
Cc''ing to the list for archival purposes:
On Tue, Mar 25, 2008 at 7:55 PM, Brian Noguchi <brian.noguchi at gmail.com> wrote:
> Hi Hemant,
>
> I''m Brian Noguchi, a developer in the Bay Area. I have some questions about
> backgroundrb, and I found your contact info on a forum. I figured its
> probably best to get answers straight from the source.
>
>