thr3ads.net - search: "crawling"

Displaying 20 results from an estimated 640 matches for "crawling".

2018 Jan 08

different names for bricks

I just noticed that gluster volume info foo and gluster volume heal foo statistics use different indices for brick numbers. Info uses 1 based but heal statistics uses 0 based. gluster volume info clifford Volume Name: cliffordType: Distributed- ReplicateVolume ID: 0e33ff98-53e8-40cf-bdb0-3e18406a945aStatus: StartedSnapshot Count: 0Number of Bricks: 2 x 2 = 4Transport-type: tcpBricks:Brick1:

basic rdig setup

2007 Sep 18

basic rdig setup

I''m developing locally on Windows and I have a remote dev box that runs Linux. I''m trying to use RDig just to index using urls, no files. Both use acts_as_ferret for an administrative search that works fine. On the Windows machine, I get no errors, but get no results. On the Linux machine, I get: File Not Found Error occured at <except.c>:93 in xraise Error occured in

Questions about backgroundrb

2008 Mar 25

Questions about backgroundrb

...web crawler runs once, it is then scheduled to run periodically on > a daily basis, saving information to the db but not generating any xml or > json to send back to the view. > The questions I have: > > Is the following the best/most scalable way to implement?...Each site I am > crawling gets its own worker -- e.g., MyspaceWorker. Within each worker, I > have a crawl method that uses concurrency to avoid latency when crawling one > set of several web pages within one website. If 2 users decides to track > two different sets of pages from a given website, then I declare t...

Gluster geo replication volume is faulty

2017 Sep 29

Gluster geo replication volume is faulty

I am trying to get up geo replication between two gluster volumes I have set up two replica 2 arbiter 1 volumes with 9 bricks [root at gfs1 ~]# gluster volume info Volume Name: gfsvol Type: Distributed-Replicate Volume ID: c2fb4365-480b-4d37-8c7d-c3046bca7306 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: gfs2:/gfs/brick1/gv0 Brick2:

Mail slowed down to a crawl...

2018 Nov 15

Mail slowed down to a crawl...

Been moving along just fine for a couple years now, then in the last two days, email has slowed to a crawl. Retrieving email via IMAP is very slow, progress bar in the mail client shows that it is downloading messages almost constantly, like it never closes the connection, or it?s getting the messages very slowly. The server that dovecot/postfix is on has also bogged down to a crawl. The

R package installation (PR#13726)

2009 May 27

R package installation (PR#13726)

Full_Name: Lukasz Andrzej Bartnik Version: 2.8.1 OS: RHELS 5.2 Submission from: (NULL) (194.181.94.250) Compile R for 32 bit on a 64 bit machine: unset LD_LIBRARY_PATH unset R_LD_LIBRARY_PATH export CC="gcc -m32" export CXXFLAGS="-m32 -O2 -g" export FFLAGS="-m32 -O2 -g" export FCFLAGS="-m32 -O2 -g" export OBJCFLAGS="-m32 -O2 -g" export LIBnn=lib

Design Dilemma - Please Help

2006 Oct 23

Design Dilemma - Please Help

Hi, I''m new. ;-) I creating a little rails app, that will crawl the web on a regular basis and then show the results. The crawling will be scheduled, likely a cron job. I can''t wrap my head around where to put my crawler. It doesn''t seem to fit. An example: Model - News Story Controllers - Grabs a story from the DB, Sort the Stories, Search the Stories etc. View - HTML News Story, RSS Story etc. Then a I...

htdig with omega for multiple URLs (websites)

2006 Mar 29

htdig with omega for multiple URLs (websites)

...Htdig looks better than my original idea - wget, you were right. Using htdig, I can crawl and search single website - but I need to integrate search of pages spread over 100+ sites. Learning, learning.... Htdig uses separate document database for every website (one database per URL to initiate crawling). Htdig also can merge result databases to allow search of integrated results. If you still have around the script you said you wrote to use htdig as crawler front-end for omega, I would be really interested to see it. My htdig crawls single site. I need to learn how to crawl multiple sites an...

add geo-replication "passive" node after node replacement

2018 Feb 07

add geo-replication "passive" node after node replacement

Hi all, i had a replica 2 gluster 3.12 between S1 and S2 (1 brick per node) geo-replicated to S5 where both S1 and S2 were visible in the geo-replication status and S2 "active" while S1 "passive". I had to replace S1 with S3, so I did an "add-brick replica 3 S3" and then "remove-brick replica 2 S1". Now I have again a replica 2 gluster between S3 and S2

Gluster geo replication volume is faulty

2017 Oct 06

Gluster geo replication volume is faulty

On 09/29/2017 09:30 PM, rick sanchez wrote: > I am trying to get up geo replication between two gluster volumes > > I have set up two replica 2 arbiter 1 volumes with 9 bricks > > [root at gfs1 ~]# gluster volume info > Volume Name: gfsvol > Type: Distributed-Replicate > Volume ID: c2fb4365-480b-4d37-8c7d-c3046bca7306 > Status: Started > Snapshot Count: 0 > Number

Geo replication snapshot error

2018 Feb 21

Geo replication snapshot error

Hi all, I use gluster 3.12 on centos 7. I am writing a snapshot program for my geo-replicated cluster. Now when I started to run tests with my application I have found a very strange behavior regarding geo-replication in gluster. I have setup my geo-replication according to the docs: http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ Both master and slave clusters are

Getting custom field data from the page through crawling

2007 Feb 08

Getting custom field data from the page through crawling

Now on to my next question.. I've got the search and indexing working well for now.. My next quest is to implement a system of creating custom fields in the index. Our site is fully dynamic. That is, every page is generated in PHP and there are enough different kinds of pages that I wouldn't want to get into the business of indexing the DB directly, so I think that using htdig to crawl

browsers slowing Centos 7 installation to a crawl

2019 Aug 04

browsers slowing Centos 7 installation to a crawl

elinks does not seem to be working for me. I typed in google.com as my first url. There seems to be no way out of google, nor any way further in. No place to type a url. What appears to be the search window is black and does not accept input. Oops. Now I seem to have clicked on google help or something. There seems no way to back up. Ok. Found the left arrow. Still no way to search or to get out

Mail slowed down to a crawl...

2018 Nov 15

Mail slowed down to a crawl...

Den 15.11.2018 21:00, skrev StarionTech (IMAP): > Been moving along just fine for a couple years now, then in the last two days, email has slowed to a crawl. > > Retrieving email via IMAP is very slow, progress bar in the mail client shows that it is downloading messages almost constantly, like it never closes the connection, or it?s getting the messages very slowly. The server that

M$ Photodraw 2000 brings box to a crawl

2000 Jul 28

M$ Photodraw 2000 brings box to a crawl

LTho wrote: > > One of my users is using Microsoft Photodraw 2000 ver 2.0.0.0915. She > can save to the samba server 3, maybe 4 times, after that the box > slows to a crawl; everybody starts getting semaphore timeouts on their > boxes, and linux activities slow to a crawl. On Linux, I'd connect from a test client, find the right smbd process (using smbstatus and ps) and

IMQ slows computer to a crawl

2006 Jan 19

IMQ slows computer to a crawl

I am attempting to implement IMQ on a 2.4.31 version kernel with iptables 1.3.3. I am following the example at http://www.linuximq.net/usage.html. When I enter the line iptables -t mangle -A POSTROUTING -o eth1 -j IMQ --todev1 (eth1 is the external interface), the computer slows to a crawl. OK, the CPU is only an AMD K6 233 which is not the world''s greatest CPU, but egress shaping is

Mail slowed down to a crawl...

2018 Nov 16

Mail slowed down to a crawl...

Yes, I thought all of those things too. Spent a good part of yesterday eliminating them as the cause. top and iotop do not show any issues. No spam filtering on this box (it?s a separate server/software). Drive is 1/3 full. No smart errors. The queue problem (and I don?t know why this isn?t handled better) turned out to be a couple of corrupted messages in the queue. Had about 400 messages

fast parallel crawling of file systems

2012 Nov 17

fast parallel crawling of file systems

Hi, I use a disk space inventory tool called TreeSizePro to scan file filesystems on windows and linux boxes. On Linux systems I export these shares via samba to scan them. TreeSizePro is multi-threaded (32 crawlers) and I run it on windows 7. I am scanning file systems that are local to the linux servers and also nfs mounts that are re-exported via samba. If I scan a windows 2008 server I can

2.2.8 slows down to a crawl within a few days

2003 Apr 17

2.2.8 slows down to a crawl within a few days

We have been using samba for years now and are generally very happy. But since a recent upgrade to 2.2.8 we have seen samba slowing down periodically to a crawl every 2 or 3 days. BTW the start of these problems coincided with the upgrade to samba 2.2.8 and a partial cabling upgrade to 100 MBit, which also coincided with the introduction of the first Win XP Pro machines to our LAN. ;-( We have

browsers slowing Centos 7 installation to a crawl

2019 Aug 04

browsers slowing Centos 7 installation to a crawl

My video problems mentioned in a previous thread are gone, though I do not know why. Now my problem is that whenever I have a browser open and an internet connection, my Centos 7 slows to a crawl. Chromium seems to be the least bad. Sometimes it slows to the point that I cannot even move the mouse. Even switching between virtual terminals takes a while sometimes. When I get there, top generally

search for: crawling