thr3ads.net - similar to: "Thousands of EPOLLERR

Displaying 20 results from an estimated 500 matches similar to: "Thousands of EPOLLERR - disconnecting now"

Thousands of EPOLLERR - disconnecting now

2018 Feb 08

Thousands of EPOLLERR - disconnecting now

On Thu, Feb 8, 2018 at 2:04 PM, Gino Lisignoli <glisignoli at gmail.com> wrote: > Hello > > I have a large cluster in which every node is logging: > > I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - > disconnecting now > > At a rate of of around 4 or 5 per second per node, which is adding up to a > lot of messages. This seems to happen while my

Clear heal statistics

2018 Jan 07

Clear heal statistics

Is there any way to clear the historic statistic from the command "gluster volume heal <volume_name> statistics" ? It seems the command takes longer and longer to run each time it is used, to the point where it times out and no longer works. -------------- next part -------------- An HTML attachment was scrubbed... URL:

Brick and Subvolume Info

2017 Nov 21

Brick and Subvolume Info

Hello I have a Distributed-Replicate volume and I would like to know if it is possible to see what sub-volume a brick belongs to, eg: A Distributed-Replicate volume containing: Number of Bricks: 2 x 2 = 4 Brick1: node1.localdomain:/mnt/data1/brick1 Brick2: node2.localdomain:/mnt/data1/brick1 Brick3: node1.localdomain:/mnt/data2/brick2 Brick4: node2.localdomain:/mnt/data2/brick2 Is it possible

Glusterd not working with systemd in redhat 7

2017 Aug 21

Glusterd not working with systemd in redhat 7

Hi! Please see bellow. Note that web1.dasilva.network is the address of the local machine where one of the bricks is installed and that ties to mount. [2017-08-20 20:30:40.359236] I [MSGID: 100030] [glusterfsd.c:2476:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.11.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2017-08-20 20:30:40.973249] I [MSGID: 106478]

Glusterd not working with systemd in redhat 7

2017 Aug 21

Glusterd not working with systemd in redhat 7

On Mon, Aug 21, 2017 at 2:49 AM, Cesar da Silva <thunderlight1 at gmail.com> wrote: > Hi! > I am having same issue but I am running Ubuntu v16.04. > It does not mount during boot, but works if I mount it manually. I am > running the Gluster-server on the same machines (3 machines) > Here is the /tc/fstab file > > /dev/sdb1 /data/gluster ext4 defaults 0 0 > >

Glusterd not working with systemd in redhat 7

2017 Aug 20

Glusterd not working with systemd in redhat 7

Hi! I am having same issue but I am running Ubuntu v16.04. It does not mount during boot, but works if I mount it manually. I am running the Gluster-server on the same machines (3 machines) Here is the /tc/fstab file /dev/sdb1 /data/gluster ext4 defaults 0 0 web1.dasilva.network:/www /mnt/glusterfs/www glusterfs defaults,_netdev,log-level=debug,log-file=/var/log/gluster.log 0 0

[3.11.2] Bricks disconnect from gluster with 0-transport: EPOLLERR

2017 Aug 06

[3.11.2] Bricks disconnect from gluster with 0-transport: EPOLLERR

Hi, I have a distributed volume which runs on Fedora 26 systems with glusterfs 3.11.2 from gluster.org repos: ---------- [root at taupo ~]# glusterd --version glusterfs 3.11.2 gluster> volume info gv2 Volume Name: gv2 Type: Distribute Volume ID: 6b468f43-3857-4506-917c-7eaaaef9b6ee Status: Started Snapshot Count: 0 Number of Bricks: 6 Transport-type: tcp Bricks: Brick1:

[3.11.2] Bricks disconnect from gluster with 0-transport: EPOLLERR

2017 Sep 13

[3.11.2] Bricks disconnect from gluster with 0-transport: EPOLLERR

I ran into something like this in 3.10.4 and filed two bugs for it: https://bugzilla.redhat.com/show_bug.cgi?id=1491059 https://bugzilla.redhat.com/show_bug.cgi?id=1491060 Please see the above bugs for full detail. In summary, my issue was related to glusterd's pid handling of pid files when is starts self-heal and bricks. The issues are: a. brick pid file leaves stale pid and brick fails

Brick process not starting after reinstall

2018 Mar 21

Brick process not starting after reinstall

Hi all, our systems have suffered a host failure in a replica three setup. The host needed a complete reinstall. I followed the RH guide to 'replace a host with the same hostname' (https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-replacing_hosts). The machine has the same OS (CentOS 7). The new machine got a minor version number newer

Brick process not starting after reinstall

2018 Mar 21

Brick process not starting after reinstall

Could you share the following information: 1. gluster --version 2. output of gluster volume status 3. glusterd log and all brick log files from the node where bricks didn't come up. On Wed, Mar 21, 2018 at 12:35 PM, Richard Neuboeck <hawk at tbi.univie.ac.at> wrote: > Hi all, > > our systems have suffered a host failure in a replica three setup. > The host needed a

Auth process sometimes stop responding after upgrade

2018 Sep 07

Auth process sometimes stop responding after upgrade

In data venerd? 7 settembre 2018 10:06:00 CEST, Sami Ketola ha scritto: > > On 7 Sep 2018, at 11.00, Simone Lazzaris <s.lazzaris at interactive.eu> > > wrote: > > > > > > The only suspect thing is this: > > > > Sep 6 14:45:41 imap-front13 dovecot: director: doveadm: Host > > 192.168.1.142 > > vhost count changed from 100 to 0 >

Dovecot v2.3.5 released

2019 Mar 08

Dovecot v2.3.5 released

On 7.3.2019 23.37, A. Schulze via dovecot wrote: > > Am 07.03.19 um 17:33 schrieb Aki Tuomi via dovecot: > >>> test-http-client-errors.c:2989: Assert failed: FALSE >>> connection timed out ................................................. : FAILED > Hello Aki, > >> Are you running with valgrind or on really slow system? > I'm not aware my buildsystem

Is transport=rdma tested with "stripe"?

2017 Aug 15

Is transport=rdma tested with "stripe"?

On Tue, Aug 15, 2017 at 01:04:11PM +0000, Hatazaki, Takao wrote: > Ji-Hyeon, > > You're saying that "stripe=2 transport=rdma" should work. Ok, that > was firstly I wanted to know. I'll put together logs later this week. Note that "stripe" is not tested much and practically unmaintained. We do not advise you to use it. If you have large files that you

Auth process sometimes stop responding after upgrade

2018 Sep 07

Auth process sometimes stop responding after upgrade

Some more information: the issue has just occurred, again on an instance without the "service_count = 0" configuration directive on pop3-login. I've observed that while the issue is occurring, the director process goes 100% CPU. I've straced the process. It is seemingly looping: ... ... epoll_ctl(13, EPOLL_CTL_ADD, 78, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=149035320,

dovecot auth using 100% CPU

2015 Jun 21

dovecot auth using 100% CPU

Every few days I find that dovecot auth is using all my CPU. This is from dovecot 2.2.13, I've just upgraded to 2.2.18 strace -r -p 17956 output: Process 17956 attached 0.000000 lseek(19, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) 0.000057 getsockname(19, {sa_family=AF_LOCAL, NULL}, [2]) = 0 0.000043 epoll_ctl(15, EPOLL_CTL_ADD, 19, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP,

Is transport=rdma tested with "stripe"?

2017 Aug 16

Is transport=rdma tested with "stripe"?

> Note that "stripe" is not tested much and practically unmaintained. Ah, this was what I suspected. Understood. I'll be happy with "shard". Having said that, "stripe" works fine with transport=tcp. The failure reproduces with just 2 RDMA servers (with InfiniBand), one of those acts also as a client. I looked into logs. I paste lengthy logs below with

pop3 TCP_CORK too late error

2010 Oct 10

pop3 TCP_CORK too late error

I was straceing a pop3 process and noticed that the TCP_CORK option isn't set soon enough: epoll_wait(8, {{EPOLLOUT, {u32=37481984, u64=37481984}}}, 38, 207) = 1 write(41, "iTxPBrNlaNFao+yQzLhuO4/+tQ5cuiKSe"..., 224) = 224 epoll_ctl(8, EPOLL_CTL_MOD, 41, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=37481984, u64=37481984}}) = 0 pread(19,

Auth process sometimes stop responding after upgrade

2018 Sep 07

Auth process sometimes stop responding after upgrade

On 7 Sep 2018, at 19.43, Timo Sirainen <tss at iki.fi> wrote: > > On 7 Sep 2018, at 16.50, Simone Lazzaris <s.lazzaris at interactive.eu <mailto:s.lazzaris at interactive.eu>> wrote: >> >> Some more information: the issue has just occurred, again on an instance without the "service_count = 0" configuration directive on pop3-login. >> >>

[PATCH net V2] vhost: correctly remove wait queue during poll failure

2018 Mar 27

[PATCH net V2] vhost: correctly remove wait queue during poll failure

We tried to remove vq poll from wait queue, but do not check whether or not it was in a list before. This will lead double free. Fixing this by switching to use vhost_poll_stop() which zeros poll->wqh after removing poll from waitqueue to make sure it won't be freed twice. Cc: Darren Kenny <darren.kenny at oracle.com> Reported-by: syzbot+c0272972b01b872e604a at

[PATCH net V2] vhost: correctly remove wait queue during poll failure

2018 Mar 27

[PATCH net V2] vhost: correctly remove wait queue during poll failure

similar to: Thousands of EPOLLERR - disconnecting now