thr3ads.net - similar to: "really large number of skipped files after a scrub"

Displaying 20 results from an estimated 3000 matches similar to: "really large number of skipped files after a scrub"

2017 Aug 16

Is transport=rdma tested with "stripe"?

> Note that "stripe" is not tested much and practically unmaintained. Ah, this was what I suspected. Understood. I'll be happy with "shard". Having said that, "stripe" works fine with transport=tcp. The failure reproduces with just 2 RDMA servers (with InfiniBand), one of those acts also as a client. I looked into logs. I paste lengthy logs below with

Production Volume will not start

2017 Dec 15

Production Volume will not start

Hi all, I have an issue where our volume will not start from any node. When attempting to start the volume it will eventually return: Error: Request timed out For some time after that, the volume is locked and we either have to wait or restart Gluster services. In the gluserd.log, it shows the following: [2017-12-15 18:00:12.423478] I [glusterd-utils.c:5926:glusterd_brick_start]

Production Volume will not start

2017 Dec 18

Production Volume will not start

On Sat, Dec 16, 2017 at 12:45 AM, Matt Waymack <mwaymack at nsgdv.com> wrote: > Hi all, > > > > I have an issue where our volume will not start from any node. When > attempting to start the volume it will eventually return: > > Error: Request timed out > > > > For some time after that, the volume is locked and we either have to wait > or restart

Enabling Halo sets volume RO

2017 Nov 07

Enabling Halo sets volume RO

Hi all, I'm taking a stab at deploying a storage cluster to explore the Halo AFR feature and running into some trouble. In GCE, I have 4 instances, each with one 10gb brick. 2 instances are in the US and the other 2 are in Asia (with the hope that it will drive up latency sufficiently). The bricks make up a Replica-4 volume. Before I enable halo, I can mount to volume and r/w files. The

[Possibile SPAM] Re: Strange messages in mnt-xxx.log

2018 Jan 17

[Possibile SPAM] Re: Strange messages in mnt-xxx.log

Here's the volume info: Volume Name: gv2a2 Type: Replicate Volume ID: 83c84774-2068-4bfc-b0b9-3e6b93705b9f Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster1:/bricks/brick2/gv2a2 Brick2: gluster3:/bricks/brick3/gv2a2 Brick3: gluster2:/bricks/arbiter_brick_gv2a2/gv2a2 (arbiter) Options Reconfigured: storage.owner-gid: 107

[Possibile SPAM] Re: Strange messages in mnt-xxx.log

2018 Jan 23

[Possibile SPAM] Re: Strange messages in mnt-xxx.log

On 17 January 2018 at 16:04, Ing. Luca Lazzeroni - Trend Servizi Srl < luca at trendservizi.it> wrote: > Here's the volume info: > > > Volume Name: gv2a2 > Type: Replicate > Volume ID: 83c84774-2068-4bfc-b0b9-3e6b93705b9f > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1:

Is transport=rdma tested with "stripe"?

2017 Aug 18

Is transport=rdma tested with "stripe"?

On Wed, Aug 16, 2017 at 4:44 PM, Hatazaki, Takao <takao.hatazaki at hpe.com> wrote: >> Note that "stripe" is not tested much and practically unmaintained. > > Ah, this was what I suspected. Understood. I'll be happy with "shard". > > Having said that, "stripe" works fine with transport=tcp. The failure reproduces with just 2 RDMA servers

Wrong volume size with df

2018 Jan 02

Wrong volume size with df

For what it's worth here, after I added a hot tier to the pool, the brick sizes are now reporting the correct size of all bricks combined instead of just one brick. Not sure if that gives you any clues for this... maybe adding another brick to the pool would have a similar effect? On Thu, Dec 21, 2017 at 11:44 AM, Tom Fite <tomfite at gmail.com> wrote: > Sure! > > > 1 -

Wrong volume size with df

2017 Dec 21

Wrong volume size with df

Sure! > 1 - output of gluster volume heal <volname> info Brick pod-sjc1-gluster1:/data/brick1/gv0 Status: Connected Number of entries: 0 Brick pod-sjc1-gluster2:/data/brick1/gv0 Status: Connected Number of entries: 0 Brick pod-sjc1-gluster1:/data/brick2/gv0 Status: Connected Number of entries: 0 Brick pod-sjc1-gluster2:/data/brick2/gv0 Status: Connected Number of entries: 0 Brick

Blocking IO when hot tier promotion daemon runs

2018 Jan 10

Blocking IO when hot tier promotion daemon runs

The sizes of the files are extremely varied, there are millions of small (<1 MB) files and thousands of files larger than 1 GB. Attached is the tier log for gluster1 and gluster2. These are full of "demotion failed" messages, which is also shown in the status: [root at pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status Node Promoted files Demoted files

Blocking IO when hot tier promotion daemon runs

2018 Jan 10

Blocking IO when hot tier promotion daemon runs

I should add that additional testing has shown that only accessing files is held up, IO is not interrupted for existing transfers. I think this points to the heat metadata in the sqlite DB for the tier, is it possible that a table is temporarily locked while the promotion daemon runs so the calls to update the access count on files are blocked? On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite

Blocking IO when hot tier promotion daemon runs

2018 Jan 18

Blocking IO when hot tier promotion daemon runs

Thanks for the info, Hari. Sorry about the bad gluster volume info, I grabbed that from a file not realizing it was out of date. Here's a current configuration showing the active hot tier: [root at pod-sjc1-gluster1 ~]# gluster volume info Volume Name: gv0 Type: Tier Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 Status: Started Snapshot Count: 13 Number of Bricks: 8 Transport-type: tcp Hot

Blocking IO when hot tier promotion daemon runs

2018 Jan 18

Blocking IO when hot tier promotion daemon runs

Hi Tom, The volume info doesn't show the hot bricks. I think you have took the volume info output before attaching the hot tier. Can you send the volume info of the current setup where you see this issue. The logs you sent are from a later point in time. The issue is hit earlier than the logs what is available in the log. I need the logs from an earlier time. And along with the entire tier

Strange messages in mnt-xxx.log

2018 Jan 16

Strange messages in mnt-xxx.log

Hi, I'm testing gluster 3.12.4 and, by inspecting log files /var/log/glusterfs/mnt-gv0.log (gv0 is the volume name), I found many lines saying: [2018-01-15 09:45:41.066914] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-gv0-dht: Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2018-01-15 09:45:45.755021] I [MSGID: 109063]

Strange messages in mnt-xxx.log

2018 Jan 17

Strange messages in mnt-xxx.log

Hi, On 16 January 2018 at 18:56, Ing. Luca Lazzeroni - Trend Servizi Srl < luca at trendservizi.it> wrote: > Hi, > > I'm testing gluster 3.12.4 and, by inspecting log files > /var/log/glusterfs/mnt-gv0.log (gv0 is the volume name), I found many lines > saying: > > [2018-01-15 09:45:41.066914] I [MSGID: 109063] > [dht-layout.c:716:dht_layout_normalize]

Fwd: Troubleshooting glusterfs

2018 Feb 07

Fwd: Troubleshooting glusterfs

Hello Nithya! Thank you for your help on figuring this out! We changed our configuration and after having a successful test yesterday we have run into new issue today. The test including moderate read/write (~20-30 Mb/s) and scaling the storage was running about 3 hours and at some moment system got stuck: On the user level there are such errors when trying to work with filesystem: OSError:

Fwd: Troubleshooting glusterfs

2018 Feb 05

Fwd: Troubleshooting glusterfs

Hello Nithya! Thank you so much, I think we are close to build a stable storage solution according to your recommendations. Here's our rebalance log - please don't pay attention to error messages after 9AM - this is when we manually destroyed volume to recreate it for further testing. Also all remove-brick operations you could see in the log were executed manually when recreating volume.

Troubleshooting glusterfs

2018 Feb 04

Troubleshooting glusterfs

Please help troubleshooting glusterfs with the following setup: Distributed volume without replication. Sharding enabled. [root at master-5f81bad0054a11e8bf7d0671029ed6b8 uploads]# gluster volume info Volume Name: gv0 Type: Distribute Volume ID: 1a7e05f6-4aa8-48d3-b8e3-300637031925 Status: Started Snapshot Count: 0 Number of Bricks: 27 Transport-type: tcp Bricks: Brick1:

Fwd: Troubleshooting glusterfs

2018 Feb 04

Fwd: Troubleshooting glusterfs

Please help troubleshooting glusterfs with the following setup: Distributed volume without replication. Sharding enabled. # cat /etc/centos-release CentOS release 6.9 (Final) # glusterfs --version glusterfs 3.12.3 [root at master-5f81bad0054a11e8bf7d0671029ed6b8 uploads]# gluster volume info Volume Name: gv0 Type: Distribute Volume ID: 1a7e05f6-4aa8-48d3-b8e3-300637031925 Status:

Blocking IO when hot tier promotion daemon runs

2018 Jan 10

Blocking IO when hot tier promotion daemon runs

Hi, Can you send the volume info, and volume status output and the tier logs. And I need to know the size of the files that are being stored. On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at gmail.com> wrote: > I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server / 3 > bricks per server distributed replicated volume. > > I'm seeing IO get blocked

similar to: really large number of skipped files after a scrub