Milind - Thank you for the response..>> What are the high and low watermarks for the tier set at ?# gluster volume get <vol> cluster.watermark-hi Option Value ------ ----- cluster.watermark-hi 90 # gluster volume get <vol> cluster.watermark-low Option Value ------ ----- cluster.watermark-low 75>> What is the size of the file that failed to migrate as per the followingtierd log:>> [2017-10-19 17:52:07.519614] I [MSGID: 109038][tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) The file was a word doc @ 29K in size.>>If possible, a *gluster volume info* would also help, instead of going toand fro with questions. # gluster vol info Volume Name: ctdb Type: Replicate Volume ID: f679c476-e0dd-4f3a-9813-1b26016b5384 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: <node1>:/mnt/ctdb_local/brick Brick2: <node2>:/mnt/ctdb_local/brick Options Reconfigured: nfs.disable: on transport.address-family: inet Volume Name: <vol> Type: Tier Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad Status: Started Snapshot Count: 0 Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: <node2>:/mnt/brick_nvme1/brick Brick2: <node1>:/mnt/brick_nvme2/brick Brick3: <node2>:/mnt/brick_nvme2/brick Brick4: <node1>:/mnt/brick_nvme1/brick Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: <node1>:/mnt/brick1/brick Brick6: <node2>:/mnt/brick2/brick Brick7: <node1>:/mnt/brick2/brick Brick8: <node2>:/mnt/brick1/brick Options Reconfigured: cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 performance.write-behind-window-size: 4MB performance.cache-size: 16GB features.quota-deem-statfs: on features.inode-quota: on features.quota: on nfs.disable: on transport.address-family: inet features.ctr-enabled: on cluster.tier-mode: cache performance.io-cache: off performance.quick-read: off cluster.tier-max-files: 1000000 HB On Sun, Oct 22, 2017 at 8:41 AM, Milind Changire <mchangir at redhat.com> wrote:> Herb, > What are the high and low watermarks for the tier set at ? > > # gluster volume get <vol> cluster.watermark-hi > > # gluster volume get <vol> cluster.watermark-low > > What is the size of the file that failed to migrate as per the following > tierd log: > > [2017-10-19 17:52:07.519614] I [MSGID: 109038] > [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion > failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) > > If possible, a *gluster volume info* would also help, instead of going to > and fro with questions. > > -- > Milind > > > > On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell < > herbert.burnswell at gmail.com> wrote: > >> All, >> >> I am new to gluster and have some questions/concerns about some tiering >> errors that I see in the log files. >> >> OS: CentOs 7.3.1611 >> Gluster version: 3.10.5 >> Samba version: 4.6.2 >> >> I see the following (scrubbed): >> >> Node 1 /var/log/glusterfs/tier/<vol>/tierd.log: >> >> [2017-10-19 17:52:07.519614] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) >> [2017-10-19 17:52:07.525110] E [MSGID: 109011] >> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >> path=/path/to/<file> >> [2017-10-19 17:52:07.526088] E [MSGID: 109023] >> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >> failed to create <file> on <vol>-hot-dht [Input/output error] >> [2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >> [2017-10-19 17:52:07.527214] E [MSGID: 109037] >> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >> [No space left on device] >> [2017-10-19 17:52:07.527244] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8) >> [2017-10-19 17:52:07.533510] E [MSGID: 109011] >> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >> path=/path/to/<file> >> [2017-10-19 17:52:07.534434] E [MSGID: 109023] >> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >> failed to create <file> on <vol>-hot-dht [Input/output error] >> [2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >> [2017-10-19 17:52:07.535570] E [MSGID: 109037] >> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >> [No space left on device] >> [2017-10-19 17:52:07.535594] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d) >> [2017-10-19 17:52:07.541363] E [MSGID: 109011] >> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >> path=/path/to/<file> >> [2017-10-19 17:52:07.542296] E [MSGID: 109023] >> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >> failed to create <file> on <vol>-hot-dht [Input/output error] >> [2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >> [2017-10-19 17:52:07.543480] E [MSGID: 109037] >> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >> [No space left on device] >> [2017-10-19 17:52:07.543521] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f) >> [2017-10-19 17:52:07.549959] E [MSGID: 109011] >> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >> path=/path/to/<file> >> [2017-10-19 17:52:07.550901] E [MSGID: 109023] >> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >> failed to create <file> on <vol>-hot-dht [Input/output error] >> [2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >> [2017-10-19 17:52:07.551896] E [MSGID: 109037] >> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >> [No space left on device] >> [2017-10-19 17:52:07.551917] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb) >> [2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run] >> 0-<vol>-tier-dht: Promotion failed >> >> Node 1 /var/log/samba/glusterfs-<vol>-pool.log: >> >> [2017-10-18 17:13:41.481860] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote >> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >> [Invalid argument] >> [2017-10-18 17:13:41.481860] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote >> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >> [Invalid argument] >> [2017-10-18 17:13:41.485916] E [MSGID: 109089] >> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file >> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument] >> [2017-10-18 17:13:41.488223] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote >> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >> [Invalid argument] >> [2017-10-18 17:13:41.488235] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote >> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >> [Invalid argument] >> [2017-10-18 17:13:41.489060] E [MSGID: 109089] >> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file >> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument] >> [2017-10-18 17:13:42.339936] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >> [Invalid argument] >> [2017-10-18 17:13:42.339988] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote >> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >> [Invalid argument] >> [2017-10-18 17:13:42.343769] E [MSGID: 109089] >> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file >> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >> [2017-10-18 17:13:42.345374] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >> [Invalid argument] >> [2017-10-18 17:13:42.345401] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote >> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >> [Invalid argument] >> [2017-10-18 17:13:42.346259] E [MSGID: 109089] >> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file >> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >> [2017-10-18 17:13:59.541591] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> [2017-10-18 17:13:59.541748] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> [2017-10-18 17:13:59.541887] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> [2017-10-18 17:13:59.541977] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> >> Node 2 /var/log/gluster/tier/<vol>/tierd.log: >> >> [2017-10-16 15:54:08.662873] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0) >> [2017-10-16 16:00:07.201584] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464) >> [2017-10-16 16:00:07.372559] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5) >> [2017-10-16 16:06:06.880522] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a) >> [2017-10-16 16:06:08.062080] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d) >> [2017-10-16 16:06:08.288298] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1) >> [2017-10-18 15:55:06.446416] I [MSGID: 109028] >> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is in >> progress. Time taken is 1376671.00 secs >> [2017-10-18 15:55:06.446433] I [MSGID: 109028] >> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files migrated: >> 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0 >> [2017-10-19 00:00:00.501576] I [MSGID: 109038] >> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction on >> cold tier >> [2017-10-19 00:00:00.502016] I [MSGID: 109038] >> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on >> cold tier >> [2017-10-19 00:00:00.501608] I [MSGID: 109038] >> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction on >> cold tier >> [2017-10-19 00:00:00.502076] I [MSGID: 109038] >> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on >> cold tier >> [2017-10-19 16:03:49.522991] I [MSGID: 109028] >> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is in >> progress. Time taken is 1463594.00 secs >> [2017-10-19 16:03:49.523017] I [MSGID: 109028] >> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files migrated: >> 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0 >> >> Node 2 /var/log/samba/glusterfs-<vol>-pool.log: >> >> [2017-10-18 16:49:09.218062] E [MSGID: 114031] >> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >> [Invalid argument] >> [2017-10-18 16:49:09.218254] E [MSGID: 109089] >> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >> 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file >> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >> [2017-10-18 16:49:09.222783] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> [2017-10-18 16:49:09.222912] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> [2017-10-18 16:49:09.223079] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> [2017-10-18 16:49:09.223200] E [MSGID: 108006] >> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are >> down. Going offline until atleast one of them comes back up. >> >> Status: >> >> # gluster vol tier <vol> status >> >> Node Promoted files Demoted files Status >> run time in h:m:s >> --------- --------- --------- >> --------- --------- >> Node1 190861 0 in >> progress 408:34:13 >> Node2 0 0 >> in progress 408:34:14 >> >> Hot tier bricks: >> >> # df -h >> >> /dev/mapper/vg_bricks-brick_nvme1 1.4T 551G 883G 39% >> /mnt/brick_nvme1 >> /dev/mapper/vg_bricks-brick_nvme2 1.4T 512G 922G 36% >> /mnt/brick_nvme2 >> >> >> Can anyone point me in the right direction as to what may be going on? >> Any guidance is greatly appreciated. >> >> Thanks in advance, >> >> HB >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Milind > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171024/8280cfc5/attachment.html>
Herb, I'm trying to weed out issues here. So, I can see quota turned *on* and would like you to check the quota settings and test to see system behavior *if quota is turned off*. Although the file size that failed migration was 29K, I'm being a bit paranoid while weeding out issues. Are you still facing tiering errors ? I can see your response to Alex with the disk space consumption and found it a bit ambiguous w.r.t. state of affairs. -- Milind On Tue, Oct 24, 2017 at 11:34 PM, Herb Burnswell < herbert.burnswell at gmail.com> wrote:> Milind - Thank you for the response.. > > >> What are the high and low watermarks for the tier set at ? > > # gluster volume get <vol> cluster.watermark-hi > Option Value > > ------ ----- > > cluster.watermark-hi 90 > > > # gluster volume get <vol> cluster.watermark-low > Option Value > > ------ ----- > > cluster.watermark-low 75 > > > > >> What is the size of the file that failed to migrate as per the > following tierd log: > > >> [2017-10-19 17:52:07.519614] I [MSGID: 109038] > [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion > failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) > > The file was a word doc @ 29K in size. > > >>If possible, a *gluster volume info* would also help, instead of going > to and fro with questions. > > # gluster vol info > > Volume Name: ctdb > Type: Replicate > Volume ID: f679c476-e0dd-4f3a-9813-1b26016b5384 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: <node1>:/mnt/ctdb_local/brick > Brick2: <node2>:/mnt/ctdb_local/brick > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > > Volume Name: <vol> > Type: Tier > Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad > Status: Started > Snapshot Count: 0 > Number of Bricks: 8 > Transport-type: tcp > Hot Tier : > Hot Tier Type : Distributed-Replicate > Number of Bricks: 2 x 2 = 4 > Brick1: <node2>:/mnt/brick_nvme1/brick > Brick2: <node1>:/mnt/brick_nvme2/brick > Brick3: <node2>:/mnt/brick_nvme2/brick > Brick4: <node1>:/mnt/brick_nvme1/brick > Cold Tier: > Cold Tier Type : Distributed-Replicate > Number of Bricks: 2 x 2 = 4 > Brick5: <node1>:/mnt/brick1/brick > Brick6: <node2>:/mnt/brick2/brick > Brick7: <node1>:/mnt/brick2/brick > Brick8: <node2>:/mnt/brick1/brick > Options Reconfigured: > cluster.lookup-optimize: on > client.event-threads: 4 > server.event-threads: 4 > performance.write-behind-window-size: 4MB > performance.cache-size: 16GB > features.quota-deem-statfs: on > features.inode-quota: on > features.quota: on > nfs.disable: on > transport.address-family: inet > features.ctr-enabled: on > cluster.tier-mode: cache > performance.io-cache: off > performance.quick-read: off > cluster.tier-max-files: 1000000 > > > HB > > > > > On Sun, Oct 22, 2017 at 8:41 AM, Milind Changire <mchangir at redhat.com> > wrote: > >> Herb, >> What are the high and low watermarks for the tier set at ? >> >> # gluster volume get <vol> cluster.watermark-hi >> >> # gluster volume get <vol> cluster.watermark-low >> >> What is the size of the file that failed to migrate as per the following >> tierd log: >> >> [2017-10-19 17:52:07.519614] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) >> >> If possible, a *gluster volume info* would also help, instead of going >> to and fro with questions. >> >> -- >> Milind >> >> >> >> On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell < >> herbert.burnswell at gmail.com> wrote: >> >>> All, >>> >>> I am new to gluster and have some questions/concerns about some tiering >>> errors that I see in the log files. >>> >>> OS: CentOs 7.3.1611 >>> Gluster version: 3.10.5 >>> Samba version: 4.6.2 >>> >>> I see the following (scrubbed): >>> >>> Node 1 /var/log/glusterfs/tier/<vol>/tierd.log: >>> >>> [2017-10-19 17:52:07.519614] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) >>> [2017-10-19 17:52:07.525110] E [MSGID: 109011] >>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>> path=/path/to/<file> >>> [2017-10-19 17:52:07.526088] E [MSGID: 109023] >>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >>> failed to create <file> on <vol>-hot-dht [Input/output error] >>> [2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>> [2017-10-19 17:52:07.527214] E [MSGID: 109037] >>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>> [No space left on device] >>> [2017-10-19 17:52:07.527244] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8) >>> [2017-10-19 17:52:07.533510] E [MSGID: 109011] >>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>> path=/path/to/<file> >>> [2017-10-19 17:52:07.534434] E [MSGID: 109023] >>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >>> failed to create <file> on <vol>-hot-dht [Input/output error] >>> [2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>> [2017-10-19 17:52:07.535570] E [MSGID: 109037] >>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>> [No space left on device] >>> [2017-10-19 17:52:07.535594] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d) >>> [2017-10-19 17:52:07.541363] E [MSGID: 109011] >>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>> path=/path/to/<file> >>> [2017-10-19 17:52:07.542296] E [MSGID: 109023] >>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >>> failed to create <file> on <vol>-hot-dht [Input/output error] >>> [2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>> [2017-10-19 17:52:07.543480] E [MSGID: 109037] >>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>> [No space left on device] >>> [2017-10-19 17:52:07.543521] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f) >>> [2017-10-19 17:52:07.549959] E [MSGID: 109011] >>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>> path=/path/to/<file> >>> [2017-10-19 17:52:07.550901] E [MSGID: 109023] >>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: >>> failed to create <file> on <vol>-hot-dht [Input/output error] >>> [2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>> [2017-10-19 17:52:07.551896] E [MSGID: 109037] >>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>> [No space left on device] >>> [2017-10-19 17:52:07.551917] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb) >>> [2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run] >>> 0-<vol>-tier-dht: Promotion failed >>> >>> Node 1 /var/log/samba/glusterfs-<vol>-pool.log: >>> >>> [2017-10-18 17:13:41.481860] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote >>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>> [Invalid argument] >>> [2017-10-18 17:13:41.481860] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote >>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>> [Invalid argument] >>> [2017-10-18 17:13:41.485916] E [MSGID: 109089] >>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file >>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument] >>> [2017-10-18 17:13:41.488223] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote >>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>> [Invalid argument] >>> [2017-10-18 17:13:41.488235] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote >>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>> [Invalid argument] >>> [2017-10-18 17:13:41.489060] E [MSGID: 109089] >>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file >>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument] >>> [2017-10-18 17:13:42.339936] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>> [Invalid argument] >>> [2017-10-18 17:13:42.339988] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote >>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>> [Invalid argument] >>> [2017-10-18 17:13:42.343769] E [MSGID: 109089] >>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file >>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >>> [2017-10-18 17:13:42.345374] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>> [Invalid argument] >>> [2017-10-18 17:13:42.345401] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote >>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>> [Invalid argument] >>> [2017-10-18 17:13:42.346259] E [MSGID: 109089] >>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file >>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >>> [2017-10-18 17:13:59.541591] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> [2017-10-18 17:13:59.541748] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> [2017-10-18 17:13:59.541887] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> [2017-10-18 17:13:59.541977] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> >>> Node 2 /var/log/gluster/tier/<vol>/tierd.log: >>> >>> [2017-10-16 15:54:08.662873] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0) >>> [2017-10-16 16:00:07.201584] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464) >>> [2017-10-16 16:00:07.372559] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5) >>> [2017-10-16 16:06:06.880522] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a) >>> [2017-10-16 16:06:08.062080] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d) >>> [2017-10-16 16:06:08.288298] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1) >>> [2017-10-18 15:55:06.446416] I [MSGID: 109028] >>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 1376671.00 secs >>> [2017-10-18 15:55:06.446433] I [MSGID: 109028] >>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0 >>> [2017-10-19 00:00:00.501576] I [MSGID: 109038] >>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction >>> on cold tier >>> [2017-10-19 00:00:00.502016] I [MSGID: 109038] >>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on >>> cold tier >>> [2017-10-19 00:00:00.501608] I [MSGID: 109038] >>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction >>> on cold tier >>> [2017-10-19 00:00:00.502076] I [MSGID: 109038] >>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on >>> cold tier >>> [2017-10-19 16:03:49.522991] I [MSGID: 109028] >>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 1463594.00 secs >>> [2017-10-19 16:03:49.523017] I [MSGID: 109028] >>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0 >>> >>> Node 2 /var/log/samba/glusterfs-<vol>-pool.log: >>> >>> [2017-10-18 16:49:09.218062] E [MSGID: 114031] >>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>> [Invalid argument] >>> [2017-10-18 16:49:09.218254] E [MSGID: 109089] >>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>> 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file >>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >>> [2017-10-18 16:49:09.222783] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> [2017-10-18 16:49:09.222912] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> [2017-10-18 16:49:09.223079] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> [2017-10-18 16:49:09.223200] E [MSGID: 108006] >>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are >>> down. Going offline until atleast one of them comes back up. >>> >>> Status: >>> >>> # gluster vol tier <vol> status >>> >>> Node Promoted files Demoted files Status >>> run time in h:m:s >>> --------- --------- --------- >>> --------- --------- >>> Node1 190861 0 in >>> progress 408:34:13 >>> Node2 0 0 >>> in progress 408:34:14 >>> >>> Hot tier bricks: >>> >>> # df -h >>> >>> /dev/mapper/vg_bricks-brick_nvme1 1.4T 551G 883G 39% >>> /mnt/brick_nvme1 >>> /dev/mapper/vg_bricks-brick_nvme2 1.4T 512G 922G 36% >>> /mnt/brick_nvme2 >>> >>> >>> Can anyone point me in the right direction as to what may be going on? >>> Any guidance is greatly appreciated. >>> >>> Thanks in advance, >>> >>> HB >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> -- >> Milind >> >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Milind -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171027/bc6f9958/attachment.html>
Milind - Thank you for your help, I appreciate it.. It appears that the tiering behaves the same when quota is turned off, info: # gluster vol info <vol> Volume Name: <vol> Type: Tier Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad Status: Started Snapshot Count: 0 Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: <node2>:/mnt/brick_nvme1/brick Brick2: <node1>:/mnt/brick_nvme2/brick Brick3: <node2>:/mnt/brick_nvme2/brick Brick4: <node1>:/mnt/brick_nvme1/brick Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: <node1>:/mnt/brick1/brick Brick6: <node2>:/mnt/brick2/brick Brick7: <node1>:/mnt/brick2/brick Brick8: <node2>:/mnt/brick1/brick Options Reconfigured: cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 performance.write-behind-window-size: 4MB performance.cache-size: 16GB features.inode-quota: off features.quota: off nfs.disable: on transport.address-family: inet features.ctr-enabled: on cluster.tier-mode: cache performance.io-cache: off performance.quick-read: off cluster.tier-max-files: 1000000 Errors in /var/log/glusterfs/tier/<vol>/tierd.log on node1 after turning off quota: [2017-10-27 18:38:08.880502] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/83540503.jpg [2017-10-27 18:38:08.880686] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/83540503.jpg on <vol>-hot-dht [Input/output error] [2017-10-27 18:38:08.880717] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/83540503.jpg [2017-10-27 18:38:08.881101] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/83540503.jpg [No space left on device] [2017-10-27 18:38:08.881145] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 83540503.jpg(gfid:00cf352a-0a21-42d3-91ae-fe6fc63fac9d) [2017-10-27 18:38:08.891692] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/152640504.jpg [2017-10-27 18:38:08.891876] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/152640504.jpg on <vol>-hot-dht [Input/output error] [2017-10-27 18:38:08.891899] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/152640504.jpg [2017-10-27 18:38:08.920077] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/152640504.jpg [No space left on device] [2017-10-27 18:38:08.920121] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 152640504.jpg(gfid:0436b8b5-2e15-411e-acfa-a5870cf125bf) [2017-10-27 18:38:08.952939] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/89240318.jpg [2017-10-27 18:38:08.953121] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/89240318.jpg on <vol>-hot-dht [Input/output error] [2017-10-27 18:38:08.953147] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/89240318.jpg [2017-10-27 18:38:08.959510] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/89240318.jpg [No space left on device] [2017-10-27 18:38:08.959560] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 89240318.jpg(gfid:1143c9bb-ea79-4c15-ad03-97a611d53135) [2017-10-27 18:38:08.986665] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/106056906.jpg [2017-10-27 18:38:08.986871] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/106056906.jpg on <vol>-hot-dht [Input/output error] [2017-10-27 18:38:08.986904] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/106056906.jpg [2017-10-27 18:38:08.991468] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/106056906.jpg [No space left on device] [2017-10-27 18:38:08.991505] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 106056906.jpg(gfid:07f5e5d4-315f-4299-a62f-6bd8f159c89d) [2017-10-27 18:38:09.025433] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/114649988.jpg I wanted to add a couple data points here: - Most (95%) of the logging is logged to node1 of the 2 node cluster. The tierd.log file on node1 is 588M in size due to all of the failure errors. The tierd.log file on node2 is only ~205K in size. I believe I posted earlier that all promoted files are listed on node1: # gluster vol tier <vol> status Node Promoted files Demoted files Status run time in h:m:s ------ --------- --------- --------- --------- <node2> 0 0 in progress 601:37:43 <node1> 271966 0 in progress 601:37:42 Is this expected behavior? - We are sharing the data (the same share) via SMB and AFP to be accessed by PC's and Mac's. The Mac's are using AFP since they have so much difficultly with SMB and network file shares. I know the Mac's create all kinds of 'special' files when working on the share, could there be a problem with certain files and tiering? For example (from node2 tierd.log): [2017-10-26 19:30:08.147159] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:db430070-b9c5-4bd2-b4c6-a347b838a97e) [2017-10-26 22:28:08.218565] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:f745bea6-04bd-4904-8237-1bd7c9c92f5b) [2017-10-26 22:28:08.221909] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:bed73314-8740-4822-9fb7-95257434e283) [2017-10-26 22:28:08.223767] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:bf1df49b-c264-449d-9bc6-65bcfd48fa4e) The .DS_Store files are Mac specific files.. Since users work directly off of the share, are there potential problems with tiering and locks? I do see warnings (on node1 tierd.log): [2017-10-27 18:30:08.719976] W [MSGID: 109023] [dht-rebalance.c:639:__is_file_migratable] 0-<vol>-tier-dht: Migrate file failed: /path/to/file.ai: File has locks. Skipping file migration [2017-10-27 18:32:08.483971] W [MSGID: 109023] [dht-rebalance.c:639:__is_file_migratable] 0-<vol>-tier-dht: Migrate file failed: /path/to/file-v1.ai: File has locks. Skipping file migration - The directory structure (over the many years) has spaces in the names of files and folders, sometimes I'm finding, even at the end of a file. Could spaces in names of files and folders be causing issues with tiering? I'm still not sure what the [No space left on device] messages are coming from as it does not appear that there are any space issues. Even before I turned off quota on the volume the sizing appeared to be fine: # gluster vol quota <vol> list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /path1 500.0GB 80%(400.0GB) 1.9MB 500.0GB No No /path2 25.0TB 80%(20.0TB) 19.2TB 5.8TB No No I will have some time this weekend to take the shares offline. Are there any steps I can take to clean up the hot tier, resync, or other, to ensure all is in a good state? Thanks in advance.. HB On Thu, Oct 26, 2017 at 9:17 PM, Milind Changire <mchangir at redhat.com> wrote:> Herb, > I'm trying to weed out issues here. > > So, I can see quota turned *on* and would like you to check the quota > settings and test to see system behavior *if quota is turned off*. > > Although the file size that failed migration was 29K, I'm being a bit > paranoid while weeding out issues. > > Are you still facing tiering errors ? > I can see your response to Alex with the disk space consumption and found > it a bit ambiguous w.r.t. state of affairs. > > -- > Milind > > > > On Tue, Oct 24, 2017 at 11:34 PM, Herb Burnswell < > herbert.burnswell at gmail.com> wrote: > >> Milind - Thank you for the response.. >> >> >> What are the high and low watermarks for the tier set at ? >> >> # gluster volume get <vol> cluster.watermark-hi >> Option Value >> >> ------ ----- >> >> cluster.watermark-hi 90 >> >> >> # gluster volume get <vol> cluster.watermark-low >> Option Value >> >> ------ ----- >> >> cluster.watermark-low 75 >> >> >> >> >> What is the size of the file that failed to migrate as per the >> following tierd log: >> >> >> [2017-10-19 17:52:07.519614] I [MSGID: 109038] >> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) >> >> The file was a word doc @ 29K in size. >> >> >>If possible, a *gluster volume info* would also help, instead of going >> to and fro with questions. >> >> # gluster vol info >> >> Volume Name: ctdb >> Type: Replicate >> Volume ID: f679c476-e0dd-4f3a-9813-1b26016b5384 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: <node1>:/mnt/ctdb_local/brick >> Brick2: <node2>:/mnt/ctdb_local/brick >> Options Reconfigured: >> nfs.disable: on >> transport.address-family: inet >> >> Volume Name: <vol> >> Type: Tier >> Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 8 >> Transport-type: tcp >> Hot Tier : >> Hot Tier Type : Distributed-Replicate >> Number of Bricks: 2 x 2 = 4 >> Brick1: <node2>:/mnt/brick_nvme1/brick >> Brick2: <node1>:/mnt/brick_nvme2/brick >> Brick3: <node2>:/mnt/brick_nvme2/brick >> Brick4: <node1>:/mnt/brick_nvme1/brick >> Cold Tier: >> Cold Tier Type : Distributed-Replicate >> Number of Bricks: 2 x 2 = 4 >> Brick5: <node1>:/mnt/brick1/brick >> Brick6: <node2>:/mnt/brick2/brick >> Brick7: <node1>:/mnt/brick2/brick >> Brick8: <node2>:/mnt/brick1/brick >> Options Reconfigured: >> cluster.lookup-optimize: on >> client.event-threads: 4 >> server.event-threads: 4 >> performance.write-behind-window-size: 4MB >> performance.cache-size: 16GB >> features.quota-deem-statfs: on >> features.inode-quota: on >> features.quota: on >> nfs.disable: on >> transport.address-family: inet >> features.ctr-enabled: on >> cluster.tier-mode: cache >> performance.io-cache: off >> performance.quick-read: off >> cluster.tier-max-files: 1000000 >> >> >> HB >> >> >> >> >> On Sun, Oct 22, 2017 at 8:41 AM, Milind Changire <mchangir at redhat.com> >> wrote: >> >>> Herb, >>> What are the high and low watermarks for the tier set at ? >>> >>> # gluster volume get <vol> cluster.watermark-hi >>> >>> # gluster volume get <vol> cluster.watermark-low >>> >>> What is the size of the file that failed to migrate as per the following >>> tierd log: >>> >>> [2017-10-19 17:52:07.519614] I [MSGID: 109038] >>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion >>> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) >>> >>> If possible, a *gluster volume info* would also help, instead of going >>> to and fro with questions. >>> >>> -- >>> Milind >>> >>> >>> >>> On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell < >>> herbert.burnswell at gmail.com> wrote: >>> >>>> All, >>>> >>>> I am new to gluster and have some questions/concerns about some tiering >>>> errors that I see in the log files. >>>> >>>> OS: CentOs 7.3.1611 >>>> Gluster version: 3.10.5 >>>> Samba version: 4.6.2 >>>> >>>> I see the following (scrubbed): >>>> >>>> Node 1 /var/log/glusterfs/tier/<vol>/tierd.log: >>>> >>>> [2017-10-19 17:52:07.519614] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb) >>>> [2017-10-19 17:52:07.525110] E [MSGID: 109011] >>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>>> path=/path/to/<file> >>>> [2017-10-19 17:52:07.526088] E [MSGID: 109023] >>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] >>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output >>>> error] >>>> [2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>>> [2017-10-19 17:52:07.527214] E [MSGID: 109037] >>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>>> [No space left on device] >>>> [2017-10-19 17:52:07.527244] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8) >>>> [2017-10-19 17:52:07.533510] E [MSGID: 109011] >>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>>> path=/path/to/<file> >>>> [2017-10-19 17:52:07.534434] E [MSGID: 109023] >>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] >>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output >>>> error] >>>> [2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>>> [2017-10-19 17:52:07.535570] E [MSGID: 109037] >>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>>> [No space left on device] >>>> [2017-10-19 17:52:07.535594] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d) >>>> [2017-10-19 17:52:07.541363] E [MSGID: 109011] >>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>>> path=/path/to/<file> >>>> [2017-10-19 17:52:07.542296] E [MSGID: 109023] >>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] >>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output >>>> error] >>>> [2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>>> [2017-10-19 17:52:07.543480] E [MSGID: 109037] >>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>>> [No space left on device] >>>> [2017-10-19 17:52:07.543521] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f) >>>> [2017-10-19 17:52:07.549959] E [MSGID: 109011] >>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for >>>> path=/path/to/<file> >>>> [2017-10-19 17:52:07.550901] E [MSGID: 109023] >>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] >>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output >>>> error] >>>> [2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file> >>>> [2017-10-19 17:52:07.551896] E [MSGID: 109037] >>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file> >>>> [No space left on device] >>>> [2017-10-19 17:52:07.551917] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb) >>>> [2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run] >>>> 0-<vol>-tier-dht: Promotion failed >>>> >>>> Node 1 /var/log/samba/glusterfs-<vol>-pool.log: >>>> >>>> [2017-10-18 17:13:41.481860] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote >>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>>> [Invalid argument] >>>> [2017-10-18 17:13:41.481860] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote >>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>>> [Invalid argument] >>>> [2017-10-18 17:13:41.485916] E [MSGID: 109089] >>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file >>>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid >>>> argument] >>>> [2017-10-18 17:13:41.488223] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote >>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>>> [Invalid argument] >>>> [2017-10-18 17:13:41.488235] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote >>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) >>>> [Invalid argument] >>>> [2017-10-18 17:13:41.489060] E [MSGID: 109089] >>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file >>>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid >>>> argument] >>>> [2017-10-18 17:13:42.339936] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>>> [Invalid argument] >>>> [2017-10-18 17:13:42.339988] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote >>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>>> [Invalid argument] >>>> [2017-10-18 17:13:42.343769] E [MSGID: 109089] >>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file >>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >>>> [2017-10-18 17:13:42.345374] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>>> [Invalid argument] >>>> [2017-10-18 17:13:42.345401] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote >>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>>> [Invalid argument] >>>> [2017-10-18 17:13:42.346259] E [MSGID: 109089] >>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file >>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >>>> [2017-10-18 17:13:59.541591] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> [2017-10-18 17:13:59.541748] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> [2017-10-18 17:13:59.541887] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> [2017-10-18 17:13:59.541977] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> >>>> Node 2 /var/log/gluster/tier/<vol>/tierd.log: >>>> >>>> [2017-10-16 15:54:08.662873] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0) >>>> [2017-10-16 16:00:07.201584] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464) >>>> [2017-10-16 16:00:07.372559] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5) >>>> [2017-10-16 16:06:06.880522] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a) >>>> [2017-10-16 16:06:08.062080] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d) >>>> [2017-10-16 16:06:08.288298] I [MSGID: 109038] >>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: >>>> Promotion failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1) >>>> [2017-10-18 15:55:06.446416] I [MSGID: 109028] >>>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is >>>> in progress. Time taken is 1376671.00 secs >>>> [2017-10-18 15:55:06.446433] I [MSGID: 109028] >>>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files >>>> migrated: 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0 >>>> [2017-10-19 00:00:00.501576] I [MSGID: 109038] >>>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction >>>> on cold tier >>>> [2017-10-19 00:00:00.502016] I [MSGID: 109038] >>>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on >>>> cold tier >>>> [2017-10-19 00:00:00.501608] I [MSGID: 109038] >>>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction >>>> on cold tier >>>> [2017-10-19 00:00:00.502076] I [MSGID: 109038] >>>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on >>>> cold tier >>>> [2017-10-19 16:03:49.522991] I [MSGID: 109028] >>>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is >>>> in progress. Time taken is 1463594.00 secs >>>> [2017-10-19 16:03:49.523017] I [MSGID: 109028] >>>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files >>>> migrated: 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0 >>>> >>>> Node 2 /var/log/samba/glusterfs-<vol>-pool.log: >>>> >>>> [2017-10-18 16:49:09.218062] E [MSGID: 114031] >>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote >>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) >>>> [Invalid argument] >>>> [2017-10-18 16:49:09.218254] E [MSGID: 109089] >>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] >>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file >>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument] >>>> [2017-10-18 16:49:09.222783] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> [2017-10-18 16:49:09.222912] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> [2017-10-18 16:49:09.223079] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> [2017-10-18 16:49:09.223200] E [MSGID: 108006] >>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are >>>> down. Going offline until atleast one of them comes back up. >>>> >>>> Status: >>>> >>>> # gluster vol tier <vol> status >>>> >>>> Node Promoted files Demoted files Status >>>> run time in h:m:s >>>> --------- --------- --------- >>>> --------- --------- >>>> Node1 190861 0 >>>> in progress 408:34:13 >>>> Node2 0 0 >>>> in progress 408:34:14 >>>> >>>> Hot tier bricks: >>>> >>>> # df -h >>>> >>>> /dev/mapper/vg_bricks-brick_nvme1 1.4T 551G 883G 39% >>>> /mnt/brick_nvme1 >>>> /dev/mapper/vg_bricks-brick_nvme2 1.4T 512G 922G 36% >>>> /mnt/brick_nvme2 >>>> >>>> >>>> Can anyone point me in the right direction as to what may be going on? >>>> Any guidance is greatly appreciated. >>>> >>>> Thanks in advance, >>>> >>>> HB >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >>> >>> -- >>> Milind >>> >>> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Milind > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171027/bf9a44eb/attachment.html>