Jarsulic, Michael [CRI]
2017-Jun-06 14:11 UTC
[Gluster-users] Files Missing on Client Side; Still available on bricks
Hello, I am still working at recovering from a few failed OS hard drives on my gluster storage and have been removing, and re-adding bricks quite a bit. I noticed yesterday night that some of the directories are not visible when I access them through the client, but are still on the brick. For example: Client: # ls /scratch/dw Ethiopian_imputation HGDP Rolwaling Tibetan_Alignment Brick: # ls /data/brick1/scratch/dw 1000GP_Phase3 Ethiopian_imputation HGDP Rolwaling SGDP Siberian_imputation Tibetan_Alignment mapata However, the directory is accessible on the client side (just not visible): # stat /scratch/dw/SGDP File: `/scratch/dw/SGDP' Size: 212992 Blocks: 416 IO Block: 131072 directory Device: 21h/33d Inode: 11986142482805280401 Links: 2 Access: (0775/drwxrwxr-x) Uid: (339748621/dw) Gid: (339748621/dw) Access: 2017-06-02 16:00:02.398109000 -0500 Modify: 2017-06-06 06:59:13.004947703 -0500 Change: 2017-06-06 06:59:13.004947703 -0500 The only place I see the directory mentioned in the log files are in the rebalance logs. The following piece may provide a clue as to what is going on: [2017-06-05 20:46:51.752726] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/HGDP00476_chr6.tped gfid not present [2017-06-05 20:46:51.752742] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08_chr4.tmp gfid not present [2017-06-05 20:46:51.752773] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08.geno.tmp gfid not present [2017-06-05 20:46:51.752789] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] 0-hpcscratch-dht: /dw/SGDP/LP6005443-DNA_D02_chr4.out gfid not present This happened yesterday during a rebalance that failed. However, doing a rebalance fix-layout allowed my to clean up these errors and successfully complete a migration to a re-added brick. Here is the information for my storage cluster: # gluster volume info Volume Name: hpcscratch Type: Distribute Volume ID: 80b8eeed-1e72-45b9-8402-e01ae0130105 Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: fs001-ib:/data/brick2/scratch Brick2: fs003-ib:/data/brick5/scratch Brick3: fs003-ib:/data/brick6/scratch Brick4: fs004-ib:/data/brick7/scratch Brick5: fs001-ib:/data/brick1/scratch Brick6: fs004-ib:/data/brick8/scratch Options Reconfigured: server.event-threads: 8 performance.client-io-threads: on client.event-threads: 8 performance.cache-size: 32MB performance.readdir-ahead: on diagnostics.client-log-level: INFO diagnostics.brick-log-level: INFO Mount points for the bricks: /dev/sdb on /data/brick2 type xfs (rw,noatime,nobarrier) /dev/sda on /data/brick1 type xfs (rw,noatime,nobarrier) Mount point on the client: 10.xx.xx.xx:/hpcscratch on /scratch type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) My question is what are some of the possibilities for the root cause of this issue and what is the recommended way of recovering from it? Let me know if you need any more information. -- Mike Jarsulic Sr. HPC Administrator Center for Research Informatics | University of Chicago 773.702.2066
Pranith Kumar Karampuri
2017-Jun-08 07:34 UTC
[Gluster-users] Files Missing on Client Side; Still available on bricks
+Raghavendra/Nithya On Tue, Jun 6, 2017 at 7:41 PM, Jarsulic, Michael [CRI] < mjarsulic at bsd.uchicago.edu> wrote:> Hello, > > I am still working at recovering from a few failed OS hard drives on my > gluster storage and have been removing, and re-adding bricks quite a bit. I > noticed yesterday night that some of the directories are not visible when I > access them through the client, but are still on the brick. For example: > > Client: > > # ls /scratch/dw > Ethiopian_imputation HGDP Rolwaling Tibetan_Alignment > > Brick: > > # ls /data/brick1/scratch/dw > 1000GP_Phase3 Ethiopian_imputation HGDP Rolwaling SGDP > Siberian_imputation Tibetan_Alignment mapata > > > However, the directory is accessible on the client side (just not visible): > > # stat /scratch/dw/SGDP > File: `/scratch/dw/SGDP' > Size: 212992 Blocks: 416 IO Block: 131072 directory > Device: 21h/33d Inode: 11986142482805280401 Links: 2 > Access: (0775/drwxrwxr-x) Uid: (339748621/dw) Gid: (339748621/dw) > Access: 2017-06-02 16:00:02.398109000 -0500 > Modify: 2017-06-06 06:59:13.004947703 -0500 > Change: 2017-06-06 06:59:13.004947703 -0500 > > > The only place I see the directory mentioned in the log files are in the > rebalance logs. The following piece may provide a clue as to what is going > on: > > [2017-06-05 20:46:51.752726] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] > 0-hpcscratch-dht: /dw/SGDP/HGDP00476_chr6.tped gfid not present > [2017-06-05 20:46:51.752742] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] > 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08_chr4.tmp gfid not present > [2017-06-05 20:46:51.752773] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] > 0-hpcscratch-dht: /dw/SGDP/LP6005441-DNA_B08.geno.tmp gfid not present > [2017-06-05 20:46:51.752789] E [MSGID: 109010] [dht-rebalance.c:2259:gf_defrag_get_entry] > 0-hpcscratch-dht: /dw/SGDP/LP6005443-DNA_D02_chr4.out gfid not present > > This happened yesterday during a rebalance that failed. However, doing a > rebalance fix-layout allowed my to clean up these errors and successfully > complete a migration to a re-added brick. > > > Here is the information for my storage cluster: > > # gluster volume info > > Volume Name: hpcscratch > Type: Distribute > Volume ID: 80b8eeed-1e72-45b9-8402-e01ae0130105 > Status: Started > Number of Bricks: 6 > Transport-type: tcp > Bricks: > Brick1: fs001-ib:/data/brick2/scratch > Brick2: fs003-ib:/data/brick5/scratch > Brick3: fs003-ib:/data/brick6/scratch > Brick4: fs004-ib:/data/brick7/scratch > Brick5: fs001-ib:/data/brick1/scratch > Brick6: fs004-ib:/data/brick8/scratch > Options Reconfigured: > server.event-threads: 8 > performance.client-io-threads: on > client.event-threads: 8 > performance.cache-size: 32MB > performance.readdir-ahead: on > diagnostics.client-log-level: INFO > diagnostics.brick-log-level: INFO > > > Mount points for the bricks: > > /dev/sdb on /data/brick2 type xfs (rw,noatime,nobarrier) > /dev/sda on /data/brick1 type xfs (rw,noatime,nobarrier) > > > Mount point on the client: > > 10.xx.xx.xx:/hpcscratch on /scratch type fuse.glusterfs > (rw,default_permissions,allow_other,max_read=131072) > > > My question is what are some of the possibilities for the root cause of > this issue and what is the recommended way of recovering from it? Let me > know if you need any more information. > > > -- > Mike Jarsulic > Sr. HPC Administrator > Center for Research Informatics | University of Chicago > 773.702.2066 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170608/93070941/attachment.html>