Gudrun Mareike Amedick
2020-Feb-17 11:05 UTC
[Gluster-users] remove-brick seems to delete file content
Hi, I'm currently removing a few bricks from a distributed dispersed volume using gluster volume remove-brick, I'm running GLusterFS 6.6. It triggered a rebalance that is supposed to remove the data from the bricks. Today in the morning, it had ~50.000 failures on each server. I found a whole bunch of log entries like this: [2020-02-17 10:02:47.971011] I [dht-rebalance.c:1589:dht_migrate_file] 0-OMICS-dht: $FILE: attempting to move from OMICS-disperse-0 to OMICS-disperse- 10 [2020-02-17 10:02:47.997915] W [MSGID: 0] [dht-rebalance.c:1026:__dht_check_free_space] 0-OMICS-dht: Write will cross min-free-disk for file - $FILE on subvol - OMICS-disperse-10. Looking for new subvol [2020-02-17 10:02:47.997970] I [MSGID: 0] [dht-rebalance.c:1082:__dht_check_free_space] 0-OMICS-dht: new target found - OMICS-disperse-1 for file - $FILE [2020-02-17 10:02:48.192873] I [MSGID: 0] [dht-rebalance.c:1788:dht_migrate_file] 0-OMICS-dht: destination for file - $FILE is changed to - OMICS- disperse-1 [2020-02-17 10:02:48.407606] E [MSGID: 109023] [dht-rebalance.c:2055:dht_migrate_file] 0-OMICS-dht: failed to set xattr on $FILE in OMICS-disperse-10 [Operation not supported] [2020-02-17 10:02:48.414374] E [MSGID: 109023] [dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-OMICS-dht: migrate-data failed for $FILE [Operation not supported] The bricks for subvol disperse-10 have indeed hit 90% during the rebalance. subvol disperse-1 is way lower. If I look for $FILE on the bricks, I find copies on both subvol disperse-0 and subvol disperse-1, and those on subvol disperse-1 look weird (brick 0100 and 0101 belong to subvol disperse-0, brick 0102 and 0103 are part of subvol disperse-1): # ls -lah $BRICKS/$FILE -rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0100/$FILE -rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0101/$FILE -rw-r--r-- 2 $USER $GROUP????0 Feb 17 11:02 $BRICK0102/$FILE -rw-r--r-- 2 $USER $GROUP????0 Feb 17 11:02 $BRICK0103/$FILE This doesn't look like a linkfile. Some of those files are empty on client side, some aren't. But since those aren't my files, I can't tell for sure whether they are supposed to look empty. The empty ones report a file size of 0 (du -h $FILE) from client side, but they do have a size (and content) on server side in their original subvolume, so I'm guessing they shouldn'd be empty :( I stopped the remove-brick operation, this looked weird. Is this supposed to happen? Or is the reblance screwing up when trying to move things to a brick that's already full?? I'm removing the subvolume disperse-12. Is it intended that data from subvol disperse-0 is being moved? Should I open a bug report?? And, most importantly, are those weird non-linkfile-but-empty-files going to be a problem and if yes, how do I get rid of them safely? Can I restore the content of those files that are currently shown as empty? Thanks in advance and kind regards, Gudrun Amedick -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 6743 bytes Desc: not available URL: <lists.gluster.org/pipermail/gluster-users/attachments/20200217/01b6277a/attachment.bin>