Shyam Ranganathan
2018-Aug-31 11:59 UTC
[Gluster-users] Bug with hardlink limitation in 3.12.13 ?
On 08/31/2018 07:15 AM, Reiner Keller wrote:> Hello, > > I got yesterday unexpected error "No space left on device" on my new > gluster volume caused by too many hardlinks. > This happened while I done "rsync --aAHXxv ..." replication from old > gluster to new gluster servers - each running latest version 3.12.13 > (for changing volume schema from 2x2 to 3x1 with quorum and a fresh > Debian Stretch setup instead Jessie).I suspect you have hit this: https://bugzilla.redhat.com/show_bug.cgi?id=1602262#c5 I further suspect your older setup was 3.10 based and not 3.12 based. There is an additional feature added in 3.12 that stores GFID to path conversion details using xattrs (see "GFID to path" in https://docs.gluster.org/en/latest/release-notes/3.12.0/#major-changes-and-features ) Due to which xattr storage limit is reached/breached on ext4 based bricks. To check if you are facing similar issue to the one in the bug provided above, I would check if the brick logs throw up the no space error on a gfid2path set failure. To get around the problem, I would suggest using xfs as the backing FS for the brick (considering you have close to 250 odd hardlinks to a file). I would not attempt to disable the gfid2path feature, as that is useful in getting to the real file just given a GFID and is already part of core on disk Gluster metadata (It can be shut off, but I would refrain from it).> > When I deduplicated it around half a year ago with "rdfind" hardlinking > was working fine (I think that was glusterfs around version 3.12.8 - > 3.12.10 ?) > > My search for documentation found only the parameter > "storage.max-hardlinks" with default of 100 for version 4.0. > I checked it in my gluster 3.12.13 but here the parameter is not yet > implemented. > > I tested/proofed it by running my small test on underlaying ext4 > filesystem brick directly and on gluster volume using same ext4 > filesystem of the brick: > > Testline for it: > ??? ??? ??? mkdir test; cd test; echo "hello" > test; for I in $(seq 1 > 100); do ln test test-$I ; done > > * on ext4 fs (old brick: xfs) I could do 100 hardlinks without problems > (from documentation I found ext has 65.000 hardlinks compiled in ) > * on actual GlusterFS (same on my old and new gluster volumes) I could > do only up to 45 hardlinks now > > But from deduplication around 6 months ago I could find e.g. a file with > 240 hardlinks setup and there is no problem using these referenced files > (caused by multiple languages / multiple uploads per language , > production/staging system cloned... ). > > My actual workaround has to be using duplicated content but it would be > great if this could be fixed in next versions ;) > > (Saltstack didn't support yet successful setup of glusterfs 4.0 > peers/volumes; something in output of "gluster --xml --mode=script" call > must be weird but I haven't seen any differences so far) > > Bests > > > Reiner > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >
Reiner Keller
2018-Aug-31 17:06 UTC
[Gluster-users] Bug with hardlink limitation in 3.12.13 ?
Hello, Am 31.08.2018 um 13:59 schrieb Shyam Ranganathan:> I suspect you have hit this: > https://bugzilla.redhat.com/show_bug.cgi?id=1602262#c5 > > I further suspect your older setup was 3.10 based and not 3.12 based. > > There is an additional feature added in 3.12 that stores GFID to path > conversion details using xattrs (see "GFID to path" in > https://docs.gluster.org/en/latest/release-notes/3.12.0/#major-changes-and-features > ) > > Due to which xattr storage limit is reached/breached on ext4 based bricks. > > To check if you are facing similar issue to the one in the bug provided > above, I would check if the brick logs throw up the no space error on a > gfid2path set failure.thanks for the hint.>From log output (= no gfid2path errors) it seems to be not the problemalthough the old gluster volume was setup with version 3.10.x (or even 3.8.x i think). I wrote I could reproduce it on new ext4? and on old xfs gluster volumes with version 3.12.13 while it was running fine with ~ 3.12.8 (half year ago) without problems. But just saw that my old main volume wasn't/isn't xfs but also ext4. Digging into logs I could see that I was running in January still 3.10.8 / 3.10.9 and initial switched in April to 3.12.9 / 3.12 version branch.>From entry sizes/differences your suggestion would fit:??? https://manpages.debian.org/testing/manpages/xattr.7.en.html or ??? http://man7.org/linux/man-pages/man5/attr.5.html In the current ext2, ext3, and ext4 filesystem implementations, the total bytes used by the names and values of all of a file's extended attributes must fit in a single filesystem block (1024, 2048 or 4096 bytes, depending on the block size specified when the filesystem was created). because I can see differences by volume setup type: * with ext4 setup "defaults" i got error after 44 successful links: /etc/mke2fs.conf: [defaults] ??????? base_features sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr ??????? default_mntopts = acl,user_xattr ??????? enable_periodic_fsck = 0 ??????? blocksize = 4096 ??????? inode_size = 256 ??????? inode_ratio = 16384 [fs_types] ??????? ext3 = { ??????????????? features = has_journal ??????? } ??????? ext4 = { ??????????????? features has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize ??????????????? inode_size = 256 ??????? } ... * with ext4 setup "small" with enhanced settings back to inode_size=256 while I formatted it I could setup only 10 successful links: ??????? small = { ??????????????? blocksize = 1024 ??????????????? inode_size = 128?????????????????????? # in my volume case also 256 ??????????????? inode_ratio = 4096 ??????? } which would match the blocksize limitation - here in default ext4 fs: # attr -l test Attribute "gfid2path.3951a8fec4234683" has a 41 byte value for test Attribute "gfid" has a 16 byte value for test Attribute "afr.dirty" has a 12 byte value for test Attribute "gfid2path.003214300fcd4d34" has a 44 byte value for test ... Attribute "gfid2path.fe4d3e4d0bc31351" has a 44 byte value for test # attr -l test | grep gfid2path | wc -l 46 41 + 16 + 12 + 45 * 44 = 2049 (+ 256 inode_size + ???? )? <= 4096 with 1k blocksize I got only: # attr -l test Attribute "gfid2path.7a3f0fa0e8f7eba3" has a 41 byte value for test Attribute "gfid" has a 16 byte value for test Attribute "afr.dirty" has a 12 byte value for test Attribute "gfid2path.13e24c98a492d7f1" has a 43 byte value for test Attribute "gfid2path.1efa5641f9785d6c" has a 43 byte value for test Attribute "gfid2path.551dfafc5d4a7bda" has a 43 byte value for test Attribute "gfid2path.578dc56f20801437" has a 43 byte value for test Attribute "gfid2path.8e983883502e3c57" has a 43 byte value for test Attribute "gfid2path.94b700e1c7f156e3" has a 43 byte value for test Attribute "gfid2path.cbeb1108f9a34dac" has a 43 byte value for test Attribute "gfid2path.cd6ba60f624abc2b" has a 43 byte value for test Attribute "gfid2path.dbf95647d59cd047" has a 43 byte value for test Attribute "gfid2path.ec6198adc227befe" has a 44 byte value for test * 41 + 16 + 12 + 9 * 43 + 44 = 500 (+256 inode_size + ???) <= 1024 whatever the unknown missing (different) size is needed for. But in log I can see only this error which is not very helpful (here tested on another volume with ext4 "default" settings): [2018-08-31 13:21:11.306022] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-0: remote operation failed: (/test/test-45 -> /test/test-46) [No space left on device] [2018-08-31 13:21:11.306420] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-2: remote operation failed: (/test/test-45 -> /test/test-46) [No space left on device] [2018-08-31 13:21:11.306466] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-1: remote operation failed: (/test/test-45 -> /test/test-46) [No space left on device] [2018-08-31 13:21:11.307452] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 23122: LINK() /test/test-46 => -1 (No space left on device) [2018-08-31 13:21:11.339428] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-0: remote operation failed: (/test/test-45 -> /test/test-47) [No space left on device] [2018-08-31 13:21:11.339991] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-1: remote operation failed: (/test/test-45 -> /test/test-47) [No space left on device] [2018-08-31 13:21:11.340039] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-2: remote operation failed: (/test/test-45 -> /test/test-47) [No space left on device] [2018-08-31 13:21:11.341036] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 23125: LINK() /test/test-47 => -1 (No space left on device) ... [2018-08-31 13:21:12.097966] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-0: remote operation failed: (/test/test-45 -> /test/test-100) [No space left on device] [2018-08-31 13:21:12.098326] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-1: remote operation failed: (/test/test-45 -> /test/test-100) [No space left on device] [2018-08-31 13:21:12.098412] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-staging-prudsys-client-2: remote operation failed: (/test/test-45 -> /test/test-100) [No space left on device] [2018-08-31 13:21:12.101533] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 23285: LINK() /test/test-100 => -1 (No space left on device) [2018-08-31 13:32:48.613484] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-staging-prudsys-dht: Found anomalies in (null) (gfid = 1923da4d-9661-4d53-84d6-7d196276a0fc). Holes=1 overlaps=0 [2018-08-31 13:32:48.613529] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-staging-prudsys-dht: Found anomalies in (null) (gfid = a04f8ab2-5b7a-490c-a3a6-71d9899295fa). Holes=1 overlaps=0 [2018-08-31 13:32:48.613556] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-staging-prudsys-dht: Found anomalies in (null) (gfid = 6d5ed713-7cff-4cf9-bb57-197a217051db). Holes=1 overlaps=0 Same log output with old ext4 filesystem: [2018-08-31 14:06:05.882886] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-2: remote operation failed: (/test/test-45 -> /test/test-46) [No space left on device] [2018-08-31 14:06:05.883427] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-3: remote operation failed: (/test/test-45 -> /test/test-46) [No space left on device] [2018-08-31 14:06:05.884821] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 15575982: LINK() /test/test-46 => -1 (No space left on device) [2018-08-31 14:06:05.901852] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-2: remote operation failed: (/test/test-45 -> /test/test-47) [No space left on device] [2018-08-31 14:06:05.902410] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-3: remote operation failed: (/test/test-45 -> /test/test-47) [No space left on device] [2018-08-31 14:06:05.903968] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 15575985: LINK() /test/test-47 => -1 (No space left on device) ... [2018-08-31 14:06:06.727908] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-2: remote operation failed: (/test/test-45 -> /test/test-100) [No space left on device] [2018-08-31 14:06:06.728409] W [MSGID: 114031] [client-rpc-fops.c:2701:client3_3_link_cbk] 0-mygluster-client-3: remote operation failed: (/test/test-45 -> /test/test-100) [No space left on device] [2018-08-31 14:06:06.729631] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 15576145: LINK() /test/test-100 => -1 (No space left on device) and no more loglines referencing my test - I can see no gfid2path errors you mentioned but error seems related to inode size as above shown. Also interesting as you mentioned: with actual 3.12.13 version on another "old" Glusterfs volume with xfs background it's working fine.> To check if you are facing similar issue to the one in the bug provided > above, I would check if the brick logs throw up the no space error on a > gfid2path set failure.Is there some parameter to get more detailed error logging ? But from docu it looks like it has default good settings: https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/ diagnostics.brick-log-level Changes the log-level of the bricks. INFO DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE diagnostics.client-log-level Changes the log-level of the clients. INFO DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE diagnostics.latency-measurement Statistics related to the latency of each operation would be tracked. Off On/Off diagnostics.dump-fd-stats Statistics related to file-operations would be tracked. Off On> To get around the problem, I would suggest using xfs as the backing FS > for the brick (considering you have close to 250 odd hardlinks to a > file). I would not attempt to disable the gfid2path feature, as that is > useful in getting to the real file just given a GFID and is already part > of core on disk Gluster metadata (It can be shut off, but I would > refrain from it).Since there are only some 10xGB of small files duplicated like this it's much easier to use then duplicated content again and perhaps I can also trigger people to clean up unneeded files.> >> My search for documentation found only the parameter >> "storage.max-hardlinks" with default of 100 for version 4.0. >> I checked it in my gluster 3.12.13 but here the parameter is not yet >> implemented.If this problem is backend filesystem related it would be good to have it documented also for 4.0 that the storage.max-hardlinks parameter would work only if the backend is e.g. xfs and has enough inode space for it (best with a reference/short example howto calculate it) ? Thanks and nice weekend Reiner -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180831/f2b1d8f0/attachment.html>
Shyam Ranganathan
2018-Sep-10 16:32 UTC
[Gluster-users] Bug with hardlink limitation in 3.12.13 ?
On 08/31/2018 01:06 PM, Reiner Keller wrote:> Hello, > > Am 31.08.2018 um 13:59 schrieb Shyam Ranganathan: >> I suspect you have hit this: >> https://bugzilla.redhat.com/show_bug.cgi?id=1602262#c5 >> >> I further suspect your older setup was 3.10 based and not 3.12 based. >> >> There is an additional feature added in 3.12 that stores GFID to path >> conversion details using xattrs (see "GFID to path" in >> https://docs.gluster.org/en/latest/release-notes/3.12.0/#major-changes-and-features >> ) >> >> Due to which xattr storage limit is reached/breached on ext4 based bricks. >> >> To check if you are facing similar issue to the one in the bug provided >> above, I would check if the brick logs throw up the no space error on a >> gfid2path set failure. > > thanks for the hint. > > From log output (= no gfid2path errors) it seems to be not the problem > although the old > gluster volume was setup with version 3.10.x (or even 3.8.x i think). > > I wrote I could reproduce it on new ext4? and on old xfs gluster volumes > with version > 3.12.13 while it was running fine with ~ 3.12.8 (half year ago) without > problems. > > But just saw that my old main volume wasn't/isn't xfs but also ext4. > Digging into logs I could see that I was running in January still 3.10.8 > / 3.10.9 > and initial switched in April to 3.12.9 / 3.12 version branch. > > From entry sizes/differences your suggestion would fit: > > ??? https://manpages.debian.org/testing/manpages/xattr.7.en.html or > ??? http://man7.org/linux/man-pages/man5/attr.5.html > > In the current ext2, ext3, and ext4 filesystem implementations, the > total bytes used by the names and values of all of a file's extended > attributes must fit in a single filesystem block (1024, 2048 or 4096 > bytes, depending on the block size specified when the filesystem was > created). > > because I can see differences by volume setup type:<huge snip> So in short, the inode size limits in ext4 impacts the hard link counts that can be created in Gluster, which is the limitation that you hit, would that be a correct summary?> > >> To check if you are facing similar issue to the one in the bug provided >> above, I would check if the brick logs throw up the no space error on a >> gfid2path set failure. > > Is there some parameter to get more detailed error logging ? But from > docu it looks like it has default good settings:The error logs posted are from the client (FUSE mount) logs, the log lines with the gfid2path that I was mentioning is on the bricks. There is no further logging level that needs to change to see the said errors as these are warning and above.>> >>> My search for documentation found only the parameter >>> "storage.max-hardlinks" with default of 100 for version 4.0. >>> I checked it in my gluster 3.12.13 but here the parameter is not yet >>> implemented. > > If this problem is backend filesystem related it would be good to have > it documented also for 4.0 that the storage.max-hardlinks parameter > would work only if the backend is e.g. xfs and has enough inode space > for it (best with a reference/short example howto calculate it) ?Fair point, raised a github issue around the same here [1] (contributions welcome :) ). Regards, Shyam [1] Gluster documentation github issue for hardlink and ext4 limitations: https://github.com/gluster/glusterdocs/issues/418