Joseph Qi
2022-Nov-21 01:31 UTC
[Ocfs2-devel] dinode link count inconsistency in ocfs2_read_links_count() logic
Hi, ocfs2_mknod()/ocfs2_link()/ocfs2_rename() will always check ocfs2_link_max(), so it seems that it won't overflow. I wonder if you've encounter another bug that leads to dinode link count leaking. Thanks, Joseph On 11/18/22 5:47 PM, Alexey Asemov (Alex/AT) via Ocfs2-devel wrote:> Hello, > > While diagnosing unlink hang issue, I have found an inconsistency in dinode hardlink count handling. > > If we take a look at ocfs2.h, sections below the text, we see ocfs2_set_links_count() always stores high portion of link count into dinode, but ocfs2_read_links_count() retrieves and adds high portion *only* when inode has OCFS2_INDEXED_DIR_FL flag set. > > The problem is, ocfs2_read_links_count() is used throughout all the code and not directories only. For files, OCFS2_INDEXED_DIR_FL flag is never present, so when number of hardlinks for file spills over 65535, it will be written as 65536 correctly to dinode, but then will always be read as 0 to inode due to the check. This causes wrong link count of 0 in stat() and total hang on attempt to unlink the 'parent' inode directory entry somehow (this is the issue I hit but I have not diagnosed how it happens because I stumbled on wrong link count immediately and dug that direction). > > While not sure about the internals, I suggest removing > > - if (di->i_dyn_features & cpu_to_le16(OCFS2_INDEXED_DIR_FL)) > > check for the time being inside ocfs2_read_links_count() so it always reads the correct link count from dinode, be it file or directory. > > I do not see dinode link count parts used outside ocfs2.h API, so it should be semantically correct thing to do, but please someone acquainted with the logic confirm. > > KR, > Alex > > --- > > static inline unsigned int ocfs2_read_links_count(struct ocfs2_dinode *di) > { > ??????? u32 nlink = le16_to_cpu(di->i_links_count); > ??????? u32 hi = le16_to_cpu(di->i_links_count_hi); > > ??????? if (di->i_dyn_features & cpu_to_le16(OCFS2_INDEXED_DIR_FL)) > ??????????????? nlink |= (hi << OCFS2_LINKS_HI_SHIFT); > > ??????? return nlink; > } > > static inline void ocfs2_set_links_count(struct ocfs2_dinode *di, u32 nlink) > { > ??????? u16 lo, hi; > > ??????? lo = nlink; > ??????? hi = nlink >> OCFS2_LINKS_HI_SHIFT; > > ??????? di->i_links_count = cpu_to_le16(lo); > ??????? di->i_links_count_hi = cpu_to_le16(hi); > } > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Alexey Asemov (Alex/AT)
2022-Nov-21 07:27 UTC
[Ocfs2-devel] dinode link count inconsistency in ocfs2_read_links_count() logic
Hello, ocfs2_link_max() relies on filesystem wide indexed directories flag, while ocfs2_read_links_count() relies on per-dinode flag, and so this is the exact inconsistency in question. I am not versed in code enough to be sure about exactly why this limit exists, but it seems supporting OCFS2_DX_LINK_MAX is no problem for the code. It's just optionally not adding high part of link count just seems weird to me, the field is there/zeroed even with no indexed dirs, so always using it feels logical. --- | if (ocfs2_supports_indexed_dirs(osb)) return OCFS2_DX_LINK_MAX; return OCFS2_LINK_MAX;| On 21.11.2022 4:31, Joseph Qi wrote:> Hi, > > ocfs2_mknod()/ocfs2_link()/ocfs2_rename() will always check > ocfs2_link_max(), so it seems that it won't overflow. > I wonder if you've encounter another bug that leads to dinode link count > leaking. > > Thanks, > Joseph > > On 11/18/22 5:47 PM, Alexey Asemov (Alex/AT) via Ocfs2-devel wrote: >> Hello, >> >> While diagnosing unlink hang issue, I have found an inconsistency in dinode hardlink count handling. >> >> If we take a look at ocfs2.h, sections below the text, we see ocfs2_set_links_count() always stores high portion of link count into dinode, but ocfs2_read_links_count() retrieves and adds high portion *only* when inode has OCFS2_INDEXED_DIR_FL flag set. >> >> The problem is, ocfs2_read_links_count() is used throughout all the code and not directories only. For files, OCFS2_INDEXED_DIR_FL flag is never present, so when number of hardlinks for file spills over 65535, it will be written as 65536 correctly to dinode, but then will always be read as 0 to inode due to the check. This causes wrong link count of 0 in stat() and total hang on attempt to unlink the 'parent' inode directory entry somehow (this is the issue I hit but I have not diagnosed how it happens because I stumbled on wrong link count immediately and dug that direction). >> >> While not sure about the internals, I suggest removing >> >> - if (di->i_dyn_features & cpu_to_le16(OCFS2_INDEXED_DIR_FL)) >> >> check for the time being inside ocfs2_read_links_count() so it always reads the correct link count from dinode, be it file or directory. >> >> I do not see dinode link count parts used outside ocfs2.h API, so it should be semantically correct thing to do, but please someone acquainted with the logic confirm. >> >> KR, >> Alex >> >> --- >> >> static inline unsigned int ocfs2_read_links_count(struct ocfs2_dinode *di) >> { >> ??????? u32 nlink = le16_to_cpu(di->i_links_count); >> ??????? u32 hi = le16_to_cpu(di->i_links_count_hi); >> >> ??????? if (di->i_dyn_features & cpu_to_le16(OCFS2_INDEXED_DIR_FL)) >> ??????????????? nlink |= (hi << OCFS2_LINKS_HI_SHIFT); >> >> ??????? return nlink; >> } >> >> static inline void ocfs2_set_links_count(struct ocfs2_dinode *di, u32 nlink) >> { >> ??????? u16 lo, hi; >> >> ??????? lo = nlink; >> ??????? hi = nlink >> OCFS2_LINKS_HI_SHIFT; >> >> ??????? di->i_links_count = cpu_to_le16(lo); >> ??????? di->i_links_count_hi = cpu_to_le16(hi); >> } >> >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel at oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20221121/4ebbdfea/attachment.html>
Alexey Asemov (Alex/AT)
2022-Nov-21 16:15 UTC
[Ocfs2-devel] dinode link count inconsistency in ocfs2_read_links_count() logic
The modification I mentioned surely fixes the thing (well, obviously). If to reproduce, trivial as well: - mkfs with indexed-dirs option (otherwise low hardlink limit will apply) - Create a file - Create 65535 hardlinks to it, stat will now show 65536 hardlinks (that comes from local inode cache) - Clear inode cache (for me cat 3 > /proc/sys/vm/drop_caches is enough) or reboot - stat again and you will see hardlink count = 0 that is the result of actual inconsistency