thr3ads.net - rsync - [Bug 10925] New: non-atomic xattr replacement in btrfs => rsync --read-batch random errors [Nov 2014]

If this information is useful, please help other people find it:
Share via:

samba-bugs at samba.org

2014-Nov-07 01:38 UTC

[Bug 10925] New: non-atomic xattr replacement in btrfs => rsync --read-batch random errors

https://bugzilla.samba.org/show_bug.cgi?id=10925

            Bug ID: 10925
           Summary: non-atomic xattr replacement in btrfs => rsync
                    --read-batch random errors
           Product: rsync
           Version: 3.1.0
          Hardware: All
               URL: http://article.gmane.org/gmane.comp.file-systems.btrfs
                    /40013
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: wayned at samba.org
          Reporter: oliva at gnu.org
        QA Contact: rsync-qa at samba.org

I tried to cross-post this by email to btrfs and rsync lists, but the latter
bounced because I'm not subscribed, so here's a copy of the post there,
with
additional info at the end, that I forgot to put in the initial message, but
that I posted to the former as a followup:

A few days ago, I started using rsync batches to archive old copies of ceph OSD
snapshots for certain kinds of disaster recovery.  This seems to exercise an
unexpected race condition in rsync, which happens to expose what appears to be
a race condition in btrfs, causing random scary but harmless errors when
replaying the rsync batches.

strace has revealed that the two rsync processes running concurrently to apply
the batch both attempt to access xattrs of the same directory concurrently.  I
understand rsync is supposed to avoid this, but something's going wrong with
that.  Here's the smoking gun, snipped from strace -p 27251 -p 27253 -o
smoking.gun, where both processes are started from a single rsync
--read-batch=- -aHAX --del ... run:

0: 27251 stat("osd/0.6ed_head/DIR_D/DIR_E/DIR_6",  <unfinished
...>
1: 27253 stat("osd/0.6ed_head/DIR_D/DIR_E/DIR_6",
{st_mode=S_IFDIR|0755,
st_size=5470, ...}) = 0
2: 27251 <... stat resumed> {st_mode=S_IFDIR|0755, st_size=5470, ...}) = 0
3: 27253 llistxattr("osd/0.6ed_head/DIR_D/DIR_E/DIR_6",
"user.cephos.phash.contents\0", 1024) = 27
4: 27251 llistxattr("osd/0.6ed_head/DIR_D/DIR_E/DIR_6", 
<unfinished ...>
5: 27253 lsetxattr("osd/0.6ed_head/DIR_D/DIR_E/DIR_6",
"user.cephos.phash.contents",
"\x01F\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x03\x00\x00", 17, 0
<unfinished ...>
6: 27251 <... llistxattr resumed>
"user.cephos.phash.contents\0", 1024) = 27
7: 27251 lgetxattr("osd/0.6ed_head/DIR_D/DIR_E/DIR_6",
"user.cephos.phash.contents", 0x0, 0) = -1 ENODATA (No data available)
8: 27253 <... lsetxattr resumed> )         = 0
9: 27253 utimensat(AT_FDCWD, "osd/0.6ed_head/DIR_D/DIR_E/DIR_6",
{UTIME_NOW,
{1407992261, 0}}, AT_SYMLINK_NOFOLLOW) = 0
a: 27251 write(2, "rsync: get_xattr_data: lgetxattr"..., 181) = 181

lines 0-2, 3-6 and 5-8, show concurrent access of both rsync processes to the
same directory.  This wouldn't be a problem, not even for replaying batches,
for the lsetxattr would put the intended xattr value in there regardless of
whether the scanner saw the xattr value before or after that.


What makes the problem visible is that btrfs appears to have a race in its
handling of xattr replacement, leaving a window between the removal of the old
value and the insertion of the new one, as shown by lines 5-8.  line 3 show the
attribute existed before, and lines 5-8 show it disappears in line 7, while
lsetxattr still runs to replace it.

If rsync tries hard enough to hit this window, the lgetxattr concurrent to the
lsetxattr eventually hits, and then rsync reports an error:

rsync: get_xattr_data:
lgetxattr(""/media/px/snapshots/cluster/20141102-to-20140816/osd/0.6ed_head/DIR_D/DIR_E/DIR_6"","user.cephos.phash.contents",0)
failed: No data available (61)

At the end, it exits with a nonzero status, even though nothing really wrong
went on and the tree ended up looking just as it was supposed to.


Now, I'm a bit concerned because the btrfs race condition, if exercised on
security-related xattrs or ACLs, could cause data to become visible that
shouldn't, which could turn this into a locally exploitable security issue. 
Sure enough nobody goes nuts repeatedly changing the ACLs of a dir or file
containing information that should be guarded by it, so as to increase the
likelihood that an attacker succeeds in accessing the data, but still...  I
don't think the temporary removal of the xattr for subsequent insertion
should
be visible at all.

I'm sorry for reporting a potential security issue like that, but by the
time
it occurred to me that it might have potential security implications, I'd
already mentioned the problem on #btrfs at FreeNode, so the horse was out of
the barn already :-(


The bugs described above occurred with rsync-3.1.0-5.fc20.x86_64 and
kernel-libre-3.16.7-200.fc20.gnu.x86_64.  The btrfs code in kernel-libre is
unchanged from the corresponding Fedora kernel.  The distro is BLAG
200k/x86_64, under development.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

samba-bugs at samba.org

2014-Nov-19 16:21 UTC

head link

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

https://bugzilla.samba.org/show_bug.cgi?id=10925

--- Comment #1 from roland <devzero at web.de> ---
interesting find, if btrfs has xattr races, but the question is how to produce
an appropriate repro case.

on a tiny btrfs here on my debian wheezy system, i did some massive parallel
run of 

"setfattr -h -n user.myattr -v attrval dir" in one window, 
whereas doing
getfattr -h -n user.myattr dir  in another window, 

and would have expected to seem some errors. i didn`t. as this is the default
kernel, maybe the race is being introduced in a later version.

furthermore - what specific problem (security or whatever) do you see here with
rsync, if there is a race window between setting and reading an xattr (which
seems is not to be expected behaviour of a filesystem) ?

also see:
https://btrfs.wiki.kernel.org/index.php/Main_Page
If you have any bug, problems, performance issues or questions while using
Btrfs, please email the Btrfs mailing list (no subscription required). Please
report bugs also on Bugzilla.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

samba-bugs at samba.org

2014-Nov-19 17:03 UTC

head link

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

https://bugzilla.samba.org/show_bug.cgi?id=10925

--- Comment #2 from roland <devzero at web.de> ---
it has been resolved already - two days after this report !!!

http://marc.info/?l=linux-btrfs&m=141552257121836&w=2

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

samba-bugs at samba.org

2014-Nov-22 05:13 UTC

head link

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

https://bugzilla.samba.org/show_bug.cgi?id=10925

--- Comment #3 from Alexandre Oliva <oliva at gnu.org> ---
Yeah, the btrfs bug is now fixed, so now the rsync --read-batch misbehavior
changed from bad error reports to silent waste of cpu cycles: it doesn't
make
sense for rsync to test and set xattrs concurrently on the same files: it
should either test them in the scanner process BEFORE telling the changer
process whether xattr changes need to be applied in the first place, OR it
should check them AFTER the changer has made the changes.  As it stands, the
check is wasteful: it is either ignored altogether, or it may see xattrs before
they are changed and conclude the changes have to be made again.  Either way,
we waste cpu cycles, syscalls, and even disk space, if we modify a
copy-on-write snapshot overwriting xattrs with the values they already have,
unnecessarily touching inodes' ctimes in the process.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

samba-bugs at samba.org

2014-Nov-22 11:31 UTC

head link

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

https://bugzilla.samba.org/show_bug.cgi?id=10925

--- Comment #4 from roland <devzero at web.de> ---
yes, you are right. thanks for pointing it out. 

wouldn`t it make sense to close this one and make  a clean, new bugzilla entry
with a proper title what the problem is about - severity/importance
"enhancement" ?

what this report is about is not obvious for any reader without reading deeper
into it - and in the end it`s not related to btrfs at all, besides the fact
that btrfs misbehaviour helped finding this issue.

so i would make another bugzilla entry with short/proper description about
what`s the problem and point to this one for reference

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Possibly Parallel Threads

Search for more maybe matching threads

rsync - Nov 2014 - [Bug 10925] New: non-atomic xattr replacement in btrfs => rsync --read-batch random errors

[Bug 10925] New: non-atomic xattr replacement in btrfs => rsync --read-batch random errors

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

[Bug 10925] non-atomic xattr replacement in btrfs => rsync --read-batch random errors

Possibly Parallel Threads