thr3ads.net - Virtualization - [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs [Sep 2023]

If this information is useful, please help other people find it:
Share via:

Erik Schilling

2023-Sep-01 10:37 UTC

[BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs

On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling
wrote:> Hi all!
>
> Some days ago I posted to #virtiofs:matrix.org, describing that I am
> observing what looks like a corruption when executing programs from a
> virtiofs-based filesystem.
>
> Over the last few days I spent more time drilling into the problem.
> This is an attempt at summarizing my findings in order to see what other
> people think about this.
>
> When running binaries mounted from virtiofs they may either: fail with a
> segfault, fail with badaddr, get stuck or - sometimes - succeed.
>
> Environment:
>   Host: Fedora 38 running 6.4.11-200.fc38.x86_64
>   Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
>   virtiofsd: latest main + some debug prints [1]
>   QEMU: built from recent git [2]
>
> virtiofsd invocation:
>   RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
>     --socket-path "fs.sock0" --shared-dir $PWD/share-dir/
--cache=never
>
> QEMU invocation:
>   ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
>     -cpu cortex-a57 \
>     -serial mon:stdio \
>     -device virtio-net-pci,netdev=net0 \
>     -netdev user,id=net0,hostfwd=tcp::2223-:22 \
>     -display none -m 2048 -smp 4 \
>     -object memory-backend-memfd,id=mem,size=2048M,share=on \
>     -numa node,memdev=mem \
>     -hda trs-overlay-guest.qcow2 \
>     -chardev socket,id=char0,path="fs.sock0" \
>     -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
>     -append 'root=/dev/vda2 ro log_buf_len=8M'
>
> I figured that launching virtiofsd with --cache=always masks the
> problem. Therefore, I set --cache=never, but I think I observed no
> difference compared to the default setting (auto).
>
> Adding logging to virtiofsd and kernel _feeled_ like it made the problem
> harder to reproduce - leaving me with the impression that some race is
> happening on somewhere.
>
> Trying to rule out that virtiofsd is returning corrupted data, I added
> some logging and hashsum calculation hacks to it [1]. The hashes check
> out across multiple accesses and the order and kind of queued messages
> is exactly the same in both the error case and crash case. fio was also
> unable to find any errors with a naive job description [3].
>
> Next, I tried to capture info on the guest side. This became a bit
> tricky since the crashes became pretty rare once I followed a fixed
> pattern of starting log capture, running perf and trying to reproduce
> the problem. Ultimately, I had the most consistent results with
> immediately running a program twice:
>
>   /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
>     /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
>
>   (/mnt being the virtiofs mount)
>
> For collecting logs, I made a hack to the guest kernel in order to dump
> the page content after receiving the virtiofs responses [4]. Reproducing
> the problem with this, leaves me with logs that seem to suggest that
> virtiofsd is returning identical content, but the guest kernel seems to
> receive differing pages:
>
> good-kernel [5]:
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid
0x1 in.len 56 out.len 104
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid
0x1 in.len 53 out.len 128
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid
0x29 in.len 56 out.len 104
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid
0x29 in.len 48 out.len 16
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid
0x29 in.len 80 out.len 832
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs: page: 000000006996d520
>   kernel: virtio_fs: to: 00000000de590c14
>   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
>   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
>   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
>   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
>   [...]
>
> bad-kernel [6]:
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid
0x1 in.len 56 out.len 104
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid
0x1 in.len 53 out.len 128
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid
0x16 in.len 56 out.len 104
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid
0x16 in.len 48 out.len 16
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid
0x16 in.len 80 out.len 832
>   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>   kernel: virtio_fs: page: 000000006ce9a559
>   kernel: virtio_fs: to: 000000007ae8b946
>   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
>   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
................
>   kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00  .
at .......+b.....
>   kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 
.N..............
>   [...]
>
> When looking at the corresponding output from virtiofsd, it claims to
> have returned identical data:
>
> good-virtiofsd [7]:
>   [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41,
unique=794, pid=481
>   [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn
{
>       fh: 31,
>       offset: 0,
>       size: 832,
>       read_flags: 2,
>       lock_owner: 6838554705639967244,
>       flags: 131072,
>       padding: 0,
>   }
>   [src/file_traits.rs:161] hash = 2308490450751364994
>   [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848,
error: 0, unique: 794 }
>
> bad-virtiofsd [8]:
>   [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22,
unique=362, pid=406
>   [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn
{
>       fh: 12,
>       offset: 0,
>       size: 832,
>       read_flags: 2,
>       lock_owner: 6181120926258395554,
>       flags: 131072,
>       padding: 0,
>   }
>   [src/file_traits.rs:161] hash = 2308490450751364994
>   [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848,
error: 0, unique: 362 }
>
> The "corruption" only seems to happen in this one page, all other
pages
> are identical between runs (except that the bad run terminates earlier).
>
> What do the experts think here? To me it feels a bit like some kind of
> corruption is going on. Or am I misinterpreting things here?
>
> Which further analysis steps would you suggest?
>
>
> Further notes:
>
> After collecting the above results, I realized that running the guest
> with -smp 1 makes the problems a lot worse. So maybe that is a better
> choice when trying to reproduce it.
>
> Repo with my scripts is available at:
>
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
>
> The scripts are just quick and dirty implementations and are not
> particulary portable.
Summary of my testing during the last few days:

Testing with KCSAN revealed a few cases that look like missing READ_ONCE
annotations (will send patches separately). But nothing of that was
related to the immediate problem. I tested instrument_read() and another
round of logging with a delay to virtio_fs_request_complete. It looks
like the buffer get corrupted before entering that function. KCSAN
or manual sleeps + prints did not show any corruption while in that
function.

KASAN did not report any issues.

Patching virtiofsd to do an additional copy and going through rust-vmm's
.copy_to() function did not change the behaviour.

I will mostly be off next week, will continue analysis afterwards. Happy
to hear about suggestions of other things to try :).

Good weekend,

- Erik

>
> - Erik
>
> [1]
https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
> [2]
https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
> [3]
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
> [4]
https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
> [5] Line 12923:
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
> [6] Line 12923:
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
> [7]
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
> [8]
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063

Erik Schilling

2023-Sep-13 06:18 UTC

head link

[BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs

On Fri Sep 1, 2023 at 12:37 PM CEST, Erik Schilling
wrote:> On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote:
> > Hi all!
> >
> > Some days ago I posted to #virtiofs:matrix.org, describing that I am
> > observing what looks like a corruption when executing programs from a
> > virtiofs-based filesystem.
> >
> > Over the last few days I spent more time drilling into the problem.
> > This is an attempt at summarizing my findings in order to see what
other
> > people think about this.
> >
> > When running binaries mounted from virtiofs they may either: fail with
a
> > segfault, fail with badaddr, get stuck or - sometimes - succeed.
> >
> > Environment:
> >   Host: Fedora 38 running 6.4.11-200.fc38.x86_64
> >   Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
> >   virtiofsd: latest main + some debug prints [1]
> >   QEMU: built from recent git [2]
> >
> > virtiofsd invocation:
> >   RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none
\
> >     --socket-path "fs.sock0" --shared-dir $PWD/share-dir/
--cache=never
> >
> > QEMU invocation:
> >   ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine
virt \
> >     -cpu cortex-a57 \
> >     -serial mon:stdio \
> >     -device virtio-net-pci,netdev=net0 \
> >     -netdev user,id=net0,hostfwd=tcp::2223-:22 \
> >     -display none -m 2048 -smp 4 \
> >     -object memory-backend-memfd,id=mem,size=2048M,share=on \
> >     -numa node,memdev=mem \
> >     -hda trs-overlay-guest.qcow2 \
> >     -chardev socket,id=char0,path="fs.sock0" \
> >     -device
vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
> >     -append 'root=/dev/vda2 ro log_buf_len=8M'
> >
> > I figured that launching virtiofsd with --cache=always masks the
> > problem. Therefore, I set --cache=never, but I think I observed no
> > difference compared to the default setting (auto).
> >
> > Adding logging to virtiofsd and kernel _feeled_ like it made the
problem
> > harder to reproduce - leaving me with the impression that some race is
> > happening on somewhere.
> >
> > Trying to rule out that virtiofsd is returning corrupted data, I added
> > some logging and hashsum calculation hacks to it [1]. The hashes check
> > out across multiple accesses and the order and kind of queued messages
> > is exactly the same in both the error case and crash case. fio was
also
> > unable to find any errors with a naive job description [3].
> >
> > Next, I tried to capture info on the guest side. This became a bit
> > tricky since the crashes became pretty rare once I followed a fixed
> > pattern of starting log capture, running perf and trying to reproduce
> > the problem. Ultimately, I had the most consistent results with
> > immediately running a program twice:
> >
> >   /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
> >     /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
> >
> >   (/mnt being the virtiofs mount)
> >
> > For collecting logs, I made a hack to the guest kernel in order to
dump
> > the page content after receiving the virtiofs responses [4].
Reproducing
> > the problem with this, leaves me with logs that seem to suggest that
> > virtiofsd is returning identical content, but the guest kernel seems
to
> > receive differing pages:
> >
> > good-kernel [5]:
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312
nodeid 0x1 in.len 56 out.len 104
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314
nodeid 0x1 in.len 53 out.len 128
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316
nodeid 0x29 in.len 56 out.len 104
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318
nodeid 0x29 in.len 48 out.len 16
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a
nodeid 0x29 in.len 80 out.len 832
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs: page: 000000006996d520
> >   kernel: virtio_fs: to: 00000000de590c14
> >   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
> >   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
> >   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
> >   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
> >   [...]
> >
> > bad-kernel [6]:
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162
nodeid 0x1 in.len 56 out.len 104
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164
nodeid 0x1 in.len 53 out.len 128
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166
nodeid 0x16 in.len 56 out.len 104
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168
nodeid 0x16 in.len 48 out.len 16
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a
nodeid 0x16 in.len 80 out.len 832
> >   kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> >   kernel: virtio_fs: page: 000000006ce9a559
> >   kernel: virtio_fs: to: 000000007ae8b946
> >   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
> >   kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
> >   kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00
00  . at .......+b.....
> >   kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00
00  .N..............
> >   [...]
> >
> > When looking at the corresponding output from virtiofsd, it claims to
> > have returned identical data:
> >
> > good-virtiofsd [7]:
> >   [DEBUG virtiofsd::server] Received request: opcode=Read (15),
inode=41, unique=794, pid=481
> >   [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? =
ReadIn {
> >       fh: 31,
> >       offset: 0,
> >       size: 832,
> >       read_flags: 2,
> >       lock_owner: 6838554705639967244,
> >       flags: 131072,
> >       padding: 0,
> >   }
> >   [src/file_traits.rs:161] hash = 2308490450751364994
> >   [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848,
error: 0, unique: 794 }
> >
> > bad-virtiofsd [8]:
> >   [DEBUG virtiofsd::server] Received request: opcode=Read (15),
inode=22, unique=362, pid=406
> >   [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? =
ReadIn {
> >       fh: 12,
> >       offset: 0,
> >       size: 832,
> >       read_flags: 2,
> >       lock_owner: 6181120926258395554,
> >       flags: 131072,
> >       padding: 0,
> >   }
> >   [src/file_traits.rs:161] hash = 2308490450751364994
> >   [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848,
error: 0, unique: 362 }
> >
> > The "corruption" only seems to happen in this one page, all
other pages
> > are identical between runs (except that the bad run terminates
earlier).
> >
> > What do the experts think here? To me it feels a bit like some kind of
> > corruption is going on. Or am I misinterpreting things here?
> >
> > Which further analysis steps would you suggest?
> >
> >
> > Further notes:
> >
> > After collecting the above results, I realized that running the guest
> > with -smp 1 makes the problems a lot worse. So maybe that is a better
> > choice when trying to reproduce it.
> >
> > Repo with my scripts is available at:
> >
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
> >
> > The scripts are just quick and dirty implementations and are not
> > particulary portable.
>
> Summary of my testing during the last few days:
>
> Testing with KCSAN revealed a few cases that look like missing READ_ONCE
> annotations (will send patches separately). But nothing of that was
> related to the immediate problem. I tested instrument_read() and another
> round of logging with a delay to virtio_fs_request_complete. It looks
> like the buffer get corrupted before entering that function. KCSAN
> or manual sleeps + prints did not show any corruption while in that
> function.
>
> KASAN did not report any issues.
>
> Patching virtiofsd to do an additional copy and going through
rust-vmm's
> .copy_to() function did not change the behaviour.
>
> I will mostly be off next week, will continue analysis afterwards. Happy
> to hear about suggestions of other things to try :).
Back from a week of vacation...

Summary of what was discussed on #virtiofs:matrix.org:

The issue only seems to happen in QEMU TCG scenarios (I tested aarch64
and x86_64 on x86_64, wizzard on Matrix tested arm32).

CCing qemu-devel. Maybe someone has some hints on where to focus the
debugging efforts?

I am trying to build a complex monster script of tracing the relevant
addresses in order to figure out whether the guest or host does the
writes. But I am happy to hear about more clever ideas :).

- Erik
>
> Good weekend,
>
> - Erik
>
>
> >
> > - Erik
> >
> > [1]
https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
> > [2]
https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
> > [3]
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
> > [4]
https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
> > [5] Line 12923:
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
> > [6] Line 12923:
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
> > [7]
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
> > [8]
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063

Virtualization - Sep 2023 - [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs

[BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs

[BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs