Dear all,
I already sent this email but I wrongly added attachments, resulting in
the first email being rejected. Sorry if you read the followup with ZFS
properties.
I'm sending it again using external storage where needed.
I recently incurred in an issue with Samba 4.7.6-ubuntu (Ubuntu 18.04.05
LTS, with latest updates).
I'm not sure what caused the issue, since at the same time I performed
updates to Ubuntu (apt upgrade) and I separately updated the zfs kernel
module to 2.0.4 from whatever version available mid last year, also some
time passed before I noticed the present issue.
The issue manifests itself as following: when I try to open some files
(not all of them) from any SMB share (I tried two different Windows 10
machines and one macOS 10.14), the smb3 process takes 100% CPU, it
doesn't seem to retrieve the file, and I have to kill it "-9", the
hard
way. I found that, sometimes, after I kill smbd3 Windows gets the file I
tried to open, but it's not a way that always works: sometimes it's
corrupted (cut too short).
I tested locally from ssh and the same files have no issues whatsoever,
it's not a hardware issue. I can read and copy them without problems.
I run influxdb on the same server to track disk and system load and I
can see that at the same time smb goes crazy the ZFS cache stats jump
from about 90 read ops/s to about 250K read ops/s (requested, less than
1 op/s actually goes to disk thanks to caching).
You can find the ZFS cache stats graph here: https://postimg.cc/ns315Kkn
I set logging in smb.conf to level 3 and I deleted all the logs, then I
restarted smbd and nmbd. I opened a file I know causes the locking and
after I verified the issue, I killed smbd3 and copied the logs.
You can find the documents here:
smb.conf https://pastebin.com/SsVBhsLY
log.192.168.3.2 https://pastebin.com/uFqLEhVH
log.desktop-eiu2a2q https://pastebin.com/ReKn0qx6
log.nmbd https://pastebin.com/gMsvyqnB
log.smbd https://pastebin.com/e3PVjnZu
log. https://pastebin.com/BjutvvQv
I also paste the output of "ls -l" and "getfacl" on a file
which I know
causes the issue. I don't post the same information for any other file,
for the simple reason that the attributes are the same... there is no
apparent difference between in the properties of safe and troubling files.
root at ml110g7# ls -l CI\ 2017.pdf
-rwxrw---- 1 olaf olaf 1570923 Feb 15 21:49 'CI 2017.pdf'
root at ml110g7# getfacl CI\ 2017.pdf
# file: CI 2017.pdf
# owner: olaf
# group: olaf
user::rwx
group::rw-
other::---
The ZFS dataset properties which may be relevant are here, the remaining
one are basically standard:
NAME PROPERTY VALUE
tank/home/olaf atime off
tank/home/olaf aclmode discard
tank/home/olaf aclinherit restricted
tank/home/olaf xattr sa
tank/home/olaf sharesmb off
tank/home/olaf acltype posix
tank/home/olaf relatime off
tank/home/olaf redundant_metadata all
Can someone please help me debug the issue?
I hope I provided all the needed information, but I can post more if I
missed some.
Thanks in advance
Olaf Marzocchi