Nico Kadel-Garcia
2021-Aug-01 22:22 UTC
[Samba] rsync copy operation fails on a CIFS mount
On Sun, Aug 1, 2021 at 11:14 AM SZIGETV?RI J?nos via samba <samba at lists.samba.org> wrote:> > Dear Members, > > Do you happen to have any recommendations on where I should begin? > > Best Regards, > J?nosrsync, on top of CIFS, on top of the limitations of Windows filesystems, is a flipping adventure. I'd split the task into multiple rsync source and matching directories, to see if it's the size of the task or specific files that are triggering the issue. Overpopulated directories with too many thousands of files in that one directory are a notorious issue. In particular, if the filesystem is live and in use by other tools, files are likely to disappear during the synchronization. Running two "rsync -a --delete" at once to the same target directory can create this sort of adventure. Reducing the rsync migration sizes may help. And rsync is supposed to be pretty idempotant, so with normal configurations you should be able to simply run the same command twice and let it continue from where it failed, without manual tuning. Nico Kadel-Garcia> SZIGETV?RI J?nos <jszigetvari at gmail.com> ezt ?rta (id?pont: 2021. j?l. 30., > P, 12:52): > > > Dear Members, > > > > I work for a company that (among others) sells Ubuntu-based log storage > > appliances. I ran into a problem, where I'm trying to copy a large amount > > of data over to a CIFS mount from a Ubuntu 18.04 based appliance to a > > Windows 2012 R2 Storage Server, and rsync fails after 1-2-3 hours into the > > copy operation with something like: > > rsync: failed to set times on "FILENAME.U5EgGX": No such device (19)" > > > > and I see a number of kernel logs just prior to that, that look like this: > > > > kernel: [3819786.441711] CIFS VFS: No task to wake, unknown frame > > received! NumMids 1 > > kernel: [3819786.441717] 00000000: 6c000000 424d53fe 00000040 00000000 > > ...l.SMB at ....... > > kernel: [3819786.441718] 00000010: 00000012 00000001 00000000 ffffffff > > ................ > > kernel: [3819786.441720] 00000020: ffffffff 00000000 00000000 00000000 > > ................ > > kernel: [3819786.441721] 00000030: 00000000 00000000 00000000 00000000 > > ................ > > kernel: [3819786.441722] 00000040: 00000000 > > > > Now who should I turn to get to the end of this problem? > > Should I file a bug for the Samba project? Or is it the kernel code that > > may be affected? If that is the case, who should I turn to? > > > > I also tried to google around for a while, and I found the same exact > > package hexdump on the linux-cifs mailing list from 2012: > > https://www.spinics.net/lists/linux-cifs/msg06634.html > > > > In that thread the person reporting the problem failed to reproduce the > > problem a few weeks after reporting it. > > There it was recommended to try and mount the share with SMB v1, but that > > is out of the question nowdays. > > > > We tried forcing the mount to happen with vers=3 and 3.02, but it made no > > difference. > > > > Best Regards, > > J?nos Szigetv?ri > > -- > > Janos SZIGETVARI > > RHCE, License no. 150-053-692 > > <https://www.redhat.com/rhtapps/verify/?certId=150-053-692> > > > > LinkedIn: linkedin.com/in/janosszigetvari > > > > __ at __?V? > > Make the switch to open (source) applications, protocols, formats now: > > - windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice > > - msn -> jabber protocol (Pidgin, Google Talk) > > - mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp > > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/options/samba
Hi Rowland and Nico, I wasn't at all saying that the problem is rsync related. I just included its output and the error string for context. The whole thing seems to us as if something would go wrong between the kernel CIFS code and the remote Windows server providing the share, after a few hours into the copy operation. Unfortunately this is an appliance we are talking about, and we can't easily modify subcomponents to work entirely differently than before. Regarding the directory structure of the data we copy: we usually aim to move few or several directories every day, with each directory containing about 150 GB of data, with one larger file (around 100 GB in size), and 1000+ smaller files, with a size of below 100 MB each. Either way, we far from tens of thousands of files per directory. The source files that are being moved are not in active use. We are talking about an archive job for files that are not being written to any longer. And even on the destination side, the only purpose of the storage server is to work as an off-appliance storage space, so neither the host, nor are other clients accessing the files on the destination side, while they are being copied or even afterwards. The archive jobs use locks to ensure that only one is running at a time, and execute sequentially if multiple jobs queue up. So all in all, I would kindly ask for tips how to troubleshoot this problem, and whether a bug report should be opened for Samba or the kernel CIFS code, or if I should ask for help at one of the other samba-* mailing lists. Thank you! Best Regards, J?nos Szigetv?ri -- Janos SZIGETVARI RHCE, License no. 150-053-692 <https://www.redhat.com/rhtapps/verify/?certId=150-053-692> LinkedIn: linkedin.com/in/janosszigetvari __ at __?V? Make the switch to open (source) applications, protocols, formats now: - windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice - msn -> jabber protocol (Pidgin, Google Talk) - mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp Nico Kadel-Garcia <nkadel at gmail.com> ezt ?rta (id?pont: 2021. aug. 2., H, 0:22):> On Sun, Aug 1, 2021 at 11:14 AM SZIGETV?RI J?nos via samba > <samba at lists.samba.org> wrote: > > > > Dear Members, > > > > Do you happen to have any recommendations on where I should begin? > > > > Best Regards, > > J?nos > > rsync, on top of CIFS, on top of the limitations of Windows > filesystems, is a flipping adventure. I'd split the task into multiple > rsync source and matching directories, to see if it's the size of the > task or specific files that are triggering the issue. Overpopulated > directories with too many thousands of files in that one directory are > a notorious issue. > > In particular, if the filesystem is live and in use by other tools, > files are likely to disappear during the synchronization. Running two > "rsync -a --delete" at once to the same target directory can create > this sort of adventure. Reducing the rsync migration sizes may help. > And rsync is supposed to be pretty idempotant, so with normal > configurations you should be able to simply run the same command twice > and let it continue from where it failed, without manual tuning. > > Nico Kadel-Garcia > > > SZIGETV?RI J?nos <jszigetvari at gmail.com> ezt ?rta (id?pont: 2021. j?l. > 30., > > P, 12:52): > > > > > Dear Members, > > > > > > I work for a company that (among others) sells Ubuntu-based log storage > > > appliances. I ran into a problem, where I'm trying to copy a large > amount > > > of data over to a CIFS mount from a Ubuntu 18.04 based appliance to a > > > Windows 2012 R2 Storage Server, and rsync fails after 1-2-3 hours into > the > > > copy operation with something like: > > > rsync: failed to set times on "FILENAME.U5EgGX": No such device (19)" > > > > > > and I see a number of kernel logs just prior to that, that look like > this: > > > > > > kernel: [3819786.441711] CIFS VFS: No task to wake, unknown frame > > > received! NumMids 1 > > > kernel: [3819786.441717] 00000000: 6c000000 424d53fe 00000040 00000000 > > > ...l.SMB at ....... > > > kernel: [3819786.441718] 00000010: 00000012 00000001 00000000 ffffffff > > > ................ > > > kernel: [3819786.441720] 00000020: ffffffff 00000000 00000000 00000000 > > > ................ > > > kernel: [3819786.441721] 00000030: 00000000 00000000 00000000 00000000 > > > ................ > > > kernel: [3819786.441722] 00000040: 00000000 > > > > > > Now who should I turn to get to the end of this problem? > > > Should I file a bug for the Samba project? Or is it the kernel code > that > > > may be affected? If that is the case, who should I turn to? > > > > > > I also tried to google around for a while, and I found the same exact > > > package hexdump on the linux-cifs mailing list from 2012: > > > https://www.spinics.net/lists/linux-cifs/msg06634.html > > > > > > In that thread the person reporting the problem failed to reproduce the > > > problem a few weeks after reporting it. > > > There it was recommended to try and mount the share with SMB v1, but > that > > > is out of the question nowdays. > > > > > > We tried forcing the mount to happen with vers=3 and 3.02, but it made > no > > > difference. > > > > > > Best Regards, > > > J?nos Szigetv?ri > > > -- > > > Janos SZIGETVARI > > > RHCE, License no. 150-053-692 > > > <https://www.redhat.com/rhtapps/verify/?certId=150-053-692> > > > > > > LinkedIn: linkedin.com/in/janosszigetvari > > > > > > __ at __?V? > > > Make the switch to open (source) applications, protocols, formats now: > > > - windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice > > > - msn -> jabber protocol (Pidgin, Google Talk) > > > - mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp > > > > > -- > > To unsubscribe from this list go to the following URL and read the > > instructions: https://lists.samba.org/mailman/options/samba >