Hi Rowland and Nico,
I wasn't at all saying that the problem is rsync related. I just included
its output and the error string for context.
The whole thing seems to us as if something would go wrong between the
kernel CIFS code and the remote Windows server providing the share, after a
few hours into the copy operation.
Unfortunately this is an appliance we are talking about, and we can't
easily modify subcomponents to work entirely differently than before.
Regarding the directory structure of the data we copy: we usually aim to
move few or several directories every day, with each directory containing
about 150 GB of data, with one larger file (around 100 GB in size), and
1000+ smaller files, with a size of below 100 MB each. Either way, we far
from tens of thousands of files per directory.
The source files that are being moved are not in active use. We are talking
about an archive job for files that are not being written to any longer.
And even on the destination side, the only purpose of the storage server is
to work as an off-appliance storage space, so neither the host, nor are
other clients accessing the files on the destination side, while they are
being copied or even afterwards.
The archive jobs use locks to ensure that only one is running at a time,
and execute sequentially if multiple jobs queue up.
So all in all, I would kindly ask for tips how to troubleshoot this
problem, and whether a bug report should be opened for Samba or the kernel
CIFS code, or if I should ask for help at one of the other samba-* mailing
lists.
Thank you!
Best Regards,
J?nos Szigetv?ri
--
Janos SZIGETVARI
RHCE, License no. 150-053-692
<https://www.redhat.com/rhtapps/verify/?certId=150-053-692>
LinkedIn: linkedin.com/in/janosszigetvari
__ at __?V?
Make the switch to open (source) applications, protocols, formats now:
- windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice
- msn -> jabber protocol (Pidgin, Google Talk)
- mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp
Nico Kadel-Garcia <nkadel at gmail.com> ezt ?rta (id?pont: 2021. aug. 2.,
H,
0:22):
> On Sun, Aug 1, 2021 at 11:14 AM SZIGETV?RI J?nos via samba
> <samba at lists.samba.org> wrote:
> >
> > Dear Members,
> >
> > Do you happen to have any recommendations on where I should begin?
> >
> > Best Regards,
> > J?nos
>
> rsync, on top of CIFS, on top of the limitations of Windows
> filesystems, is a flipping adventure. I'd split the task into multiple
> rsync source and matching directories, to see if it's the size of the
> task or specific files that are triggering the issue. Overpopulated
> directories with too many thousands of files in that one directory are
> a notorious issue.
>
> In particular, if the filesystem is live and in use by other tools,
> files are likely to disappear during the synchronization. Running two
> "rsync -a --delete" at once to the same target directory can
create
> this sort of adventure. Reducing the rsync migration sizes may help.
> And rsync is supposed to be pretty idempotant, so with normal
> configurations you should be able to simply run the same command twice
> and let it continue from where it failed, without manual tuning.
>
> Nico Kadel-Garcia
>
> > SZIGETV?RI J?nos <jszigetvari at gmail.com> ezt ?rta (id?pont:
2021. j?l.
> 30.,
> > P, 12:52):
> >
> > > Dear Members,
> > >
> > > I work for a company that (among others) sells Ubuntu-based log
storage
> > > appliances. I ran into a problem, where I'm trying to copy a
large
> amount
> > > of data over to a CIFS mount from a Ubuntu 18.04 based appliance
to a
> > > Windows 2012 R2 Storage Server, and rsync fails after 1-2-3 hours
into
> the
> > > copy operation with something like:
> > > rsync: failed to set times on "FILENAME.U5EgGX": No
such device (19)"
> > >
> > > and I see a number of kernel logs just prior to that, that look
like
> this:
> > >
> > > kernel: [3819786.441711] CIFS VFS: No task to wake, unknown frame
> > > received! NumMids 1
> > > kernel: [3819786.441717] 00000000: 6c000000 424d53fe 00000040
00000000
> > > ...l.SMB at .......
> > > kernel: [3819786.441718] 00000010: 00000012 00000001 00000000
ffffffff
> > > ................
> > > kernel: [3819786.441720] 00000020: ffffffff 00000000 00000000
00000000
> > > ................
> > > kernel: [3819786.441721] 00000030: 00000000 00000000 00000000
00000000
> > > ................
> > > kernel: [3819786.441722] 00000040: 00000000
> > >
> > > Now who should I turn to get to the end of this problem?
> > > Should I file a bug for the Samba project? Or is it the kernel
code
> that
> > > may be affected? If that is the case, who should I turn to?
> > >
> > > I also tried to google around for a while, and I found the same
exact
> > > package hexdump on the linux-cifs mailing list from 2012:
> > > https://www.spinics.net/lists/linux-cifs/msg06634.html
> > >
> > > In that thread the person reporting the problem failed to
reproduce the
> > > problem a few weeks after reporting it.
> > > There it was recommended to try and mount the share with SMB v1,
but
> that
> > > is out of the question nowdays.
> > >
> > > We tried forcing the mount to happen with vers=3 and 3.02, but it
made
> no
> > > difference.
> > >
> > > Best Regards,
> > > J?nos Szigetv?ri
> > > --
> > > Janos SZIGETVARI
> > > RHCE, License no. 150-053-692
> > > <https://www.redhat.com/rhtapps/verify/?certId=150-053-692>
> > >
> > > LinkedIn: linkedin.com/in/janosszigetvari
> > >
> > > __ at __?V?
> > > Make the switch to open (source) applications, protocols, formats
now:
> > > - windows -> Linux, iexplore -> Firefox, msoffice ->
LibreOffice
> > > - msn -> jabber protocol (Pidgin, Google Talk)
> > > - mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt ->
odt/ods/odp
> > >
> > --
> > To unsubscribe from this list go to the following URL and read the
> > instructions: https://lists.samba.org/mailman/options/samba
>