awl1
2017-Aug-18 20:54 UTC
[Samba] Friendly Reminder: Would you please comment on my findings?
Hello Andrew, many thanks for joining this discussion! :-) Am 18.08.2017 um 21:46 schrieb Andrew Bartlett:> I do realise you are in between a rock and a hard place. You have > identified an interesting issue, triggered by a massive protocol change > (so not able to be bisected down to a regression) that requires > significant work to understand and may or not be possible to resolve.Note that I have tracked down the issue to what I believe to be the root cause, and the root cause is NOT an issue in Samba, but an issue with Microsoft's SMB2/SMB3 client that uses completely inefficient SMB2_FIND_ID_BOTH_DIRECTORY_INFO requests in SMB2/3 as opposed to efficient FIND_FIRST2 requests in SMB1: The main parts of my analysis of the issue are contained here: https://lists.samba.org/archive/samba/2017-July/209749.html https://lists.samba.org/archive/samba/2017-July/209750.html https://lists.samba.org/archive/samba/2017-July/209751.html Just citing key findings for your reference: In SMB1, the Windows client executes one FIND_FIRST2 Request for each file to be copied (i.e. in my scenario, ~ 1000 requests) returning STATUS_NO_SUCH_FILE every time before actually creating/writing to the file. When looking at the same file in the SMB2 3.1.1, the Windows client issues a different Find operation (SMB2_FIND_ID_BOTH_DIRECTORY_INFO with Pattern "*") that does not look for the particular file name that is about to be written, but seems to try and list the whole current directory's content with a pattern of "*". Note that, looping through the 2000 files to be written in my scenario, the length of the Samba's Find Response increases with every file successfully copied: When copying file number 1000, the Find Response sends back a list of all 999 files that have been successfully copied to this directory before, and this list of 999 file names is not needed for any meaningful purpose, as the goal only is to check whether file number 1000 already exists in this list of 999 files (which it of course never does!) or not. The last such call to SMB2_FIND_ID_BOTH_DIRECTORY_INFO contained in the traces has a response length of about 64kB (containing filenames that have already been written to the target directory but are not needed/helpful in any way) and interestingly does not return "STATUS_NO_MORE_FILES", but "STATUS_INFO_LENGTH_MISMATCH", maybe because the buffer size for the result of the pattern lookup is only 64kB!? Looking at the exact same scenario from a Linux Linux mount.cifs vers=3.0 client unveils only four (!!!) SMB2 Find requests for the whole scenario, where Windows Explorer sends no less than 2140 SMB2 Find requests to copy ~ 1000 files to the share (1036 times "SMB2_FIND_ID_BOTH_DIRECTORY_INFO, Pattern: *" plus 1104 times SMB2_FIND_NAME_INFO Pattern: <file name>), and Windows command line "xcopy" is even worse (3741 find requests in order to copy ~ 1000 files). While even the Linux SMB2 client is still slower than the Windows SMB1 client, I tend to think that the remaining difference from 25 seconds with SMB 1.5 in Win10 to 36 seconds with SMB2 3.0 in Linux (44%) might be tolerable... So IMHO I have already uncovered that it is the implementation of the Windows SMB2/SMB3 client that is faulty, and what I'd ask the Samba team is to a) verify that my assessment is correct and b) engage in raising this huge performance regression with Microsoft (because this will definitely end up nowhere when I am trying to raise this with MS as a private individual customer based on a single Win10 Pro license)...> Have you tried to engage with Thecus on this? I know it seems odd, and > getting to speak with an engineer who actually understands what you are > trying to warn them might be very difficult, but they will be upgrading > at some point and then it is a regression to them, and they may have > the incentive to look into it. It seems like a long shot, but similar > long shots include getting the attention of another NAS Vendor alreadyEngaging with Thecus on this will be rejected, as my NAS (a 2008 N4200PRO) is an EOL product. I have compiled my own version of Samba 4.6.5 and deployed it onto my NAS as an installable module, replacing default Samba 3.5.16.> using Samba 4.x, like NETGEAR, or as an enterprise linux customer?As the performance regression bug is in the Windows client, even using a very recent NAS with Samba 4.x will most definitely show the exact same behaviour.> Does this just happen on your NAS, or can you reproduce on stock Samba > locally on a PC? Are you sure it always happened with SMB2? If you > can find any SMB2-supporting release (early support was in 3.5 I think, > and 3.6 had it off by default) that is not slow then bisect your way > between that and master, it might undercover a regression (for example, > due to our symlinks security fix).As I have tracked it down to be a Windows SMB2 client-side issue, this will most definitely show with every Windows SMB2 client and any Samba server that speaks SMB2 or higher (i.e. versions 3.6 onwards). I had already tried to "bisect" this very early in the process and analyze other Samba versions, as laid out here: https://lists.samba.org/archive/samba/2017-July/209731.html http://home.mnet-online.de/awl1/Performance%20Regression.xls http://home.mnet-online.de/awl1/Performance%20Regression.pdf The results for different Samba server versions were consistent, but only then (i.e. after my bisect attempts) it became apparent that it rather is the Windows client to blame, and only if the protocol is SMB2...> I hope this helps,It will be most helpful when somebody from the Samba team (whether Jeremy, you or somebody else) can spend some time in order to try and understand/reproduce/assess my analysis. If you agree with my findings, the "real" work afterwards will be to raise the issue with Mi9crosoft and make them aware of the detrimental effects that their client-side implementation of SMB2 has with regards to performance, when comparing to the exact same scenario in SMB1. As stated before, I am very convinced that I am not the only one affected by this issue. The sad truth rather seems to be that everybody else besides me seems to have silently accepted the poor performance in a "huge number of small files" scenario, even though it became only as poor as it is with SMB2 and was perfectly fine before with SMB1... :-(:-(:-( Did I succeed in making myself clear enough? (Not that easy for a non-native English speaker, as the issue is rather complex...) Many thanks for considering my request for help with this & best regards Andreas
Jeremy Allison
2017-Aug-18 21:17 UTC
[Samba] Friendly Reminder: Would you please comment on my findings?
On Fri, Aug 18, 2017 at 10:54:29PM +0200, awl1 wrote:> Hello Andrew, > > many thanks for joining this discussion! :-) > > Am 18.08.2017 um 21:46 schrieb Andrew Bartlett: > >I do realise you are in between a rock and a hard place. You have > >identified an interesting issue, triggered by a massive protocol change > >(so not able to be bisected down to a regression) that requires > >significant work to understand and may or not be possible to resolve. > Note that I have tracked down the issue to what I believe to be the > root cause, and the root cause is NOT an issue in Samba, but an > issue with Microsoft's SMB2/SMB3 client that uses completely > inefficient SMB2_FIND_ID_BOTH_DIRECTORY_INFO requests in SMB2/3 as > opposed to efficient FIND_FIRST2 requests in SMB1: > > The main parts of my analysis of the issue are contained here: > > https://lists.samba.org/archive/samba/2017-July/209749.html > https://lists.samba.org/archive/samba/2017-July/209750.html > https://lists.samba.org/archive/samba/2017-July/209751.html > > Just citing key findings for your reference: > > In SMB1, the Windows client executes one FIND_FIRST2 Request for > each file to be copied (i.e. in my scenario, ~ 1000 requests) > returning STATUS_NO_SUCH_FILE every time before actually > creating/writing to the file. > > When looking at the same file in the SMB2 3.1.1, the Windows client > issues a different Find operation (SMB2_FIND_ID_BOTH_DIRECTORY_INFO > with Pattern "*") that does not look for the particular file name > that is about to be written, but seems to try and list the whole > current directory's content with a pattern of "*". Note that, > looping through the 2000 files to be written in my scenario, the > length of the Samba's Find Response increases with every file > successfully copied: When copying file number 1000, the Find > Response sends back a list of all 999 files that have been > successfully copied to this directory before, and this list of 999 > file names is not needed for any meaningful purpose, as the goal > only is to check whether file number 1000 already exists in this > list of 999 files (which it of course never > does!) or not. The last such call to > SMB2_FIND_ID_BOTH_DIRECTORY_INFO contained in the traces has a > response length of about 64kB (containing filenames that have > already been written to the target directory but are not > needed/helpful in any way) and interestingly does not return > "STATUS_NO_MORE_FILES", but "STATUS_INFO_LENGTH_MISMATCH", maybe > because the buffer size for the result of the pattern lookup is only > 64kB!?This might be hidden against Windows due to directory handle leases, which we don't yet support.
Andrew Bartlett
2017-Aug-18 21:27 UTC
[Samba] Friendly Reminder: Would you please comment on my findings?
On Fri, 2017-08-18 at 22:54 +0200, awl1 via samba wrote:> Am 18.08.2017 um 21:46 schrieb Andrew Bartlett: > > I do realise you are in between a rock and a hard place. You have > > identified an interesting issue, triggered by a massive protocol change > > (so not able to be bisected down to a regression) that requires > > significant work to understand and may or not be possible to resolve. > > Note that I have tracked down the issue to what I believe to be the root > cause, and the root cause is NOT an issue in Samba, but an issue with > Microsoft's SMB2/SMB3 client that uses completely inefficient > SMB2_FIND_ID_BOTH_DIRECTORY_INFOSMB2_FIND_ID_BOTH_DIRECTORY_INFOrequests in SMB2/3 as opposed to > efficient FIND_FIRST2 requests in SMB1:Are these inefficient against a Windows server? Either way, I suggest moving this to samba-technical. The reason is that there are actually some senior Microsoft folks on that list, who care about SMB2 and getting rid of SMB1, so if client changes are needed. However, if it is fast against Windows, but slow against Samba, then that is harder, as (aside from being nice) there isn't the same motivation to fix the client, but suggests there is something we can do to the server to make the client 'behave'. Anyway, that still gives you a route forward. Please use a good subject like 'Excessive SMB2 SMB2_FIND_ID_BOTH_DIRECTORY_INFO generated with Windows Explorer, compared with SMB1 FIND_FIRST2 -> slowdown with many small files.' Regarding Thecus, presumably they have a more recent product already using SMB2? That product would be their motivation, but you have to get past 1st level support :-) Thanks, Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
awl1
2017-Aug-18 21:35 UTC
[Samba] Friendly Reminder: Would you please comment on my findings?
Am 18.08.2017 um 23:17 schrieb Jeremy Allison via samba:> This might be hidden against Windows due to directory handle leases, > which we don't yet support.Are you saying that when I replace the Samba server by a Windows SMB2 share server, I should see better performance? I can perfectly test that out and record a Wireshark trace for this if you like... Layperson question: Would such a "directory handle lease" be something like a cache for SMB2 Find responses? This would have to be a client-side cache in order to avoid sending back all 1000 file names in the directory from server to client over the network with every single response. Also, in case it indeed were a client-side cache, this cache would also need to be silently concurrently updated with every Create request/response cycle, because the number of files in the server-side directory always grows by the one file just written between one call to SMB2_FIND_ID_BOTH_DIRECTORY_INFO and the next... Thanks & best regards Andreas
Possibly Parallel Threads
- Friendly Reminder: Would you please comment on my findings?
- Friendly Reminder: Would you please comment on my findings?
- Friendly Reminder: Would you please comment on my findings?
- Friendly Reminder: Would you please comment on my findings?
- Friendly Reminder: Would you please comment on my findings?