Case van Rij
2021-Feb-10 18:40 UTC
[Samba] Benchmarking linux 5.10 smb3 client namespace performance
I've recently started looking at using linux clients as smb3 workload generators using the spec.org SpecSFS 2014 benchmark. For the initial performance comparison I'm using 4 windows 2012R2 clients and 4 linux 5.10.13-1.el7.elrepo.x86_64 clients. Both sets of clients use E5-2637 v4 @ 3.50GHz CPUs with 40GbE to 8 40GbE nics on a high performance NAS array. The first workload I looked at is the SWBUILD workload, where each client runs a netmist userspace process with a mostly namespace workload. Using 4 windows clients running 80 business metrics this means 400 threads, each attempting to perform 100 operations per second, This workload using SMB3 targets 40,000 operations per second, and achieves this with 1.368 ms/sec average latency as measured by userspace. Business Requested Achieved Avg Lat Metric Op Rate Op Rate (ms) 80 40000.00 40000.420 1.368 for each thread, the workload looks like: Write_file ops = 1605 Avg Latency: 0.001544 Read_file ops = 1447 Avg Latency: 0.001521 Mkdir ops = 246 Avg Latency: 0.004653 Unlink ops = 717 Avg Latency: 0.001635 Create ops = 256 Avg Latency: 0.003522 Stat ops = 16556 Avg Latency: 0.001536 Access ops = 1401 Avg Latency: 0.001534 Chmod ops = 1226 Avg Latency: 0.003059 Readdir ops = 481 Avg Latency: 0.002131 For the initial linux run I scaled it way down to 4 Business metric, eg. 20 threads each running 100 operations per second. the first linux client is running 20 threads, mounting with vers=3.02 actimeo=120, across 4 smb3 mounts (4 target ips): Business Requested Achieved Avg Lat Metric Op Rate Op Rate (ms) 4 2000.00 427.417 46.612 Write_file ops = 442 Avg Latency: 0.057040 Read_file ops = 348 Avg Latency: 0.053756 Mkdir ops = 58 Avg Latency: 0.125199 Unlink ops = 178 Avg Latency: 0.045923 Create ops = 55 Avg Latency: 0.107244 Stat ops = 4069 Avg Latency: 0.047403 Access ops = 308 Avg Latency: 0.048040 Chmod ops = 294 Avg Latency: 0.046134 Readdir ops = 133 Avg Latency: 0.038968 So that seems pretty surprising .. especially since server side latency averages sub 1ms! a PCAP analysis on smb rtt confirms the same on the wire, SMB2 SRT Statistics: Filter: smb2.cmd Index Commands Calls Min SRT Max SRT Avg SRT Sum SRT 5 Create 17763 0.000193 0.034437 0.001351 23.998490 6 Close 16477 0.000090 0.034437 0.001301 21.433632 7 Flush 9 0.000616 0.004260 0.001175 0.010574 8 Read 648 0.000111 0.007905 0.000855 0.554251 9 Write 1077 0.000121 0.006913 0.001817 1.956459 14 Find 209 0.000625 0.012334 0.001705 0.356392 16 GetInfo 15180 0.000193 0.034437 0.001291 19.601276 17 SetInfo 148 0.000843 0.010075 0.002392 0.354038 ================================================================= I scaled down the per-thread operation rate to 10 operations per second: Business Requested Achieved Avg Lat Metric Op Rate Op Rate (ms) 4 200.00 200.089 2.144 32 1600.00 1600.605 2.035 40 2000.00 2000.748 4.705 48 2400.00 2400.926 5.323 56 2800.00 2801.046 6.428 64 3200.00 3199.791 10.469 72 3600.00 3222.795 52.887 on the linux side, each netmist load generating thread averages 1% cpu, and cifsd threads are under 1% cpu Long story short .. before I start profiling the linux kernel side, I would be curious if anyone has performed similar tests and perhaps has solutions or known issues. Thanks, Case