Robert Milkowski
2008-Aug-06 00:49 UTC
[zfs-discuss] ZFS cache flushes and 6540 disk array - fixed
Hello zfs-discuss,
I''ve just did a quick test on scsi cache flushes performance impact on
6540 disk array when using ZFS.
The configuration is: v490, S10U5 (137111-03), 2x 6540 disk
arrays with 7.10.25.10 firmware, host is dual ported. ZFS does
mirroring between 6540s. There is no other load on 6540 except for
these testing. Each LUN is a RAID-10 made of many disks on each 6540
(doesn''t really matter).
zpool status dslrp
pool: dslrp
state: ONLINE
scrub: scrub completed with 0 errors on Tue Aug 5 10:49:28 2008
config:
NAME STATE READ WRITE CKSUM
dslrp ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t600A0B800029B74600004245486B68EBd0 ONLINE 0 0 0
c6t600A0B800029AF0000006CCC486B36ABd0 ONLINE 0 0 0
errors: No known data errors
I wrote a simple C program to perform in a loop (N times) creation of
a file with O_DSYNC flag (synchronous writes), writing 255 bytes of
data, closing file and removing it. Then I measured total execution
time while disabling or enabling scsi cache flushes in zfs. The source
code for the program is attached at the end of this post.
# echo "zfs_nocacheflush/D" | mdb -k
zfs_nocacheflush:
zfs_nocacheflush: 1
# ptime ./filesync-1 /dslrp/test/ 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 44.947136
real 44.950
user 0.349
sys 18.188
# dtrace -n fbt::*SYNCHRONIZE_CACHE*:entry''{@=count();}'' -n
tick-1s''{printa(@);clear(@);}''
[not a single synchronize which is expected]
# iostat -xnzC 1
[...]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4297.0 0.0 17236.1 0.0 1.0 0.0 0.2 0 98 c6
0.0 2149.5 0.0 8622.0 0.0 0.5 0.0 0.2 0 50
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 2147.5 0.0 8614.0 0.0 0.5 0.0 0.2 1 49
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 3892.9 0.0 15571.7 0.0 1.0 0.0 0.3 0 98 c6
0.0 1946.0 0.0 7783.9 0.0 0.6 0.0 0.3 0 57
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 1947.0 0.0 7787.9 0.0 0.4 0.0 0.2 0 41
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4548.1 0.0 18192.3 0.0 1.0 0.0 0.2 0 97 c6
0.0 2274.0 0.0 9096.2 0.0 0.5 0.0 0.2 0 51
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 2274.0 0.0 9096.2 0.0 0.5 0.0 0.2 1 46
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4632.8 0.0 18620.4 0.0 1.1 0.0 0.2 0 99 c6
0.0 2316.9 0.0 9310.2 0.0 0.5 0.0 0.2 1 47
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 2315.9 0.0 9310.2 0.0 0.6 0.0 0.2 1 52
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4610.2 0.0 18150.6 0.0 1.0 0.0 0.2 0 97 c6
0.0 2304.1 0.0 9075.3 0.0 0.5 0.0 0.2 0 52
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 2306.1 0.0 9075.3 0.0 0.5 0.0 0.2 1 45
c6t600A0B800029B74600004245486B68EBd0
[...]
Now lets repeat the same test but with ZFS sending scsi flushes.
# echo "zfs_nocacheflush/W0" | mdb -kw
zfs_nocacheflush: 0x1 = 0x0
# ptime ./filesync-1 /dslrp/test/ 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 53.809971
real 53.813
user 0.351
sys 22.107
# dtrace -n fbt::*SYNCHRONIZE_CACHE*:entry''{@=count();}'' -n
tick-1s''{printa(@);clear(@);}''
[...]
CPU ID FUNCTION:NAME
3 93193 :tick-1s
6000
3 93193 :tick-1s
7244
3 93193 :tick-1s
6172
3 93193 :tick-1s
7172
[...]
So now we are sending thousands of cache flushes per second as
expected.
# iostat -xnzC 1
[...]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 8386.6 0.0 17050.2 0.0 1.0 0.0 0.1 0 84 c6
0.0 4191.3 0.0 8525.1 0.0 0.5 0.0 0.1 0 43
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 4195.3 0.0 8525.1 0.0 0.5 0.0 0.1 0 41
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 7476.5 0.0 14953.0 0.0 0.9 0.0 0.1 0 87 c6
0.0 3738.3 0.0 7476.5 0.0 0.4 0.0 0.1 0 44
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 3738.3 0.0 7476.5 0.0 0.4 0.0 0.1 0 42
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 8146.9 0.0 16345.8 0.0 0.8 0.0 0.1 0 83 c6
0.0 4073.4 0.0 8172.9 0.0 0.4 0.0 0.1 0 42
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 4073.4 0.0 8172.9 0.0 0.4 0.0 0.1 0 41
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 7646.4 0.0 15288.8 0.0 0.8 0.0 0.1 0 80 c6
0.0 3823.2 0.0 7644.4 0.0 0.4 0.0 0.1 0 40
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 3823.2 0.0 7644.4 0.0 0.4 0.0 0.1 0 40
c6t600A0B800029B74600004245486B68EBd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 6620.2 0.0 13240.4 0.0 0.8 0.0 0.1 0 81 c6
0.0 3310.1 0.0 6620.2 0.0 0.4 0.0 0.1 0 41
c6t600A0B800029AF0000006CCC486B36ABd0
0.0 3310.1 0.0 6620.2 0.0 0.4 0.0 0.1 0 40
c6t600A0B800029B74600004245486B68EBd0
[...]
Looks like iostat is counting also scsi cache flushes - that''s why we
are doing more IOPS, avg IO time is lower (each scsi flush is a NOP so
it is lowering avg time).
Total execution time for the test with scsi flushes enabled in zfs
takes about 20% more time to complete than zfs with scsi cache flushes
disabled. Looks like 6540 disk array with firmware 7.10.25.10 does
actually ignore scsi flushes. Last time I checked on 2530 array the difference
is
about 7x not just 20%.
IIRC someone reported that 6540 performance was badly hit by zfs
sending scsi cache flushes. Looks like it''s been fixed
in latest firmware recently released so in most environments there is''s
no much point in
disabling cache flushes anymore on 6540+zfs.
Although there is still 20% hit please keep in mind that the program I
used for testing is rather on an extreme side and in most environments
the performance overhead should be much smaller than 20%. Nevertheless
if you want to squeeze every bit of performance from zfs+6540 (or
similar array with nvram)disabling cache flushes won''t hurt.
Just out of curiosity let''s disable zil (zil_disable=1):
# ptime ./filesync-1 /dslrp/test/ 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 12.130675
real 12.134
user 0.328
sys 11.805
# ptime ./filesync-1 /tmp 100000
Time in seconds to create and unlink 100000 files with O_DSYNC: 3.098279
real 3.101
user 0.291
sys 2.809
That''s interesting - tmpfs is 4 times faster then ZFS with ZIL
disabled (basically everything happens in memory) in that test -
disabling checksums for the test filesystem doesn''t change it. Maybe
it is mostly due to metadata checksums?
// simple and ugly program
# cat filesync-1.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/nameser.h>
#include <resolv.h>
#include <netdb.h>
#include <sys/time.h>
#include <inttypes.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
char *path;
char filename[256];
int n_req;
int i, fd;
hrtime_t stopper_start, stopper_stop;
if(argc != 3)
{
printf("Usage: %s path N\n", argv[0]);
exit(1);
}
path = argv[1];
n_req = atoi(argv[2]);
stopper_start = gethrtime();
i=0;
while (i++ < n_req)
{
strcpy(filename, path);
strcat(filename, "/filesync_test");
fd = open(filename, O_CREAT|O_RDWR|O_DSYNC);
write (fd, filename, 255);
close(fd);
unlink(filename);
}
stopper_stop = gethrtime();
printf("Time in seconds to create and unlink %d files with O_DSYNC:
%f\n\n", n_req, (float)(stopper_stop - stopper_start)/1000000000);
exit(0);
}
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com