Robert Milkowski
2008-Aug-06 00:49 UTC
[zfs-discuss] ZFS cache flushes and 6540 disk array - fixed
Hello zfs-discuss, I''ve just did a quick test on scsi cache flushes performance impact on 6540 disk array when using ZFS. The configuration is: v490, S10U5 (137111-03), 2x 6540 disk arrays with 7.10.25.10 firmware, host is dual ported. ZFS does mirroring between 6540s. There is no other load on 6540 except for these testing. Each LUN is a RAID-10 made of many disks on each 6540 (doesn''t really matter). zpool status dslrp pool: dslrp state: ONLINE scrub: scrub completed with 0 errors on Tue Aug 5 10:49:28 2008 config: NAME STATE READ WRITE CKSUM dslrp ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t600A0B800029B74600004245486B68EBd0 ONLINE 0 0 0 c6t600A0B800029AF0000006CCC486B36ABd0 ONLINE 0 0 0 errors: No known data errors I wrote a simple C program to perform in a loop (N times) creation of a file with O_DSYNC flag (synchronous writes), writing 255 bytes of data, closing file and removing it. Then I measured total execution time while disabling or enabling scsi cache flushes in zfs. The source code for the program is attached at the end of this post. # echo "zfs_nocacheflush/D" | mdb -k zfs_nocacheflush: zfs_nocacheflush: 1 # ptime ./filesync-1 /dslrp/test/ 100000 Time in seconds to create and unlink 100000 files with O_DSYNC: 44.947136 real 44.950 user 0.349 sys 18.188 # dtrace -n fbt::*SYNCHRONIZE_CACHE*:entry''{@=count();}'' -n tick-1s''{printa(@);clear(@);}'' [not a single synchronize which is expected] # iostat -xnzC 1 [...] extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 4297.0 0.0 17236.1 0.0 1.0 0.0 0.2 0 98 c6 0.0 2149.5 0.0 8622.0 0.0 0.5 0.0 0.2 0 50 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 2147.5 0.0 8614.0 0.0 0.5 0.0 0.2 1 49 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 3892.9 0.0 15571.7 0.0 1.0 0.0 0.3 0 98 c6 0.0 1946.0 0.0 7783.9 0.0 0.6 0.0 0.3 0 57 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 1947.0 0.0 7787.9 0.0 0.4 0.0 0.2 0 41 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 4548.1 0.0 18192.3 0.0 1.0 0.0 0.2 0 97 c6 0.0 2274.0 0.0 9096.2 0.0 0.5 0.0 0.2 0 51 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 2274.0 0.0 9096.2 0.0 0.5 0.0 0.2 1 46 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 4632.8 0.0 18620.4 0.0 1.1 0.0 0.2 0 99 c6 0.0 2316.9 0.0 9310.2 0.0 0.5 0.0 0.2 1 47 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 2315.9 0.0 9310.2 0.0 0.6 0.0 0.2 1 52 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 4610.2 0.0 18150.6 0.0 1.0 0.0 0.2 0 97 c6 0.0 2304.1 0.0 9075.3 0.0 0.5 0.0 0.2 0 52 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 2306.1 0.0 9075.3 0.0 0.5 0.0 0.2 1 45 c6t600A0B800029B74600004245486B68EBd0 [...] Now lets repeat the same test but with ZFS sending scsi flushes. # echo "zfs_nocacheflush/W0" | mdb -kw zfs_nocacheflush: 0x1 = 0x0 # ptime ./filesync-1 /dslrp/test/ 100000 Time in seconds to create and unlink 100000 files with O_DSYNC: 53.809971 real 53.813 user 0.351 sys 22.107 # dtrace -n fbt::*SYNCHRONIZE_CACHE*:entry''{@=count();}'' -n tick-1s''{printa(@);clear(@);}'' [...] CPU ID FUNCTION:NAME 3 93193 :tick-1s 6000 3 93193 :tick-1s 7244 3 93193 :tick-1s 6172 3 93193 :tick-1s 7172 [...] So now we are sending thousands of cache flushes per second as expected. # iostat -xnzC 1 [...] extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 8386.6 0.0 17050.2 0.0 1.0 0.0 0.1 0 84 c6 0.0 4191.3 0.0 8525.1 0.0 0.5 0.0 0.1 0 43 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 4195.3 0.0 8525.1 0.0 0.5 0.0 0.1 0 41 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 7476.5 0.0 14953.0 0.0 0.9 0.0 0.1 0 87 c6 0.0 3738.3 0.0 7476.5 0.0 0.4 0.0 0.1 0 44 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 3738.3 0.0 7476.5 0.0 0.4 0.0 0.1 0 42 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 8146.9 0.0 16345.8 0.0 0.8 0.0 0.1 0 83 c6 0.0 4073.4 0.0 8172.9 0.0 0.4 0.0 0.1 0 42 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 4073.4 0.0 8172.9 0.0 0.4 0.0 0.1 0 41 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 7646.4 0.0 15288.8 0.0 0.8 0.0 0.1 0 80 c6 0.0 3823.2 0.0 7644.4 0.0 0.4 0.0 0.1 0 40 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 3823.2 0.0 7644.4 0.0 0.4 0.0 0.1 0 40 c6t600A0B800029B74600004245486B68EBd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 6620.2 0.0 13240.4 0.0 0.8 0.0 0.1 0 81 c6 0.0 3310.1 0.0 6620.2 0.0 0.4 0.0 0.1 0 41 c6t600A0B800029AF0000006CCC486B36ABd0 0.0 3310.1 0.0 6620.2 0.0 0.4 0.0 0.1 0 40 c6t600A0B800029B74600004245486B68EBd0 [...] Looks like iostat is counting also scsi cache flushes - that''s why we are doing more IOPS, avg IO time is lower (each scsi flush is a NOP so it is lowering avg time). Total execution time for the test with scsi flushes enabled in zfs takes about 20% more time to complete than zfs with scsi cache flushes disabled. Looks like 6540 disk array with firmware 7.10.25.10 does actually ignore scsi flushes. Last time I checked on 2530 array the difference is about 7x not just 20%. IIRC someone reported that 6540 performance was badly hit by zfs sending scsi cache flushes. Looks like it''s been fixed in latest firmware recently released so in most environments there is''s no much point in disabling cache flushes anymore on 6540+zfs. Although there is still 20% hit please keep in mind that the program I used for testing is rather on an extreme side and in most environments the performance overhead should be much smaller than 20%. Nevertheless if you want to squeeze every bit of performance from zfs+6540 (or similar array with nvram)disabling cache flushes won''t hurt. Just out of curiosity let''s disable zil (zil_disable=1): # ptime ./filesync-1 /dslrp/test/ 100000 Time in seconds to create and unlink 100000 files with O_DSYNC: 12.130675 real 12.134 user 0.328 sys 11.805 # ptime ./filesync-1 /tmp 100000 Time in seconds to create and unlink 100000 files with O_DSYNC: 3.098279 real 3.101 user 0.291 sys 2.809 That''s interesting - tmpfs is 4 times faster then ZFS with ZIL disabled (basically everything happens in memory) in that test - disabling checksums for the test filesystem doesn''t change it. Maybe it is mostly due to metadata checksums? // simple and ugly program # cat filesync-1.c #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/nameser.h> #include <resolv.h> #include <netdb.h> #include <sys/time.h> #include <inttypes.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(int argc, char *argv[]) { char *path; char filename[256]; int n_req; int i, fd; hrtime_t stopper_start, stopper_stop; if(argc != 3) { printf("Usage: %s path N\n", argv[0]); exit(1); } path = argv[1]; n_req = atoi(argv[2]); stopper_start = gethrtime(); i=0; while (i++ < n_req) { strcpy(filename, path); strcat(filename, "/filesync_test"); fd = open(filename, O_CREAT|O_RDWR|O_DSYNC); write (fd, filename, 255); close(fd); unlink(filename); } stopper_stop = gethrtime(); printf("Time in seconds to create and unlink %d files with O_DSYNC: %f\n\n", n_req, (float)(stopper_stop - stopper_start)/1000000000); exit(0); } -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com