Matthew C Aycock
2007-Dec-21 14:57 UTC
[zfs-discuss] Odd behavior of NFS of ZFS versus UFS
I have a test cluster running HA-NFS that shares both ufs and zfs based file systems. However, the behavior that I am seeing is a little perplexing. The Setup: I have Sun Cluster 3.2 on a pair of SunBlade 1000''s connecting to two T3B partner groups through a QLogic switch. All four bricks of the T3B are configured as RAID-5 with a hot spare. One brick from each pair is mirrored with VxVM 4.1 with a ufs file system on top of the mirror. I have mirrored the other two bricks via a Zpool. I have configured an HAStoragePlus resource for the datadg VxVM disk group and another one for the hazfs Zpool. Both are a part of my single nfs-rg. All machines are connected via 100MB switches. I have a small test program that was created to detect a particular "problem" that we were having. Its very simple and I will include the c code at the end. What is does is to time the creation of a file, do an 8k synchronous write, and close the file. If the time is greater than 1 second, it prints out the elapsed time. Very simple. The Test: I have two identical SunBlade 2500s that each mount a file system, run a loop of iozone 500 then sleep 10 seconds, run nf (my test program) on the mounted file system. One does this on the ZFS based file system and the other on the UFS based one. The Results: On the UFS based filesystem, nf reports ZERO output. Thus, it never took more than a second to do the test. On the ZFS based mount point I see multiple delays ranging from 2 to 6 seconds. So, I reversed the roles of the machines and ran the test again with virtually the save results. The $1000 Question: Why would this happen? The Code: #include <sys/types.h> #include <time.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> void main () { char nbuff[32]; char data [8192]; int fd; time_t start,finish; char date[256]; while (1) { start=time(0); sprintf(nbuff,"TEMP%d", rand()); fd=open(nbuff, O_RDWR| O_CREAT |O_SYNC, 0777); write (fd, data, sizeof (data)); close (fd); unlink(nbuff); finish=time(0); if ((finish - start) > 1) { cftime(date, "%c", &start); fprintf(stderr,"%s elapsed=%d\n",date, finish-start); } sleep(1); } } This message posted from opensolaris.org
Matthew C Aycock wrote:> I have a test cluster running HA-NFS that shares both ufs and zfs based file systems. However, the behavior that I am seeing is a little perplexing. >Since this is a purely synchronous test, suspect the ZIL. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#ZIL -- richard> The Setup: I have Sun Cluster 3.2 on a pair of SunBlade 1000''s connecting to two T3B partner groups through a QLogic switch. All four bricks of the T3B are configured as RAID-5 with a hot spare. One brick from each pair is mirrored with VxVM 4.1 with a ufs file system on top of the mirror. I have mirrored the other two bricks via a Zpool. I have configured an HAStoragePlus resource for the datadg VxVM disk group and another one for the hazfs Zpool. Both are a part of my single nfs-rg. All machines are connected via 100MB switches. > > I have a small test program that was created to detect a particular "problem" that we were having. Its very simple and I will include the c code at the end. What is does is to time the creation of a file, do an 8k synchronous write, and close the file. If the time is greater than 1 second, it prints out the elapsed time. Very simple. > > The Test: I have two identical SunBlade 2500s that each mount a file system, run a loop of iozone 500 then sleep 10 seconds, run nf (my test program) on the mounted file system. One does this on the ZFS based file system and the other on the UFS based one. > > The Results: On the UFS based filesystem, nf reports ZERO output. Thus, it never took more than a second to do the test. On the ZFS based mount point I see multiple delays ranging from 2 to 6 seconds. So, I reversed the roles of the machines and ran the test again with virtually the save results. > > The $1000 Question: Why would this happen? > > The Code: > #include <sys/types.h> > #include <time.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <unistd.h> > #include <stdio.h> > #include <stdlib.h> > > > void main () { > > > char nbuff[32]; > char data [8192]; > int fd; > time_t start,finish; > char date[256]; > > while (1) { > start=time(0); > sprintf(nbuff,"TEMP%d", rand()); > fd=open(nbuff, O_RDWR| O_CREAT |O_SYNC, 0777); > write (fd, data, sizeof (data)); > close (fd); > unlink(nbuff); > finish=time(0); > if ((finish - start) > 1) { > cftime(date, "%c", &start); > fprintf(stderr,"%s elapsed=%d\n",date, finish-start); > } > sleep(1); > } > } > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >