We have a server using Solaris 10. It''s a pair of systems with a shared J4200, with Solaris cluster. It works very nicely. Solaris cluster switches over transparently. However as an NFS server it is dog-slow. This is the usual synchronous write problem. Setting zfs_disable fixes the problem. otherwise it can take more than an hour to copy files that take me 2 min with our netapp. The obvious solution is to use a flash disk for the ZIL. However I''m clueless what hardware to use. Can anyone suggest either a flash drive that will work in the J4200 (SATA), or some way to connect a drive to two machines so that Solaris cluster will work? Sun used to claim that they were going to support a flash drive in the J4200. How that statement seems to have disappeared, and their SATA flash drive seems to be vapor, despite appearing real on the Sun web site. (I tried to order one.) -- This message posted from opensolaris.org
Charles Hedrick wrote:> We have a server using Solaris 10. It''s a pair of systems with a shared J4200, with Solaris cluster. It works very nicely. Solaris cluster switches over transparently. > > However as an NFS server it is dog-slow. This is the usual synchronous write problem. Setting zfs_disable fixes the problem. otherwise it can take more than an hour to copy files that take me 2 min with our netapp. > > The obvious solution is to use a flash disk for the ZIL. However I''m clueless what hardware to use. Can anyone suggest either a flash drive that will work in the J4200 (SATA), or some way to connect a drive to two machines so that Solaris cluster will work? Sun used to claim that they were going to support a flash drive in the J4200. How that statement seems to have disappeared, and their SATA flash drive seems to be vapor, despite appearing real on the Sun web site. (I tried to order one.) >I don''t see anywhere a specific option from Sun for an SSD in the J4xxx arrays. HOWEVER, the Sun Storage 7210 is based on the J4400. You can try ordering Option XTA7210-LOGZ18GB This is the 18GB Zeus-based SSD with the 3.5" bracket for use in the J4xxx chassis that come with the 7210. I have no idea if this would be supported, but it certainly should work just fine. Optionally, you should be able to get an 18 or 32GB SSDs for EACH machines you are using as the cluster heads, assuming their a reasonably late-model machine (i.e. using 2.5" drives, and first offered within the last 2 years). This IS supported. However, doing something this way does run the risk of data loss during a failover - no data corruption, mind you, but as the original data on the SSD in the failed machine is unaccessible, it won''t be available to the failover machine. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Thanks. That''s what I was looking for. Yikes! I hadn''t realized how expensive the Zeus is. We''re using Solaris cluster, so if the system goes down, the other one takes over. That means that if the ZIL is on a local disk, we lose it in a crash. Might as well just set zil_disable (something I''m considering doing anyway). -- This message posted from opensolaris.org
It turns out that our storage is currently being used for * backups of various kinds, run daily by cron jobs * saving old log files from our production application * saving old versions of java files from our production application Most of the usage is write-only, and a fair amount of it involves copying huge directories. There''s no actual current user data. I think zil_disable may actually make sense. -- This message posted from opensolaris.org
On Dec 22, 2009, at 8:40 PM, Charles Hedrick <hedrick at rutgers.edu> wrote:> It turns out that our storage is currently being used for > > * backups of various kinds, run daily by cron jobs > * saving old log files from our production application > * saving old versions of java files from our production application > > Most of the usage is write-only, and a fair amount of it involves > copying huge directories. There''s no actual current user data. > > I think zil_disable may actually make sense.How about a zil comprised of two mirrored iSCSI vdevs formed from a SSD on each box? An idea. -Ross
On Dec 22, 2009, at 5:40 PM, Charles Hedrick wrote:> It turns out that our storage is currently being used for > > * backups of various kinds, run daily by cron jobs > * saving old log files from our production application > * saving old versions of java files from our production application > > Most of the usage is write-only, and a fair amount of it involves > copying huge directories. There''s no actual current user data. > > I think zil_disable may actually make sense.Except that with the ZIL disabled, you break the trust that the data was written. Kinda defeats the prime objective, no? -- richard
On Dec 22, 2009, at 8:58 PM, Richard Elling <richard.elling at gmail.com> wrote:> On Dec 22, 2009, at 5:40 PM, Charles Hedrick wrote: > >> It turns out that our storage is currently being used for >> >> * backups of various kinds, run daily by cron jobs >> * saving old log files from our production application >> * saving old versions of java files from our production application >> >> Most of the usage is write-only, and a fair amount of it involves >> copying huge directories. There''s no actual current user data. >> >> I think zil_disable may actually make sense. > > Except that with the ZIL disabled, you break the trust that the data > was > written. Kinda defeats the prime objective, no?But no pre-warp civilizations were influenced! Oh, you said objective... -Ross
On Tue, 22 Dec 2009, Ross Walker wrote:>> >> I think zil_disable may actually make sense. > > How about a zil comprised of two mirrored iSCSI vdevs formed from a SSD on > each box?I would not have believed that this is a useful idea except that I have seen "IOPS offload" to a server on the network work extremely well. Latencies on gigabit ethernet are pretty small these days. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Dec 22, 2009, at 9:08 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > wrote:> On Tue, 22 Dec 2009, Ross Walker wrote: >>> I think zil_disable may actually make sense. >> >> How about a zil comprised of two mirrored iSCSI vdevs formed from a >> SSD on each box? > > I would not have believed that this is a useful idea except that I > have seen "IOPS offload" to a server on the network work extremely > well. Latencies on gigabit ethernet are pretty small these days.Yes, Gbe only adds about 100us to the latency and when using a raw SSD as a backing store it should be a lot better than what the OP is doing now (and he can use one of the less costly models). -Ross
Is ISCSI reliable enough for this? -- This message posted from opensolaris.org
Charles Hedrick wrote:> Is ISCSI reliable enough for this? >YES. The original idea is a good one, and one that I''d not thought of. The (old) iSCSI implementation is quite mature, if not anywhere as nice (feature/flexibility-wise) as the new COMSTAR stuff. I''m thinking that just putting in a straight-through cable between the two machine is the best idea here, rather than going through a switch. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
And how do you expect the mirrored iSCSI volume to work after failover, with secondary (ex-primary) unreachable? Regards, Andrey On Wed, Dec 23, 2009 at 9:40 AM, Erik Trimble <Erik.Trimble at sun.com> wrote:> Charles Hedrick wrote: >> >> Is ISCSI reliable enough for this? >> > > YES. > > The original idea is a good one, and one that I''d not thought of. ?The (old) > iSCSI implementation is quite mature, if not anywhere as nice > (feature/flexibility-wise) as the new COMSTAR stuff. > > I''m thinking that just putting in a straight-through cable between the two > machine is the best idea here, rather than going through a switch. > > -- > Erik Trimble > Java System Support > Mailstop: ?usca22-123 > Phone: ?x17195 > Santa Clara, CA > Timezone: US/Pacific (GMT-0800) > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Andrey Kuzmin wrote:> And how do you expect the mirrored iSCSI volume to work after > failover, with secondary (ex-primary) unreachable? > > Regards, > Andrey >As a normal Degraded mirror. No problem. The suggestion was to make the SSD on each machine an iSCSI volume, and add the two volumes as a mirrored ZIL into the zpool. It''s a (relatively) simple and ingenious suggestion. -Erik> > On Wed, Dec 23, 2009 at 9:40 AM, Erik Trimble <Erik.Trimble at sun.com> wrote: > >> Charles Hedrick wrote: >> >>> Is ISCSI reliable enough for this? >>> >>> >> YES. >> >> The original idea is a good one, and one that I''d not thought of. The (old) >> iSCSI implementation is quite mature, if not anywhere as nice >> (feature/flexibility-wise) as the new COMSTAR stuff. >> >> I''m thinking that just putting in a straight-through cable between the two >> machine is the best idea here, rather than going through a switch. >> >> -- >> Erik Trimble >> Java System Support >> Mailstop: usca22-123 >> Phone: x17195 >> Santa Clara, CA >> Timezone: US/Pacific (GMT-0800) >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >>-- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Erik.Trimble at Sun.COM said:> The suggestion was to make the SSD on each machine an iSCSI volume, and add > the two volumes as a mirrored ZIL into the zpool.I''ve mentioned the following before.... For a poor-person''s slog which gives decent NFS performance, we have had good results with allocating a slice on (e.g.) an X4150''s internal disk, behind the internal Adaptec RAID controller. Said controller has only 256MB of NVRAM, but it made a big difference with NFS performance (look for the "tar unpack" results at the bottom of the page): http://acc.ohsu.edu/~hakansom/j4400_bench.html You can always replace them when funding for your Zeus SSD''s comes in (:-). Regards, -- Marion Hakanson <hakansom at ohsu.edu> OHSU Advanced Computing Center