I asked this on the x86 mailing list (and got a "it should work" answer), but this is probably more of the appropriate place for it. In a 2 node Sun Cluster (3.2 running Solaris 10 u8, but could be running u9 if needed), we''re looking at moving from VXFS to ZFS. However, quite frankly, part of the rationale is L2ARC. Would it be possible to use internal (SSD) storage for the L2ARC in such a scenario? My understanding is that if a ZFS filesystem is passed from one node to another, the L2ARC has to be rebuilt. So, why can''t it just be rebuilt on internal storage? The nodes (x4240''s) are identical and would have identical storage installed, so the paths would be the same. Has anyone done anything similar to this? I''d love something more than "it should work" before dropping $25k on SSD''s... TIA, matt
On 11/15/2010 2:55 PM, Matt Banks wrote:> I asked this on the x86 mailing list (and got a "it should work" answer), but this is probably more of the appropriate place for it. > > In a 2 node Sun Cluster (3.2 running Solaris 10 u8, but could be running u9 if needed), we''re looking at moving from VXFS to ZFS. However, quite frankly, part of the rationale is L2ARC. Would it be possible to use internal (SSD) storage for the L2ARC in such a scenario? My understanding is that if a ZFS filesystem is passed from one node to another, the L2ARC has to be rebuilt. So, why can''t it just be rebuilt on internal storage? > > The nodes (x4240''s) are identical and would have identical storage installed, so the paths would be the same. > > Has anyone done anything similar to this? I''d love something more than "it should work" before dropping $25k on SSD''s... > > TIA, > matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussIf your SSD is part of the shared storage (and, thus, visible from both nodes), then it will be part of the whole pool when exported/imported by the cluster failover software. If, on the other hand, you have an SSD in each node that is attached to the shared-storage as L2ARC, then it''s not visible to the other node, and the L2ARC would have to be reattached and rebuilt in a failover senario. If you are using only X4240 systems ONLY, then you don''t have ANY shared storage - ZFS isn''t going to be able to "failover" between the two nodes. You''d have to mirror the data between the two nodes somehow; they wouldn''t be part of the same zpool. Really, what you want is something like a J4000-series dual-attached to both X4240, with SSDs and HDs installed in the J4000-series chassis, not in the X4240s. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Nov 15, 2010, at 4:15 PM, Erik Trimble wrote:> On 11/15/2010 2:55 PM, Matt Banks wrote: >> I asked this on the x86 mailing list (and got a "it should work" answer), but this is probably more of the appropriate place for it. >> >> In a 2 node Sun Cluster (3.2 running Solaris 10 u8, but could be running u9 if needed), we''re looking at moving from VXFS to ZFS. However, quite frankly, part of the rationale is L2ARC. Would it be possible to use internal (SSD) storage for the L2ARC in such a scenario? My understanding is that if a ZFS filesystem is passed from one node to another, the L2ARC has to be rebuilt. So, why can''t it just be rebuilt on internal storage? >> >> The nodes (x4240''s) are identical and would have identical storage installed, so the paths would be the same. >> >> Has anyone done anything similar to this? I''d love something more than "it should work" before dropping $25k on SSD''s... >> >> TIA, >> matt > > If your SSD is part of the shared storage (and, thus, visible from both nodes), then it will be part of the whole pool when exported/imported by the cluster failover software. > > If, on the other hand, you have an SSD in each node that is attached to the shared-storage as L2ARC, then it''s not visible to the other node, and the L2ARC would have to be reattached and rebuilt in a failover senario. > > > > If you are using only X4240 systems ONLY, then you don''t have ANY shared storage - ZFS isn''t going to be able to "failover" between the two nodes. You''d have to mirror the data between the two nodes somehow; they wouldn''t be part of the same zpool. > > > Really, what you want is something like a J4000-series dual-attached to both X4240, with SSDs and HDs installed in the J4000-series chassis, not in the X4240s.Believe you me, had the standalone j4x00''s not been EOL''d on 24-Sept-10 (and if they supported SSD''s), or if the 2540''s/2501 we have attached to this cluster supported SSD''s, that would be my first choice (honestly, I LOVE the j4x00''s - we get great performance out of them every time we''ve installed them - better at times than 2540''s or 6180''s). However, at this point, the only real choice we seem to have for external storage from Oracle is an F5100 or stepping up to a 6580 with a CSM2 or a 7120. The 6580 obviously ain''t gonna happen and a 7120 leaves us with NFS and NFS+Solaris+Intersystems Cach? has massive performance issues. The F5100 may be an option, but I''d like to explore this first. (In the interest of complete description of this particular configuration: we have 2x 2540''s - one of which has a 2501 attached to it - attached to 2x x4240''s. The 2540''s are entirely populated with SATA 7200k rpm drives. The external file systems are VXFS at this point and managed by Volume Manager and have been in production for well over a year. When these systems were installed, ZFS still wasn''t an option for us.) I''m OK having to rebuild the L2ARC cache in case of a failover. They don''t happen often. And it''s not like this is entirely unprecedented. This is exactly the model Oracle uses for the 7000 series storage with cluster nodes. The "readzillas" (or whatever they''re called now) are in the cluster nodes - meaning if one fails, the other takes over and has to rebuild its L2ARC. I''m talking about having an SSD (or more, but let''s say 1 for simplicity''s sake) in each of the x4240''s. One is sitting unused in node b waiting for node a to fail. Node a''s SSD is in use as L2ARC. Then, node a fails, the ZFS file systems fail over, and then node b''s SSD (located at the same path as it was in node a) is used as L2ARC for the failed over file system. The $2,400 for two Marlin SSD''s is a LOT less money than the $47k (incl HBA''s) the "lowend" F5100 would run (MSRP). matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101115/aa40625c/attachment.html>
comment below... On Nov 15, 2010, at 4:21 PM, Matt Banks wrote:> > On Nov 15, 2010, at 4:15 PM, Erik Trimble wrote: > >> On 11/15/2010 2:55 PM, Matt Banks wrote: >>> I asked this on the x86 mailing list (and got a "it should work" answer), but this is probably more of the appropriate place for it. >>> >>> In a 2 node Sun Cluster (3.2 running Solaris 10 u8, but could be running u9 if needed), we''re looking at moving from VXFS to ZFS. However, quite frankly, part of the rationale is L2ARC. Would it be possible to use internal (SSD) storage for the L2ARC in such a scenario? My understanding is that if a ZFS filesystem is passed from one node to another, the L2ARC has to be rebuilt. So, why can''t it just be rebuilt on internal storage? >>> >>> The nodes (x4240''s) are identical and would have identical storage installed, so the paths would be the same. >>> >>> Has anyone done anything similar to this? I''d love something more than "it should work" before dropping $25k on SSD''s... >>> >>> TIA, >>> matt >> >> If your SSD is part of the shared storage (and, thus, visible from both nodes), then it will be part of the whole pool when exported/imported by the cluster failover software. >> >> If, on the other hand, you have an SSD in each node that is attached to the shared-storage as L2ARC, then it''s not visible to the other node, and the L2ARC would have to be reattached and rebuilt in a failover senario. >> >> >> >> If you are using only X4240 systems ONLY, then you don''t have ANY shared storage - ZFS isn''t going to be able to "failover" between the two nodes. You''d have to mirror the data between the two nodes somehow; they wouldn''t be part of the same zpool. >> >> >> Really, what you want is something like a J4000-series dual-attached to both X4240, with SSDs and HDs installed in the J4000-series chassis, not in the X4240s. > > > > Believe you me, had the standalone j4x00''s not been EOL''d on 24-Sept-10 (and if they supported SSD''s), or if the 2540''s/2501 we have attached to this cluster supported SSD''s, that would be my first choice (honestly, I LOVE the j4x00''s - we get great performance out of them every time we''ve installed them - better at times than 2540''s or 6180''s). However, at this point, the only real choice we seem to have for external storage from Oracle is an F5100 or stepping up to a 6580 with a CSM2 or a 7120. The 6580 obviously ain''t gonna happen and a 7120 leaves us with NFS and NFS+Solaris+Intersystems Cach? has massive performance issues. The F5100 may be an option, but I''d like to explore this first. > > (In the interest of complete description of this particular configuration: we have 2x 2540''s - one of which has a 2501 attached to it - attached to 2x x4240''s. The 2540''s are entirely populated with SATA 7200k rpm drives. The external file systems are VXFS at this point and managed by Volume Manager and have been in production for well over a year. When these systems were installed, ZFS still wasn''t an option for us.) > > I''m OK having to rebuild the L2ARC cache in case of a failover.The L2ARC is rebuilt any time the pool is imported. If the L2ARC devices are not found, then the pool is still ok, but will be listed as degraded (see the definition of degraded in the zpool man page). This is harmless from a data protection viewpoint, though if you intend to run that way for a long time, you might just remove the L2ARC from the pool. In the case of clusters with the L2ARC unshared, we do support this under NexentaStor HA-Cluster and it is a fairly common case. I can''t speak for what Oracle can "support." -- richard> They don''t happen often. And it''s not like this is entirely unprecedented. This is exactly the model Oracle uses for the 7000 series storage with cluster nodes. The "readzillas" (or whatever they''re called now) are in the cluster nodes - meaning if one fails, the other takes over and has to rebuild its L2ARC. > > I''m talking about having an SSD (or more, but let''s say 1 for simplicity''s sake) in each of the x4240''s. One is sitting unused in node b waiting for node a to fail. Node a''s SSD is in use as L2ARC. Then, node a fails, the ZFS file systems fail over, and then node b''s SSD (located at the same path as it was in node a) is used as L2ARC for the failed over file system. > > The $2,400 for two Marlin SSD''s is a LOT less money than the $47k (incl HBA''s) the "lowend" F5100 would run (MSRP). > > matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101116/40a8a882/attachment.html>