Jamie Krier
2012-Dec-12 18:21 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
I''ve hit this bug on four of my Solaris 11 servers. Looking for anyone else who has seen it, as well as comments/speculation on cause. This bug is pretty bad. If you are lucky you can import the pool read-only and migrate it elsewhere. I''ve also tried setting zfs:zfs_recover=1,aok=1 with varying results. http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc Hardware platform: Supermicro X8DAH 144GB ram Supermicro sas2 jbods LSI 9200-8e controllers (Phase 13 fw) Zuesram log ZuesIops sas l2arc Seagate ST33000650SS sas drives All four servers are running the same hardware, so at first I suspected a problem there. I opened a ticket with Oracle which ended with this email: --------------------------------------------------------------------------------------------------------------------------------- We strongly expect that this is a software issue because this problem does not happen on Solaris 10. On Solaris 11, it happens with both the SPARC and the X64 versions of Solaris. We have quite a few customer who have seen this issue and we are in the process of working on a fix. Because we do not know the source of the problem yet, I cannot speculate on the time to fix. This particular portion of Solaris 11 (the virtual memory sub-system) is quite different than in Solaris 10. We re-wrote the memory management in order to get ready for systems with much more memory than Solaris 10 was designed to handle. Because this is the memory management system, there is not expected to be any work-around. Depending on your company''s requirements, one possibility is to use Solaris 10 until this issue is resolved. I apologize for any inconvenience that this bug may cause. We are working on it as a Sev 1 Priority1 in sustaining engineering. --------------------------------------------------------------------------------------------------------------------------------- I am thinking about switching to an Illumos distro, but wondering if this problem may be present there as well. Thanks - Jamie -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121212/7c52d3dc/attachment-0001.html>
Thomas Nau
2012-Dec-12 19:20 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Jamie We ran Into the same and had to migrate the pool while imported read-only. On top we were adviced to NOT use an L2ARC. Maybe you should consider that as well Thomas Am 12.12.2012 um 19:21 schrieb Jamie Krier <jamie.krier at gmail.com>:> I''ve hit this bug on four of my Solaris 11 servers. Looking for anyone else who has seen it, as well as comments/speculation on cause. > > This bug is pretty bad. If you are lucky you can import the pool read-only and migrate it elsewhere. > > I''ve also tried setting zfs:zfs_recover=1,aok=1 with varying results. > > > > http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc > > > > Hardware platform: > > Supermicro X8DAH > > 144GB ram > > Supermicro sas2 jbods > > LSI 9200-8e controllers (Phase 13 fw) > > Zuesram log > > ZuesIops sas l2arc > > Seagate ST33000650SS sas drives > > > > All four servers are running the same hardware, so at first I suspected a problem there. I opened a ticket with Oracle which ended with this email: > > --------------------------------------------------------------------------------------------------------------------------------- > > We strongly expect that this is a software issue because this problem does not happen > > on Solaris 10. On Solaris 11, it happens with both the SPARC and the X64 versions of > > Solaris. > > > > We have quite a few customer who have seen this issue and we are in the process of > > working on a fix. Because we do not know the source of the problem yet, I cannot speculate > > on the time to fix. This particular portion of Solaris 11 (the virtual memory sub-system) is quite > > different than in Solaris 10. We re-wrote the memory management in order to get ready for > > systems with much more memory than Solaris 10 was designed to handle. > > > > Because this is the memory management system, there is not expected to be any > > work-around. > > > > Depending on your company''s requirements, one possibility is to use Solaris 10 until this > > issue is resolved. > > > > I apologize for any inconvenience that this bug may cause. We are working on it as a Sev 1 Priority1 in sustaining engineering. > > --------------------------------------------------------------------------------------------------------------------------------- > > > > I am thinking about switching to an Illumos distro, but wondering if this problem may be present there as well. > > > > Thanks > > > > - Jamie > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121212/19dc3f71/attachment.html>
Tomas Forsman
2012-Dec-12 20:47 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
On 12 December, 2012 - Thomas Nau sent me these 7,3K bytes:> Jamie > We ran Into the same and had to migrate the pool while imported > read-only. On top we were adviced to NOT use an L2ARC. Maybe you > should consider that as wellWe also ran into something similar, imported read-only and created a new pool. A few months later, we ran into an L2ARC bug (15809921) to which we''ve received an IDR that we have not applied yet. This bug caused the following: errors: Permanent errors have been detected in the following files: <metadata>:<0x132c1f> on a 3x3 mirrored pool (triple-mirroring), all 9 disks had checksum errors.> Thomas > > > Am 12.12.2012 um 19:21 schrieb Jamie Krier <jamie.krier at gmail.com>: > > > I''ve hit this bug on four of my Solaris 11 servers. Looking for anyone else who has seen it, as well as comments/speculation on cause. > > > > This bug is pretty bad. If you are lucky you can import the pool read-only and migrate it elsewhere. > > > > I''ve also tried setting zfs:zfs_recover=1,aok=1 with varying results. > > > > > > > > http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc > > > > > > > > Hardware platform: > > > > Supermicro X8DAH > > > > 144GB ram > > > > Supermicro sas2 jbods > > > > LSI 9200-8e controllers (Phase 13 fw) > > > > Zuesram log > > > > ZuesIops sas l2arc > > > > Seagate ST33000650SS sas drives > > > > > > > > All four servers are running the same hardware, so at first I suspected a problem there. I opened a ticket with Oracle which ended with this email: > > > > --------------------------------------------------------------------------------------------------------------------------------- > > > > We strongly expect that this is a software issue because this problem does not happen > > > > on Solaris 10. On Solaris 11, it happens with both the SPARC and the X64 versions of > > > > Solaris. > > > > > > > > We have quite a few customer who have seen this issue and we are in the process of > > > > working on a fix. Because we do not know the source of the problem yet, I cannot speculate > > > > on the time to fix. This particular portion of Solaris 11 (the virtual memory sub-system) is quite > > > > different than in Solaris 10. We re-wrote the memory management in order to get ready for > > > > systems with much more memory than Solaris 10 was designed to handle. > > > > > > > > Because this is the memory management system, there is not expected to be any > > > > work-around. > > > > > > > > Depending on your company''s requirements, one possibility is to use Solaris 10 until this > > > > issue is resolved. > > > > > > > > I apologize for any inconvenience that this bug may cause. We are working on it as a Sev 1 Priority1 in sustaining engineering. > > > > --------------------------------------------------------------------------------------------------------------------------------- > > > > > > > > I am thinking about switching to an Illumos distro, but wondering if this problem may be present there as well. > > > > > > > > Thanks > > > > > > > > - Jamie > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss/Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Bob Friesenhahn
2012-Dec-13 15:03 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
On Wed, 12 Dec 2012, Jamie Krier wrote:> > > I am thinking about switching to an Illumos distro, but wondering if this problem may be present there > as well.?I believe that Illumos is forked before this new virtual memory sub-system was added to Solaris. There have not been such reports on Illumos or OpenIndiana mailing lists and I don''t recall seeing this issue in the bug trackers. Illumos is not so good at dealing with huge memory systems but perhaps it is also more stable as well. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Jamie Krier
2012-Dec-15 01:55 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
I have removed all L2arc devices as a precaution. Has anyone seen this error with no L2arc device configured? On Thu, Dec 13, 2012 at 9:03 AM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Wed, 12 Dec 2012, Jamie Krier wrote: > >> >> >> I am thinking about switching to an Illumos distro, but wondering if this >> problem may be present there >> as well. >> > > I believe that Illumos is forked before this new virtual memory sub-system > was added to Solaris. There have not been such reports on Illumos or > OpenIndiana mailing lists and I don''t recall seeing this issue in the bug > trackers. > > Illumos is not so good at dealing with huge memory systems but perhaps it > is also more stable as well. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/** > users/bfriesen/ <http://www.simplesystems.org/users/bfriesen/> > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121214/d6c33e19/attachment.html>
Cindy Swearingen
2012-Dec-17 20:54 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Hi Jamie, No doubt. This is a bad bug and we apologize. Below is a misconception that this bug is related to the VM2 project. It is not. Its related to a problem that was introduced in the ZFS ARC code. If you would send me your SR number privately, we can work with the support person to correct this misconception. We agree with Thomas''s advice that should you remove separate cache devices to help alleviate this problem. To summarize: 1. If you are running Solaris 11 or Solaris 11.1 and have separate cache devices, you should remove them to avoid this problem. When the SRU that fixes this problem is available, apply the SRU. Solaris 10 releases are not impacted. 2. A MOS knowledge article (1497293.1) is available to help diagnose this problem. 3. File a MOS SR to get access to the IDR. 4. We hope to have the SRU information available in a few days. Thanks, Cindy On 12/12/12 11:21, Jamie Krier wrote:> I''ve hit this bug on four of my Solaris 11 servers. Looking for anyone > else who has seen it, as well as comments/speculation on cause. > > This bug is pretty bad. If you are lucky you can import the pool > read-only and migrate it elsewhere. > > I''ve also tried setting zfs:zfs_recover=1,aok=1 with varying results. > > > http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc > > > Hardware platform: > > Supermicro X8DAH > > 144GB ram > > Supermicro sas2 jbods > > LSI 9200-8e controllers (Phase 13 fw) > > Zuesram log > > ZuesIops sas l2arc > > Seagate ST33000650SS sas drives > > > All four servers are running the same hardware, so at first I suspected > a problem there. I opened a ticket with Oracle which ended with this email: > > --------------------------------------------------------------------------------------------------------------------------------- > > We strongly expect that this is a software issue because this problem > does not happen > > on Solaris 10. On Solaris 11, it happens with both the SPARC and the > X64 versions of > > Solaris. > > > We have quite a few customer who have seen this issue and we are in the > process of > > working on a fix. Because we do not know the source of the problem yet, > I cannot speculate > > on the time to fix. This particular portion of Solaris 11 (the virtual > memory sub-system) is quite > > different than in Solaris 10. We re-wrote the memory management in > order to get ready for > > systems with much more memory than Solaris 10 was designed to handle. > > > Because this is the memory management system, there is not expected to > be any > > work-around. > > > Depending on your company''s requirements, one possibility is to use > Solaris 10 until this > > issue is resolved. > > > I apologize for any inconvenience that this bug may cause. We are > working on it as a Sev 1 Priority1 in sustaining engineering. > > --------------------------------------------------------------------------------------------------------------------------------- > > > I am thinking about switching to an Illumos distro, but wondering if > this problem may be present there as well. > > > Thanks > > > - Jamie > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
sol
2012-Dec-18 12:27 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
From: Cindy Swearingen <cindy.swearingen at oracle.com>>No doubt. This is a bad bug and we apologize. >1. If you are running Solaris 11 or Solaris 11.1 and have separate >cache devices, you should remove them to avoid this problem.How is the 7000-series storage appliance affected?2. A MOS knowledge article (1497293.1) is available to help diagnose?this problem.>MOS isn''t able to find this article when I search for it.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121218/21f63bac/attachment.html>
Cindy Swearingen
2012-Dec-18 16:45 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Hi Sol, The appliance is affected as well. I apologize. The MOS article is for internal diagnostics. I''ll provide a set of steps to identify this problem as soon as I understand them better. Thanks, Cindy On 12/18/12 05:27, sol wrote:> *From:* Cindy Swearingen <cindy.swearingen at oracle.com> > No doubt. This is a bad bug and we apologize. > 1. If you are running Solaris 11 or Solaris 11.1 and have separate > cache devices, you should remove them to avoid this problem. > > How is the 7000-series storage appliance affected? > > 2. A MOS knowledge article (1497293.1) is available to help diagnose > this problem. > > MOS isn''t able to find this article when I search for it. >
Cindy Swearingen
2012-Dec-19 21:23 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Hi Everyone, I was mistaken. The ZFSSA is not impacted by this bug. I provided a set of steps below to help identify this problem. If you file an SR, an IDR can be applied. Otherwise, you will need to wait for the SRU. Thanks, Cindy If you are running S11 or S11.1 and you have a ZFS storage pool with separate cache devices, consider running these steps to identify whether your pool is impacted. Until an IDR is applied or the SRU is available, remove the cache devices. 1. Export the pool. This step is necessary because zdb needs to be run on a quiet pool. # zpool export pool-name 2. Run zdb to identify space map inconsistencies. # zdb -emm pool-name 3. Based on running zdb, determine your next step: A. If zdb completes successfully, scrub the pool. # zpool import pool-name # zpool scrub pool-name If scrubbing the pool finds no issues, then your pool is most likely not impacted by this problem. If scrubbing the pool finds permanent metadata errors, then you should open an SR. B. If zdb doesn''t complete successfully, open an SR. On 12/18/12 09:45, Cindy Swearingen wrote:> Hi Sol, > > The appliance is affected as well. > > I apologize. The MOS article is for internal diagnostics. > > I''ll provide a set of steps to identify this problem > as soon as I understand them better. > > Thanks, Cindy > > On 12/18/12 05:27, sol wrote: >> *From:* Cindy Swearingen <cindy.swearingen at oracle.com> >> No doubt. This is a bad bug and we apologize. >> 1. If you are running Solaris 11 or Solaris 11.1 and have separate >> cache devices, you should remove them to avoid this problem. >> >> How is the 7000-series storage appliance affected? >> >> 2. A MOS knowledge article (1497293.1) is available to help diagnose >> this problem. >> >> MOS isn''t able to find this article when I search for it. >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Josh Simon
2012-Dec-27 19:11 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Hi, We were hit by this bug as well on Solaris 11 (2+ months ago). Our only options were to import the pool read-only and transfer the data off to another system or restore from backups. Oracle told us that the bug is caused by a race condition within read/re-write operations on the same block. There is a small window of opportunity for the in-memory data (in the ARC) for an individual data block to become corrupt when being re-written while a read request is in-flight for the same block and the pass is greater than 1. Assuming the re-write operation completes first, the read operation overwrites the in-memory copy using the older/stale on-disk data (corruption). If the read is completed before the re-write no corruption is seen. It''s a very specific set of circumstances needed to reproduce the issue. The reason why metaslabs are more commonly affected is due to the fact they''re re-written within the same birthtime more frequently than any other object. Solaris 11.1 has a new feature (ZIO Join) that allows multiple read requests for the same data block to issue just 1 IO instead of individual IOs for each request. The bug still exists in S11.1 but the new code reduces the window of opportunity for this bug to almost zero. The complete bug fix has already been implemented in Solaris 12 and is currently being tested in Solaris 11.2 and S10u11. From there it will be put into an SRU for S11.1 (I assume S11.0 as well). I followed up with Oracle today and was told that their investigation uncovered that rewrite may inherit a previous copy of a metadata block cached in L2ARC. As soon as rewritten block is evicted from ARC, the next read will fetch a stale inherited copy from L2ARC. So not using L2ARC or CACHE devices sounds like a good idea to me! Hopefully this nasty bug is fixed soon :( Thanks, Josh Simon On 12/12/2012 1:21 PM, Jamie Krier wrote:> I''ve hit this bug on four of my Solaris 11 servers. Looking for anyone > else who has seen it, as well as comments/speculation on cause. > > This bug is pretty bad. If you are lucky you can import the pool > read-only and migrate it elsewhere. > > I''ve also tried setting zfs:zfs_recover=1,aok=1 with varying results. > > > http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc > > > Hardware platform: > > Supermicro X8DAH > > 144GB ram > > Supermicro sas2 jbods > > LSI 9200-8e controllers (Phase 13 fw) > > Zuesram log > > ZuesIops sas l2arc > > Seagate ST33000650SS sas drives > > > All four servers are running the same hardware, so at first I suspected > a problem there. I opened a ticket with Oracle which ended with this email: > > --------------------------------------------------------------------------------------------------------------------------------- > > We strongly expect that this is a software issue because this problem > does not happen > > on Solaris 10. On Solaris 11, it happens with both the SPARC and the > X64 versions of > > Solaris. > > > We have quite a few customer who have seen this issue and we are in the > process of > > working on a fix. Because we do not know the source of the problem yet, > I cannot speculate > > on the time to fix. This particular portion of Solaris 11 (the virtual > memory sub-system) is quite > > different than in Solaris 10. We re-wrote the memory management in > order to get ready for > > systems with much more memory than Solaris 10 was designed to handle. > > > Because this is the memory management system, there is not expected to > be any > > work-around. > > > Depending on your company''s requirements, one possibility is to use > Solaris 10 until this > > issue is resolved. > > > I apologize for any inconvenience that this bug may cause. We are > working on it as a Sev 1 Priority1 in sustaining engineering. > > --------------------------------------------------------------------------------------------------------------------------------- > > > I am thinking about switching to an Illumos distro, but wondering if > this problem may be present there as well. > > > Thanks > > > - Jamie > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Andras Spitzer
2012-Dec-28 06:11 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Josh, You mention that Oracle is preparing patches for both Solaris 11.2 and S10u11, does that mean that the bug exist in Solaris 10 as well? I may be wrong but Cindy mentioned the bug is only in Solaris 11. Regards, sendai -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121228/501710b0/attachment.html>
cindy swearingen
2012-Dec-30 18:42 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Existing Solaris 10 releases are not impacted. S10u11 isn''t released yet so I think we can assume that this upcoming Solaris 10 release will include a preventative fix. Thanks, Cindy On Thu, Dec 27, 2012 at 11:11 PM, Andras Spitzer <wsendai at gmail.com> wrote:> Josh, > > You mention that Oracle is preparing patches for both Solaris 11.2 and > S10u11, does that mean that the bug exist in Solaris 10 as well? I may be > wrong but Cindy mentioned the bug is only in Solaris 11. > > Regards, > sendai > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121230/17483e81/attachment-0001.html>
Robert Milkowski
2013-Jan-04 19:12 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
> > Illumos is not so good at dealing with huge memory systems but perhaps > it is also more stable as well.Well, I guess that it depends on your environment, but generally I would expect S11 to be more stable if only because the sheer amount of bugs reported by paid customers and bug fixes by Oracle that Illumos is not getting (lack of resource, limited usage, etc.). -- Robert Milkowski http://milek.blogspot.com
Richard Elling
2013-Jan-04 21:55 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
On Jan 4, 2013, at 11:12 AM, Robert Milkowski <rmilkowski at task.gda.pl> wrote:>> >> Illumos is not so good at dealing with huge memory systems but perhaps >> it is also more stable as well. > > Well, I guess that it depends on your environment, but generally I would > expect S11 to be more stable if only because the sheer amount of bugs > reported by paid customers and bug fixes by Oracle that Illumos is not > getting (lack of resource, limited usage, etc.).There is a two-edged sword. Software reliability analysis shows that the most reliable software is the software that is oldest and unchanged. But people also want new functionality. So while Oracle has more changes being implemented in Solaris, it is destabilizing while simultaneously improving reliability. Unfortunately, it is hard to get both wins. What is more likely is that new features are being driven into Solaris 11 that are destabilizing. By contrast, the number of new features being added to illumos-gate (not to be confused with illumos-based distros) is relatively modest and in all cases are not gratuitous. -- richard -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130104/e03b6e03/attachment-0001.html>
Jamie Krier
2013-Jan-12 02:10 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
It appears this bug has been fixed in Solaris 11.1 SRU 3.4 719137515809921SUNBT7191375 metadata rewrites should coordinate with l2arc Cindy can you confirm? Thanks On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling <richard.elling at gmail.com>wrote:> On Jan 4, 2013, at 11:12 AM, Robert Milkowski <rmilkowski at task.gda.pl> > wrote: > > > Illumos is not so good at dealing with huge memory systems but perhaps > it is also more stable as well. > > > Well, I guess that it depends on your environment, but generally I would > expect S11 to be more stable if only because the sheer amount of bugs > reported by paid customers and bug fixes by Oracle that Illumos is not > getting (lack of resource, limited usage, etc.). > > > There is a two-edged sword. Software reliability analysis shows that the > most reliable software is the software that is oldest and unchanged. But > people also want new functionality. So while Oracle has more changes > being implemented in Solaris, it is destabilizing while simultaneously > improving reliability. Unfortunately, it is hard to get both wins. What is > more > likely is that new features are being driven into Solaris 11 that are > destabilizing. By contrast, the number of new features being added to > illumos-gate (not to be confused with illumos-based distros) is relatively > modest and in all cases are not gratuitous. > -- richard > > -- > > Richard.Elling at RichardElling.com > +1-760-896-4422 > > > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130111/35f5d2de/attachment.html>
Cindy Swearingen
2013-Jan-14 15:26 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Hi Jamie, Yes, that is correct. The S11u1 version of this bug is: https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599 and has this notation which means Solaris 11.1 SRU 3.4: Changeset pushed to build 0.175.1.3.0.4.0 Thanks, Cindy On 01/11/13 19:10, Jamie Krier wrote:> It appears this bug has been fixed in Solaris 11.1 SRU 3.4 > > 7191375 15809921 SUNBT7191375 metadata rewrites should coordinate with > l2arc > > > Cindy can you confirm? > > Thanks > > > On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling <richard.elling at gmail.com > <mailto:richard.elling at gmail.com>> wrote: > > On Jan 4, 2013, at 11:12 AM, Robert Milkowski > <rmilkowski at task.gda.pl <mailto:rmilkowski at task.gda.pl>> wrote: > >>> >>> Illumos is not so good at dealing with huge memory systems but >>> perhaps >>> it is also more stable as well. >> >> Well, I guess that it depends on your environment, but generally I >> would >> expect S11 to be more stable if only because the sheer amount of bugs >> reported by paid customers and bug fixes by Oracle that Illumos is not >> getting (lack of resource, limited usage, etc.). > > There is a two-edged sword. Software reliability analysis shows that > the > most reliable software is the software that is oldest and unchanged. > But > people also want new functionality. So while Oracle has more changes > being implemented in Solaris, it is destabilizing while simultaneously > improving reliability. Unfortunately, it is hard to get both wins. > What is more > likely is that new features are being driven into Solaris 11 that are > destabilizing. By contrast, the number of new features being added to > illumos-gate (not to be confused with illumos-based distros) is > relatively > modest and in all cases are not gratuitous. > -- richard > > -- > > Richard.Elling at RichardElling.com > <mailto:Richard.Elling at RichardElling.com> > +1-760-896-4422 <tel:%2B1-760-896-4422> > > > > > > > > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Tomas Forsman
2013-Jan-14 19:48 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
On 14 January, 2013 - Cindy Swearingen sent me these 2,3K bytes:> Hi Jamie, > > Yes, that is correct. > > The S11u1 version of this bug is: > > https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599Host oraclecorp.com not found: 3(NXDOMAIN) Would oracle.internal be a better domain name?> and has this notation which means Solaris 11.1 SRU 3.4: > > Changeset pushed to build 0.175.1.3.0.4.0 > > Thanks, > > Cindy > > On 01/11/13 19:10, Jamie Krier wrote: >> It appears this bug has been fixed in Solaris 11.1 SRU 3.4 >> >> 7191375 15809921 SUNBT7191375 metadata rewrites should coordinate with >> l2arc >> >> >> Cindy can you confirm? >> >> Thanks >> >> >> On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling <richard.elling at gmail.com >> <mailto:richard.elling at gmail.com>> wrote: >> >> On Jan 4, 2013, at 11:12 AM, Robert Milkowski >> <rmilkowski at task.gda.pl <mailto:rmilkowski at task.gda.pl>> wrote: >> >>>> >>>> Illumos is not so good at dealing with huge memory systems but >>>> perhaps >>>> it is also more stable as well. >>> >>> Well, I guess that it depends on your environment, but generally I >>> would >>> expect S11 to be more stable if only because the sheer amount of bugs >>> reported by paid customers and bug fixes by Oracle that Illumos is not >>> getting (lack of resource, limited usage, etc.). >> >> There is a two-edged sword. Software reliability analysis shows that >> the >> most reliable software is the software that is oldest and unchanged. >> But >> people also want new functionality. So while Oracle has more changes >> being implemented in Solaris, it is destabilizing while simultaneously >> improving reliability. Unfortunately, it is hard to get both wins. >> What is more >> likely is that new features are being driven into Solaris 11 that are >> destabilizing. By contrast, the number of new features being added to >> illumos-gate (not to be confused with illumos-based distros) is >> relatively >> modest and in all cases are not gratuitous. >> -- richard >> >> -- >> >> Richard.Elling at RichardElling.com >> <mailto:Richard.Elling at RichardElling.com> >> +1-760-896-4422 <tel:%2B1-760-896-4422> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss/Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Ian Collins
2013-Jan-14 21:02 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Cindy Swearingen wrote:> Hi Jamie, > > Yes, that is correct. > > The S11u1 version of this bug is: > > https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599 > > and has this notation which means Solaris 11.1 SRU 3.4: > > Changeset pushed to build 0.175.1.3.0.4.0Hello Cindy, I really really hope this will be a public update. Within a week of upgrading to 11.1 I hit this bug and I had to rebuild my main pool. I''m still restoring backups. Without this fix, 11.1 is a bomb waiting to go off! -- Ian.
Nico Williams
2013-Jan-14 21:02 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsman <stric at acc.umu.se> wrote:>> https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599 > > Host oraclecorp.com not found: 3(NXDOMAIN) > > Would oracle.internal be a better domain name?Things like that cannot be changed easily. They (Oracle) are stuck with that domainname for the forseeable future. Also, whoever thought it up probably didn''t consider leakage of internal URIs to the outside. *shrug*
Cindy Swearingen
2013-Jan-14 21:12 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
I believe the bug.oraclecorp.com URL is accessible with a support contract, but its difficult for me to test. I should have mentioned it. I apologize. cs On 01/14/13 14:02, Nico Williams wrote:> On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsman<stric at acc.umu.se> wrote: >>> https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599 >> >> Host oraclecorp.com not found: 3(NXDOMAIN) >> >> Would oracle.internal be a better domain name? > > Things like that cannot be changed easily. They (Oracle) are stuck > with that domainname for the forseeable future. Also, whoever thought > it up probably didn''t consider leakage of internal URIs to the > outside. *shrug* > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Tomas Forsman
2013-Jan-14 21:28 UTC
[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
On 14 January, 2013 - Cindy Swearingen sent me these 1,0K bytes:> I believe the bug.oraclecorp.com URL is accessible with a support > contract, but its difficult for me to test.Support contract or not, the domain is not exposed to the internet.> I should have mentioned it. I apologize. > > cs > > On 01/14/13 14:02, Nico Williams wrote: >> On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsman<stric at acc.umu.se> wrote: >>>> https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599 >>> >>> Host oraclecorp.com not found: 3(NXDOMAIN) >>> >>> Would oracle.internal be a better domain name? >> >> Things like that cannot be changed easily. They (Oracle) are stuck >> with that domainname for the forseeable future. Also, whoever thought >> it up probably didn''t consider leakage of internal URIs to the >> outside. *shrug* >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss/Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se