Dear fellows, We have a backup server with a zpool size of 20 TB, we transfer information using zfs snapshots every day (we have around 300 fs on that pool), the storage is a dell md3000i connected by iscsi, the pool is currently version 10, the same storage is connected to another server with a smaller pool of 3 TB(zpool version 10) this server is working fine and speed is good between the storage and the server, however in the server with 20 TB pool performance is an issue after we restart the server performance is good but with the time lets say a week the performance keeps dropping until we have to bounce the server again (same behavior with new version of solaris in this case performance drops in 2 days), no errors in logs or storage or the zpool status -v We suspect that the pool has some issues probably there is corruption somewhere, we tested solaris 10 8/11 with zpool 29, although we haven''t update the pool itself, with the new solaris the performance is even worst and every time that we restart the server we get stuff like this: SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4 DESC: All faults associated with an event id have been addressed. Refer to http://sun.com/msg/FMD-8000-4M for more information. AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online. IMPACT: Performance degradation of the system due to the original fault may have been recovered. REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components. [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor And we need to export and import the pool in order to be able to access it. Now my question is do you guys know if we upgrade the pool does this process fix some issues in the metadata of the pool ? We''ve been holding back the upgrade because we know that after the upgrade there is no way to return to version 10. Does anybody has experienced corruption in the pool without a hardware failure ? Is there any tools or procedures to find corruption on the pool or File systems inside the pool ? (besides scrub) So far we went through the connections cables, ports and controllers between the storage and the server everything seems fine, we''ve swapped network interfaces, cables, switch ports etc etc. Any ideas would be really appreciate it. Cheers Ivan
Hung-Sheng Tsao (laoTsao)
2012-Jan-27 11:03 UTC
[zfs-discuss] zfs and iscsi performance help
hi IMHO, upgrade to s11 if possible use the COMSTAR based iscsi Sent from my iPad On Jan 26, 2012, at 23:25, Ivan Rodriguez <ivanoch at gmail.com> wrote:> Dear fellows, > > We have a backup server with a zpool size of 20 TB, we transfer > information using zfs snapshots every day (we have around 300 fs on > that pool), > the storage is a dell md3000i connected by iscsi, the pool is > currently version 10, the same storage is connected > to another server with a smaller pool of 3 TB(zpool version 10) this > server is working fine and speed is good between the storage > and the server, however in the server with 20 TB pool performance is > an issue after we restart the server > performance is good but with the time lets say a week the performance > keeps dropping until we have to > bounce the server again (same behavior with new version of solaris in > this case performance drops in 2 days), no errors in logs or storage > or the zpool status -v > > We suspect that the pool has some issues probably there is corruption > somewhere, we tested solaris 10 8/11 with zpool 29, > although we haven''t update the pool itself, with the new solaris the > performance is even worst and every time > that we restart the server we get stuff like this: > > SOURCE: zfs-diagnosis, REV: 1.0 > EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4 > DESC: All faults associated with an event id have been addressed. > Refer to http://sun.com/msg/FMD-8000-4M for more information. > AUTO-RESPONSE: Some system components offlined because of the > original fault may have been brought back online. > IMPACT: Performance degradation of the system due to the original > fault may have been recovered. > REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components. > [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, > VER: 1, SEVERITY: Minor > > And we need to export and import the pool in order to be able to access it. > > Now my question is do you guys know if we upgrade the pool does this > process fix some issues in the metadata of the pool ? > We''ve been holding back the upgrade because we know that after the > upgrade there is no way to return to version 10. > > Does anybody has experienced corruption in the pool without a hardware > failure ? > Is there any tools or procedures to find corruption on the pool or > File systems inside the pool ? (besides scrub) > > So far we went through the connections cables, ports and controllers > between the storage and the server everything seems fine, we''ve > swapped network interfaces, cables, switch ports etc etc. > > > Any ideas would be really appreciate it. > > Cheers > Ivan > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Fri, Jan 27, 2012 at 03:25:39PM +1100, Ivan Rodriguez wrote:> > We have a backup server with a zpool size of 20 TB, we transfer > information using zfs snapshots every day (we have around 300 fs on > that pool), > the storage is a dell md3000i connected by iscsi, the pool is > currently version 10, the same storage is connected > to another server with a smaller pool of 3 TB(zpool version 10) this > server is working fine and speed is good between the storage > and the server, however in the server with 20 TB pool performance is > an issue after we restart the server > performance is good but with the time lets say a week the performance > keeps dropping until we have to > bounce the server again (same behavior with new version of solaris in > this case performance drops in 2 days), no errors in logs or storage > or the zpool status -vThis sounds like a ZFS cache problem on the server. You might check on how cache statistics change over time. Some tuning may eliminate this degradation. More memory may also help. Does a scrub show any errors? Does the performance drop affect reads or writes or both?> We suspect that the pool has some issues probably there is corruption > somewhere, we tested solaris 10 8/11 with zpool 29, > although we haven''t update the pool itself, with the new solaris the > performance is even worst and every time > that we restart the server we get stuff like this: > > SOURCE: zfs-diagnosis, REV: 1.0 > EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4 > DESC: All faults associated with an event id have been addressed. > Refer to http://sun.com/msg/FMD-8000-4M for more information. > AUTO-RESPONSE: Some system components offlined because of the > original fault may have been brought back online. > IMPACT: Performance degradation of the system due to the original > fault may have been recovered. > REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components. > [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, > VER: 1, SEVERITY: Minor > > And we need to export and import the pool in order to be able to access it.This is a separate problem, introduced with an upgrade to the Iscsi service. The new one has a dependancy on the name service (typically DNS), which means that it isn''t available when the zpool import is done during the boot. Check with Oracle support to see if they have found a solution. -- -Gary Mills- -refurb- -Winnipeg, Manitoba, Canada-
Hi Ivan, On Jan 26, 2012, at 8:25 PM, Ivan Rodriguez wrote:> Dear fellows, > > We have a backup server with a zpool size of 20 TB, we transfer > information using zfs snapshots every day (we have around 300 fs on > that pool), > the storage is a dell md3000i connected by iscsi, the pool is > currently version 10, the same storage is connected > to another server with a smaller pool of 3 TB(zpool version 10) this > server is working fine and speed is good between the storage > and the server, however in the server with 20 TB pool performance is > an issue after we restart the server > performance is good but with the time lets say a week the performance > keeps dropping until we have to > bounce the server again (same behavior with new version of solaris in > this case performance drops in 2 days), no errors in logs or storage > or the zpool status -v > > We suspect that the pool has some issues probably there is corruption > somewhere, we tested solaris 10 8/11 with zpool 29, > although we haven''t update the pool itself, with the new solaris the > performance is even worst and every timeIf you upgrade to zpool version 29 or later, then you will be tied to the lawnmower (Oracle) forever. Several changes related to snapshot performance were introduced in version 28 and earlier.> that we restart the server we get stuff like this: > > SOURCE: zfs-diagnosis, REV: 1.0 > EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4 > DESC: All faults associated with an event id have been addressed. > Refer to http://sun.com/msg/FMD-8000-4M for more information. > AUTO-RESPONSE: Some system components offlined because of the > original fault may have been brought back online. > IMPACT: Performance degradation of the system due to the original > fault may have been recovered. > REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components. > [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, > VER: 1, SEVERITY: Minor > > And we need to export and import the pool in order to be able to access it.The MD3000i systems that I have used have an irritating behavior when the LUNs are scanned (eg during zpool import). There is an out-of-band systems management LUN that takes up to 1 minute to respond to a SCSI inquiry. During a zpool import, Solaris tries to inquire each of the LUNs to see if they contain pool parts. Depending on the various timeout values set in the iSCSI client stack, this can be painful. I am not aware of a workaround or bug fix on the Dell side and Dell docs just say "don''t use that LUN"> > Now my question is do you guys know if we upgrade the pool does this > process fix some issues in the metadata of the pool ? > We''ve been holding back the upgrade because we know that after the > upgrade there is no way to return to version 10.To remain more flexible, avoid zpool version 29 or later.> > Does anybody has experienced corruption in the pool without a hardware > failure ?Yes, but I don''t think that is your current problem.> Is there any tools or procedures to find corruption on the pool or > File systems inside the pool ? (besides scrub)scrub is the method.> > So far we went through the connections cables, ports and controllers > between the storage and the server everything seems fine, we''ve > swapped network interfaces, cables, switch ports etc etc. > > > Any ideas would be really appreciate it.HTH, -- richard -- ZFS Performance and Training Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120127/d0fceedf/attachment.html>