thr3ads.net - zfs discuss - [zfs-discuss] zfs and iscsi performance help [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Ivan Rodriguez

2012-Jan-27 04:25 UTC

[zfs-discuss] zfs and iscsi performance help

Dear fellows,

We have a backup server with a zpool size of 20 TB, we transfer
information using zfs snapshots every day (we have around 300 fs on
that pool),
the storage is a dell md3000i connected by iscsi, the pool is
currently version 10, the same storage is connected
 to another server with a smaller pool of 3 TB(zpool version 10) this
server is working fine and speed is good between the storage
and the server, however  in the server with 20 TB pool performance is
an issue  after we restart the server
performance is good but with the time lets say a week the performance
keeps dropping until we have to
bounce the server again (same behavior with new version of solaris in
this case performance drops in 2 days), no errors in logs or storage
or the zpool status -v

We suspect that the pool has some issues probably there is corruption
somewhere, we tested solaris 10 8/11 with zpool 29,
although we haven''t update the pool itself, with the new solaris the
performance is even worst and every time
that we restart the server we get stuff like this:

 SOURCE: zfs-diagnosis, REV: 1.0
 EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
DESC: All faults associated with an event id have been addressed.
 Refer to http://sun.com/msg/FMD-8000-4M for more information.
 AUTO-RESPONSE: Some system components offlined because of the
original fault may have been brought back online.
 IMPACT: Performance degradation of the system due to the original
fault may have been recovered.
 REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired
components.
[ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
VER: 1, SEVERITY: Minor

And we need to export and import the pool in order to be  able to  access it.

Now my question is do you guys know if we upgrade the pool does this
process  fix some issues in the metadata of the pool ?
We''ve been holding back the upgrade because we know that after the
upgrade there is no way to return to version 10.

Does anybody has experienced corruption in the pool without a hardware
failure ?
Is there any tools or procedures to find corruption on the pool or
File systems inside the pool ? (besides scrub)

So far we went through the connections cables, ports and controllers
between the storage and the server everything seems fine, we''ve
swapped network interfaces, cables, switch ports etc etc.


Any ideas would be really appreciate it.

Cheers
Ivan

Hung-Sheng Tsao (laoTsao)

2012-Jan-27 11:03 UTC

head link

[zfs-discuss] zfs and iscsi performance help

hi
IMHO, upgrade to s11 if possible
use the COMSTAR based iscsi 

Sent from my iPad

On Jan 26, 2012, at 23:25, Ivan Rodriguez <ivanoch at gmail.com> wrote:
> Dear fellows,
> 
> We have a backup server with a zpool size of 20 TB, we transfer
> information using zfs snapshots every day (we have around 300 fs on
> that pool),
> the storage is a dell md3000i connected by iscsi, the pool is
> currently version 10, the same storage is connected
> to another server with a smaller pool of 3 TB(zpool version 10) this
> server is working fine and speed is good between the storage
> and the server, however  in the server with 20 TB pool performance is
> an issue  after we restart the server
> performance is good but with the time lets say a week the performance
> keeps dropping until we have to
> bounce the server again (same behavior with new version of solaris in
> this case performance drops in 2 days), no errors in logs or storage
> or the zpool status -v
> 
> We suspect that the pool has some issues probably there is corruption
> somewhere, we tested solaris 10 8/11 with zpool 29,
> although we haven''t update the pool itself, with the new solaris
the
> performance is even worst and every time
> that we restart the server we get stuff like this:
> 
> SOURCE: zfs-diagnosis, REV: 1.0
> EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
> DESC: All faults associated with an event id have been addressed.
> Refer to http://sun.com/msg/FMD-8000-4M for more information.
> AUTO-RESPONSE: Some system components offlined because of the
> original fault may have been brought back online.
> IMPACT: Performance degradation of the system due to the original
> fault may have been recovered.
> REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired
components.
> [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
> VER: 1, SEVERITY: Minor
> 
> And we need to export and import the pool in order to be  able to  access
it.
> 
> Now my question is do you guys know if we upgrade the pool does this
> process  fix some issues in the metadata of the pool ?
> We''ve been holding back the upgrade because we know that after the
> upgrade there is no way to return to version 10.
> 
> Does anybody has experienced corruption in the pool without a hardware
> failure ?
> Is there any tools or procedures to find corruption on the pool or
> File systems inside the pool ? (besides scrub)
> 
> So far we went through the connections cables, ports and controllers
> between the storage and the server everything seems fine, we''ve
> swapped network interfaces, cables, switch ports etc etc.
> 
> 
> Any ideas would be really appreciate it.
> 
> Cheers
> Ivan
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Gary Mills

2012-Jan-27 14:01 UTC

head link

[zfs-discuss] zfs and iscsi performance help

On Fri, Jan 27, 2012 at 03:25:39PM +1100, Ivan Rodriguez
wrote:> 
> We have a backup server with a zpool size of 20 TB, we transfer
> information using zfs snapshots every day (we have around 300 fs on
> that pool),
> the storage is a dell md3000i connected by iscsi, the pool is
> currently version 10, the same storage is connected
>  to another server with a smaller pool of 3 TB(zpool version 10) this
> server is working fine and speed is good between the storage
> and the server, however  in the server with 20 TB pool performance is
> an issue  after we restart the server
> performance is good but with the time lets say a week the performance
> keeps dropping until we have to
> bounce the server again (same behavior with new version of solaris in
> this case performance drops in 2 days), no errors in logs or storage
> or the zpool status -v
This sounds like a ZFS cache problem on the server.  You might check
on how cache statistics change over time.  Some tuning may eliminate
this degradation.  More memory may also help.  Does a scrub show any
errors?  Does the performance drop affect reads or writes or both?
> We suspect that the pool has some issues probably there is corruption
> somewhere, we tested solaris 10 8/11 with zpool 29,
> although we haven''t update the pool itself, with the new solaris
the
> performance is even worst and every time
> that we restart the server we get stuff like this:
> 
>  SOURCE: zfs-diagnosis, REV: 1.0
>  EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
> DESC: All faults associated with an event id have been addressed.
>  Refer to http://sun.com/msg/FMD-8000-4M for more information.
>  AUTO-RESPONSE: Some system components offlined because of the
> original fault may have been brought back online.
>  IMPACT: Performance degradation of the system due to the original
> fault may have been recovered.
>  REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired
components.
> [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
> VER: 1, SEVERITY: Minor
> 
> And we need to export and import the pool in order to be  able to  access
it.
This is a separate problem, introduced with an upgrade to the Iscsi
service.  The new one has a dependancy on the name service (typically
DNS), which means that it isn''t available when the zpool import is
done during the boot.  Check with Oracle support to see if they have
found a solution.

-- 
-Gary Mills-		-refurb-		-Winnipeg, Manitoba, Canada-

Richard Elling

2012-Jan-27 18:27 UTC

head link

[zfs-discuss] zfs and iscsi performance help

Hi Ivan,

On Jan 26, 2012, at 8:25 PM, Ivan Rodriguez wrote:
> Dear fellows,
> 
> We have a backup server with a zpool size of 20 TB, we transfer
> information using zfs snapshots every day (we have around 300 fs on
> that pool),
> the storage is a dell md3000i connected by iscsi, the pool is
> currently version 10, the same storage is connected
> to another server with a smaller pool of 3 TB(zpool version 10) this
> server is working fine and speed is good between the storage
> and the server, however  in the server with 20 TB pool performance is
> an issue  after we restart the server
> performance is good but with the time lets say a week the performance
> keeps dropping until we have to
> bounce the server again (same behavior with new version of solaris in
> this case performance drops in 2 days), no errors in logs or storage
> or the zpool status -v
> 
> We suspect that the pool has some issues probably there is corruption
> somewhere, we tested solaris 10 8/11 with zpool 29,
> although we haven''t update the pool itself, with the new solaris
the
> performance is even worst and every time
If you upgrade to zpool version 29 or later, then you will be tied to the
lawnmower (Oracle) forever. Several changes related to snapshot performance
were introduced in version 28 and earlier.
> that we restart the server we get stuff like this:
> 
> SOURCE: zfs-diagnosis, REV: 1.0
> EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
> DESC: All faults associated with an event id have been addressed.
> Refer to http://sun.com/msg/FMD-8000-4M for more information.
> AUTO-RESPONSE: Some system components offlined because of the
> original fault may have been brought back online.
> IMPACT: Performance degradation of the system due to the original
> fault may have been recovered.
> REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired
components.
> [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
> VER: 1, SEVERITY: Minor
> 
> And we need to export and import the pool in order to be  able to  access
it.
The MD3000i systems that I have used have an irritating behavior when the LUNs
are scanned (eg during zpool import). There is an out-of-band systems management
LUN that takes up to 1 minute to respond to a SCSI inquiry. During a zpool
import,
Solaris tries to inquire each of the LUNs to see if they contain pool parts.
Depending
on the various timeout values set in the iSCSI client stack, this can be
painful. I am not
aware of a workaround or bug fix on the Dell side and Dell docs just say
"don''t use that
LUN"
> 
> Now my question is do you guys know if we upgrade the pool does this
> process  fix some issues in the metadata of the pool ?
> We''ve been holding back the upgrade because we know that after the
> upgrade there is no way to return to version 10.
To remain more flexible, avoid zpool version 29 or later.
> 
> Does anybody has experienced corruption in the pool without a hardware
> failure ?
Yes, but I don''t think that is your current problem.
> Is there any tools or procedures to find corruption on the pool or
> File systems inside the pool ? (besides scrub)
scrub is the method.
> 
> So far we went through the connections cables, ports and controllers
> between the storage and the server everything seems fine, we''ve
> swapped network interfaces, cables, switch ports etc etc.
> 
> 
> Any ideas would be really appreciate it.
HTH,
 -- richard

--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120127/d0fceedf/attachment.html>

zfs discuss - Jan 2012 - zfs and iscsi performance help

[zfs-discuss] zfs and iscsi performance help

[zfs-discuss] zfs and iscsi performance help

[zfs-discuss] zfs and iscsi performance help

[zfs-discuss] zfs and iscsi performance help