Hi all: I''m a new guy who is just started ZFS for half a year. We are using Nexenta in corporate pilot environment. these days, when I was trying to move around 4TB data from an old pool(4*2TB raidz) to new pool (11*2TB raidz2), it seems will never end up successfully. 1. I used CP first. the initial speed is great @ 250MB/s. but it will suddenly become slow at 400KB/s when around 400-600GB data is copied. I''ve tried several times. 2. Use Rsync. Speed is crab, around 15MB/s. it will end up good but performance seems dropped even after Rsync finished. 3. use ZFS send|recv. Speed is close to CP. but still suddenly go down to xxxKB/s after around 1.5TB migrated. 4. I''ve tried CP to remote server which has 1Gb nic connection. It can be finished but I can see a significant system slow down. what''s the root cause of that? is it because the memory is too low? all my ZFS system has 12GB memory. I''m considering adding at least 12GB more to each of them. thanks fei -- This message posted from opensolaris.org
On 09/ 9/10 01:14 PM, Fei Xu wrote:> Hi all: > > I''m a new guy who is just started ZFS for half a year. We are using Nexenta in corporate pilot environment. these days, when I was trying to move around 4TB data from an old pool(4*2TB raidz) to new pool (11*2TB raidz2), it seems will never end up successfully. > 1. I used CP first. the initial speed is great @ 250MB/s. but it will suddenly become slow at 400KB/s when around 400-600GB data is copied. I''ve tried several times. >11*2TB raidz2 is a bad idea. The stripe is too wide, so performance will suffer.> 2. Use Rsync. Speed is crab, around 15MB/s. it will end up good but performance seems dropped even after Rsync finished. > 3. use ZFS send|recv. Speed is close to CP. but still suddenly go down to xxxKB/s after around 1.5TB migrated. > 4. I''ve tried CP to remote server which has 1Gb nic connection. It can be finished but I can see a significant system slow down. > > what''s the root cause of that? > is it because the memory is too low? all my ZFS system has 12GB memory. I''m considering adding at least 12GB more to each of them. > >Run zpool iostat -v on the receiver when things slow down and post the result. I moved (send/receive) 2TB of data across a LAN last week and there wasn''t any variation in speed. -- Ian.
thank you Ian. I''ve re-build the pool to 9*2TB Raidz2 and start the ZFS send command. result will come out after about 3 hours. thanks fei -- This message posted from opensolaris.org
now it gets extremly slow at around 400G sent. first iostat result is captured when the send operation starts. capacity operations bandwidth pool alloc free read write read write ----------- ----- ----- ----- ----- ----- ----- sh001a 37.6G 16.2T 0 1.17K 82 146M raidz2 37.6G 16.2T 0 1.17K 82 146M c0t10d0 - - 0 201 974 21.0M c0t11d0 - - 0 201 974 21.1M c0t23d0 - - 0 201 1.56K 21.0M c0t24d0 - - 0 201 1.26K 21.0M c0t25d0 - - 0 201 662 21.1M c0t26d0 - - 0 201 1.26K 21.1M c0t2d0 - - 0 202 974 21.1M c0t5d0 - - 0 200 662 20.9M c0t6d0 - - 0 200 1.26K 21.0M ----------- ----- ----- ----- ----- ----- ----- syspool 10.6G 137G 11 13 668K 137K c3d0s0 10.6G 137G 11 13 668K 137K ----------- ----- ----- ----- ----- ----- ----- vol1 5.40T 1.85T 621 5 76.9M 12.4K raidz1 5.40T 1.85T 621 5 76.9M 12.4K c0t22d0 - - 280 3 19.5M 14.2K c0t3d0 - - 279 3 19.5M 13.9K c0t20d0 - - 280 3 19.5M 14.2K c0t21d0 - - 280 3 19.5M 13.9K ----------- ----- ----- ----- ----- ----- ----- ----------------------------------------------------------------------------------------------- below result is when ZFS send stuck @ 397G. Seems the HD I/O is quite normal. then, where is the data... notice that, IOstat command response very slow. capacity operations bandwidth pool alloc free read write read write ----------- ----- ----- ----- ----- ----- ----- sh001a 397G 15.9T 0 1.08K 490 136M raidz2 397G 15.9T 0 1.08K 490 136M c0t10d0 - - 0 185 1.68K 19.4M c0t11d0 - - 0 185 1.71K 19.4M c0t23d0 - - 0 185 1.99K 19.4M c0t24d0 - - 0 185 1.79K 19.4M c0t25d0 - - 0 185 2.10K 19.4M c0t26d0 - - 0 185 2.07K 19.4M c0t2d0 - - 0 185 1.99K 19.4M c0t5d0 - - 0 185 2.12K 19.4M c0t6d0 - - 0 185 2.23K 19.4M ----------- ----- ----- ----- ----- ----- ----- syspool 10.6G 137G 2 6 131K 48.0K c3d0s0 10.6G 137G 2 6 131K 48.0K ----------- ----- ----- ----- ----- ----- ----- vol1 5.40T 1.85T 1009 1 125M 2.85K raidz1 5.40T 1.85T 1009 1 125M 2.85K c0t22d0 - - 453 0 31.6M 2.64K c0t3d0 - - 452 0 31.6M 2.58K c0t20d0 - - 453 0 31.6M 2.64K c0t21d0 - - 453 0 31.6M 2.56K ----------- ----- ----- ----- ----- ----- ----- -- This message posted from opensolaris.org
On 09/ 9/10 02:42 PM, Fei Xu wrote:> now it gets extremly slow at around 400G sent. > > first iostat result is captured when the send operation starts. > > capacity operations bandwidth > pool alloc free read write read write > ----------- ----- ----- ----- ----- ----- ----- > sh001a 37.6G 16.2T 0 1.17K 82 146M > raidz2 37.6G 16.2T 0 1.17K 82 146M ><snip>> ----------------------------------------------------------------------------------------------- > below result is when ZFS send stuck @ 397G. Seems the HD I/O is quite normal. then, where is the data... notice that, IOstat command response very slow. > > capacity operations bandwidth > pool alloc free read write read write > ----------- ----- ----- ----- ----- ----- ----- > sh001a 397G 15.9T 0 1.08K 490 136M > raidz2 397G 15.9T 0 1.08K 490 136M >Have you get dedup enabled? Note the read bandwith is much higher. -- Ian.
> > ve you get dedup enabled? Note the read bandwith is > much higher. > > -- > Ian. >no, dedup is not enabled since it''s still not stable enough even for test environment. here is a JPG of Read/Write indicator. RED line is read and GREEN line is write. you can see, because destination pool has more disks, more throughput ability, so, most of the time, it''s just waiting for source pool reading. http://bbs.manmi.com/UploadFile/2010-9/20109911304144848.jpg -- This message posted from opensolaris.org
I dig deeper into it and might find some useful information. I attached an X25 SSD for ZIL to see if it helps. but no luck. I run IOstate -xnz for more details and got interesting result as below.(maybe too long) some explaination: 1. c2d0 is SSD for ZIL 2. c0t3d0, c0t20d0, c0t21d0, c0t22d0 is source pool. 3. you can see the first result is the same as zpool iostate -v, and I''ve ran -xnz several times, the first result were always good. 4. then, 30s later comes up the second one, and then 3rd, 4th. you can see my source pool disks are extremely slow and busy. One by One. asvc_t is high, %b is high. that''s abnormal. 5. I think it might because those 4 drives are the old WD green disks with "defect" of disk head park/unpark and TLER. other 12 2TB disks are the latest one which already fixed. 6. Let me try to test on 1TB disk and see if any issue. root at sh1slcs001:/volumes# iostat -xnz 30 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.4 0.0 3.0 0.3 0.0 0.0 0.0 0.2 0 0 c2d0 1.4 8.9 57.8 40.3 0.0 0.1 2.5 5.8 1 5 c3d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 986.5 0 0 fd0 2.8 154.8 141.5 16517.7 0.0 2.4 0.1 15.0 2 25 c0t2d0 408.4 0.3 29140.4 0.9 0.0 4.1 0.1 10.1 2 55 c0t3d0 2.8 154.9 136.7 16519.6 0.0 2.4 0.1 14.9 2 25 c0t5d0 2.8 155.0 139.4 16520.9 0.0 2.4 0.1 15.0 2 26 c0t6d0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.1 0 0 c0t7d0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 3.4 0 0 c0t8d0 0.0 0.0 0.4 0.0 0.0 0.0 0.0 0.1 0 0 c0t9d0 2.8 154.9 139.2 16523.9 0.0 2.4 0.1 15.2 2 26 c0t10d0 2.8 155.0 136.1 16525.4 0.0 2.5 0.1 15.7 2 26 c0t11d0 409.9 0.3 29152.1 0.9 0.0 3.9 0.1 9.4 2 52 c0t20d0 409.5 0.3 29152.0 0.9 0.0 3.9 0.1 9.6 2 52 c0t21d0 409.2 0.3 29155.7 0.9 0.0 4.1 0.1 10.0 2 55 c0t22d0 2.8 155.0 138.6 16527.6 0.0 2.4 0.1 15.2 2 26 c0t23d0 2.9 154.9 138.0 16528.8 0.0 2.4 0.1 15.1 2 26 c0t24d0 2.9 155.0 136.1 16530.2 0.0 2.5 0.1 15.8 2 27 c0t25d0 2.8 155.1 139.7 16531.6 0.0 2.4 0.1 15.5 2 26 c0t26d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.3 0.0 1.2 0.0 0.0 0.0 0.0 0.1 0 0 c2d0 0.1 17.7 0.1 51.7 0.0 0.1 0.2 4.1 0 7 c3d0 0.1 2.1 0.0 79.8 0.0 0.0 0.1 4.0 0 0 c0t2d0 0.2 0.0 7.1 0.0 0.1 2.3 278.5 11365.1 1 46 c0t3d0 0.1 2.2 0.0 79.9 0.0 0.0 0.1 3.7 0 0 c0t5d0 0.1 2.3 0.0 80.0 0.0 0.0 0.1 9.2 0 0 c0t6d0 0.1 2.5 0.0 80.1 0.0 0.0 0.1 3.8 0 0 c0t10d0 0.1 2.4 0.0 80.0 0.0 0.0 0.1 9.5 0 0 c0t11d0 1.9 0.0 133.0 0.0 0.1 2.8 60.2 1520.6 2 51 c0t20d0 1.6 0.0 110.9 0.0 0.0 0.0 0.0 7.6 0 0 c0t21d0 1.6 0.0 109.5 0.0 0.0 0.0 0.0 11.5 0 0 c0t22d0 0.1 2.4 0.0 80.0 0.0 0.0 0.1 8.6 0 0 c0t23d0 0.1 2.6 0.0 80.0 0.0 0.0 0.1 2.9 0 0 c0t24d0 0.1 2.4 0.0 79.9 0.0 0.0 0.1 6.2 0 0 c0t25d0 0.1 2.3 0.0 79.9 0.0 0.0 0.1 7.8 0 0 c0t26d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.0 0.6 0.0 0.0 0.0 0.0 0.1 0 0 c2d0 0.0 21.3 0.0 82.3 0.0 0.1 0.4 4.0 0 8 c3d0 0.0 2.1 0.0 68.2 0.0 0.0 0.2 3.8 0 0 c0t2d0 0.7 0.0 39.1 0.0 0.0 0.6 64.0 884.1 1 10 c0t3d0 0.0 2.2 0.0 68.2 0.0 0.0 0.1 3.5 0 0 c0t5d0 0.0 2.3 0.0 68.3 0.0 0.0 0.1 3.5 0 0 c0t6d0 0.0 2.3 0.0 68.3 0.0 0.0 0.1 3.2 0 0 c0t10d0 0.0 2.2 0.0 68.3 0.0 0.0 0.1 3.7 0 0 c0t11d0 2.0 0.0 133.7 0.0 0.0 0.0 0.0 9.4 0 0 c0t20d0 2.1 0.0 135.8 0.0 0.1 5.2 67.8 2498.1 3 88 c0t21d0 2.8 0.0 134.4 0.0 0.0 0.0 0.0 3.2 0 0 c0t22d0 0.0 2.1 0.0 68.2 0.0 0.0 0.1 3.6 0 0 c0t23d0 0.0 2.4 0.0 68.4 0.0 0.0 0.1 3.1 0 0 c0t24d0 0.0 2.4 0.0 68.4 0.0 0.0 0.1 3.4 0 0 c0t25d0 0.0 2.2 0.0 68.3 0.0 0.0 0.1 3.3 0 0 c0t26d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 0.0 0.6 0.0 0.0 0.0 0.0 0.1 0 0 c2d0 0.1 16.5 0.1 44.5 0.0 0.1 0.2 4.8 0 7 c3d0 0.0 2.3 0.0 73.8 0.0 0.0 0.1 3.8 0 0 c0t2d0 3.5 0.0 246.8 0.0 0.0 0.8 6.3 229.8 1 20 c0t3d0 0.0 2.0 0.0 73.7 0.0 0.1 0.1 48.3 0 2 c0t5d0 0.0 2.2 0.0 73.8 0.0 0.0 0.1 5.4 0 0 c0t6d0 0.0 2.4 0.0 74.0 0.0 0.0 0.1 5.4 0 0 c0t10d0 0.0 2.5 0.0 74.0 0.0 0.0 0.1 4.7 0 0 c0t11d0 3.1 0.0 136.6 0.0 0.0 0.0 0.0 1.4 0 0 c0t20d0 0.7 0.0 29.2 0.0 0.0 0.6 0.0 911.0 0 12 c0t21d0 1.9 0.0 138.7 0.0 0.1 4.7 73.0 2428.6 2 66 c0t22d0 0.0 2.2 0.0 73.9 0.0 0.0 0.1 5.4 0 0 c0t23d0 0.0 2.3 0.0 73.9 0.0 0.1 0.1 27.7 0 2 c0t24d0 0.0 2.4 0.0 74.0 0.0 0.0 0.1 4.7 0 0 c0t25d0 0.0 2.3 0.0 73.9 0.0 0.0 0.1 7.3 0 0 c0t26d0 -- This message posted from opensolaris.org
On 08 September, 2010 - Fei Xu sent me these 5,9K bytes:> I dig deeper into it and might find some useful information. > I attached an X25 SSD for ZIL to see if it helps. but no luck. > I run IOstate -xnz for more details and got interesting result as below.(maybe too long) > some explaination: > 1. c2d0 is SSD for ZIL > 2. c0t3d0, c0t20d0, c0t21d0, c0t22d0 is source pool....> extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.3 0.0 1.2 0.0 0.0 0.0 0.0 0.1 0 0 c2d0 > 0.1 17.7 0.1 51.7 0.0 0.1 0.2 4.1 0 7 c3d0 > 0.1 2.1 0.0 79.8 0.0 0.0 0.1 4.0 0 0 c0t2d0 > 0.2 0.0 7.1 0.0 0.1 2.3 278.5 11365.1 1 46 c0t3d0Service time here is crap. 11 seconds to reply.> 0.1 2.2 0.0 79.9 0.0 0.0 0.1 3.7 0 0 c0t5d0 > 0.1 2.3 0.0 80.0 0.0 0.0 0.1 9.2 0 0 c0t6d0 > 0.1 2.5 0.0 80.1 0.0 0.0 0.1 3.8 0 0 c0t10d0 > 0.1 2.4 0.0 80.0 0.0 0.0 0.1 9.5 0 0 c0t11d0 > 1.9 0.0 133.0 0.0 0.1 2.8 60.2 1520.6 2 51 c0t20d01.5 seconds to reply. crap.> extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device...> 0.7 0.0 39.1 0.0 0.0 0.6 64.0 884.1 1 10 c0t3d0...> 2.1 0.0 135.8 0.0 0.1 5.2 67.8 2498.1 3 88 c0t21d0...> extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device...> 3.5 0.0 246.8 0.0 0.0 0.8 6.3 229.8 1 20 c0t3d0...> 0.7 0.0 29.2 0.0 0.0 0.6 0.0 911.0 0 12 c0t21d0 > 1.9 0.0 138.7 0.0 0.1 4.7 73.0 2428.6 2 66 c0t22d0... Service times here are crap. Disks are malfunctioning in some way. If your source disks can take seconds (or 10+ seconds) to reply, then of course your copy will be slow. Disk is probably having a hard time reading the data or something. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
> > Service times here are crap. Disks are malfunctioning > in some way. If > your source disks can take seconds (or 10+ seconds) > to reply, then of > course your copy will be slow. Disk is probably > having a hard time > reading the data or something. >Yeah, that should not go over 15ms. I just cannot understand why it starts ok with hundred GB files transfered and then suddenly fall to "sleep". by the way, WDIDLE time is already disabled which might cause some issue. I''ve changed to another system to test ZFS send between 8*1TB pool and 4*1TB pool. hope everythings OK on this case. -- This message posted from opensolaris.org
On Sep 9, 2010, at 8:27 AM, Fei Xu <twinsenxu at hotmail.com> wrote:>> >> Service times here are crap. Disks are malfunctioning >> in some way. If >> your source disks can take seconds (or 10+ seconds) >> to reply, then of >> course your copy will be slow. Disk is probably >> having a hard time >> reading the data or something. >> > > > Yeah, that should not go over 15ms. I just cannot understand why it starts ok with hundred GB files transfered and then suddenly fall to "sleep". > by the way, WDIDLE time is already disabled which might cause some issue. I''ve changed to another system to test ZFS send between 8*1TB pool and 4*1TB pool. hope everythings OK on this case.This might be the dreaded WD TLER issue. Basically the drive keeps retrying a read operation over and over after a bit error trying to recover from a read error themselves. With ZFS one really needs to disable this and have the drives fail immediately. Check your drives to see if they have this feature, if so think about replacing the drives in the source pool that have long service times and make sure this feature is disabled on the destination pool drives. -Ross
On Sep 9, 2010, at 8:27 AM, Fei Xu <twinsenxu at hotmail.com> wrote:> This might be the dreaded WD TLER issue. Basically the drive keeps retrying a read operation over and over after a bit error trying to recover from a > read error themselves. With ZFS one really needs to disable this and have the drives fail immediately.> Check your drives to see if they have this feature, if so think about replacing the drives in the source pool that have long service times and make sure this feature is disabled on the destination pool drives.> -RossIt might be due tler-issues, but I''d try to pin greens down to SATA1-mode (use jumper, or force via controller). It might help a bit with these disks, although these are not really suitable disks for any use in any raid configurations due tler issue, which cannot be disabled in later firmware versions. Yours Markus Kovero
On Thu, 9 Sep 2010 14:05:51 +0000, Markus Kovero <Markus.Kovero at nebula.fi> wrote:> On Sep 9, 2010, at 8:27 AM, Fei Xu <twinsenxu at hotmail.com> wrote: > > >> This might be the dreaded WD TLER issue. Basically the drive keeps retrying a read operation over and over after a bit error trying to recover from a > read error themselves. With ZFS one really needs to disable this and have the drives fail immediately. > >> Check your drives to see if they have this feature, if so think about replacing the drives in the source pool that have long service times and make sure this feature is disabled on the destination pool drives. > >> -Ross > > > It might be due tler-issues, but I''d try to pin greens down to > SATA1-mode (use jumper, or force via controller). It might help a bit > with these disks, although these are not really suitable disks for any > use in any raid configurations due tler issue, which cannot be > disabled in later firmware versions. > > Yours > Markus Kovero >Just to clarify - do you mean TLER should be off or on? TLER = Time Limited Error Recovery so the drive only takes a max time (eg: 7 seconds) to retrieve data or returns an error. So you say ''cannot be disabled'' but I think you mean ''cannot be ENABLED'' ? I''ve been doing a lot of research for a new storage box at work, and from reading a lot of the info available in the Storage forum on hardforum.com, the experts there seem to recommend NOT having TLER enabled when using ZFS as ZFS can be configured for its timeouts, etc, and the main reason to use TLER is when using those drives with hardware RAID cards which will kick a drive out of the array if it takes longer than 10 seconds. Can anyone else here comment if they have had experience with the WD drives and ZFS and if they have TLER enabled or disabled? Cheers, Mark
>>>>> "ml" == Mark Little <marklittle at koallo.com> writes:ml> Just to clarify - do you mean TLER should be off or on? It should be set to ``do not have asvc_t 11 seconds and <1 io/s''''. ...which is not one of the settings of the TLER knob. This isn''t a problem with the TLER *setting*. TLER does not even apply unless the drive has a latent sector error. TLER does not even apply unless the drive has a latent sector error. TLER does not even apply unless the drive has a latent sector error. GOT IT? so if the drive is not defective, but is erratically having huge latency when not busy, this isn''t a TLER problem. It''s a drive-is-unpredictable-piece-of-junk problem. Will the problem go away if you change the TLER setting to the opposite of whatever it is? Who knows?! It shouldn''t based on the claimed purpose of TLER, but in reality, maybe, maybe not, because the drive shouldn''t (``shouldn''t'''', haha) act like that to begin with. It will be more likely to go away if you replace the drive with a different model, though. ml> Storage forum on hardforum.com, the experts there seem to ml> recommend NOT having TLER enabled when using ZFS as ZFS can be ml> configured for its timeouts, etc, I don''t believe there are any configurable timeouts in ZFS. The ZFS developers take the position that timeouts are not our problem and push all that work down the stack to the controller driver and the disk driver, which cooperate (this is two drivers, now. plus a third ``SCSI mid-layer'''' perhaps, for some controllers but not others.) to implement a variety of inconsistent, silly, undocumented cargo-cult flailing timeout regimes that we all have to put up with. However they are always quite long. The ATA max timeout is 30sec, and AIUI they are all much longer than that. My new favorite thing, though, is the reference counting. OS: ``This disk/iSCSIdisk is `busy'' so you can''t detach it''''. me: ``bullshit. YOINK, detached, now deal with it.'''' IMO this area is in need of some serious bar-raising. ml> and the main reason to use TLER is when using those drives ml> with hardware RAID cards which will kick a drive out of the ml> array if it takes longer than 10 seconds. yup. which is something the drive will not do unless it encounters an ERROR. that is the E in TLER. In other words, the feature as described prevents you from noticing and invoking warranty replacement on your about-to-fail drive. For this you pay double. Have I got that right? In any case the obvious proper place to fix this is in the RAID-on-a-card firmware, not the disk firmware, if it does even need fixing which is unclear to me. unless the disk manufacturers are going to offer a feature ``do not spend more than 1 second out of every 2 seconds `trying harder'' to read marginal data, just return errors'''' which woudl actually have real value, the only reason TLER is proper is that it can convince all you gamers to pay twice as much for a drive because they''ve flipped a single bit in the firmware and then shovelled a big pile of bullshit into your heads. ml> Can anyone else here comment if they have had experience with ml> the WD drives and ZFS and if they have TLER enabled or ml> disabled? I do not have any problems with drives dropping out of ZFS using the normal TLER setting. I do have problems with slowly-failing drives fucking up the whole system. ZFS doesn''t deal with them gracefully, and I have to find the bad drive and remove it by hand. All this stuff about cold spares automatically replacing and USARS never notice, is largely a fantasy. Neither observation leads me to want TLER. however observations like this ``why did my disks suddenly slow down?'''' lead me to avoid WD drives period, for ZFS or not ZFS or anything at all. Whipping up all this marketing sillyness around TLER also leads me to avoid them because I know they will shovel bullshit and FUD to justify jacked prices. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100909/1663ce1d/attachment.bin>
Just to update the status and findings. I''ve checked TLER settings and they are off by default. I moved the source pool to another chassis and do the 3.8TB send again. this time, not any problems! the difference is 1. New chassis 2. BIGGER memory. 32GB v.s 12GB 3. although wdidle time is disabled by default, I''ve change the HD mode from silent to performance in HDtune. this is what I once heard from some website that might also fix the disk head park/unpark issue (aka, C1). seems TLER is not the root cause or at least, set to off is ok. my next step will be 1. move back HD to see if it''s the "performance mode" fix the issue 2. if not, add more memory and try again. by the way, in HDtune, I saw C7: Ultra DMA CRC error count is a little high which indicates a potential connection issue. Maybe all are caused by the enclosure? -- This message posted from opensolaris.org
Richard Elling
2010-Sep-11 00:52 UTC
[zfs-discuss] performance leakage when copy huge data
On Sep 9, 2010, at 5:55 PM, Fei Xu wrote:> Just to update the status and findings.Thanks for the update.> I''ve checked TLER settings and they are off by default. > > I moved the source pool to another chassis and do the 3.8TB send again. this time, not any problems! the difference is > 1. New chassisCan you describe the old and new chassis in detail? Model numbers?> 2. BIGGER memory. 32GB v.s 12GBIt is not a memory issue.> 3. although wdidle time is disabled by default, I''ve change the HD mode from silent to performance in HDtune. this is what I once heard from some website that might also fix the disk head park/unpark issue (aka, C1).Not a bad idea.> seems TLER is not the root cause or at least, set to off is ok.Definitely not a TLER issue.> my next step will be > 1. move back HD to see if it''s the "performance mode" fix the issue > 2. if not, add more memory and try again.It is not a memory issue.> by the way, in HDtune, I saw C7: Ultra DMA CRC error count is a little high which indicates a potential connection issue. Maybe all are caused by the enclosure?Bingo! -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
> > by the way, in HDtune, I saw C7: Ultra DMA CRC > error count is a little high which indicates a > potential connection issue. Maybe all are caused by > the enclosure? > > Bingo!You are right, I''ve done a lot of tests and the defect is narrorw down the "problem hardware". The two pool works fine in one chassis but after moved to original enclosure, it just failed when CP or ZFS send. I also noticed when the machine bootup reading ZFS configure, there is a warning message. " REading ZFS config:" *Warning" /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0(mpt0): Discovery in progress, can''t verify IO unit config. I did search a lot but cannot find more details. my 2 server configuration: 1. "PRoblem chassis" supermicro SuperChassis847e2. Tysonberg MB with onboard LSI 1068e (IT mode, which direct expose HD to system without RAID), Single Xeon5520. 2. "Good Chassis":Self-developed chassis by other department. S5000WB MB, single E5504, 2 PCIe-4x LSI 3081 HBA card. Seems the SAS cable are all connecting right. I suspect the issue of onboard 1068e and moving the LSI3081 card to the "problem" server to test. -- This message posted from opensolaris.org