Kyle McDonald
2010-May-20 15:05 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Hi all, I recently installed Nexenta Community 3.0.2 on one of my servers: IBM eSeries X346 2.8Ghz Xeon 12GB DDR2 RAM 1 builtin BGE interface for management 4 port Intel GigE card aggregated for Data IBM ServRAID 7k with 256MB BB Cache with (isp driver) 6 RAID0 single drive LUNS (so I can use the Cache) 1 18GB LUN for the rpool 5 300GB LUN for the data pool 1 RAIDZ1 pool from the 5 300GB drives. 4 test filesystems 1 No Dedup, No Compression 1 DeDup, No Compression 1 No DeDup, Compression 1 DeDup, Compression This is pretty old hardware, so I wasn''t expecting miracles, but I thought I''d give it a shot. My work load is NFS service to software build servers (cvs checkouts, un tarring files, compiling, etc.) I''m hoping the many CVS checkout trees will lend themselves to DeDup well, and I know source code should compress easily. I setup one client with a single GigE connection, mounted the four file systems (plus one from the netapp we have here) and proceeded to write a loop to time both un-tarring the gcc-4.3.3 sources to those 5 filesystems, and to 1 local directory, and to rm -rf the sources too. The tar took 28 seconds and 10 seconds to remove in the local dir, then on the first ZFS/NFS filesystem mount, it took basically forever and hung the Nexenta server. I was watching it go on the web admin page and it all looked fine for a while, then the client started reporting ''NFS Server not responding, still trying...'' For a while, there were Also ''NFS Server OK'' messages too, and the Web GUI remained responsive. Eventually The OK messages stopped, and the Web GUI froze. I went an rebooted the NFS client thinking that id the requests stopped the Server might catch up, but it never started responding again. I was only untarring a file.. How did this bring the machine down? I hadn''t even gotten to the FS''s that had SeSup or Compression turned on, so those shouldn''t have affected things - yet. Any ideas? -Kyle
Erast
2010-May-20 17:00 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Hi Kyle, very likely that you hit driver bug in isp. After the reboot, take a look on /var/adm/messages file - anything related might shed some light. I wouldn''t suspect Intel GigE card - fairly good one and driver is very stable. Also, some upgrades posted, make sure the kernel displays 134e after the reboot into the new upgrade checkpoint. The upgrade command: nmc$ setup appliance upgrade On 05/20/2010 08:05 AM, Kyle McDonald wrote:> Hi all, > > I recently installed Nexenta Community 3.0.2 on one of my servers: > > IBM eSeries X346 > 2.8Ghz Xeon > 12GB DDR2 RAM > 1 builtin BGE interface for management > 4 port Intel GigE card aggregated for Data > IBM ServRAID 7k with 256MB BB Cache with (isp driver) > 6 RAID0 single drive LUNS (so I can use the Cache) > 1 18GB LUN for the rpool > 5 300GB LUN for the data pool > 1 RAIDZ1 pool from the 5 300GB drives. > 4 test filesystems > 1 No Dedup, No Compression > 1 DeDup, No Compression > 1 No DeDup, Compression > 1 DeDup, Compression > > This is pretty old hardware, so I wasn''t expecting miracles, but I > thought I''d give it a shot. > My work load is NFS service to software build servers (cvs checkouts, un > tarring files, compiling, etc.) I''m hoping the many CVS checkout trees > will lend themselves to DeDup well, and I know source code should > compress easily. > > I setup one client with a single GigE connection, mounted the four file > systems (plus one from the netapp we have here) and proceeded to write a > loop to time both un-tarring the gcc-4.3.3 sources to those 5 > filesystems, and to 1 local directory, and to rm -rf the sources too. > > The tar took 28 seconds and 10 seconds to remove in the local dir, then > on the first ZFS/NFS filesystem mount, it took basically forever and > hung the Nexenta server. I was watching it go on the web admin page and > it all looked fine for a while, then the client started reporting ''NFS > Server not responding, still trying...'' For a while, there were Also > ''NFS Server OK'' messages too, and the Web GUI remained responsive. > Eventually The OK messages stopped, and the Web GUI froze. > > I went an rebooted the NFS client thinking that id the requests stopped > the Server might catch up, but it never started responding again. > > I was only untarring a file.. How did this bring the machine down? > I hadn''t even gotten to the FS''s that had SeSup or Compression turned > on, so those shouldn''t have affected things - yet. > > Any ideas? > > -Kyle > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Roy Sigurd Karlsbakk
2010-May-20 17:58 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
----- "Travis Tabbal" <travis at tabbal.net> skrev:> Disable ZIL and test again. NFS does a lot of sync writes and kills > performance. Disabling ZIL (or using the synchronicity option if a > build with that ever comes out) will prevent that behavior, and should > get your NFS performance close to local. It''s up to you if you want to > leave it that way. There are reasons not to as well. NFS clients can > get corrupted views of the filesystem should the server go down before > a write flush is completed. ZIL prevents that problem. In my case, the > clients aren''t on a UPS while the server is, so it''s not an issue. :)Disabling ZIL is, according to ZFS best practice, NOT recommended. Get some SSD for the Zil instead, preferably mirrored. You won''t need a lot, ZIL never uses more than half the RAM size Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Travis Tabbal
2010-May-20 17:59 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Disable ZIL and test again. NFS does a lot of sync writes and kills performance. Disabling ZIL (or using the synchronicity option if a build with that ever comes out) will prevent that behavior, and should get your NFS performance close to local. It''s up to you if you want to leave it that way. There are reasons not to as well. NFS clients can get corrupted views of the filesystem should the server go down before a write flush is completed. ZIL prevents that problem. In my case, the clients aren''t on a UPS while the server is, so it''s not an issue. :) -- This message posted from opensolaris.org
David Magda
2010-May-20 18:32 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Thu, May 20, 2010 13:58, Roy Sigurd Karlsbakk wrote:> ----- "Travis Tabbal" <travis at tabbal.net> skrev: > >> Disable ZIL and test again. NFS does a lot of sync writes and kills >> performance. Disabling ZIL (or using the synchronicity option if a >> build with that ever comes out) will prevent that behavior, and should >> get your NFS performance close to local. It''s up to you if you want to >> leave it that way. There are reasons not to as well. NFS clients can >> get corrupted views of the filesystem should the server go down before >> a write flush is completed. ZIL prevents that problem. In my case, the >> clients aren''t on a UPS while the server is, so it''s not an issue. :) > > Disabling ZIL is, according to ZFS best practice, NOT recommended. Get > some SSD for the Zil instead, preferably mirrored. You won''t need a lot, > ZIL never uses more than half the RAM sizeDisabling the ZIL is an easy way to TEST whether a ZIL would be helpful. If things speed up after turning it off, then you''d turn it back on, and go and purchase an SSD. There''s no sense spending money if it won''t fix the problem. To the OP, see Section 2.7 ("Disabling the ZIL (Don''t)") of: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide As mentioned, you do NOT want to run with this in production, but it is a quick way to check.
Kyle McDonald
2010-May-21 14:48 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
<SNIP a whole lot of ZIL/SLOG discussion> Hi guys. yep I know about the ZIL, and SSD Slogs. While setting Nextenta up it offered to disable the ZIL entirely. For now I left it on. In the end (hopefully for only specifc filesystems - once that feature is released.) I''ll end up disabling the ZIL for our software builds since: 1) The builds are disposable - We only need to save them if they finish, and we can restart them if needed. 2) The build servers are not on UPS so a power failure is likely to make the clients lose all state and need to restart anyway. But, This issue I''ve seen with Nexenta, is not due to the ZIL. It runs until it literally crashes the machine. It''s not just slow, It brings the machine to it''s knees. I beleive it does have something to do with exhausting memory though. As Erast says it maybe the IPS driver (though I''ve used that on b130 of SXCE without issues,) or who knows what else. I did download some updates from Nexenta yesterday. I''m going to try to retest today or tomorrow. -Kyle