Hello All, In a moment of insanity I''ve upgraded from a 5200+ to a Phenom 9600 on my zfs server and I''ve had a lot of problems with hard hangs when accessing the pool. The motherboard is an Asus M2N32-WS, which has had the latest available BIOS upgrade installed to support the Phenom. bash-3.2# psrinfo -pv The physical processor has 4 virtual processors (0-3) x86 (AuthenticAMD 100F22 family 16 model 2 step 2 clock 2310 MHz) AMD Phenom(tm) 9600 Quad-Core Processor The pool is spread across 12 disks ( 3 x 4 disk raidz groups ) attached to both the motherboard and a Supermicro AOC-SAT2-MV8 in a PCI-X slot (marvell88sx driver). The hangs occur during large writes to the pool, i.e a 10G mkfile, usually just after the physical disk access start, and the file is not created in the directory on the pool at all. The system hard hangs at this point, even with booting under kmdb there''s no panic string and after setting snooping=1 in /etc/system there''s no crash dump created after it reboots. Doing the same operation to a single UFS disk attached to the motherboard''s ATA133 interface doesn''t cause a problem, neither does writing to a raidz pool created from 4 files on that ATA disk. If I use psradm and disable any 2 cores on the Phenom there''s no problem with the mkfile either, but turn a third on and it''ll hang. This is with the virtualization, and power now extensions disabled in the BIOS. So, before I go and shout at the motherboard manufacturer are there any components in b78 that might not be expecting a quad core AMD cpu? Possibly in the marvell88sx driver? Or is there anything more I can do to track this issue down. Thanks, Alan This message posted from opensolaris.org
On Sat, 12 Jan 2008, Alan Romeril wrote: [ .... reformatted .... ]> Hello All,> In a moment of insanity I''ve upgraded from a 5200+ to a Phenom > 9600 on my zfs server and I''ve had a lot of problems with hard hangs > when accessing the pool. The motherboard is an Asus M2N32-WS, which > has had the latest available BIOS upgrade installed to support the > Phenom. > > bash-3.2# psrinfo -pv > The physical processor has 4 virtual processors (0-3) > x86 (AuthenticAMD 100F22 family 16 model 2 step 2 clock 2310 MHz) > AMD Phenom(tm) 9600 Quad-Core Processor > > The pool is spread across 12 disks ( 3 x 4 disk raidz groups ) > attached to both the motherboard and a Supermicro AOC-SAT2-MV8 in a > PCI-X slot (marvell88sx driver). The hangs occur during large > writes to the pool, i.e a 10G mkfile, usually just after the > physical disk access start, and the file is not created in the > directory on the pool at all. The system hard hangs at this point, > even with booting under kmdb there''s no panic string and after > setting snooping=1 in /etc/system there''s no crash dump created > after it reboots. Doing the same operation to a single UFS disk > attached to the motherboard''s ATA133 interface doesn''t cause a > problem, neither does writing to a raidz pool created from 4 files > on that ATA disk. If I use psradm and disable any 2 cores on the > Phenom there''s no problem with the mkfile either, but turn a third > on and it''ll hang. This is with the virtualization, and power now > extensions disabled in the BIOS. > > So, before I go and shout at the motherboard manufacturer are > there any components in b78 that might not be expecting a quad core > AMD cpu? Possibly in the marvell88sx driver? Or is there anything > more I can do to track this issue down.Please read the tomshardware.com article[1] where he found that Phenom upgrade compatibility is not what AMD would have expected/predicted/published. It''s also possible that your CPU VRM (voltage regulators) can''t supply the necessary current when the Phenom gets really busy. The only way to diagnose this issue is to apply "swap-tronics" to the motherboard and power supply. Welcome to the bleeding edge! :( IMHO Phenom is far from ready for prime time. And this is coming from an AMD fanboy who has built, bought and recommended AMD based systems exclusively for the last 2 1/2 years+. Squawking at the motherboard maker is unlikely to get you any satisfaction IMHO. Cut your losses and go back to the 5200+ or build a system based on a Penyrn chip when the less expensive Penyrn family members become available - proba-bobly[2] within 60 days. As an aside, with ZFS, you gain more by maxing out your memory than by spending the equivalent dollars on a CPU upgrade. And memory has *never* been this inexpensive. Recommendation: max out your memory and tune your 5200+ based system for max memory throughput[3]. PS: IMHO Phenom won''t be a real contender until they triple the L3 memory. The architecture is sound, but currently cache-starved IMHO. PPS: On an Sun x2200 system (bottom-of-the-line config [2*2.2GHz dual core CPUs] purchased during Suns anniverserary sale) we "pushed in" a SAS controller, two 140Gb SAS disks and 24Gb of 3rd party RAM[4]. Yes - configured for ZFS boot and ZFS based filesystems exclusively and currently running snv_68 (due to be upgraded when build 80 ships). You cannot believe how responsive this system is - mainly due to the RAM. For a highly performant ZFS system, there are 3 things that you should maximize/optimize: 1) RAM capacity 2) RAM capacity 3) RAM capacity PPPS: Sorry to beat this horse into submission - but! If you have a choice (at a given budget) of 800MHz memory parts at N gigabytes (capacity), or, 667MHz (or 553MHz) memory parts at N * 2 gigabytes - *always*[5] go with the config that gives you the maximum memory capacity. You really won''t notice the difference between 800MHz memory parts and 667MHz memory parts, but you *will* notice the difference between the system with 8Gb of RAM and (the same system with) 16Gbs of RAM when it comes to ZFS (and overall) performance. [1] http://www.tomshardware.com/2007/12/26/phenom_motherboards/ [2] deliberate new word - represents techno uncertainty [3] memtestx86 v3 is your friend. Available on the UBCD (Ultimage Bood CD ROM) [4] odd mixture of 1Gb and 2Gb parts [5] there are some very rare exceptions to this rule - for really unusual workload scenarios (like scientific computing). HTH. Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
Hi, On Sat, 12 Jan 2008, Alan Romeril wrote:> Hello All, > In a moment of insanity I''ve upgraded from a 5200+ to a Phenom 9600 on my > zfs server and I''ve had a lot of problems with hard hangs when accessing the > pool. > The motherboard is an Asus M2N32-WS, which has had the latest available BIOS > upgrade installed to support the Phenom. > > bash-3.2# psrinfo -pv > The physical processor has 4 virtual processors (0-3) > x86 (AuthenticAMD 100F22 family 16 model 2 step 2 clock 2310 MHz) > AMD Phenom(tm) 9600 Quad-Core ProcessorI have almost the same configuration, and with the same problem :-( ASUS M2N32 WS Professional, BIOS 1703 (latest available) Phenom 9600 Kingston PC2-5300, 2GB x 4. AOC-SATA2-MV8, 5 x samsung 750GB disks OpenSolaris NV81 This machine boot fine. I can login with gnome as desktop. But when I try to copy files to a zfs in a zpool created as a raidz of the five samsung disks, it crashes before 30MByte is copied. It has crashed with a fma-error, panics or just as a hang.> The pool is spread across 12 disks ( 3 x 4 disk raidz groups ) attached to > both the motherboard and a Supermicro AOC-SAT2-MV8 in a PCI-X slot > (marvell88sx driver). The hangs occur during large writes to the pool, i.e a > 10G mkfile, usually just after the physical disk access start, and the file > is not created in the directory on the pool at all. The system hard hangs at > this point, even with booting under kmdb there''s no panic string and after > setting snooping=1 in /etc/system there''s no crash dump created after it > reboots. Doing the same operation to a single UFS disk attached to the > motherboard''s ATA133 interface doesn''t cause a problem, neither does writing > to a raidz pool created from 4 files on that ATA disk. If I use psradm and > disable any 2 cores on the Phenom there''s no problem with the mkfile either, > but turn a third on and it''ll hang. This is with the virtualization, and > power now extensions disabled in the BIOS.Thanks for the tip to disable 2 core, this works also for me. But I have had one crash after about 1 day. And how to run with all 4 core is still the question? /stefan
Hi Al, Thanks for the tips, I''ve maxed the memory on the board now (Up to 8GB from 4GB) and you are dead right about it being cheap to do so. I''d upgraded the power supply as I thought that was an issue since the original couldn''t provide enough start-up current but that didn''t make much difference to the hard hangs. However after moaning to ASUS I was given a beta BIOS ( Version 1802 if anyone else needs to chase it up ) and that has made a big difference to the system. It''s now stable! I''m going to keep an eye on things and see how well it performs, hopefully it''ll be worth the upgrade cost and hassle. Cheers, Alan> > > > So, before I go and shout at the motherboard > manufacturer are > > there any components in b78 that might not be > expecting a quad core > > AMD cpu? Possibly in the marvell88sx driver? Or > is there anything > > more I can do to track this issue down. > > Please read the tomshardware.com article[1] where he > found that Phenom > upgrade compatibility is not what AMD would have > expected/predicted/published. It''s also possible > that your CPU VRM > (voltage regulators) can''t supply the necessary > current when the > Phenom gets really busy. > > The only way to diagnose this issue is to apply > "swap-tronics" to the > motherboard and power supply. Welcome to the > bleeding edge! :( > > IMHO Phenom is far from ready for prime time. And > this is coming from > an AMD fanboy who has built, bought and recommended > AMD based systems > exclusively for the last 2 1/2 years+. > > Squawking at the motherboard maker is unlikely to get > you any > satisfaction IMHO. Cut your losses and go back to > the 5200+ or build > a system based on a Penyrn chip when the less > expensive Penyrn family > members become available - proba-bobly[2] within 60 > days. > > As an aside, with ZFS, you gain more by maxing out > your memory than by > spending the equivalent dollars on a CPU upgrade. > And memory has > never* been this inexpensive. Recommendation: max > out your memory > and tune your 5200+ based system for max memory > throughput[3]. > > PS: IMHO Phenom won''t be a real contender until they > triple the L3 > memory. The architecture is sound, but currently > cache-starved IMHO. > > PPS: On an Sun x2200 system (bottom-of-the-line > config [2*2.2GHz dual > core CPUs] purchased during Suns anniverserary sale) > we "pushed in" a > SAS controller, two 140Gb SAS disks and 24Gb of 3rd > party RAM[4]. > Yes - configured for ZFS boot and ZFS based > filesystems exclusively > and currently running snv_68 (due to be upgraded when > build 80 ships). > You cannot believe how responsive this system is - > mainly due to the > RAM. For a highly performant ZFS system, there are 3 > things that you > should maximize/optimize: > > 1) RAM capacity > 2) RAM capacity > 3) RAM capacity > > PPPS: Sorry to beat this horse into submission - but! > If you have a > hoice (at a given budget) of 800MHz memory parts at N > gigabytes > (capacity), or, 667MHz (or 553MHz) memory parts at N > * 2 gigabytes - > *always*[5] go with the config that gives you the > maximum memory > capacity. You really won''t notice the difference > between 800MHz > memory parts and 667MHz memory parts, but you *will* > notice the > difference between the system with 8Gb of RAM and > (the same system > with) 16Gbs of RAM when it comes to ZFS (and overall) > performance. > > [1] > http://www.tomshardware.com/2007/12/26/phenom_motherbo > ards/ > [2] deliberate new word - represents techno > uncertainty > [3] memtestx86 v3 is your friend. Available on the > UBCD (Ultimage > Bood CD ROM) > [4] odd mixture of 1Gb and 2Gb parts > [5] there are some very rare exceptions to this rule > - for really > unusual workload scenarios (like scientific > computing). > > HTH. > > Regards, > > Al Hopper Logical Approach Inc, Plano, TX. > al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 > Timezone: US CDT > enSolaris Governing Board (OGB) Member - Apr 2005 to > Mar 2007 > http://www.opensolaris.org/os/community/ogb/ogb_2005-2 > 007/ > Graduate from "sugar-coating school"? Sorry - I > never attended! :) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ssThis message posted from opensolaris.org