Hello, I''ve got 9 domain Us. They are each a RHEL 5.2 instance. They have 1G ram, 1 cpu, 100G drive. They are paravirtualized. The drives used are created as such: pfexec zfs create -s -V 100G datastore/virtMachine1 The hardware is a Dell 2900, 48G ram, 3.06T 15krm sas drives. OpenSolaris seems to be fairly happy on this system. When I ran all zones, everything was fine and fast, however vendor requires RHEL, and I refuse to give up ZFS, so I had to fire up xVM just so I could run MySQL inside an x86 container called RHEL5.2 Anyway, these domainUs boot, run, work pretty well (slower than zones by about 17% btw), and generally work fine. Except that they crash pretty regularly anywhere inbetween 6 and 10 days. I''ve been searching forums, etc. Not sure what to do. here''s a log entry: Jul 1 15:15:08 ecw-mysql1 unix: [ID 836849 kern.notice] Jul 1 15:15:08 ecw-mysql1 ^Mpanic[cpu0]/thread=ffffff005b7e1c80: Jul 1 15:15:08 ecw-mysql1 genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff005b7e1120 addr=fffffe0a3e18ec20 Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] Jul 1 15:15:08 ecw-mysql1 unix: [ID 839527 kern.notice] sched: Jul 1 15:15:08 ecw-mysql1 unix: [ID 753105 kern.notice] #pf Page fault Jul 1 15:15:08 ecw-mysql1 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xfffffe0a3e18ec20 Jul 1 15:15:08 ecw-mysql1 unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffffb8a0663, sp=0xffffff005b7e1218, eflags=0x10246 Jul 1 15:15:08 ecw-mysql1 unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2660<vmxe,xmme,fxsr,mce,pae> Jul 1 15:15:08 ecw-mysql1 unix: [ID 624947 kern.notice] cr2: fffffe0a3e18ec20 Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rdi: fffffe0a3e18ec20 rsi: 0 rdx: e0508673 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rcx: 3 r8: 0 r9: ffffff0cb9384000 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rax: 0 rbx: e0508673 rbp: ffffff005b7e12b0 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] r10: 0 r11: ffffff0000002000 r12: 0 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] r13: 1 r14: fffffe0a3e18ec20 r15: e0508673 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] fsb: 0 gsb: fffffffffbc5ef70 ds: 4b Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] trp: e err: 3 rip: fffffffffb8a0663 Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] cs: e030 rfl: 10246 rsp: ffffff005b7e1218 Jul 1 15:15:08 ecw-mysql1 unix: [ID 266532 kern.notice] ss: e02b Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1000 unix:die+10f () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1110 unix:trap+1768 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1120 unix:_cmntrap+12f () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e12b0 unix:atomic_cas_ptr+3 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1350 unix:hati_pte_map+160 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e13d0 unix:hati_load_common+15d () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1490 unix:hat_devload+15d () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e14f0 rootnex:rootnex_map_regspec+151 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e15a0 rootnex:rootnex_map+141 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e15f0 genunix:ddi_map+51 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e16e0 npe:npe_bus_map+43d () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1720 pcie_pci:pepb_bus_map+31 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1760 pcie_pci:pepb_bus_map+31 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e17b0 genunix:ddi_map+51 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1870 genunix:ddi_regs_map_setup+d5 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e18c0 genunix:pci_config_setup+69 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1900 pcie:pcie_init_bus+41 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1a30 pcie_pci:pepb_initchild+bc () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1ab0 pcie_pci:pepb_ctlops+276 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1af0 genunix:init_node+78 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1b30 genunix:i_ndi_config_node+fa () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1b60 genunix:i_ndi_init_hw_children+48 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1bc0 genunix:config_immediate_children+83 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c10 genunix:devi_config_common+a6 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c60 genunix:mt_config_thread+53 () Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c70 unix:thread_start+8 () Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] Jul 1 15:15:08 ecw-mysql1 genunix: [ID 672855 kern.notice] syncing file systems... Jul 1 15:15:09 ecw-mysql1 genunix: [ID 904073 kern.notice] done Jul 1 15:15:10 ecw-mysql1 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel Jul 1 15:17:51 ecw-mysql1 genunix: [ID 409368 kern.notice] ^M100% done: 1588175 pages dumped, compression ratio 3.44, Jul 1 15:17:51 ecw-mysql1 genunix: [ID 851671 kern.notice] dump succeeded Anyway, sometimes they blame xVM hypervisor for the crash, sometimes not. I''ve got twin Dell 2900s, have moved the domainUs from one machine to the other, same results. Name ID Mem VCPUs State Time(s) Def 1024 1 0.0 Domain-0 0 34154 8 r----- 3871.7 EDB_Bs 1 1024 1 -b---- 803.8 EDB_Faare 8 2048 2 -b---- 1099.3 EDB_Gral 7 1024 1 -b---- 185.2 EDB_NC 6 1024 1 -b---- 290.0 EDB_Tg 2 1024 1 -b---- 45.0 EDB_Way 3 1024 1 -b---- 62.3 EDB_Wel 5 1024 1 -b---- 278.2 EDB_Wnd 9 1024 1 -b---- 306.9 EHX_Dbase 10 4096 1 -b---- 51.3 Iine 4 1024 1 -b---- 76.0 Repair 512 1 13.1 Anyway the crashed occur when the Time(s) for any one domainU gets up around 25000 or so. These are production databases, so they do get a lot of work. Anyway, it''s aggravating when the servers die like that, but zfs is there helping out, so that''s nice. no idea if any of this makes sense, it''s late, and I''m not too concerned about it anymore, however any help would be great! thanks, Jack -- This message posted from opensolaris.org
On Thu, Jul 2, 2009 at 1:55 PM, Jack<no-reply@opensolaris.org> wrote:> When I ran all zones, everything was fine and fast, however vendor requires RHEL, and I refuse to give up ZFS, so I had to fire up xVM just so I could run MySQL inside an x86 container called RHEL5.2Is that all you''re running on domU? MySQL? Since MySQL is owned by Sun they SHOULD support running on solaris zone :P And why 5.2? why not 5.3? Last time I check my 5.3 domU works fine running alfresco (which includes MySQL). You might want to try upgrading at least domU kernel. Also, what version of opensolaris are you running? -- Fajar
Jack wrote:> Hello, > I''ve got 9 domain Us. They are each a RHEL 5.2 instance. They have> 1G ram, 1 cpu, 100G drive. They are paravirtualized. The drives used > are created as such:> pfexec zfs create -s -V 100G datastore/virtMachine1 > > The hardware is a Dell 2900, 48G ram, 3.06T 15krm sas drives.> OpenSolaris seems to be fairly happy on this system. This doesn''t have anything to do with your problem... But I''m curious, what do you have for CPUs? Are you limiting dom0 memory on boot? What about the number of CPUs dom0 can use? e.g. kernel /boot/amd64/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g dom0_max_vcpus=2 dom0_vcpus_pin=true> When I ran all zones, everything was fine and fast, however vendor> requires RHEL, and I refuse to give up ZFS, so I had to fire up xVM > just so I could run MySQL inside an x86 container called RHEL5.2> > Anyway, these domainUs boot, run, work pretty well (slower than> zones by about 17% btw), and generally work fine.> > Except that they crash pretty regularly anywhere inbetween 6 and 10 days.What version of Opensolaris are you using? Are you using stock Xen bits (that come with opensolaris)? The bug looks familiar, but I''ll have to do some searching... MRJ> I''ve been searching forums, etc. Not sure what to do. here''s a log entry: > Jul 1 15:15:08 ecw-mysql1 unix: [ID 836849 kern.notice] > Jul 1 15:15:08 ecw-mysql1 ^Mpanic[cpu0]/thread=ffffff005b7e1c80: > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff005b7e1120 addr=fffffe0a3e18ec20 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] > Jul 1 15:15:08 ecw-mysql1 unix: [ID 839527 kern.notice] sched: > Jul 1 15:15:08 ecw-mysql1 unix: [ID 753105 kern.notice] #pf Page fault > Jul 1 15:15:08 ecw-mysql1 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xfffffe0a3e18ec20 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffffb8a0663, sp=0xffffff005b7e1218, eflags=0x10246 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2660<vmxe,xmme,fxsr,mce,pae> > Jul 1 15:15:08 ecw-mysql1 unix: [ID 624947 kern.notice] cr2: fffffe0a3e18ec20 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rdi: fffffe0a3e18ec20 rsi: 0 rdx: e0508673 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rcx: 3 r8: 0 r9: ffffff0cb9384000 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rax: 0 rbx: e0508673 rbp: ffffff005b7e12b0 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] r10: 0 r11: ffffff0000002000 r12: 0 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] r13: 1 r14: fffffe0a3e18ec20 r15: e0508673 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] fsb: 0 gsb: fffffffffbc5ef70 ds: 4b > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] trp: e err: 3 rip: fffffffffb8a0663 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] cs: e030 rfl: 10246 rsp: ffffff005b7e1218 > Jul 1 15:15:08 ecw-mysql1 unix: [ID 266532 kern.notice] ss: e02b > Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1000 unix:die+10f () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1110 unix:trap+1768 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1120 unix:_cmntrap+12f () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e12b0 unix:atomic_cas_ptr+3 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1350 unix:hati_pte_map+160 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e13d0 unix:hati_load_common+15d () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1490 unix:hat_devload+15d () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e14f0 rootnex:rootnex_map_regspec+151 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e15a0 rootnex:rootnex_map+141 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e15f0 genunix:ddi_map+51 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e16e0 npe:npe_bus_map+43d () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1720 pcie_pci:pepb_bus_map+31 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1760 pcie_pci:pepb_bus_map+31 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e17b0 genunix:ddi_map+51 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1870 genunix:ddi_regs_map_setup+d5 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e18c0 genunix:pci_config_setup+69 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1900 pcie:pcie_init_bus+41 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1a30 pcie_pci:pepb_initchild+bc () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1ab0 pcie_pci:pepb_ctlops+276 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1af0 genunix:init_node+78 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1b30 genunix:i_ndi_config_node+fa () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1b60 genunix:i_ndi_init_hw_children+48 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1bc0 genunix:config_immediate_children+83 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c10 genunix:devi_config_common+a6 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c60 genunix:mt_config_thread+53 () > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c70 unix:thread_start+8 () > Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice] > Jul 1 15:15:08 ecw-mysql1 genunix: [ID 672855 kern.notice] syncing file systems... > Jul 1 15:15:09 ecw-mysql1 genunix: [ID 904073 kern.notice] done > Jul 1 15:15:10 ecw-mysql1 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > Jul 1 15:17:51 ecw-mysql1 genunix: [ID 409368 kern.notice] ^M100% done: 1588175 pages dumped, compression ratio 3.44, > Jul 1 15:17:51 ecw-mysql1 genunix: [ID 851671 kern.notice] dump succeeded > > > > Anyway, sometimes they blame xVM hypervisor for the crash, sometimes not. I''ve got twin Dell 2900s, have moved the domainUs from one machine to the other, same results. > > Name ID Mem VCPUs State Time(s) > Def 1024 1 0.0 > Domain-0 0 34154 8 r----- 3871.7 > EDB_Bs 1 1024 1 -b---- 803.8 > EDB_Faare 8 2048 2 -b---- 1099.3 > EDB_Gral 7 1024 1 -b---- 185.2 > EDB_NC 6 1024 1 -b---- 290.0 > EDB_Tg 2 1024 1 -b---- 45.0 > EDB_Way 3 1024 1 -b---- 62.3 > EDB_Wel 5 1024 1 -b---- 278.2 > EDB_Wnd 9 1024 1 -b---- 306.9 > EHX_Dbase 10 4096 1 -b---- 51.3 > Iine 4 1024 1 -b---- 76.0 > Repair 512 1 13.1 > > Anyway the crashed occur when the Time(s) for any one domainU gets up around 25000 or so. These are production databases, so they do get a lot of work. > > Anyway, it''s aggravating when the servers die like that, but zfs is there helping out, so that''s nice. > no idea if any of this makes sense, it''s late, and I''m not too concerned about it anymore, however any help would be great! > thanks, > Jack
Okay, the version of OpenSolaris is: Sun Microsystems Inc. SunOS 5.11 snv_101b November 2008 xVM is the 3.1 (whatever is stock with OSOL) I don''t have anything unusual. The hardware is as such: Dell 2900, 48G ram, Intel(r) Xeon(r) CPU E5420 @ 2.50GHz Dual 250G sata drives for the rpool, 8 450G sas drives for the datastore pool. It is really fast, does a great job with the exception of the crashing. And yes, I agree, I''d just run MySQL on Solaris if I could. It''s the vendor that''s the moron, not me ;) eClinicalWorks is so backwards they think that RHEL is a better platform for MySQL than OSOL, and they have it in their heads that x86 virtualization is somehow superior to zones. I fought this for about 6 months, but they are threatening to pull our contract, so I have to yield to the RHEL madness. Anyway, that''s why I don''t mind it crashing... it''s their fault, eCW is filled with idiocy. It''s a bit slower now on the RHEL ontop of xVM ontop of Solaris - 17% or so, but whatever. </rant> Okay, so if there is anything I can do to improve the product, I''d be glad to. Consider this a good test platform as this is real data (9 active clinics) with a real load. It''s never working the cpu hard at all, just the drives get hit pretty good. eClinicalWorks uses full joins for about 80% of their db queries... -- This message posted from opensolaris.org
> This doesn''t have anything to do with your problem... > But I''m curious, what do you have for CPUs? > Are you limiting dom0 memory on boot? What about > the number of CPUs dom0 can use? e.g.> kernel /boot/amd64/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g dom0_max_vcpus=2 dom0_vcpus_pin=trueOkay, this is a Intel(r) Xeon(r) CPU E5420 @ 2.50GHz based system, there are 8 cores (twin quadcores). No, I''m not doing anything to the boot of xVM. I used to do that on my linux versions of xen, but got out of the habit when I saw xVM managing it okay on it''s own. Plus, I didn''t read where they wanted that to happen. Guess I''ll give that a shot. But what about ZFS? doesn''t it want just a ton of memory? Seems like I should have a bunch around for that. That''s why I asked for 48G ram on these machines ;) this is my grub entry: title Solaris xVM findroot (pool_rpool,0,a) kernel$ /boot/$ISADIR/xen.gz module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS module$ /platform/i86pc/$ISADIR/boot_archive> What version of Opensolaris are you using? Are you using stock Xen bits (that > come with opensolaris)? The bug looks familiar, but I''ll have to do some searching...Sun Microsystems Inc. SunOS 5.11 snv_101b November 2008 and yes, stock xVM. Now, I''ve tried xVM on another 2900 configured identically to this one with the Sun Microsystems Inc. SunOS 5.11 snv_111b November 2008 version - have it running right now, and performance is really bad there. It''s so slow it''s unreal, so I just quit on that and went back to the one that worked pretty well. Boot times on the newer 0906 version are around 20 minutes for rhel5.2 and centos 5.3 won''t even boot after being installed. That''s why I quit there - had something that worked, why complain ;) -- This message posted from opensolaris.org
On Sat, Jul 4, 2009 at 2:05 AM, Jack<no-reply@opensolaris.org> wrote:> Okay, the version of OpenSolaris is: > Sun Microsystems Inc. SunOS 5.11 snv_101b November 2008That''s pretty old :P I suggest you upgrade to the latest bits (117), but try it on dev servers first. One of the things to watch out is that opensolaris >snv_105 uses crossbow, which might give you problems if you use vlans (as in you need some config adjustments). -- Fajar
Jack wrote:>> This doesn''t have anything to do with your problem... >> But I''m curious, what do you have for CPUs? >> Are you limiting dom0 memory on boot? What about >> the number of CPUs dom0 can use? e.g. > >> kernel /boot/amd64/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g dom0_max_vcpus=2 dom0_vcpus_pin=true > > Okay, this is a Intel(r) Xeon(r) CPU E5420 @ 2.50GHz based system, there are 8 cores (twin quadcores). > > No, I''m not doing anything to the boot of xVM. I used to do that on my linux versions> of xen, but got out of the habit when I saw xVM managing it okay on it''s own. Plus, > I didn''t read where they wanted that to happen. Guess I''ll give that a shot. But > what about ZFS? doesn''t it want just a ton of memory? Seems like I should have a > bunch around for that. That''s why I asked for 48G ram on these machines ;) >> this is my grub entry: > > title Solaris xVM > findroot (pool_rpool,0,a) > kernel$ /boot/$ISADIR/xen.gz > module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS > module$ /platform/i86pc/$ISADIR/boot_archiveyou should limit the dom0 cpus to 2-4 depending on your load (mpstat) when doing a lot of IO. Even better if you can isolate dom0 CPUs from the guests. This is easier if you have a really large system of course :-) You also should limit the dom0 memory so you don''t balloon dom0 memory down. For your setup, I would think 16g should be plenty... But you won''t know for sure until you try :-) kernel$ /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1 dom0_mem=16g dom0_max_vcpus=4 dom0_vcpus_pin=true As a safety net, you could restrict dom0 ballooning so you don''t accidentally take away its memory. svccfg -s xvm/xend setprop config/dom0-min-mem=16000 svcadm refresh xvm/xend;svcadm restart xvm/xend Since your using zfs in dom0, you should limit the size of the arc. I would start with 1/2 the memory in your dom0 if you have >= 4G. e.g. echo "set zfs:zfs_arc_max = 0x200000000" >> /etc/system> >> What version of Opensolaris are you using? Are you using stock Xen bits (that >> come with opensolaris)? The bug looks familiar, but I''ll have to do some searching... > > Sun Microsystems Inc. SunOS 5.11 snv_101b November 2008 > and yes, stock xVM. > > Now, I''ve tried xVM on another 2900 configured identically to this one with the> Sun Microsystems Inc. SunOS 5.11 snv_111b November 2008 version - > have it running right now, and performance is really bad there. It''s so slow > it''s unreal, so I just quit on that and went back to the one that worked pretty > well. Boot times on the newer 0906 version are around 20 minutes for rhel5.2 > and centos 5.3 won''t even boot after being installed. That''s why I quit > there - had something that worked, why complain ;) The centos5.3 problem is likely http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6836480 But you shouldn''t be seeing things go slow like that? Can you try the above menu.lst changes and arc limits to see if it still runs slow? You can copy the xdb driver from b118 and it should fix the centos5.3 problem your seeing. What kind of disk is the boot disk (sata)? If sata, I assume it''s not running in IDE mode? Thanks, MRJ
"You also should limit the dom0 memory so you don''t balloon dom0 memory down" So, does that cause problems to have the full allocation of ram available and then as you add VMs, it disappear? I''ve not yet implemented your changes, but will do so shortly (life and work, both maxed), and report the results back. Should be interesting! thanks, jdownes -- This message posted from opensolaris.org
Jack wrote:> "You also should limit the dom0 memory so you don''t balloon dom0 memory down" > > So, does that cause problems to have the full allocation of ram available> and then as you add VMs, it disappear? It does not interact well with zfs... MRJ> I''ve not yet implemented your> changes, but will do so shortly (life and work, both maxed), and report > the results back. Should be interesting!> > thanks, > jdownes
Okay, I''ve done this: kernel$ /boot/$ISADIR/xen.gz dom0_mem=16g dom0_max_vcpus=4 dom0_vcpus_pin=true I removed the com port stuff assuming that to be serial console - we don''t even have one here, just use ssh and KVMs. also, since I''m using 16G for main memory of domain0, I followed your instruction to use half for the arc: set zfs:zfs_arc_max = 0x800000000 is in the /etc/system... hopefully that means use 8G for the arc ;) -- This message posted from opensolaris.org
Jack wrote:> Okay, I''ve done this: > kernel$ /boot/$ISADIR/xen.gz dom0_mem=16g dom0_max_vcpus=4 dom0_vcpus_pin=true > > I removed the com port stuff assuming that to be serial console - we don''t even have one here, just use ssh and KVMs. > > also, since I''m using 16G for main memory of domain0, I followed your instruction to use half for the arc: > > set zfs:zfs_arc_max = 0x800000000 > > is in the /etc/system... hopefully that means use 8G for the arc ;)Nope :-) set zfs:zfs_arc_max = 0x200000000
hmm, I just checked the arc size with Ben Rockwoods arc_summary script and it shows: ARC Size: Current Size: 8070 MB (arcsize) Target Size (Adaptive): 14292 MB (c) Min Size (Hard Limit): 1918 MB (zfs_arc_min) Max Size (Hard Limit): 15351 MB (zfs_arc_max) I made the change to /etc/system just to see if it''d crash by letting it run like that. Guess not, didn''t change anything either though. Perhaps zfs "knew" what I meant? Anyway, after the change, I have very similar numbers.. which is probably just the movement of the arc on it''s own... never mind the change ;) ARC Size: Current Size: 8175 MB (arcsize) Target Size (Adaptive): 15351 MB (c) Min Size (Hard Limit): 1918 MB (zfs_arc_min) Max Size (Hard Limit): 15351 MB (zfs_arc_max) -- This message posted from opensolaris.org