Hi all we are encountering severe problems on our X4240 (64GB, 16 disks) running Solaris 10 and ZFS. From time to time (5-6 times a day) ??FrontBase hangs or crashes ??VBox virtual machine do hang ??Other applications show rubber effect (white screen) while moving the windows I have been tearing my hair off where this comes from. Could be software bugs, but in all these applications from different vendors? Could be a Solaris bug or bad memory!? Rather unlikely. I just was hit by a thought. On another machine with 6GB RAM I fired up a second virtual machine (vbox). This drove the machine almost to a halt. The second vbox instance never came up. I finally saw a panel raised by the first vbox instance that there was not enough memory available (non severe vbox error) and the virtual machine was halted!! After killing the process of the second vbox I could simply press resume and the first vbox machine continued to work properly. OK, now this starts to make sense. My idea is that ZFS is blocking/allocating all of the available system memory. When an app (FrontBase, VBox,...) is started and suddenly requests larger chunks of memory from the system, the malloc calls fail because ZFS has allocated all the memory or because the system cannot release the memory quickly enough and make it available fo rthe requesting apps, so the malloc fails or times out or whatever which is not catched in the apps and makes them hang or crash or stall for minutes. Does this make any sense? Any similar experiences? What can I do about that? Thanks a lot, Andreas
Hi all,> we are encountering severe problems on our X4240 (64GB, 16 disks) > running Solaris 10 and ZFS. From time to time (5-6 times a day) > > ??FrontBase hangs or crashes > ??VBox virtual machine do hang > ??Other applications show rubber effect (white screen) while moving > the windows > > I have been tearing my hair off where this comes from. Could be > software bugs, but in all these applications from different vendors? > Could be a Solaris bug or bad memory!? Rather unlikely. I just was hit > by a thought. On another machine with 6GB RAM I fired up a second > virtual machine (vbox). This drove the machine almost to a halt. The > second vbox instance never came up. I finally saw a panel raised by > the first vbox instance that there was not enough memory available > (non severe vbox error) and the virtual machine was halted!! After > killing the process of the second vbox I could simply press resume and > the first vbox machine continued to work properly. > > OK, now this starts to make sense. My idea is that ZFS is > blocking/allocating all of the available system memory. When an app > (FrontBase, VBox,...) is started and suddenly requests larger chunks > of memory from the system, the malloc calls fail because ZFS has > allocated all the memory or because the system cannot release the > memory quickly enough and make it available fo rthe requesting apps, > so the malloc fails or times out or whatever which is not catched in > the apps and makes them hang or crash or stall for minutes. Does this > make any sense? Any similar experiences? >Followup to my owm message. On the X4240 I have set zfs:zfs_arc_max = 0x780000000 in /etc/system. Would it be a good idea to reduce that to say set zfs:zfs_arc_max = 0x280000000 ?? Hints greatly appreciated! Thanks, Andreas
On Thu, 22 Apr 2010, Andreas H?schler wrote:> we are encountering severe problems on our X4240 (64GB, 16 disks) running > Solaris 10 and ZFS. From time to time (5-6 times a day) > > ??FrontBase hangs or crashes > ??VBox virtual machine do hang > ??Other applications show rubber effect (white screen) while moving the > windows > > I have been tearing my hair off where this comes from. Could be software > bugs, but in all these applications from different vendors? Could be a > Solaris bug or bad memory!? Rather unlikely. I just was hit by a thought. OnI see that no one has responded yet. You are jumping to conclusions that zfs and its memory usage is somehow responsible for the problem you are seeing. The problem could be due to a faulty/failing disk, a poor connection with a disk, or some other hardware issue. A failing disk can easily make the system pause temporarily like that. As root you can run ''/usr/sbin/fmdump -ef'' to see all the fault events as they are reported. Be sure to execute ''/usr/sbin/fmadm faulty'' to see if a fault has already been identified on your system. Also execute ''/usr/bin/iostat -xe'' to see if there are errors reported against some of your disks, or if some are reported as being abnormally slow. You might also want to verify that your Solaris 10 is current. I notice that you did not identify what Solaris 10 you are using.> another machine with 6GB RAM I fired up a second virtual machine (vbox). This > drove the machine almost to a halt. The second vbox instance never came up. I > finally saw a panel raised by the first vbox instance that there was not > enough memory available (non severe vbox error) and the virtual machine was > halted!! After killing the process of the second vbox I could simply press > resume and the first vbox machine continued to work properly.Maybe you should read the VirtualBox documentation. There is a note about Solaris 10 and about how VirtualBox may fail if it can''t get enough contiguous memory space. Maybe I am lucky since I have run three VirtualBox instances at a time (2GB allocation each) on my system with no problem at all. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hi Bob,> The problem could be due to a faulty/failing disk, a poor connection > with a disk, or some other hardware issue. A failing disk can easily > make the system pause temporarily like that. > > As root you can run ''/usr/sbin/fmdump -ef'' to see all the fault events > as they are reported. Be sure to execute ''/usr/sbin/fmadm faulty'' to > see if a fault has already been identified on your system. Also > execute ''/usr/bin/iostat -xe'' to see if there are errors reported > against some of your disks, or if some are reported as being > abnormally slow. > > You might also want to verify that your Solaris 10 is current. I > notice that you did not identify what Solaris 10 you are using.Thanks a lot for these hints. I checked all this. On my mirror server I found a faulty DIMM with these commands. But on the main server exhibiting the described problem everything seems fine.>> another machine with 6GB RAM I fired up a second virtual machine >> (vbox). This drove the machine almost to a halt. The second vbox >> instance never came up. I finally saw a panel raised by the first >> vbox instance that there was not enough memory available (non severe >> vbox error) and the virtual machine was halted!! After killing the >> process of the second vbox I could simply press resume and the first >> vbox machine continued to work properly. > > Maybe you should read the VirtualBox documentation. There is a note > about Solaris 10 and about how VirtualBox may fail if it can''t get > enough contiguous memory space. > > Maybe I am lucky since I have run three VirtualBox instances at a time > (2GB allocation each) on my system with no problem at all.I have inserted set zfs:zfs_arc_max = 0x200000000 in /etc/system and rebooted the machine having 64GB of memory. Tomorrow will show whether this did the trick! Thanks a lot, Andreas
On Fri, 23 Apr 2010, Andreas H?schler wrote:>> >> Maybe I am lucky since I have run three VirtualBox instances at a time (2GB >> allocation each) on my system with no problem at all. > > I have inserted > > set zfs:zfs_arc_max = 0x200000000 > > in /etc/system and rebooted the machine having 64GB of memory. Tomorrow will > show whether this did the trick!This *could* help if your server runs a rather strange and intermittent program which suddenly requests a huge amount of memory, accesses all that memory, and then releases the memory. ZFS actually gives memory back to the kernel when requested, but of course it needs to determine which memory should be returned. It seems unlikely that this would cause other applications to freeze unless there is a common dependency. I do limit the size of the ARC on my system because I do run programs which request a lot of memory and then quit. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/