XenFolk, I was given the task of bringing up some useful virtual servers under OpenSolaris on a SunFire X2200, as a demonstration. After moving it to a colo, I find that it crashes on occasion for no discernible reason. And the remote console doesn''t seem to work remotely! I need help or suggestions for debugging. Details: I brought up the system in my basement, and was able to access serial console, video console, eLOM SSH, eLOM Web, and Web remote console over my home network. Attempts to create a RAID drive on it showed that it didn''t really have hardware RAID, only assistance for software RAID. After installing the OpenSolaris CD I had downloaded, I realized that it was a minimal distribution. I downloaded SXCE version 107, and installed it using ZFS RAID to mirror the two 1Tb drives together. I re-booted it as Solaris xVM dom0. I installed fully virtual copies of SXCE, Fedora Linux 10, and Ubuntu Linux 8.10 server, somewhere along the line fixing a buggy script /usr/lib/xen/scripts/vbd-check with all the quotes that it needs [vs. a limited number in the official fix that I later found]. It seemed to work fine. I disconnected my VT200, monitor, keyboard, mouse, etc. and went to a colo where it was installed with only network connections to the eLOM connection and the system network port. I verified that there was SSH access to all virtual machines and the eLOM, and left. After a few days, I found problems browsing to or SSH''ing in to the virtual systems. I believe that, the first time, neither Linux VM responded to Web or SSH, and the SXCE VM was slow to respond to SSH. When I logged into the dom0 SXCE, I found that various commands would not run, even though an ''ls'' showed them in their "bin" directories, and testing showed that I could not get to the contents of those or other files. I could still SSH to the eLOM port, and I could also browse to the SSL Web port for the eLOM, so I could re-boot the system. It came back up with all four installed OSes (one dom0 and three domU) working fine. Since then, this has happened at intervals between one day and several weeks [I was hoping that magic had struck], and when it happens I usually cannot log in at all to the systems, vs. what happened that first time. I am not constantly monitoring them, so it is likely that the first time I chanced on a degraded mode on its way to crashing. About the SXCE domU that was slow to respond - on that one, I NFS auto-mount my home directory from the dom0 to the domU. I was going to do that for the Linux domU''s as well, but (a) it wasn''t working, and (b) they wanted different contents in the user home directories. Anyway, after each re-boot the NFS auto-mount takes much longer than I think it should. This may or may not be an actual bug that needs fixing. As far as what appears on the console, I wish I could tell you. The remote console worked fine when I was on the same network, but over the Internet it looks like SSH and the HTTPS Web interface work, but when I invoke the remote console from the same machines that worked locally, the Java machine loads javaRKVM.jnlp, grinds for a while, asks about the certificates [twice], and then spits out an "IOException / Create Connection Failure!" window, and the Sun eLOM Remote Console window just sits there blankly. If it matters, my network is NATted from the world; but the documentation does not say anything about this or other potential problems. Has anyone experienced these problems before? Does anyone have a fix for them? Does anyone have suggestions as to how they may be debugged? Thanks! -- /*********************************************************************\ ** ** Joe Yao jsdy@tux.org - Joseph S. D. Yao ** \*********************************************************************/