Hello all, Sorry to bug the list again, but is anyone running xen on recent Dell hardware equipped with the PERC 4e/[SD]i controller? I''ve been having issues with random hangs, and I''m trying to narrow down the list of possible causes. Symptom: moderate io causes my xen-running machine to lock hard. Doing something like "find /" does it everytime, but I''ve also had hangs while booting virtual domains, doing a dd, etc. It''s happened in both domain0 and in other virtual domains. This hang during find does not occur when using a stock Debian (2.4.28) kernel nor a self-compiled 2.6.10 kernel (which is my first big clue that it might be a xen-related issue). I''ve consulted with Dell on this, and they recommended a firmware upgrade, but that did nothing for my particular problem. I''m running 2.6.10 in domain0, using the debian packages provided by Adam Heath. The driver I''m trying to use is the new megaraid driver, not the legacy one, which doesn''t seem to recognize the raid controller at all. My first guess was flakey hardware, but testing with non-xen kernels makes me think otherwise. Does anyone have any ideas what I could try next? thanks, -michal urbanski ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Sorry to bug the list again, but is anyone running xen on recent Dell > hardware equipped with the PERC 4e/[SD]i controller? I''ve been having > issues with random hangs, and I''m trying to narrow down the list of > possible causes.That''s worrying. When you say ''hangs'', does the machine lock solid, or do non disk-bound processes continue OK (e.g. if you have a running ''top'' does it continue to update?). Also, does the machine eventually reboot, or just hang. If you switch to a text console and run the ''find'', do you see an Oops message come out? What model of Dell machine is this? Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Thu, Mar 24, 2005 at 09:33:45PM -0000, Ian Pratt wrote:> > Sorry to bug the list again, but is anyone running xen on recent Dell > > hardware equipped with the PERC 4e/[SD]i controller? I''ve been having > > issues with random hangs, and I''m trying to narrow down the list of > > possible causes. > > That''s worrying. > > When you say ''hangs'', does the machine lock solid, or do non disk-bound > processes continue OK (e.g. if you have a running ''top'' does it continue > to update?). Also, does the machine eventually reboot, or just hang. > > If you switch to a text console and run the ''find'', do you see an Oops > message come out? > > What model of Dell machine is this?It hangs, solid. No ssh, dead to the network, console dead, etc. No reboot... I''ve come back to a hung machine after a weekend, and it was still hung. Once it hangs, I can''t get anything on a console or anything. Nothing makes it down to the logs, either, as far as I can tell. The machine is a PowerEdge 2850, the controller is a PERC 4e/Di. -michal ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> It hangs, solid. No ssh, dead to the network, console dead, etc. No > reboot... I''ve come back to a hung machine after a weekend, and it was > still hung. > > Once it hangs, I can''t get anything on a console or anything. Nothing > makes it down to the logs, either, as far as I can tell.Please can you switch to a text console and then try an get it to hang. Hopefully something will come out. Also, please can you add ''watchdog'' to the Xen command line in grub.conf.> The machine is a PowerEdge 2850, the controller is a PERC 4e/Di.Is this the standard on-board SCSI controller? Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt wrote:>>The machine is a PowerEdge 2850, the controller is a PERC 4e/Di. > > Is this the standard on-board SCSI controller?It''s an embedded LSI Logic MegaRAID controller. I''ve heard that the stock firmware (v 513O) for this controller on the Dell PowerEdge 2800/2850 is very buggy. I suggest that Michal upgrade the controller''s firmware (to v 516A - the latest and greatest) and try again. If that still doesn''t fix it, then I suggest that Michal contact Dell''s technical support and get the machine repaired or replaced. -- Phil Brutsche phil@brutsche.us ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Thu, Mar 24, 2005 at 04:34:26PM -0600, Phil Brutsche wrote:> Ian Pratt wrote: > >>The machine is a PowerEdge 2850, the controller is a PERC 4e/Di. > > > > Is this the standard on-board SCSI controller? > > It''s an embedded LSI Logic MegaRAID controller. > > I''ve heard that the stock firmware (v 513O) for this controller on the > Dell PowerEdge 2800/2850 is very buggy. > > I suggest that Michal upgrade the controller''s firmware (to v 516A - the > latest and greatest) and try again. > > If that still doesn''t fix it, then I suggest that Michal contact Dell''s > technical support and get the machine repaired or replaced. >Hi, yes, I''ve got the latest version of the 516A firmware (got that from Dell a few days ago). I''m just running some tests now to try to get it to crash in order to answer Ian''s previous questions... -michal ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Thu, Mar 24, 2005 at 09:45:51PM -0000, Ian Pratt wrote:> > It hangs, solid. No ssh, dead to the network, console dead, etc. No > > reboot... I''ve come back to a hung machine after a weekend, and it was > > still hung. > > > > Once it hangs, I can''t get anything on a console or anything. Nothing > > makes it down to the logs, either, as far as I can tell. > > Please can you switch to a text console and then try an get it to hang. > Hopefully something will come out. > > Also, please can you add ''watchdog'' to the Xen command line in > grub.conf. > > > The machine is a PowerEdge 2850, the controller is a PERC 4e/Di. > > Is this the standard on-board SCSI controller? > > IanSorry the response took so long... anyway, it looks like it''s a xen problem. I booted the machine with a 2.6.10 non-xen kernel, and had it doing heavy io for the past three days... it stayed up. On a text console, when it hangs I get nothing. I''ve added watchdog to the relevant line in grub.conf, and that doesn''t seem to do anything (it does seem like it takes longer for it to crash, but that''s probably my imagination :) I''m waiting for it to hang while I''m tail -f''ing various things in /var/log... is kern.log the one I should be looking at? The drive controller isn''t the onboard one, it''s an optional dual-channel RAID controller. -michal ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Thu, Mar 24, 2005 at 09:45:51PM -0000, Ian Pratt wrote:> > It hangs, solid. No ssh, dead to the network, console dead, etc. No > > reboot... I''ve come back to a hung machine after a weekend, and it was > > still hung. > > > > Once it hangs, I can''t get anything on a console or anything. Nothing > > makes it down to the logs, either, as far as I can tell. > > Please can you switch to a text console and then try an get it to hang. > Hopefully something will come out. > > Also, please can you add ''watchdog'' to the Xen command line in > grub.conf. > > > The machine is a PowerEdge 2850, the controller is a PERC 4e/Di. > > Is this the standard on-board SCSI controller? > > IanSorry this reply took so long. Upon hanging, I get nothing from the console... even after having turned watchdog on. I''m trying to force a crash now while I monitor various things in /var/log... is kern.log the one I should be looking at? Over the long weekend, I ran the machine on a non-xen 2.6.10 kernel, doing a big 100gb dd and a "find /" repeatedly, two activities that normally cause the machine to fail over... and it didn''t hang. I''m not sure if this anecdote is enough to confirm that it is indeed a xen-related problem and not something else (like say hardware), but it is a datapoint nonetheless. Is there anything else I can check? -michal urbanski _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users