Hi, all After deploying several xen hosts for various purposes (staging, production, development, qa, etc), I''ve been seeing strange problems involving rpm and yum. Two of the most frequently occuring problems are: 1. Indefinite lock-up / "hang" when using yum or rpm -- doing an strace shows that the process is waiting for a futex to complete, and this will take forever (not sure if it actually completes). 2. rpmdb corruption -- rpm / yum complains about incorrect db version, corrupted index, or whatnot. In both cases, deleting /var/lib/rpm/__db* and recreating the db via rpm --rebuilddb (optional) fixes it. But the problem will recur again later. One fix I''ve found is to force it to use LD_ASSUME_KERNEL=x where x is a really old kernel (pre-2.4.6). Anyone else observed this problem? I''m not sure if /lib/tls is to be blamed (I left it in place because I need db4 and a lot of ther things). Using xen-2.0.7 and noticed this with 2.0.6 as well. As always, thanks for the help in advance. :-) - gino _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, 2005-09-22 at 11:26 -0700, Gino LV. Ledesma wrote:> Hi, all > > After deploying several xen hosts for various purposes (staging, > production, development, qa, etc), I''ve been seeing strange problems > involving rpm and yum. Two of the most frequently occuring problems > are: > > 1. Indefinite lock-up / "hang" when using yum or rpm -- doing an > strace shows that the process is waiting for a futex to complete, and > this will take forever (not sure if it actually completes). > > 2. rpmdb corruption -- rpm / yum complains about incorrect db version, > corrupted index, or whatnot. > > In both cases, deleting /var/lib/rpm/__db* and recreating the db via > rpm --rebuilddb (optional) fixes it. But the problem will recur again > later. One fix I''ve found is to force it to use LD_ASSUME_KERNEL=x > where x is a really old kernel (pre-2.4.6). > > Anyone else observed this problem? I''m not sure if /lib/tls is to be > blamed (I left it in place because I need db4 and a lot of ther > things). Using xen-2.0.7 and noticed this with 2.0.6 as well. > > As always, thanks for the help in advance. :-) > > - ginoHave a Centos 4.1 that was running 2.0.7 with Centos 4.1 and FC4 domU''s and did not see any such issues. It is running 2.0 testing right now. Just ripped off yum updates on a Centos 4.1 and FC4 vm with no problems. That rpm issue should be very sporatic with newer version of rpm, I used to see it quite a bit in early nptl older rpm days, but very rare these days. Rarely had to rebuild the db to fix it, generally just removing the rpm db files resolved it. This was RH8 through FCX. kill -9 "pid of rpm" rm -rf /var/lib/rpm__db* Regards, Ted _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Gino LV. Ledesma
2005-Sep-22 22:08 UTC
Re: [Xen-users] Frequent rpm db corruption / lock-up
Thanks for the reply. This is exactly what I have been experiencing, which is very odd. I remember seeing these back in the 2.4->2.6 migration and I do know it has something to do with NPTL. Only the domUs are affected (all running CentOS 4.1 with latest updates applied). In a test of doing "yum search foo" 10x, the problem can show itself in the 4th or 5th run. Worse if there are two processes trying to access the rpm db -- e.g. someone doing rpm -qf <foo> and another doing rpm -qa. - gino On 9/22/05, Ted Kaczmarek <tedkaz@optonline.net> wrote:> On Thu, 2005-09-22 at 11:26 -0700, Gino LV. Ledesma wrote: > > Hi, all > > > > After deploying several xen hosts for various purposes (staging, > > production, development, qa, etc), I''ve been seeing strange problems > > involving rpm and yum. Two of the most frequently occuring problems > > are: > > > > 1. Indefinite lock-up / "hang" when using yum or rpm -- doing an > > strace shows that the process is waiting for a futex to complete, and > > this will take forever (not sure if it actually completes). > > > > 2. rpmdb corruption -- rpm / yum complains about incorrect db version, > > corrupted index, or whatnot. > > > > In both cases, deleting /var/lib/rpm/__db* and recreating the db via > > rpm --rebuilddb (optional) fixes it. But the problem will recur again > > later. One fix I''ve found is to force it to use LD_ASSUME_KERNEL=x > > where x is a really old kernel (pre-2.4.6). > > > > Anyone else observed this problem? I''m not sure if /lib/tls is to be > > blamed (I left it in place because I need db4 and a lot of ther > > things). Using xen-2.0.7 and noticed this with 2.0.6 as well. > > > > As always, thanks for the help in advance. :-) > > > > - gino > > Have a Centos 4.1 that was running 2.0.7 with Centos 4.1 and FC4 domU''s > and did not see any such issues. It is running 2.0 testing right now. > Just ripped off yum updates on a Centos 4.1 and FC4 vm with no problems. > > That rpm issue should be very sporatic with newer version of rpm, I used > to see it quite a bit in early nptl older rpm days, but very rare these > days. > > Rarely had to rebuild the db to fix it, generally just removing the rpm > db files resolved it. This was RH8 through FCX. > > kill -9 "pid of rpm" > rm -rf /var/lib/rpm__db* > > Regards, > Ted > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users