thr3ads.net - Xen users - [Xen-users] Frequent rpm db corruption / lock-up [Sep 2005]

If this information is useful, please help other people find it:
Share via:

Gino LV. Ledesma

2005-Sep-22 18:26 UTC

[Xen-users] Frequent rpm db corruption / lock-up

Hi, all

After deploying several xen hosts for various purposes (staging,
production, development, qa, etc), I''ve been seeing strange problems
involving rpm and yum. Two of the most frequently occuring problems
are:

1. Indefinite lock-up / "hang" when using yum or rpm -- doing an
strace shows that the process is waiting for a futex to complete, and
this will take forever (not sure if it actually completes).

2. rpmdb corruption -- rpm / yum complains about incorrect db version,
corrupted index, or whatnot.

In both cases, deleting /var/lib/rpm/__db* and recreating the db via
rpm --rebuilddb (optional) fixes it. But the problem will recur again
later. One fix I''ve found is to force it to use LD_ASSUME_KERNEL=x
where x is a really old kernel (pre-2.4.6).

Anyone else observed this problem? I''m not sure if /lib/tls is to be
blamed (I left it in place because I need db4 and a lot of ther
things). Using xen-2.0.7 and noticed this with 2.0.6 as well.

As always, thanks for the help in advance. :-)

- gino

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Ted Kaczmarek

2005-Sep-22 20:15 UTC

head link

Re: [Xen-users] Frequent rpm db corruption / lock-up

On Thu, 2005-09-22 at 11:26 -0700, Gino LV. Ledesma
wrote:> Hi, all
> 
> After deploying several xen hosts for various purposes (staging,
> production, development, qa, etc), I''ve been seeing strange
problems
> involving rpm and yum. Two of the most frequently occuring problems
> are:
> 
> 1. Indefinite lock-up / "hang" when using yum or rpm -- doing an
> strace shows that the process is waiting for a futex to complete, and
> this will take forever (not sure if it actually completes).
> 
> 2. rpmdb corruption -- rpm / yum complains about incorrect db version,
> corrupted index, or whatnot.
> 
> In both cases, deleting /var/lib/rpm/__db* and recreating the db via
> rpm --rebuilddb (optional) fixes it. But the problem will recur again
> later. One fix I''ve found is to force it to use LD_ASSUME_KERNEL=x
> where x is a really old kernel (pre-2.4.6).
> 
> Anyone else observed this problem? I''m not sure if /lib/tls is to
be
> blamed (I left it in place because I need db4 and a lot of ther
> things). Using xen-2.0.7 and noticed this with 2.0.6 as well.
> 
> As always, thanks for the help in advance. :-)
> 
> - gino
Have a Centos 4.1 that was running 2.0.7 with Centos 4.1 and FC4 domU''s
and did not see any such issues. It is running 2.0 testing right now.
Just ripped off yum updates on a Centos 4.1 and FC4 vm with no problems.

That rpm issue should be very sporatic with newer version of rpm, I used
to see it quite a bit in early nptl older rpm days, but very rare these
days. 

Rarely had to rebuild the db to fix it, generally just removing the rpm
db files resolved it.  This was RH8 through FCX.

kill -9 "pid of rpm"
rm -rf /var/lib/rpm__db*

Regards,
Ted


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Gino LV. Ledesma

2005-Sep-22 22:08 UTC

head link

Re: [Xen-users] Frequent rpm db corruption / lock-up

Thanks for the reply.

This is exactly what I have been experiencing, which is very odd. I
remember seeing these back in the 2.4->2.6 migration and I do know it
has something to do with NPTL.

Only the domUs are affected (all running CentOS 4.1 with latest
updates applied).

In a test of doing "yum search foo" 10x, the problem can show itself
in the 4th or 5th run. Worse if there are two processes trying to
access the rpm db -- e.g. someone doing rpm -qf <foo> and another
doing rpm -qa.

- gino

On 9/22/05, Ted Kaczmarek <tedkaz@optonline.net>
wrote:> On Thu, 2005-09-22 at 11:26 -0700, Gino LV. Ledesma wrote:
> > Hi, all
> >
> > After deploying several xen hosts for various purposes (staging,
> > production, development, qa, etc), I''ve been seeing strange
problems
> > involving rpm and yum. Two of the most frequently occuring problems
> > are:
> >
> > 1. Indefinite lock-up / "hang" when using yum or rpm --
doing an
> > strace shows that the process is waiting for a futex to complete, and
> > this will take forever (not sure if it actually completes).
> >
> > 2. rpmdb corruption -- rpm / yum complains about incorrect db version,
> > corrupted index, or whatnot.
> >
> > In both cases, deleting /var/lib/rpm/__db* and recreating the db via
> > rpm --rebuilddb (optional) fixes it. But the problem will recur again
> > later. One fix I''ve found is to force it to use
LD_ASSUME_KERNEL=x
> > where x is a really old kernel (pre-2.4.6).
> >
> > Anyone else observed this problem? I''m not sure if /lib/tls
is to be
> > blamed (I left it in place because I need db4 and a lot of ther
> > things). Using xen-2.0.7 and noticed this with 2.0.6 as well.
> >
> > As always, thanks for the help in advance. :-)
> >
> > - gino
>
> Have a Centos 4.1 that was running 2.0.7 with Centos 4.1 and FC4
domU''s
> and did not see any such issues. It is running 2.0 testing right now.
> Just ripped off yum updates on a Centos 4.1 and FC4 vm with no problems.
>
> That rpm issue should be very sporatic with newer version of rpm, I used
> to see it quite a bit in early nptl older rpm days, but very rare these
> days.
>
> Rarely had to rebuild the db to fix it, generally just removing the rpm
> db files resolved it.  This was RH8 through FCX.
>
> kill -9 "pid of rpm"
> rm -rf /var/lib/rpm__db*
>
> Regards,
> Ted
>
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Sep 2005 - Frequent rpm db corruption / lock-up

[Xen-users] Frequent rpm db corruption / lock-up

Re: [Xen-users] Frequent rpm db corruption / lock-up

Re: [Xen-users] Frequent rpm db corruption / lock-up