thr3ads.net - Ovirt devel - [Ovirt-devel] node becomes "unavailable" [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Justin Clacherty

2010-Aug-17 03:35 UTC

[Ovirt-devel] node becomes "unavailable"

After running for a while the node becomes "unavailable" in the server
UI.  All VMs running on that node also become unavailable.  The node is 
still running fine as are all the VMs, they're just no longer manageable.

I looked on the node and everything appeared to be running fine.  Looked 
on the server and ovirt-taskomatic was stopped (this seems to happen 
quite a bit).  Restarted it but that didn't help.  Restarting Matahari 
on the node sends information to the server but the node does not become 
available.  The only way I've been able to get it back is to shutdown 
all the VMs and reboot the node and management server.  Is anyone else 
seeing this happen?  What else can I look at when it happens again?

Cheers,
Justin.

Nicolas Ochem

2010-Aug-17 07:37 UTC

head link

[Ovirt-devel] node becomes "unavailable"

You can look at /var/log/ovirt-server/db-omatic.log . Probably the node 
times out because it does not answer to heartbeat anymore.

To get more detail you can run the db-omatic script in no-daemon mode 
(/usr/share/ovirt-server/db-omatic/db_omatic.rb -n)

I see that very often on fedora 13, a bit less on fedora 12.

This is because the ruby aqmp bindings get stuck when they have to 
handle too many threads.

There's no fix for this yet, but a workaround : whenever that happens, 
restart everything in the node and server with this script :

http://ovirt.pastebin.com/JjNpEDak
http://ovirt.pastebin.com/tPAPJBpB

You can put that script in a cron job.

On 08/17/2010 05:35 AM, Justin Clacherty wrote:>    After running for a while the node becomes "unavailable" in
the server
> UI.  All VMs running on that node also become unavailable.  The node is
> still running fine as are all the VMs, they're just no longer
manageable.
>
> I looked on the node and everything appeared to be running fine.  Looked
> on the server and ovirt-taskomatic was stopped (this seems to happen
> quite a bit).  Restarted it but that didn't help.  Restarting Matahari
> on the node sends information to the server but the node does not become
> available.  The only way I've been able to get it back is to shutdown
> all the VMs and reboot the node and management server.  Is anyone else
> seeing this happen?  What else can I look at when it happens again?
>
> Cheers,
> Justin.
>
> _______________________________________________
> Ovirt-devel mailing list
> Ovirt-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/ovirt-devel
>

Possibly Parallel Threads

Search for more maybe matching threads

Ovirt devel - Aug 2010 - node becomes "unavailable"

[Ovirt-devel] node becomes "unavailable"

[Ovirt-devel] node becomes "unavailable"

Possibly Parallel Threads