Hi all,
We use for preproduction purpose the next branche of oVirt. We notice that a
lot of bugs appears when the number of message in qpidd increase. It seems
that qpidd is doing the job and that most of the issue are due to Qmf::Query .
For example in db-omatic lines 265,296
When you restart db-omatic, if you have multiple node, you have mutiple
threads launch (line 266) that hang on :
qmf_host = @qmfc.objects(Qmf::Query.new(:class => "node"),
'hostname' => host_info['hostname'])
The function never return. But qpidd never stop to answer correctly to the
request done by ruby-qmf.
A workarround for us consist to :
- stopping all the libvirt-qpid on every node,
- restarting db-omatic
- starting libvirt-qpid sequentially on every node.
Doing this way work, and gave to us a concistent db for db-omatic.
What do you thing if we replace the Thread.new on line 266 by a begin ? Because
the concurrency of the requests on qpidd made by db-omatic seems the origin of
the hang.
<code snipset of db-omatic lines 265,296>
if state == Host::STATE_AVAILABLE
Thread.new do
@logger.info "#{host_info['hostname']} has moved to
available, sleeping for updates to vms."
sleep(20)
# At this point we want to set all domains that are
# unreachable to stopped. We're using a thread here to
# sleep for 10 seconds outside of the main dbomatic loop.
# If after 10 seconds with this host up there are still
# domains set to 'unreachable', then we're going to guess
# the node rebooted and so the domains should be set to
# stopped.
@logger.info "Checking for dead VMs on newly available host
#{host_info['hostname']}."
# Double check to make sure this host is still up.
begin
qmf_host = @qmfc.objects(Qmf::Query.new(:class =>
"node"), 'hostname' => host_info['hostname'])
if !qmf_host
@logger.info "Host #{host_info['hostname']} is not
up after waiting 20 seconds, skipping dead VM check."
else
db_vm = Vm.find(:all, :conditions => ["host_id = ? AND
state = ?", db_host.id, Vm::STATE_UNREACHABLE])
db_vm.each do |vm|
@logger.info "Moving vm #{vm.description} in state
#{vm.state} to state stopped."
set_vm_stopped(vm)
vm.save!
end
end
rescue Exception => e # just log any errors here
@logger.info "Exception checking for dead VMs (could be
normal): #{e.message}"
@logger.info e.backtrace
end
end
end
</code>
--
Pierre-Gilles Mialon
Responsable h?bergement :: Head of Hosting services
pmialon at linagora.com :: +33.1 58 18 65 46
Linagora :: http://www.linagora.com
27 rue de Berri :: 75008 PARIS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL:
<http://listman.redhat.com/archives/ovirt-devel/attachments/20091202/9ad48eb5/attachment.sig>