I ran into an odd problem today. I wanted to share it here in the hopes of maybe saving someone else some lost time. When you run libvirtd as an unprivileged user (e.g., if you target qemu:///session from a non-root account), then libvirt will open a unix domain socket in one of two places: - If XDG_RUNTIME_DIR is defined, then inside $XDG_RUNTIME_DIR/libvirt/libvirt-sock - If XDG_RUNTIME_DIR is *not* defined, then inside $HOME/.cache/libvirt/libvirt-sock With a CentOS 7 system, at least, if you ssh directly into an account, XDG_RUNTIME_DIR is set. But! If you `su -` to the account from root, e.g: # su - stack Then XDG_RUNTIME_DIR is *not* set. The problem is a little subtle, because most operations you will perform will work just fine in both cases: you can query for defined but not active guests, storagep pools, volumes, and so forth without a problem and you'll get the same answer. The problem crops up when you start a guest, which results in a persistent libvirtd process. Now, depending on how you got to your account, you will either (a) talk to the persistent process, and you'll be able to see the running guests, or (b) you'll end up spawning a new ephemeral libvirtd process listening in the *other* location, and you won't see anything, and you will wonder why there is a qemu process running for your guest but it's not showing up in "virsh list" and what the heck is going on here. I don't know if there's a good solution to this, but the failure mode is really non-obvious. Cheers, -- Lars Kellogg-Stedman <lars@redhat.com> | larsks @ {freenode,twitter,github} Cloud Engineering / OpenStack | http://blog.oddbit.com/
On Wed, Mar 09, 2016 at 01:01:40PM -0500, Lars Kellogg-Stedman wrote:> I ran into an odd problem today. I wanted to share it here in the > hopes of maybe saving someone else some lost time. > > When you run libvirtd as an unprivileged user (e.g., if you target > qemu:///session from a non-root account), then libvirt will open a > unix domain socket in one of two places: > > - If XDG_RUNTIME_DIR is defined, then inside > $XDG_RUNTIME_DIR/libvirt/libvirt-sock > > - If XDG_RUNTIME_DIR is *not* defined, then inside > $HOME/.cache/libvirt/libvirt-sock > > With a CentOS 7 system, at least, if you ssh directly into an > account, XDG_RUNTIME_DIR is set. But! If you `su -` to the account > from root, e.g: > > # su - stack > > Then XDG_RUNTIME_DIR is *not* set.I see, didn't realize this. Refere below a quick test, based on what you mentioned> The problem is a little subtle, because most operations you will > perform will work just fine in both cases: you can query for defined > but not active guests, storagep pools, volumes, and so forth without a > problem and you'll get the same answer.Let's put this to test. I'm on a root shell: $ whoami root `su -` into a user: $ su - kashyapc $ echo $XDG_RUNTIME_DIR Try to enumerate instances, fails on the first attempt: $ virsh list error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory The above step seems to have prompted libvirt to create the socketL $ file ~/.cache/libvirt/libvirt-sock /home/kashyapc/.cache/libvirt/libvirt-sock: socket So, the second attempt to enumerate instances work just fine, since the socket is created. * * *> The problem crops up when you start a guest, which results in a > persistent libvirtd process. Now, depending on how you got to your > account, you will either (a) talk to the persistent process, and > you'll be able to see the running guests, or (b) you'll end up > spawning a new ephemeral libvirtd process listening in the *other* > location, and you won't see anything, and you will wonder why there is > a qemu process running for your guest but it's not showing up in > "virsh list" and what the heck is going on here.Test-2 ------ If I don't `su -` to get to my account at first, but spawn a new shell (Ctl + Shift + t), the XDG_RUNTIME_DIR variable is set: $ echo $XDG_RUNTIME_DIR /run/user/1000 $ virsh list --all Id Name State ---------------------------------------------------- - vm1 shut off $ virsh start vm1 Domain vm1 started And the socket is created under /run/user/1000/libvirt: $ ls /run/user/1000/libvirt/ hostdevmgr libvirtd.pid libvirt-sock network qemu storage $ ls ~/.cache/libvirt/ hostdevmgr libvirt network qemu storage virsh Then, continuing with the above same shell, do: $ sudo -i $ su - kashyapc Now, again try to enumerate the instance we started a few steps above, as it was looking for the socket in the other location). $ virsh list error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/home/kashyapc/.cache/libvirt/libvirt-sock': No such file or directory But, the socket is created, anyway, and try to enumerate the instance: $ virsh list $ virsh list --all Id Name State ---------------------------------------------------- - vm1 shut off As you've observed, while the QEMU process for the above VM is still intact, from libvirt's perspective, the instance is (a) not enumerated with `virsh list`; (b) and when the '--all' flag is supplied to `virsh list`, the VM is listed as "shut off", which can cause more confusion. (Finally: on the above session, if I logout of the `su -`'ed user session & the root session, then I'm back to the 'pristine shell state' where lIbvirt behaves 'properly'.)> I don't know if there's a good solution to this, but the failure mode > is really non-obvious.This seems worth filing a bug for. -- /kashyap
On Wed, Mar 09, 2016 at 01:01:40PM -0500, Lars Kellogg-Stedman wrote:> I ran into an odd problem today. I wanted to share it here in the > hopes of maybe saving someone else some lost time. > > When you run libvirtd as an unprivileged user (e.g., if you target > qemu:///session from a non-root account), then libvirt will open a > unix domain socket in one of two places: > > - If XDG_RUNTIME_DIR is defined, then inside > $XDG_RUNTIME_DIR/libvirt/libvirt-sock > > - If XDG_RUNTIME_DIR is *not* defined, then inside > $HOME/.cache/libvirt/libvirt-sock > > With a CentOS 7 system, at least, if you ssh directly into an > account, XDG_RUNTIME_DIR is set. But! If you `su -` to the account > from root, e.g: > > # su - stack > > Then XDG_RUNTIME_DIR is *not* set. The problem is a little subtle, > because most operations you will perform will work just fine in both > cases: you can query for defined but not active guests, storagep > pools, volumes, and so forth without a problem and you'll get the same > answer.IMHO this is a bug in the pam config. We really expect to see the same environment setup no matter how you login text console vs su vs ssh vs GDM. If that's not happening, its always going to cause bad behaviour across many apps, not only libvirt. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
On Thu, Mar 10, 2016 at 12:37:31PM +0000, Daniel P. Berrange wrote:> On Wed, Mar 09, 2016 at 01:01:40PM -0500, Lars Kellogg-Stedman wrote: > > I ran into an odd problem today. I wanted to share it here in the > > hopes of maybe saving someone else some lost time. > > > > When you run libvirtd as an unprivileged user (e.g., if you target > > qemu:///session from a non-root account), then libvirt will open a > > unix domain socket in one of two places: > > > > - If XDG_RUNTIME_DIR is defined, then inside > > $XDG_RUNTIME_DIR/libvirt/libvirt-sock > > > > - If XDG_RUNTIME_DIR is *not* defined, then inside > > $HOME/.cache/libvirt/libvirt-sock > > > > With a CentOS 7 system, at least, if you ssh directly into an > > account, XDG_RUNTIME_DIR is set. But! If you `su -` to the account > > from root, e.g: > > > > # su - stack > > > > Then XDG_RUNTIME_DIR is *not* set. The problem is a little subtle, > > because most operations you will perform will work just fine in both > > cases: you can query for defined but not active guests, storagep > > pools, volumes, and so forth without a problem and you'll get the same > > answer. > > IMHO this is a bug in the pam config. We really expect to see the > same environment setup no matter how you login text console vs su > vs ssh vs GDM. If that's not happening, its always going to cause > bad behaviour across many apps, not only libvirt.Talking to Alexander Bokovoy (of FreeIPA, CCed) on IRC on this topic, he says: 'su -' does initialize environment to start a shell as a login shell. It clears out everything but TERM from the old environment and sets a new one. If your shell for $user does not set XDG_RUNTIME_DIR, then that's the issue, not PAM XDG_RUNTIME_DIR is set by pam_systemd after logind created a session for that user, but only in the case if user who authenticated is the same as the original user of the session when you do 'su - $user' as root, you'd get this [error message is manually wrapped for this email]: su[9188]: pam_systemd(su-l:session): pam-systemd initializing su[9188]: pam_systemd(su-l:session): Asking logind to create session: uid=1792600000 pid=9188 service=su-l type=tty class=user desktop= seat= vtnr=0 tty=pts/1 display= remote=no remote_user=root remote_host su[9188]: pam_systemd(su-l:session): Cannot create session: Already running in a session [NOTE: you need to add 'debug' option to pam_systemd.so, /etc/pam.d/system-auth] `su -` isn't a best tool, specifically under systemd -- it may be more efficient to use systemd tools to create sessions and activate/switch them -- /kashyap
Lars Kellogg-Stedman <lars@redhat.com> writes:> I ran into an odd problem today. I wanted to share it here in the > hopes of maybe saving someone else some lost time. > > When you run libvirtd as an unprivileged user (e.g., if you target > qemu:///session from a non-root account), then libvirt will open a > unix domain socket in one of two places: > > - If XDG_RUNTIME_DIR is defined, then inside > $XDG_RUNTIME_DIR/libvirt/libvirt-sock > > - If XDG_RUNTIME_DIR is *not* defined, then inside > $HOME/.cache/libvirt/libvirt-sock > > With a CentOS 7 system, at least, if you ssh directly into an > account, XDG_RUNTIME_DIR is set. But! If you `su -` to the account > from root, e.g: > > # su - stack > > Then XDG_RUNTIME_DIR is *not* set. The problem is a little subtle, > because most operations you will perform will work just fine in both > cases: you can query for defined but not active guests, storagep > pools, volumes, and so forth without a problem and you'll get the same > answer. > > The problem crops up when you start a guest, which results in a > persistent libvirtd process. Now, depending on how you got to your > account, you will either (a) talk to the persistent process, and > you'll be able to see the running guests, or (b) you'll end up > spawning a new ephemeral libvirtd process listening in the *other* > location, and you won't see anything, and you will wonder why there is > a qemu process running for your guest but it's not showing up in > "virsh list" and what the heck is going on here. > > I don't know if there's a good solution to this, but the failure mode > is really non-obvious. > > Cheers,I've run into the same problem, while trying to inspect the VM that users have made using qemu:///session on a shared shell server. I reached the conclusion that the ultimate problem was, as is suggested elsewhere in this thread, with PAM, and how su does not register a new session with PAM/logind/systemd. As far as I know, there is no widely used way for an administrator to get a shell as some user, registered with PAM, without first authenticating specifically as that user. (There is machinectl shell, but that is pretty new, and also rather inflexible.) I solved this problem by writing a small Kerberos plugin to allow administrator principals in our domain to authenticate as any Unix user; then to use qemu:///session, they can simply ssh in. This also allows administrators to use qemu+ssh://someuser@host/session on remote hosts.
Apparently Analagous Threads
- Making remote access to qemu://session easier?
- Re: libvirtd vs XDG_RUNTIME_DIR
- Re: P2P live migration with non-shared storage: fails to connect to remote libvirt URI qemu+ssh
- virsh not connecting to libvertd ?
- [Bug 3140] New: support a token for XDG_RUNTIME_DIR