On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrang? wrote:> On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote: > > On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrang? <berrange at redhat.com> > > wrote: > > > > > On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: > > > > Hi, > > > > > > > > > > > > I have a question regarding capability caching in the context of > > > KubeVirt. > > > > Since we start in KubeVirt one libvirt instance per VM, libvirt has to > > > > re-discover on every VM start the qemu capabilities which leads to a > > > 1-2s+ > > > > delay in startup. > > > > > > > > We already discover the features in a dedicated KubeVirt pod on each > > > node. > > > > Therefore I tried to copy the capabilities over to see if that would > > > work. > > > > > > > > It looks like in general it could work, but libvirt seems to detect a > > > > mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates > > > > the cache. The recreated capability cache looks esctly like the original > > > > one though ... > > > > > > > > The check responsible for the invalidation is this: > > > > > > > > ``` > > > > Outdated capabilities for '%s': host cpuid changed > > > > ``` > > > > > > > > So the KVM_GET_SUPPORTED_CPUID call seems to return > > > > slightly different values in different containers. > > > > > > > > After trying out the attached golang scripts in different containers, I > > > > could indeed see differences. > > > > > > > > I can however not really judge what the differences in these KVM function > > > > registers mean and I am curious if someone else knows. The files are > > > > attached too (as json for easy diffing). > > > > > > Can you confirm whether the two attached data files were captured > > > by containers running on the same physical host, or could each > > > container have run on a different host. > > > > > > > They are coming from the same host, that is the most surprising bit for me. > > I am also very sure that this is the case, because I only had one k8s node > > from where I took these. > > The containers however differ (obviously) on namespaces and on the > > privilege level (less obvious). The handler dump is from a fully privileged > > container. > > The privilege level sounds like something that might be impactful, > so I'll investigate that. I'd be pretty surprised for namespaces > to have any impact thnough.The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on. https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346 In the 2 jSON files you provide, the differences i see should already be matched by /* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00; so those differences ought not to be causing the cache to be invalidated. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrang? <berrange at redhat.com> wrote:> On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrang? wrote: > > On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote: > > > On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrang? <berrange at redhat.com > > > > > wrote: > > > > > > > On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: > > > > > Hi, > > > > > > > > > > > > > > > I have a question regarding capability caching in the context of > > > > KubeVirt. > > > > > Since we start in KubeVirt one libvirt instance per VM, libvirt > has to > > > > > re-discover on every VM start the qemu capabilities which leads to > a > > > > 1-2s+ > > > > > delay in startup. > > > > > > > > > > We already discover the features in a dedicated KubeVirt pod on > each > > > > node. > > > > > Therefore I tried to copy the capabilities over to see if that > would > > > > work. > > > > > > > > > > It looks like in general it could work, but libvirt seems to > detect a > > > > > mismatch in the exposed KVM CPU ID in every pod. Therefore it > invalidates > > > > > the cache. The recreated capability cache looks esctly like the > original > > > > > one though ... > > > > > > > > > > The check responsible for the invalidation is this: > > > > > > > > > > ``` > > > > > Outdated capabilities for '%s': host cpuid changed > > > > > ``` > > > > > > > > > > So the KVM_GET_SUPPORTED_CPUID call seems to return > > > > > slightly different values in different containers. > > > > > > > > > > After trying out the attached golang scripts in different > containers, I > > > > > could indeed see differences. > > > > > > > > > > I can however not really judge what the differences in these KVM > function > > > > > registers mean and I am curious if someone else knows. The files > are > > > > > attached too (as json for easy diffing). > > > > > > > > Can you confirm whether the two attached data files were captured > > > > by containers running on the same physical host, or could each > > > > container have run on a different host. > > > > > > > > > > They are coming from the same host, that is the most surprising bit > for me. > > > I am also very sure that this is the case, because I only had one k8s > node > > > from where I took these. > > > The containers however differ (obviously) on namespaces and on the > > > privilege level (less obvious). The handler dump is from a fully > privileged > > > container. > > > > The privilege level sounds like something that might be impactful, > > so I'll investigate that. I'd be pretty surprised for namespaces > > to have any impact thnough. > > The privilege level is a red herring. Peter reminded me that we have > to filter out some parts of CPUID because the APIC IDs vary depending > on what host CPU the task executes on. > > > https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346 > > In the 2 jSON files you provide, the differences i see should already > be matched by > > /* filter out local apic id */ > if (entry->function == 0x01 && entry->index == 0x00) > entry->ebx &= 0x00ffffff; > if (entry->function == 0x0b) > entry->edx &= 0xffffff00; > > so those differences ought not to be causing the cache to be > invalidated. >Hm, maybe I misinterpreted the logs then. The snipped I looked at was this: ``` {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed (1661786802 vs 0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"} {"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0, b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} {"component":"virt-launcher","level":"info","msg":"Outdated capabilities for '/usr/bin/qemu-system-x86_64': host cpuid changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} {"component":"virt-launcher","level":"info","msg":"Outdated cached capabilities '/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' for '/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} ``` I had the impression from the code that the `/dev/kvm` change (because the containers are not created at the same time) does not invalidate it either. I added the whole debug log, maybe I missed something obvious. Does it make a difference if the cache is created via `virsh domcapabilities` and `virsh capabilities` or via defining the first domain? Best regards, Roman> > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :| > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20220908/af7f961f/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: debug.log.gz Type: application/gzip Size: 169826 bytes Desc: not available URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20220908/af7f961f/attachment-0001.gz>
On Thu, Sep 08, 2022 at 04:22:17PM +0200, Roman Mohr wrote:> On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrang? <berrange at redhat.com> > wrote: > > > On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrang? wrote: > > > On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote: > > > > On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrang? <berrange at redhat.com > > > > > > > wrote: > > > > > > > > > On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: > > > > > > Hi, > > > > > > > > > > > > > > > > > > I have a question regarding capability caching in the context of > > > > > KubeVirt. > > > > > > Since we start in KubeVirt one libvirt instance per VM, libvirt > > has to > > > > > > re-discover on every VM start the qemu capabilities which leads to > > a > > > > > 1-2s+ > > > > > > delay in startup. > > > > > > > > > > > > We already discover the features in a dedicated KubeVirt pod on > > each > > > > > node. > > > > > > Therefore I tried to copy the capabilities over to see if that > > would > > > > > work. > > > > > > > > > > > > It looks like in general it could work, but libvirt seems to > > detect a > > > > > > mismatch in the exposed KVM CPU ID in every pod. Therefore it > > invalidates > > > > > > the cache. The recreated capability cache looks esctly like the > > original > > > > > > one though ... > > > > > > > > > > > > The check responsible for the invalidation is this: > > > > > > > > > > > > ``` > > > > > > Outdated capabilities for '%s': host cpuid changed > > > > > > ``` > > > > > > > > > > > > So the KVM_GET_SUPPORTED_CPUID call seems to return > > > > > > slightly different values in different containers. > > > > > > > > > > > > After trying out the attached golang scripts in different > > containers, I > > > > > > could indeed see differences. > > > > > > > > > > > > I can however not really judge what the differences in these KVM > > function > > > > > > registers mean and I am curious if someone else knows. The files > > are > > > > > > attached too (as json for easy diffing). > > > > > > > > > > Can you confirm whether the two attached data files were captured > > > > > by containers running on the same physical host, or could each > > > > > container have run on a different host. > > > > > > > > > > > > > They are coming from the same host, that is the most surprising bit > > for me. > > > > I am also very sure that this is the case, because I only had one k8s > > node > > > > from where I took these. > > > > The containers however differ (obviously) on namespaces and on the > > > > privilege level (less obvious). The handler dump is from a fully > > privileged > > > > container. > > > > > > The privilege level sounds like something that might be impactful, > > > so I'll investigate that. I'd be pretty surprised for namespaces > > > to have any impact thnough. > > > > The privilege level is a red herring. Peter reminded me that we have > > to filter out some parts of CPUID because the APIC IDs vary depending > > on what host CPU the task executes on. > > > > > > https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346 > > > > In the 2 jSON files you provide, the differences i see should already > > be matched by > > > > /* filter out local apic id */ > > if (entry->function == 0x01 && entry->index == 0x00) > > entry->ebx &= 0x00ffffff; > > if (entry->function == 0x0b) > > entry->edx &= 0xffffff00; > > > > so those differences ought not to be causing the cache to be > > invalidated. > > > > Hm, maybe I misinterpreted the logs then. The snipped I looked at was this: > > > ``` > {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed > (1661786802 vs > 0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"} > {"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0, > b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} > {"component":"virt-launcher","level":"info","msg":"Outdated capabilities > for '/usr/bin/qemu-system-x86_64': host cpuid > changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} > {"component":"virt-launcher","level":"info","msg":"Outdated cached > capabilities > '/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' > for > '/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} > ```Can you capture the /var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' from the virt-handler and virt-launcher pods. It contains a <cpuid> block that will show us the differences libvirt recored, /after/, libvirt has done its filtering. This will show if there si more we need to filter.> I had the impression from the code that the `/dev/kvm` change (because the > containers are not created at the same time) does not invalidate it either. > > I added the whole debug log, maybe I missed something obvious. > > Does it make a difference if the cache is created via `virsh > domcapabilities` and `virsh capabilities` or via defining the first domain?They'll all end up at the same caching scode so should not make any difference.> > Best regards, > Roman > > > > > > With regards, > > Daniel > > -- > > |: https://berrange.com -o- > > https://www.flickr.com/photos/dberrange :| > > |: https://libvirt.org -o- > > https://fstop138.berrange.com :| > > |: https://entangle-photo.org -o- > > https://www.instagram.com/dberrange :| > > > >With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|