Daniel P. Berrangé <berrange@redhat.com> writes:> On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: >> Hi, >> > >> I've met two situations with NVDIMM support in libvirt where I'm not >> sure all the parties (libvirt & I) do the things correctly. >> >> The first problem is with memory alignment and size changes. In >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also >> makes some NVDIMM size changes for better alignments, in >> qemuDomainMemoryDeviceAlignSize. This can lead to the size being >> rounded up, exceeding the size of the backing device and QEMU failing to >> start the VM for that reason (I've experienced that actually). I work >> with emulated NVDIMM devices, not a bare metal hardware, so one might >> argue that in practice the device sizes should already be aligned, but >> I'm not sure it must be always the case considering labels or whatever >> else the user decides to set up. And I still don't feel very >> comfortable that I have to count with two internal size adjustments >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal >> of getting the VM started and having the NVDIMM aligned properly to make >> (non-NVDIMM) memory hot plug working. Is the size alignment performed >> by libvirt, especially rounding up, completely correct for NVDIMMs? > > The comment on the function says QEMU aligns to "page size", which > is something that can vary depending not only on architecture, and > also the build config options for the kernel on that architecture. > eg aarch64 has different page size in RHEL than other distros because > of different choice of page size in kernel config. > > Libvirt rounds up to 1 MB, essentially so that the size works no matter > what architecture or build options were used. I think this is quite > compelling as I don't think mgmt apps are likely to care enough about > non-x86 architectures to pick the right rounded sizes. > > If we're enforcing this 1 MB rounding though, we really should be > documenting it clearly, so that apps can pick the right backing file > size. I think we dropped the ball on docs.I still can't see it in the documentation, would it be possible to be clear about it in the docs, please? For first, it's not very intuitive to figure out that (if I've figured out it correctly) on POWER one *must* specify the NVDIMM size S as S == aligned_size + label_size and that size is used for the QEMU device; while on x86_64 one can specify any size S and align_up(S) will be used for the QEMU device (and label size doesn't influence the value). And additional alignment may be required for having any memory hot plug working. For second, and more importantly, I'm afraid that without documenting it, future changes may break the current behavior without warning. For example, the recent changes regarding POWER alignment in 6.7.0 are for good IMHO and one can use the same size with both 6.7 and 6.6 versions, but they could still cause pre-6.7 sizes stop working. Thanks, Milan
On Thu, Sep 10, 2020 at 04:26:40PM +0200, Milan Zamazal wrote:> Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: > >> Hi, > >> > > > >> I've met two situations with NVDIMM support in libvirt where I'm not > >> sure all the parties (libvirt & I) do the things correctly. > >> > >> The first problem is with memory alignment and size changes. In > >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also > >> makes some NVDIMM size changes for better alignments, in > >> qemuDomainMemoryDeviceAlignSize. This can lead to the size being > >> rounded up, exceeding the size of the backing device and QEMU failing to > >> start the VM for that reason (I've experienced that actually). I work > >> with emulated NVDIMM devices, not a bare metal hardware, so one might > >> argue that in practice the device sizes should already be aligned, but > >> I'm not sure it must be always the case considering labels or whatever > >> else the user decides to set up. And I still don't feel very > >> comfortable that I have to count with two internal size adjustments > >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal > >> of getting the VM started and having the NVDIMM aligned properly to make > >> (non-NVDIMM) memory hot plug working. Is the size alignment performed > >> by libvirt, especially rounding up, completely correct for NVDIMMs? > > > > The comment on the function says QEMU aligns to "page size", which > > is something that can vary depending not only on architecture, and > > also the build config options for the kernel on that architecture. > > eg aarch64 has different page size in RHEL than other distros because > > of different choice of page size in kernel config. > > > > Libvirt rounds up to 1 MB, essentially so that the size works no matter > > what architecture or build options were used. I think this is quite > > compelling as I don't think mgmt apps are likely to care enough about > > non-x86 architectures to pick the right rounded sizes. > > > > If we're enforcing this 1 MB rounding though, we really should be > > documenting it clearly, so that apps can pick the right backing file > > size. I think we dropped the ball on docs. > > I still can't see it in the documentation, would it be possible to be > clear about it in the docs, please? For first, it's not very intuitive > to figure out that (if I've figured out it correctly) on POWER one > *must* specify the NVDIMM size S as > > S == aligned_size + label_size > > and that size is used for the QEMU device; while on x86_64 one can > specify any size S and > > align_up(S) > > will be used for the QEMU device (and label size doesn't influence the > value). And additional alignment may be required for having any memory > hot plug working. > > For second, and more importantly, I'm afraid that without documenting > it, future changes may break the current behavior without warning. For > example, the recent changes regarding POWER alignment in 6.7.0 are for > good IMHO and one can use the same size with both 6.7 and 6.6 versions, > but they could still cause pre-6.7 sizes stop working.I don't know what changes you are referring to here, but if they were in libvirt I'd consider that a bug - we shouldn't break a previously working configuration by increasing required alignment. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes:> On Thu, Sep 10, 2020 at 04:26:40PM +0200, Milan Zamazal wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> > >> > On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: >> >> Hi, >> >> >> > >> >> I've met two situations with NVDIMM support in libvirt where I'm not >> >> sure all the parties (libvirt & I) do the things correctly. >> >> >> >> The first problem is with memory alignment and size changes. In >> >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also >> >> makes some NVDIMM size changes for better alignments, in >> >> qemuDomainMemoryDeviceAlignSize. This can lead to the size being >> >> rounded up, exceeding the size of the backing device and QEMU failing to >> >> start the VM for that reason (I've experienced that actually). I work >> >> with emulated NVDIMM devices, not a bare metal hardware, so one might >> >> argue that in practice the device sizes should already be aligned, but >> >> I'm not sure it must be always the case considering labels or whatever >> >> else the user decides to set up. And I still don't feel very >> >> comfortable that I have to count with two internal size adjustments >> >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal >> >> of getting the VM started and having the NVDIMM aligned properly to make >> >> (non-NVDIMM) memory hot plug working. Is the size alignment performed >> >> by libvirt, especially rounding up, completely correct for NVDIMMs? >> > >> > The comment on the function says QEMU aligns to "page size", which >> > is something that can vary depending not only on architecture, and >> > also the build config options for the kernel on that architecture. >> > eg aarch64 has different page size in RHEL than other distros because >> > of different choice of page size in kernel config. >> > >> > Libvirt rounds up to 1 MB, essentially so that the size works no matter >> > what architecture or build options were used. I think this is quite >> > compelling as I don't think mgmt apps are likely to care enough about >> > non-x86 architectures to pick the right rounded sizes. >> > >> > If we're enforcing this 1 MB rounding though, we really should be >> > documenting it clearly, so that apps can pick the right backing file >> > size. I think we dropped the ball on docs. >> >> I still can't see it in the documentation, would it be possible to be >> clear about it in the docs, please? For first, it's not very intuitive >> to figure out that (if I've figured out it correctly) on POWER one >> *must* specify the NVDIMM size S as >> >> S == aligned_size + label_size >> >> and that size is used for the QEMU device; while on x86_64 one can >> specify any size S and >> >> align_up(S) >> >> will be used for the QEMU device (and label size doesn't influence the >> value). And additional alignment may be required for having any memory >> hot plug working. >> >> For second, and more importantly, I'm afraid that without documenting >> it, future changes may break the current behavior without warning. For >> example, the recent changes regarding POWER alignment in 6.7.0 are for >> good IMHO and one can use the same size with both 6.7 and 6.6 versions, >> but they could still cause pre-6.7 sizes stop working. > > I don't know what changes you are referring to here, but if they were > in libvirt I'd consider that a bug - we shouldn't break a previously > working configuration by increasing required alignment.I mean disabling the auto alignment in https://gitlab.com/libvirt/libvirt/-/commit/07de813924caf37e535855541c0c1183d9d382e2 and replacing it with validation in https://gitlab.com/libvirt/libvirt/-/commit/0ccceaa57c50e5ee528f7073fa8723afd62b88b7 That change can cause a VM fail to start but after (manually) adjusting the device size, all should work all right. Changes that would actually change sizes would be more dangerous. Regards, Milan