Danilo Krummrich
2025-Apr-23 15:02 UTC
[PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote:> Hello, Danilo, > Thanks for all the feedback. Due to the volume of feedback, I will respond > incrementally in multiple emails so we can discuss as we go - hope that's Ok but > let me know if that is annoying.That's perfectly fine, whatever works best for you. :)> On 4/23/2025 10:06 AM, Danilo Krummrich wrote: > > >> +impl Vbios { > >> + /// Read bytes from the ROM at the current end of the data vector > >> + fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result { > >> + let current_len = data.len(); > >> + let start = ROM_OFFSET + current_len; > >> + > >> + // Ensure length is a multiple of 4 for 32-bit reads > >> + if len % core::mem::size_of::<u32>() != 0 { > >> + pr_err!("VBIOS read length {} is not a multiple of 4\n", len); > > > > Please don't use any of the pr_*() print macros within a driver, use the dev_*() > > ones instead. > > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev' > from the Bar0 and pass that along, but that should be doable.You can also pass the pci::Device reference to VBios::probe() directly.> > > > >> + return Err(EINVAL); > >> + } > >> + > >> + // Allocate and zero-initialize the required memory > > > > That's obvious from the code, if you feel this needs a comment, better explain > > what we need it for, why zero-initialize, etc. > > Sure, actually the extends_with() is a performance optimization as we want to do > only a single allocation and then fill in the allocated data. It has nothing to > do with 0-initializing per-se. I will adjust the comment, but: > > This code... > > >> + data.extend_with(len, 0, GFP_KERNEL)?; > >> + with_bar!(?bar0, |bar0_ref| { > >> + let dst = &mut data[current_len..current_len + len]; > >> + for (idx, chunk) in dst > >> + .chunks_exact_mut(core::mem::size_of::<u32>()) > >> + .enumerate() > >> + { > >> + let addr = start + (idx * core::mem::size_of::<u32>()); > >> + // Convert the u32 to a 4 byte array. We use the .to_ne_bytes() > >> + // method out of convenience to convert the 32-bit integer as it > >> + // is in memory into a byte array without any endianness > >> + // conversion or byte-swapping. > >> + chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes()); > >> + } > >> + Ok(()) > >> + })?; > >> + > >> + Ok(()) > >> + } > ..actually initially was: > > + with_bar!(self.bar0, |bar0| { > + // Get current length > + let current_len = self.data.len(); > + > + // Read ROM data bytes push directly to vector > + for i in 0..bytes as usize { > + // Read byte from the VBIOS ROM and push it to the data vector > + let rom_addr = ROM_OFFSET + current_len + i; > + let byte = bar0.try_readb(rom_addr)?; > + self.data.push(byte, GFP_KERNEL)?; > > Where this bit could result in a lot of allocation. > > There was an unsafe() way of not having to do this but we settled with > extends_with(). > > Thoughts?If I understand you correctly, you just want to make sure that subsequent push() calls don't re-allocate? If so, you can just use reserve() [1] and keep the subsequent push() calls. [1] https://rust.docs.kernel.org/kernel/alloc/kvec/struct.Vec.html#method.reserve
Joel Fernandes
2025-Apr-24 19:19 UTC
[PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote: [..]> > >> + data.extend_with(len, 0, GFP_KERNEL)?; > > >> + with_bar!(?bar0, |bar0_ref| { > > >> + let dst = &mut data[current_len..current_len + len]; > > >> + for (idx, chunk) in dst > > >> + .chunks_exact_mut(core::mem::size_of::<u32>()) > > >> + .enumerate() > > >> + { > > >> + let addr = start + (idx * core::mem::size_of::<u32>()); > > >> + // Convert the u32 to a 4 byte array. We use the .to_ne_bytes() > > >> + // method out of convenience to convert the 32-bit integer as it > > >> + // is in memory into a byte array without any endianness > > >> + // conversion or byte-swapping. > > >> + chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes()); > > >> + } > > >> + Ok(()) > > >> + })?; > > >> + > > >> + Ok(()) > > >> + } > > ..actually initially was: > > > > + with_bar!(self.bar0, |bar0| { > > + // Get current length > > + let current_len = self.data.len(); > > + > > + // Read ROM data bytes push directly to vector > > + for i in 0..bytes as usize { > > + // Read byte from the VBIOS ROM and push it to the data vector > > + let rom_addr = ROM_OFFSET + current_len + i; > > + let byte = bar0.try_readb(rom_addr)?; > > + self.data.push(byte, GFP_KERNEL)?; > > > > Where this bit could result in a lot of allocation. > > > > There was an unsafe() way of not having to do this but we settled with > > extends_with(). > > > > Thoughts? > > If I understand you correctly, you just want to make sure that subsequent push() > calls don't re-allocate? If so, you can just use reserve() [1] and keep the > subsequent push() calls. > > [1] https://rust.docs.kernel.org/kernel/alloc/kvec/struct.Vec.html#method.reserveOk that does turn out to be cleaner! I replaced it with the following and it works. Let me know if it looks good now? Here's a preview: - data.extend_with(len, 0, GFP_KERNEL)?; + data.reserve(len, GFP_KERNEL)?; + with_bar_res!(bar0, |bar0_ref| { - let dst = &mut data[current_len..current_len + len]; - for (idx, chunk) in dst - .chunks_exact_mut(core::mem::size_of::<u32>()) - .enumerate() - { - let addr = start + (idx * core::mem::size_of::<u32>()); - // Convert the u32 to a 4 byte array. We use the .to_ne_bytes() - // method out of convenience to convert the 32-bit integer as it - // is in memory into a byte array without any endianness - // conversion or byte-swapping. - chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes()); + // Read ROM data bytes and push directly to vector + for i in 0..len { + // Read 32-bit word from the VBIOS ROM + let rom_addr = start + i * core::mem::size_of::<u32>(); + let word = bar0_ref.try_read32(rom_addr)?; + + // Convert the u32 to a 4 byte array and push each byte + word.to_ne_bytes().iter().try_for_each(|&b| data.push(b, GFP_KERNEL))?; } Thanks.
Joel Fernandes
2025-Apr-24 19:54 UTC
[PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote:> On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote: > > Hello, Danilo, > > Thanks for all the feedback. Due to the volume of feedback, I will respond > > incrementally in multiple emails so we can discuss as we go - hope that's Ok but > > let me know if that is annoying. > > That's perfectly fine, whatever works best for you. :) > > > On 4/23/2025 10:06 AM, Danilo Krummrich wrote: > > > > >> +impl Vbios { > > >> + /// Read bytes from the ROM at the current end of the data vector > > >> + fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result { > > >> + let current_len = data.len(); > > >> + let start = ROM_OFFSET + current_len; > > >> + > > >> + // Ensure length is a multiple of 4 for 32-bit reads > > >> + if len % core::mem::size_of::<u32>() != 0 { > > >> + pr_err!("VBIOS read length {} is not a multiple of 4\n", len); > > > > > > Please don't use any of the pr_*() print macros within a driver, use the dev_*() > > > ones instead. > > > > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev' > > from the Bar0 and pass that along, but that should be doable. > > You can also pass the pci::Device reference to VBios::probe() directly.This turns out to be rather difficult to do in the whole vbios.rs because we'd have to them propogate pdev to various class methods which may print errors (some of which don't make sense to pass pdev to, like try_from()). But I can do it in probe() (or new() as we call it now). See below for preview diff doing this for many prints where possible, does this work for you? Preview diff (give or take rustfmt): ---8<----------------------- diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs index 43cf34a078ae..808e8446ac79 100644 --- a/drivers/gpu/nova-core/firmware/gsp.rs +++ b/drivers/gpu/nova-core/firmware/gsp.rs @@ -236,7 +236,7 @@ pub(crate) fn new( falcon: &Falcon<Gsp>, pdev: &Device, bar: &Devres<Bar0>, - bios: &Vbios, + bios: &Vbios<'_>, cmd: FwsecCommand, ) -> Result<Self> { let v3_desc = bios.fwsec_header()?; diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs index 0069d6ec8751..aa301e2a7111 100644 --- a/drivers/gpu/nova-core/gpu.rs +++ b/drivers/gpu/nova-core/gpu.rs @@ -259,7 +259,7 @@ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit< let fb_layout = FbLayout::new(spec.chipset, &bar, &fw.bootloader)?; pr_info!("{:#x?}\n", fb_layout); - let bios = Vbios::new(&bar)?; + let bios = Vbios::new(pdev, &bar)?; // TODO: should we write 0x0 back when we drop this object? let sysmem_flush = DmaObject::new(pdev.as_ref(), 0x1000, "sysmem flush page")?; diff --git a/drivers/gpu/nova-core/vbios.rs b/drivers/gpu/nova-core/vbios.rs index f0c43a5143d0..c5a8333f00b2 100644 --- a/drivers/gpu/nova-core/vbios.rs +++ b/drivers/gpu/nova-core/vbios.rs @@ -8,6 +8,7 @@ use kernel::devres::Devres; use kernel::error::Result; use kernel::prelude::*; +use kernel::pci; /// The offset of the VBIOS ROM in the BAR0 space. const ROM_OFFSET: usize = 0x300000; @@ -24,7 +25,8 @@ const FALCON_UCODE_ENTRY_APPID_FWSEC_DBG: u8 = 0x45; const FALCON_UCODE_ENTRY_APPID_FWSEC_PROD: u8 = 0x85; -pub(crate) struct Vbios { +pub(crate) struct Vbios<'a> { + pdev: &'a pci::Device, pub fwsec_image: Option<FwSecBiosImage>, // VBIOS data vector: As BIOS images are scanned, they are added to this vector // for reference or copying into other data structures. It is the entire @@ -35,7 +37,7 @@ pub(crate) struct Vbios { data: Option<KVec<u8>>, } -impl Vbios { +impl Vbios<'_> { /// Read bytes from the ROM at the current end of the data vector fn read_more(&mut self, bar0: &Devres<Bar0>, len: usize) -> Result { let data = self.data.as_mut().ok_or(EINVAL)?; @@ -44,7 +46,7 @@ fn read_more(&mut self, bar0: &Devres<Bar0>, len: usize) -> Result { // Ensure length is a multiple of 4 for 32-bit reads if len % core::mem::size_of::<u32>() != 0 { - pr_err!("VBIOS read length {} is not a multiple of 4\n", len); + dev_err!(self.pdev.as_ref(), "VBIOS read length {} is not a multiple of 4\n", len); return Err(EINVAL); } @@ -73,7 +75,7 @@ fn read_more_at_offset( len: usize, ) -> Result { if offset > BIOS_MAX_SCAN_LEN { - pr_err!("Error: exceeded BIOS scan limit.\n"); + dev_err!(self.pdev.as_ref(), "Error: exceeded BIOS scan limit.\n"); return Err(EINVAL); } @@ -101,13 +103,14 @@ fn read_bios_image_at_offset( let data_len = self.data.as_ref().ok_or(EINVAL)?.len(); if offset + len > data_len { self.read_more_at_offset(bar0, offset, len).inspect_err(|e| { - pr_err!("Failed to read more at offset {:#x}: {:?}\n", offset, e) + dev_err!(self.pdev.as_ref(), "Failed to read more at offset {:#x}: {:?}\n", offset, e) })?; } let data = self.data.as_ref().ok_or(EINVAL)?; BiosImage::try_from(&data[offset..offset + len]).inspect_err(|e| { - pr_err!( + dev_err!( + self.pdev.as_ref(), "Failed to create BiosImage at offset {:#x}: {:?}\n", offset, e @@ -117,8 +120,9 @@ fn read_bios_image_at_offset( /// Probe for VBIOS extraction /// Once the VBIOS object is built, bar0 is not read for vbios purposes anymore. - pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> { + pub(crate) fn new(pdev: &pci::Device, bar0: &Devres<Bar0>) -> Result<Self> { let mut vbios = Self { + pdev, fwsec_image: None, data: Some(KVec::new()), }; @@ -137,7 +141,8 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> { vbios.read_bios_image_at_offset(bar0, cur_offset, BIOS_READ_AHEAD_SIZE) .and_then(|image| image.image_size_bytes()) .inspect_err(|e| { - pr_err!( + dev_err!( + vbios.pdev.as_ref(), "Failed to parse initial BIOS image headers at offset {:#x}: {:?}\n", cur_offset, e @@ -148,7 +153,8 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> { let full_image vbios.read_bios_image_at_offset(bar0, cur_offset, image_size) .inspect_err(|e| { - pr_err!( + dev_err!( + vbios.pdev.as_ref(), "Failed to parse full BIOS image at offset {:#x}: {:?}\n", cur_offset, e @@ -158,7 +164,8 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> { // Determine the image type let image_type = full_image.image_type_str(); - pr_info!( + dev_info!( + vbios.pdev.as_ref(), "Found BIOS image at offset {:#x}, size: {:#x}, type: {}\n", cur_offset, image_size, @@ -195,7 +202,7 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> { // Safety check - don't go beyond BIOS_MAX_SCAN_LEN (1MB) if cur_offset > BIOS_MAX_SCAN_LEN { - pr_err!("Error: exceeded BIOS scan limit, stopping scan\n"); + dev_err!(vbios.pdev.as_ref(), "Error: exceeded BIOS scan limit, stopping scan\n"); break; } } // end of loop @@ -212,9 +219,9 @@ pub(crate) fn new(bar0: &Devres<Bar0>) -> Result<Self> { { second .setup_falcon_data(pci_at, first) - .inspect_err(|e| pr_err!("Falcon data setup failed: {:?}\n", e))?; + .inspect_err(|e| dev_err!(vbios.pdev.as_ref(), "Falcon data setup failed: {:?}\n", e))?; } else { - pr_err!("Missing required images for falcon data setup, skipping\n"); + dev_err!(vbios.pdev.as_ref(), "Missing required images for falcon data setup, skipping\n"); } second // Return the potentially modified second image };
Danilo Krummrich
2025-Apr-24 20:01 UTC
[PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
On Thu, Apr 24, 2025 at 03:19:00PM -0400, Joel Fernandes wrote:> On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote: > > [..] > > > > >> + data.extend_with(len, 0, GFP_KERNEL)?; > > > >> + with_bar!(?bar0, |bar0_ref| { > > > >> + let dst = &mut data[current_len..current_len + len]; > > > >> + for (idx, chunk) in dst > > > >> + .chunks_exact_mut(core::mem::size_of::<u32>()) > > > >> + .enumerate() > > > >> + { > > > >> + let addr = start + (idx * core::mem::size_of::<u32>()); > > > >> + // Convert the u32 to a 4 byte array. We use the .to_ne_bytes() > > > >> + // method out of convenience to convert the 32-bit integer as it > > > >> + // is in memory into a byte array without any endianness > > > >> + // conversion or byte-swapping. > > > >> + chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes()); > > > >> + } > > > >> + Ok(()) > > > >> + })?; > > > >> + > > > >> + Ok(()) > > > >> + } > > > ..actually initially was: > > > > > > + with_bar!(self.bar0, |bar0| { > > > + // Get current length > > > + let current_len = self.data.len(); > > > + > > > + // Read ROM data bytes push directly to vector > > > + for i in 0..bytes as usize { > > > + // Read byte from the VBIOS ROM and push it to the data vector > > > + let rom_addr = ROM_OFFSET + current_len + i; > > > + let byte = bar0.try_readb(rom_addr)?; > > > + self.data.push(byte, GFP_KERNEL)?; > > > > > > Where this bit could result in a lot of allocation. > > > > > > There was an unsafe() way of not having to do this but we settled with > > > extends_with(). > > > > > > Thoughts? > > > > If I understand you correctly, you just want to make sure that subsequent push() > > calls don't re-allocate? If so, you can just use reserve() [1] and keep the > > subsequent push() calls. > > > > [1] https://rust.docs.kernel.org/kernel/alloc/kvec/struct.Vec.html#method.reserve > > > > Ok that does turn out to be cleaner! I replaced it with the following and it works. > > Let me know if it looks good now? Here's a preview: > > - data.extend_with(len, 0, GFP_KERNEL)?; > + data.reserve(len, GFP_KERNEL)?; > + > with_bar_res!(bar0, |bar0_ref| { > - let dst = &mut data[current_len..current_len + len]; > - for (idx, chunk) in dst > - .chunks_exact_mut(core::mem::size_of::<u32>()) > - .enumerate() > - { > - let addr = start + (idx * core::mem::size_of::<u32>()); > - // Convert the u32 to a 4 byte array. We use the .to_ne_bytes() > - // method out of convenience to convert the 32-bit integer as it > - // is in memory into a byte array without any endianness > - // conversion or byte-swapping. > - chunk.copy_from_slice(&bar0_ref.try_read32(addr)?.to_ne_bytes()); > + // Read ROM data bytes and push directly to vector > + for i in 0..len { > + // Read 32-bit word from the VBIOS ROM > + let rom_addr = start + i * core::mem::size_of::<u32>(); > + let word = bar0_ref.try_read32(rom_addr)?; > + > + // Convert the u32 to a 4 byte array and push each byte > + word.to_ne_bytes().iter().try_for_each(|&b| data.push(b, GFP_KERNEL))?; > }Looks good to me, thanks!
Danilo Krummrich
2025-Apr-24 20:17 UTC
[PATCH 13/16] gpu: nova-core: Add support for VBIOS ucode extraction for boot
On Thu, Apr 24, 2025 at 03:54:48PM -0400, Joel Fernandes wrote:> On Wed, Apr 23, 2025 at 05:02:58PM +0200, Danilo Krummrich wrote: > > On Wed, Apr 23, 2025 at 10:52:42AM -0400, Joel Fernandes wrote: > > > Hello, Danilo, > > > Thanks for all the feedback. Due to the volume of feedback, I will respond > > > incrementally in multiple emails so we can discuss as we go - hope that's Ok but > > > let me know if that is annoying. > > > > That's perfectly fine, whatever works best for you. :) > > > > > On 4/23/2025 10:06 AM, Danilo Krummrich wrote: > > > > > > >> +impl Vbios { > > > >> + /// Read bytes from the ROM at the current end of the data vector > > > >> + fn read_more(bar0: &Devres<Bar0>, data: &mut KVec<u8>, len: usize) -> Result { > > > >> + let current_len = data.len(); > > > >> + let start = ROM_OFFSET + current_len; > > > >> + > > > >> + // Ensure length is a multiple of 4 for 32-bit reads > > > >> + if len % core::mem::size_of::<u32>() != 0 { > > > >> + pr_err!("VBIOS read length {} is not a multiple of 4\n", len); > > > > > > > > Please don't use any of the pr_*() print macros within a driver, use the dev_*() > > > > ones instead. > > > > > > Ok I'll switch to this. One slight complication is I've to retrieve the 'dev' > > > from the Bar0 and pass that along, but that should be doable. > > > > You can also pass the pci::Device reference to VBios::probe() directly. > > > This turns out to be rather difficult to do in the whole vbios.rs because > we'd have to them propogate pdev to various class methods which may print > errorsNote that you can always create an ARef<pci::Device> instance from a &pci::Device, which you can store temporarily if needed. Though I don't think it's needed here.> (some of which don't make sense to pass pdev to, like try_from()).Yeah, it's indeed difficult with a TryFrom or From impl. I guess you're referring to things like impl TryFrom<&[u8]> for PcirStruct and I actually think that's a bit of an abuse of the TryFrom trait. A &[u8] isn't really something that is "natural" to convert to a PcirStruct. Instead you should just move this code into a normal constructor, i.e. PcirStruct::new(). Here you can then also pass a device reference to print errors. We should really stick to dev_*() print macros from within driver code.