Hi, On 2022-08-14 09:50:35 +0800, Xuan Zhuo wrote:> Sorry, I didn't get any valuable information from the logs, can you tell me how > to get such an image? Or how your [1] script is executed.Is there specific information you'd like from the VM? I just recreated the problem and can extract. The last image that succeeded getting built is publically available, so you could create a gcp VM for that, go to /usr/src/linux, git pull, make & install the new kernel and reproduce the problem that way. The git pull will take a bit because it's a shallow clone... gcloud compute instances create myvm --preemptible --project your-gcp-project --image-project pg-ci-images --image pg-ci-sid-newkernel-2022-08-12t06-52 --zone us-west1-a --custom-cpu=4 --custom-memory=4 --metadata=serial-port-enable=true If you want to log in via serial console, you'd have set a password before rebooting. gcloud compute connect-to-serial-port --zone us-west1-a --project=pg-ci-images-dev myvm Executing the script requires a gcp key with the right to create instances and images. Here's how to invoke it: PACKER_LOG=1 GOOGLE_APPLICATION_CREDENTIALS=~/image-builder at pg-ci-images-dev.iam.gserviceaccount.com.json \ packer build \ -var gcp_project=pg-ci-images-dev \ -var "image_date=$(date --utc +'%Y-%m-%dt%H-%M')" \ -var "task_name=sid-newkernel" \ -only 'linux.googlecompute.sid-newkernel' \ -on-error=ask \ packer/linux_debian.pkr.hcl Of course you'd need to change the gcp_project= variable to point to a the project you have access to and GOOGLE_APPLICATION_CREDENTIALS to point to your gcp key. Initially (package upgrades, kernel builds) the VM would be SSH accessible. After building the kernel it's only accessible via serial console. I can probably also get you the image in some other form that you prefer, although I don't know if the problem will reproduce outside gcp. If helpful I could upload a "broken" gcp image that you could use to> > [1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225Greetings, Andres Freund
Hi, On 2022-08-13 20:52:39 -0700, Andres Freund wrote:> Is there specific information you'd like from the VM? I just recreated the > problem and can extract.Actually, after reproducing I seem to now hit a likely different issue. I guess I should have checked exactly the revision I had a problem with earlier, rather than doing a git pull (up to aea23e7c464b) [ 0.727199] scsi host0: Virtio SCSI HBA [ 0.732257] scsi 0:0:1:0: Direct-Access Google PersistentDisk 1 PQ: 0 ANSI: 6 [ 0.736259] Freeing initrd memory: 7236K [ 0.741743] sd 0:0:1:0: Attached scsi generic sg0 type 0 [ 0.742569] sd 0:0:1:0: [sda] 52428800 512-byte logical blocks: (26.8 GB/25.0 GiB) [ 0.742628] tun: Universal TUN/TAP device driver, 1.6 [ 0.743730] sd 0:0:1:0: [sda] 4096-byte physical blocks [ 0.748026] sd 0:0:1:0: [sda] Write Protect is off [ 0.750684] sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 0.795519] BUG: unable to handle page fault for address: ffffa3107bd80008 [ 0.795753] sky2: driver version 1.30 [ 0.796500] #PF: supervisor read access in kernel mode [ 0.797252] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 0.796500] #PF: error_code(0x0000) - not-present page [ 0.796500] PGD 100001067 P4D 100001067 PUD 0 [ 0.796500] Oops: 0000 [#1] PREEMPT SMP PTI [ 0.796500] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.19.0-origin-14013-gaea23e7c464b #2 [ 0.798728] ehci-pci: EHCI PCI platform driver [ 0.796500] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022 [ 0.800112] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 0.796500] RIP: 0010:kmem_cache_free+0x155/0x3e0 [ 0.801875] ohci-pci: OHCI PCI platform driver [ 0.796500] Code: 02 00 00 65 48 ff 08 e8 e9 cd e6 ff 66 90 8b 45 28 48 c7 04 03 00 00 00 00 48 85 db 74 38 48 8b 45 00 65 48 03 05 fb 13 34 6d <48> 8b 50 08 4c 39 60 10 0f 85 da 01 00 00 8b 4d 28 48 8b 00 48 89 [ 0.803798] uhci_hcd: USB Universal Host Controller Interface driver [ 0.796500] RSP: 0000:ffffa29cc0134e80 EFLAGS: 00010286 [ 0.805319] RAX: ffffa3107bd80000 RBX: ffff998840b253c0 RCX: ffff029c00000000 [ 0.805319] RDX: 0000000000000000 RSI: ffffc8f280000000 RDI: ffff998840ab2300 [ 0.805319] RBP: ffff998840ab2300 R08: fffffffffff0bddf R09: 0000000000000008 [ 0.805319] R10: ffffffff93e060c0 R11: ffffa29cc0134ff8 R12: ffffc8f28402c940 [ 0.805319] R13: ffffffff92f17edd R14: 0000000000001000 R15: 0000000000001000 [ 0.805319] FS: 0000000000000000(0000) GS:ffff99887bd80000(0000) knlGS:0000000000000000 [ 0.805319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.805319] CR2: ffffa3107bd80008 CR3: 000000002720c001 CR4: 00000000003706e0 [ 0.805319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.805319] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.805319] Call Trace: [ 0.805319] <IRQ> [ 0.805319] blk_update_request+0xfd/0x3d0 [ 0.805319] ? detach_buf_split+0x6a/0x150 [ 0.805319] scsi_end_request+0x22/0x1b0 [ 0.805319] scsi_io_completion+0x3c/0x750 [ 0.805319] blk_complete_reqs+0x38/0x50 [ 0.805319] __do_softirq+0xe1/0x2ed [ 0.805319] ? handle_edge_irq+0x9a/0x230 [ 0.805319] __irq_exit_rcu+0xa6/0x100 [ 0.805319] common_interrupt+0xa5/0xc0 [ 0.805319] </IRQ> [ 0.805319] <TASK> [ 0.805319] asm_common_interrupt+0x22/0x40 [ 0.805319] RIP: 0010:acpi_idle_do_entry+0x46/0x60 [ 0.805319] Code: 75 08 48 8b 15 2f 1a 19 01 ed c3 cc cc cc cc 65 48 8b 04 25 00 ad 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 9c 0d 5b 00 fb f4 <fa> c3 cc cc cc cc e9 2f fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 [ 0.805319] RSP: 0000:ffffa29cc00a7e68 EFLAGS: 00000246 [ 0.805319] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000098d [ 0.805319] RDX: ffff99887bd80000 RSI: ffff998840b2c000 RDI: ffff998840b2c064 [ 0.805319] RBP: ffff998841a2a400 R08: fffffffffff0be0e R09: 0000000157c1aaba [ 0.805319] R10: 0000000000000018 R11: 0000000000000c27 R12: ffffffff93fc46a0 [ 0.805319] R13: ffff998840b2c064 R14: 0000000000000001 R15: 0000000000000000 [ 0.805319] acpi_idle_enter+0x9f/0x100 [ 0.805319] cpuidle_enter_state+0x84/0x400 [ 0.805319] cpuidle_enter+0x24/0x40 [ 0.805319] do_idle+0x1df/0x260 [ 0.805319] cpu_startup_entry+0x14/0x20 [ 0.805319] start_secondary+0xe8/0xf0 [ 0.805319] secondary_startup_64_no_verify+0xe0/0xeb [ 0.805319] </TASK> [ 0.805319] Modules linked in: [ 0.805319] CR2: ffffa3107bd80008 [ 0.805319] ---[ end trace 0000000000000000 ]--- Regards, Andres