Guido Falsi
2017-Jun-22 17:06 UTC
vnode_pager_generic_getpages_done: I/O read error 5 caused by r318394 (was Re: FreeBSD 11.1-BETA1 Now Available)
On 06/22/17 18:38, Warner Losh wrote:> > > On Thu, Jun 22, 2017 at 2:26 AM, Guido Falsi <madpilot at freebsd.org > <mailto:madpilot at freebsd.org>> wrote: > > On 06/21/17 16:59, Guido Falsi wrote: > > On 06/13/17 13:44, Peter Blok wrote: > >> Hi, > >> > >> For a while now, I?m not able to build a RPI1-B image from -stable. I have narrowed it dow to fix 318394, which adds a refresh option to geom_label. If I undo this fix in today?s stable it works ok. If I don?t I?m getting continuously: > >> > >> vm_fault: pager read error, pid 1 (init) > >> vnode_pager_generic_getpages_done: I/O read error 5 > >> > >> I have looked at the fix and I can?t figure out why it breaks the code. > >> > >> And yes I have tried various other SD cards - they all have the same issue. > >> > > > > Hi, > > > > I'm seeing similar symptoms with NanoBSD images on PCEngines ALIX and > > APU2 boards, using compactflash and SD card storage respectively. The > > problem has appeared as soon as I started testing 11.1-BETA1 from the > > stable branch. > > > > Problem appears when I update the image, using a slightly modified > > version of the standard nanobsd update and updatep[12] scripts. My > > changes are not in the dd/gpart commands though, which are the same. > > gpart seems the most likely candidate though. > > > > I have just discovered this thread and I will test reverting r318394 > > soon. Thanks to Peter for narrowing it down! > > > > Maybe this is related to having the disks mounted read-only? > > > > I noticed that after the problem appears many commands, including > shutdown, start failing telling "device not configured" for all mounted > FSes. I'm even unable to "ls /dev". > > Looks like the geom refresh changes devices from below the system in a > way which triggers this reaction. > > I don't know the geom code and have been unable to find an immediate > problem in the commit mentioned above. I'd really like some help to know > where to look, or what kind of debugging information is needed. > > This is quite a bad bug for people running NanoBSD and should be fixed > before the release. > > > So can I recreate this with the embedded-type NanoBSD image? If this > change breaks NanoBSD, it will need to be reverted... >You should be able to reproduce it with a nanobsd image, then updating it using the standard script which dumps the new image in the "other" partition and uses gpart to configure the new partition as bootable. I'm using a slightly modified update script which also mounts the new partition in /mnt and performs some operations there. Then it dismounts the partition and launches the "gpart set -a active -i ${_to} ${NANO_DRIVE}" command (which I suspect is exactly where the actual problem is happening). I also tested reverting the change and can confirm that it makes the problem go away. I'm sure it can be triggered by other gpart operations. I'm trying to understand exactly which operations. I'll followup as soon as I have easier use case to reproduce it. I first need to revert to an image affected by the problem. Thanks for your feedback! -- Guido Falsi <madpilot at FreeBSD.org>
Guido Falsi
2017-Jun-22 20:02 UTC
vnode_pager_generic_getpages_done: I/O read error 5 caused by r318394 (was Re: FreeBSD 11.1-BETA1 Now Available)
On 06/22/17 19:06, Guido Falsi wrote:> On 06/22/17 18:38, Warner Losh wrote:> I'll followup as soon as I have easier use case to reproduce it. I first > need to revert to an image affected by the problem.I have made a few more tests. I am able to trigger this bug easily by running gpart. I'm testing on a PCEngines APU2 board with SD memory card. # gpart set -a active -i 1 mmcsd0 active set on mmcsd0s1 # fsck_ffs -n /dev/mmcsd0s1a ** /dev/mmcsd0s1a (NO WRITE) ** Last Mounted on /mnt ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames Segmentation fault # shutdown -r now /sbin/shutdown: Device not configured also, if I open another shell I can't perform many other operations which are not failing in the previous root shell: > tail /var/log/messages /usr/bin/tail: Device not configured. BTW while testing this multiple times I also had the root shell segfault while browsing history, so it should be quite easy to reproduce on your side too. running the gpart set command triggers it every time, with slightly different bu always disruptive symptoms. There is a chance it only shows with these embedded systems storage controllers though. -- Guido Falsi <madpilot at FreeBSD.org>