On Sat, Mar 13, 2021 at 9:43 PM Kevin Oberman <rkoberman at gmail.com> wrote:> No improvement with stable/13-n244880-cec3990d347. It may be worse. > worse. An attempt to unpack firefox-86.0.1,2 saw disk rates in the range > of 800K to 2 MB/s range and with repeated 30 second freezes. I have no idea > what made it so much worse, but I'm forced to start wondering if it could > be a hardware issue. The disk drive was already replaced once due to a bad > bearing. Went from a WD Black to a Seagate. Since it just keeps getting > worse, I must consider that possibility. It is odd, though, that it was > suddenly worse with the updated system. > > I think I will try going back to n244765-a00bf7d9bba (March 4) and see if > it improves. If it does, I can likely eliminate bad hardware. > -- > Kevin Oberman, Part time kid herder and retired Network Engineer > E-mail: rkoberman at gmail.com > PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 > > > On Sat, Mar 13, 2021 at 6:00 PM Warner Losh <imp at bsdimp.com> wrote: > >> >> >> On Sat, Mar 13, 2021 at 6:37 PM Kevin Oberman <rkoberman at gmail.com> >> wrote: >> >>> I have been dealing with this for a long time since head back in >>> September through 13-stable of Mar-4. I have seen no improvement over this >>> time. It seems (my perception without supporting data) that it got worse in >>> the timeframe of BETA-3 tag. I was running stable, so not quite BETA-3. It >>> also does not help that I have also been bitten by the P-State related >>> freeze issue which has some similarities. disabling p-states has almost >>> eliminated this issue, though, with only three occurrences since I disabled >>> them in late January. >>> >>> As a result, I don't think it is a recent change, but a problem that >>> has existed for at least 3 months. This was made worse by two hardware >>> issues that kept the system unavailable for most of the time between buying >>> it last spring and getting the keyboard replaced in January. (Both the >>> mainboard and the disk drive had already been replaced.) There was another >>> slow I/O issue that I had assumed was the same as mine, but was reportedly >>> fixed with BETA-4. A few are still seeing slow I/O, so I assume that there >>> were different issues with I/O. Since CometLake systems seem pretty >>> uncommon, it might be related to that. >>> >> >> It was a change from last fall, or set of changes. RC1 or defintely RC2 >> has fixes to regain performance lost. If BETA4 was the last one you >> evaluated, perhaps you could do a couple tests with RC2 now that it's out >> to see if it is the same thing? >> >> Warner >> >> >>> Kevin Oberman, Part time kid herder and retired Network Engineer >>> E-mail: rkoberman at gmail.com >>> PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 >>> >>> >>> On Sat, Mar 13, 2021 at 4:36 PM Warner Losh <imp at bsdimp.com> wrote: >>> >>>> >>>> >>>> On Sat, Mar 13, 2021 at 5:33 PM Kevin Oberman <rkoberman at gmail.com> >>>> wrote: >>>> >>>>> Just spent a little time looking at my issue and have a few more notes: >>>>> >>>> >>>> What version did you evaluate? There's a number of changes lately that >>>> could have a big impact on this... >>>> >>>> Warner >>>> >>>> >>>>> Seems to only occur on large r/w operations from/to the same disk. "sp >>>>> big-file /other/file/on/same/disk" or tar/untar operations on large >>>>> files. >>>>> Hit this today updating firefox. >>>>> >>>>> I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other >>>>> things while it was running slowly, the disk would appear to lock up. >>>>> E.g. >>>>> pwd(1) seemed to completely lock up the system, but I could still ping >>>>> it >>>>> and, after about 30 seconds, things came back to life. It was also not >>>>> instantaneous. Disc activity dropped to <1MB/s for a few seconds before >>>>> everything froze. >>>>> >>>>> During the untar of firefox, I saw; this several times. I also looked >>>>> at my >>>>> console where I found these errors during : >>>>> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192 >>>>> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096 >>>>> >>>>> I should note that some operations continue just fine while this is >>>>> going >>>>> on until I do something that freezes the system. I assume that this >>>>> eliminates the disk drive and low-level driver. Is vfs a possible >>>>> issue. It >>>>> had some serious work in the past few months by markj. That does not >>>>> explain why more people are not seeing this. >>>>> >>>>> I have been seeing this since at least September 2020, so it goes back >>>>> a >>>>> way. As this CometLake system will not run graphics on 12, I can't >>>>> confirm >>>>> operation before 13. >>>>> -- >>>>> Kevin Oberman, Part time kid herder and retired Network Engineer >>>>> E-mail: rkoberman at gmail.com >>>>> PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 >>>>> >>>>> >>>>> On Fri, Mar 5, 2021 at 10:47 PM Mark Millard via freebsd-stable < >>>>> freebsd-stable at freebsd.org> wrote: >>>>> >>>>> > >>>>> > Konstantin Belousov kostikbel at gmail.com wrote on >>>>> > Fri Mar 5 23:12:13 UTC 2021 : >>>>> > >>>>> > > On Sat, Mar 06, 2021 at 12:27:55AM +0200, Christos Chatzaras wrote: >>>>> > . . . >>>>> > > > Command: /usr/bin/time -l portsnap extract (these tests done >>>>> with 2 >>>>> > different idle servers but with same 4TB HDDs models) >>>>> > > > >>>>> > > > FreeBSD 12.2p4 >>>>> > > > >>>>> > > > 99.45 real 34.90 user 59.63 sys >>>>> > > > 100.00 real 34.91 user 59.97 sys >>>>> > > > 82.95 real 35.98 user 60.68 sys >>>>> > > > >>>>> > > > FreeBSD 13.0-RC1 >>>>> > > > >>>>> > > > 217.43 real 75.67 user 110.97 sys >>>>> > > > 125.50 real 63.00 user 96.47 sys >>>>> > > > 118.93 real 62.91 user 96.28 sys >>>>> > > . . . >>>>> > > In the portsnap results for 13RC1, the variance is too high to >>>>> conclude >>>>> > > anything, I think. >>>>> > >>>>> > I'll note that there are other reports of wide variance >>>>> > in transfer rates observed during an overall operation >>>>> > such as "make extract". The one I'm thinking of is: >>>>> > >>>>> > >>>>> https://lists.freebsd.org/pipermail/freebsd-stable/2021-March/093251.html >>>>> > >>>>> > which is an update to earlier reports, but based on more recent >>>>> > stable/13. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253968 >>>>> > comment 4 has some more notes about the context. The "make extract" >>>>> > for firefox likely is not as complicated as the portsnap extract >>>>> > example's execution structure. >>>>> > >>>>> > Might be something to keep an eye on if there are on-going >>>>> > examples of over time. >>>>> > >>>>> > ==>>>>> > Mark Millard >>>>> > marklmi at yahoo.com >>>>> > ( dsl-only.net went >>>>> > away in early 2018-Mar) >>>>> > >>>>> >>>> Backing off to Mar. 4 was not an improvement. My untar did seem betterfor a couple of minutes, but then the display froze again for 30 seconds and disk performance dropped to <1M. then things got really bad and behaved in a manner that was baffling to me. The screen froze again, but stayed frozen after half a minute. I clicked on a couple of buttons in Firefox to no effect and then hit ctrl-q to quit. After the long pause, I pressed the power button to try to force a shutdown. Suddenly, it started unwinding everything I had done during the freeze. My browser did the updates from my mouse clicks including quitting. It then switched to a different workspace from ctrl-alt-right and did a clean shutdown. ???? Do I also have a graphics issue? Examining log files show no indication that anything was happening. SMART shows no errors and reasonable values for everything. No indication of a HW problem. The system performs well unless I do something that tries a bulk disk data move. Building world takes about 75 minutes. I just have a very hard time building big ports. -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkoberman at gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
On 2021-Mar-14, at 11:09, Kevin Oberman <rkoberman at gmail.com> wrote:> . . . > > Seems to only occur on large r/w operations from/to the same disk. "sp > big-file /other/file/on/same/disk" or tar/untar operations on large files. > Hit this today updating firefox. > > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other > things while it was running slowly, the disk would appear to lock up. E.g. > pwd(1) seemed to completely lock up the system, but I could still ping it > and, after about 30 seconds, things came back to life. It was also not > instantaneous. Disc activity dropped to <1MB/s for a few seconds before > everything froze. > > During the untar of firefox, I saw; this several times. I also looked at my > console where I found these errors during : > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192 > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096Does anyone know: Are those messages normal "reading is taking a rather long time" notices or is their presence more useful information in some way about the type of problem or context for the problem? As for the tests: Are these messages always present when near a time frame when the problem occurs? Never present in a near time frame to a period when the problem does not occur? It appears that the messages are associated with reading the disk(s), not directly with writing them, where the reads take more than "hz * 20" time units to complete. (I'm looking at main (14) code.) What might contribute to the time taken for the pending read(s)? /* * swap_pager_getpages() - bring pages in from swap * * Attempt to page in the pages in array "ma" of length "count". The * caller may optionally specify that additional pages preceding and * succeeding the specified range be paged in. The number of such pages * is returned in the "rbehind" and "rahead" parameters, and they will * be in the inactive queue upon return. * * The pages in "ma" must be busied and will remain busied upon return. */ static int swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count, int *rbehind, int *rahead) { . . . /* * Wait for the pages we want to complete. VPO_SWAPINPROG is always * cleared on completion. If an I/O error occurs, SWAPBLK_NONE * is set in the metadata for each page in the request. */ VM_OBJECT_WLOCK(object); /* This could be implemented more efficiently with aflags */ while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) { ma[0]->oflags |= VPO_SWAPSLEEP; VM_CNT_INC(v_intrans); if (VM_OBJECT_SLEEP(object, &object->handle, PSWP, "swread", hz * 20)) { printf( "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); } } VM_OBJECT_WUNLOCK(object); . . . where: #define VM_OBJECT_SLEEP(object, wchan, pri, wmesg, timo) \ rw_sleep((wchan), &(object)->lock, (pri), (wmesg), (timo)) and: #define rw_sleep(chan, rw, pri, wmesg, timo) \ _sleep((chan), &(rw)->lock_object, (pri), (wmesg), \ tick_sbt * (timo), 0, C_HARDCLOCK) (I do not claim to be able to interpret the implications of the code that leads to the messages. But seeing some of the code might prompt a thought by someone that knows the code's context and operation.)> . . . > Backing off to Mar. 4 was not an improvement. My untar did seem better for a couple of minutes, but then the display froze again for 30 seconds and disk performance dropped to <1M.You were able to see the disk performance drop while the display was frozen? It might not be the best for monitoring but I'll ask this in terms of top output: Does Inact, Laundry, Wired, Free, or other such show anything fairly unique for around the problematical time frame(s)?> then things got really bad and behaved in a manner that was baffling to me. The screen froze again, but stayed frozen after half a minute. I clicked on a couple of buttons in Firefox to no effect and then hit ctrl-q to quit. After the long pause, I pressed the power button to try to force a shutdown. Suddenly, it started unwinding everything I had done during the freeze. My browser did the updates from my mouse clicks including quitting. It then switched to a different workspace from ctrl-alt-right and did a clean shutdown. ???? > > Do I also have a graphics issue? Examining log files show no indication that anything was happening. SMART shows no errors and reasonable values for everything. No indication of a HW problem. The system performs well unless I do something that tries a bulk disk data move. Building world takes about 75 minutes. I just have a very hard time building big ports.Almost like things were stuck-sleeping and then the sleep(s) finished? ==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)