On Wed, Jun 04, 2008 at 06:07:47PM +0300, Andriy Gapon
wrote:>
> I wouldn't report this if not for one coincidence (which is described
> below). I have too little facts, so this is more of a mystery problem
> tale than a real problem report.
>
> There are two systems:
> 1. old, slow, i386, UP, 7-STABLE
> 2. new, fast, amd64, MP, 6.3-RELEASE
>
> Systems are located at different physical locations.
>
> What is common between them:
> 1. they both have the same backup strategy where dumps of certain levels
> are performed on certain days; there are monthly dumps of level 2 (on
> first day of each month), weekly dumps of level 4 (each Sunday) and
> daily dumps of levels > 5 (each day except for Sunday - but including
> the firsts).
> dumps are done on live filesystems using -L.
> dumps are initially done to the same disk and only later are transfered
> to archive media.
> 2. both kernels are compiled with softupdates support but there are no
> filesystems with it enabled
> 3. both systems have root partition gmirror-ed, it is dumped
> 4. both systems have gjournal support (on 6.X it is added via a
> "non-official" patch), there are gjournaled filesystems on both
systems
> and they are dumped.
>
> On June 1 (Sunday) exactly the same thing happened on both systems.
> At 4AM monthly level 2 dump was started and successfully performed.
> At 5AM weekly level 4 dump was started.
> Somewhere in the process of it system locked up.
> When I physically accessed the systems I found the following: keyboard
> didn't respond[*], screen froze, no pings. After reset I found that
logs
> stopped being updated at some timer shortly after 5AM.
> [*] - although on amd64 system I was able to switch exactly once between
> virtual terminals (actually from X terminal to console terminal). But
> that's all, no led responses, no special combinations (like break to
> debugger - it is compiled in / enabled).
>
> This coincidence in details (and that one successful VT switch) lead me
> to believe that this was some lock up in kernel rather than a hardware
> problem. Also, I use that backup scheme for almost a year and never had
> such a problem before. I just checked and this was the first time that
> the 1st of a month fell on Sunday, so this was the first time when level
> 2 dump was followed by level 4 dump. In previous months it was followed
> by level > 6 dumps.
>
> All in all, quite strange.
Do you use snapshots on the gjournaled fs ? I believe this is problematic.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080604/79b089d9/attachment.pgp