thr3ads.net - Xen devel - [Xen-devel] file corruption!!! [Jul 2004]

If this information is useful, please help other people find it:
Share via:

James Harper

2004-Jul-16 06:06 UTC

[Xen-devel] file corruption!!!

This strikes me as quite nasty, although i''ve no idea if it''s
xen related at all... i was just doing a ''make world'' under
dom0, when the compile crashed out thus:

gcc -D__KERNEL__ -I/usr/src/xeno-unstable.bk/linux-2.4.26-xen0/include -Wall
-Wstrict-prototypes -Wno-trigraphs -O2 -fno
-strict-aliasing -fno-common -fomit-frame-pointer -pipe
-mpreferred-stack-boundary=2 -march=i686   -nostdinc -iwithprefi
x include -DKBUILD_BASENAME=svcsock  -c -o svcsock.o svcsock.c
svcsock.c:144:25: warning: null character(s) preserved in literal
svcsock.c:144:25: missing terminating " character
svcsock.c:1176:1: unterminated argument list invoking macro "dprintk"
svcsock.c: In function `svc_sock_enqueue'':
svcsock.c:144: error: `dprintk'' undeclared (first use in this function)
svcsock.c:144: error: (Each undeclared identifier is reported only once
svcsock.c:144: error: for each function it appears in.)
svcsock.c:144: error: parse error at end of input
svcsock.c:118: warning: unused variable `rqstp''
svcsock.c:66: warning: `svc_setup_socket'' declared `static''
but never defined
svcsock.c:67: warning: `svc_udp_data_ready'' declared `static''
but never defined
svcsock.c:68: warning: `svc_udp_recvfrom'' declared `static''
but never defined
svcsock.c:69: warning: `svc_udp_sendto'' declared `static'' but
never defined
svcsock.c:116: warning: `svc_sock_enqueue'' defined but not used

and sure enough svcsock.c had a big lump of corruption right in the middle of
it. My most recent reboot was afaik orderly and so the corruption
shouldn''t have come from an unclean halt. My last compile was right
before my last reboot (i think) and the svcsock.c was not corrupt then.

I copied the file back from a known good source and restarted the compile, but
it aborted pretty quickly with internal gcc errors etc.

I then tried a basic file corruption test - copying large files of known data
back and forth lots and then finally compare to the original but that started
seg faulting, and then the process hung in a ''D'' state, so
i''m rebooting now.

When it comes back up i''ll try to break it again.

My tree is about 30 hours old. I should probably completely refresh it to ensure
there is no other corruption... is there a bk command to do that? or at least to
sum all the files to check them against the originals?

thanks

James

Andy Isaacson

2004-Jul-17 15:25 UTC

head link

Re: [Xen-devel] file corruption!!!

On Fri, Jul 16, 2004 at 04:06:20PM +1000, James Harper
wrote:> This strikes me as quite nasty, although i''ve no idea if
it''s xen
> related at all... i was just doing a ''make world'' under
dom0, when the
> compile crashed out thus:
> 
> svcsock.c:144:25: warning: null character(s) preserved in literal
> svcsock.c:144:25: missing terminating " character
> svcsock.c:1176:1: unterminated argument list invoking macro
"dprintk"
[snip]> and sure enough svcsock.c had a big lump of corruption right in the
> middle of it. My most recent reboot was afaik orderly and so the
> corruption shouldn''t have come from an unclean halt. My last
compile
> was right before my last reboot (i think) and the svcsock.c was not
> corrupt then.
> 
> I copied the file back from a known good source and restarted the
> compile, but it aborted pretty quickly with internal gcc errors etc.
> 
> I then tried a basic file corruption test - copying large files of
> known data back and forth lots and then finally compare to the
> original but that started seg faulting, and then the process hung in a
> ''D'' state, so i''m rebooting now.
> 
> When it comes back up i''ll try to break it again.
Sounds like hardware failure.  Run memtest86 for an hour or so and let
us know whether it finds any errors.

-andy


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Chris Andrews

2004-Jul-17 18:03 UTC

head link

Re: [Xen-devel] file corruption!!!

On 17 Jul 2004, at 16:25, Andy Isaacson wrote:
> On Fri, Jul 16, 2004 at 04:06:20PM +1000, James Harper wrote:
>> I copied the file back from a known good source and restarted the
>> compile, but it aborted pretty quickly with internal gcc errors etc.
>>
>> I then tried a basic file corruption test - copying large files of
>> known data back and forth lots and then finally compare to the
>> original but that started seg faulting, and then the process hung in a
>> ''D'' state, so i''m rebooting now.
>>
>> When it comes back up i''ll try to break it again.
>
> Sounds like hardware failure.  Run memtest86 for an hour or so and let
> us know whether it finds any errors.
Well, I also have similar corruption in my domain0 filesystems, and 
I''ve heard of another instance, so there could be something in this.

Running -unstable as of yesterday, but with v1.9 of 
linux-2.4.26-xen-sparse/arch/xen/drivers/blkif/backend/vbd.c, as the 
v1.11 changes seem to break exporting block devices from domain0. The 
filesystems are on devicemapper LVs, but copying to a file and 
exporting the loop device didn''t help -- the xenU kernel reports
''no
device'' for /dev/sda1 when it tries to mount root. The same export 
statement worked when booting the kernel with earlier vbd.c.

I''ve just rebooted into a 2.6.6 kernel and fscked everything, and 
domain0''s /usr needed plenty of repair. I''m going to let the
machine
chug for a bit in 2.6.6 and see if any problems show up. (The machine 
is remote, so I can''t easily run memtest86.)

I also have a serial console problem - the console connection works 
fine with Linux 2.6, but appears to only work in one direction with Xen 
- I see the bios output, grub output, then Xen output, but once domain0 
is booting I can''t send characters, although ^A^A^A does switch between
domain0 and Xen and I do see further console jibber.

I''m booting with this grub stanza:

title Xen Virtualised Linux
root (hd0,0)
kernel /boot/xen.gz dom0_mem=131072 com1=9600,8n1 watchdog 
console=com1,vga
module /boot/vmlinuz-2.4.26-xen0 root=/dev/sda1 ro console=tty0 
console=ttyS0,9600 panic=30


Cheers,
Chris.



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2004-Jul-17 20:21 UTC

head link

Re: [Xen-devel] file corruption!!!

> Well, I also have similar corruption in my domain0 filesystems, and 
> I''ve heard of another instance, so there could be something in
this.
That''s obviously very worrying to hear -- we just haven''t seen
anything like this.

It would be very interesting to hear whether you get the problem
with the 2.6.7 xen linux. It might give us a clue as to whether
the problem is with the backend blk driver or within the domain
itself (the 2.6.7 implementation is completely different).
 > Running -unstable as of yesterday, but with v1.9 of 
> linux-2.4.26-xen-sparse/arch/xen/drivers/blkif/backend/vbd.c, as the 
> v1.11 changes seem to break exporting block devices from domain0. The 
> filesystems are on devicemapper LVs, but copying to a file and 
> exporting the loop device didn''t help -- the xenU kernel reports
''no
> device'' for /dev/sda1 when it tries to mount root. The same export
> statement worked when booting the kernel with earlier vbd.c.
I''ve checked in a fix. It should work fine with LVM and loop
devices again.
 > I also have a serial console problem - the console connection works 
> fine with Linux 2.6, but appears to only work in one direction with Xen 
> - I see the bios output, grub output, then Xen output, but once domain0 
> is booting I can''t send characters, although ^A^A^A does switch
between
> domain0 and Xen and I do see further console jibber.
> 
> I''m booting with this grub stanza:
> 
> title Xen Virtualised Linux
> root (hd0,0)
> kernel /boot/xen.gz dom0_mem=131072 com1=9600,8n1 watchdog 
> console=com1,vga
> module /boot/vmlinuz-2.4.26-xen0 root=/dev/sda1 ro console=tty0 
> console=ttyS0,9600 panic=30
Nothing obviously wrong here, though could you try the simpler:

module /boot/vmlinuz-2.4.26-xen0 root=/dev/sda1 ro console=ttyS0

What happens if you run a getty on /dev/ttyS0 ?

Ian




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Chris Andrews

2004-Jul-18 16:22 UTC

head link

Re: [Xen-devel] file corruption!!!

On 17 Jul 2004, at 21:21, Ian Pratt wrote:
> It would be very interesting to hear whether you get the problem
> with the 2.6.7 xen linux. It might give us a clue as to whether
> the problem is with the backend blk driver or within the domain
> itself (the 2.6.7 implementation is completely different).
I can certainly give the 2.6.7 guest another try. I did have it 
booting, but I didn''t persist with it long enough to tell if there was 
fs corruption -- there seemed to be issues loading modules, and when I 
compiled everything in, I got a gpf when racoon tried to use a PF_KEY 
socket. I''ll try and get some useful dumps for both these problems.
> I''ve checked in a fix. It should work fine with LVM and loop
> devices again.
All seems well with that fix, with domain 0 using LVs, and exporting 
them to other domains. I''ll give the machine some load and see what 
happens.
> Nothing obviously wrong here, though could you try the simpler:
>
> module /boot/vmlinuz-2.4.26-xen0 root=/dev/sda1 ro console=ttyS0
>
> What happens if you run a getty on /dev/ttyS0 ?
That module line gives the same symptoms. I do have a getty on ttyS0, 
and I see the login banner from it, but can''t log in.

Actually I do have problems with Linux 2.6.6 on the same system. Once 
the kernel initialises the serial driver, the port settings appear to 
change -- I get the symptoms of incorrect baud rate. When userspace 
starts, it seems to switch back (although I have to reset my terminal). 
The hardware is a Dell 1650, with console redirection on, but 
redirection after boot disabled.

Domain 0 runs debian testing -- would I need to disable the calls to 
setserial in the initscripts, or should they just fail safely?

Cheers,
Chris.



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2004-Jul-18 17:48 UTC

head link

Re: [Xen-devel] file corruption!!!

> 
> On 17 Jul 2004, at 21:21, Ian Pratt wrote:
> 
> > It would be very interesting to hear whether you get the problem
> > with the 2.6.7 xen linux. It might give us a clue as to whether
> > the problem is with the backend blk driver or within the domain
> > itself (the 2.6.7 implementation is completely different).
> 
> I can certainly give the 2.6.7 guest another try. I did have it 
> booting, but I didn''t persist with it long enough to tell if there
was
> fs corruption -- there seemed to be issues loading modules, and when I 
> compiled everything in, I got a gpf when racoon tried to use a PF_KEY 
> socket. I''ll try and get some useful dumps for both these
problems.
I haven''t tried loading modules, but I can''t think why it
wouldn''t work (assuming the mechanism is basically the same as
2.4).

BTW:  what''s racoon, and what''s a PF_KEY socket?
 > > I''ve checked in a fix. It should work fine with LVM and loop
> > devices again.
> 
> All seems well with that fix, with domain 0 using LVs, and exporting 
> them to other domains. I''ll give the machine some load and see
what
> happens.
Thanks for the confirmation.
> > Nothing obviously wrong here, though could you try the simpler:
> >
> > module /boot/vmlinuz-2.4.26-xen0 root=/dev/sda1 ro console=ttyS0
> >
> > What happens if you run a getty on /dev/ttyS0 ?
> 
> That module line gives the same symptoms. I do have a getty on ttyS0, 
> and I see the login banner from it, but can''t log in.
There was a bug along these lines, but it''s believed fixed. If
you''re using the latest repo, that''s a concern.
> Actually I do have problems with Linux 2.6.6 on the same system. Once 
> the kernel initialises the serial driver, the port settings appear to 
> change -- I get the symptoms of incorrect baud rate. When userspace 
> starts, it seems to switch back (although I have to reset my terminal). 
> The hardware is a Dell 1650, with console redirection on, but 
> redirection after boot disabled.
With the default configuration, xen owns the serial uart at all
times, so linux shouldn''t be able to mess with the baud rate
etc. 
> Domain 0 runs debian testing -- would I need to disable the calls to 
> setserial in the initscripts, or should they just fail safely?
I''m not sure how setserial works -- presumably it tries to do
ioctls on /dev/ttyS0 rather than trying to inb/outb the uart
directly?  The former should just be ignored by the xencons ttyS0
driver and do no harm. If the latter, it''s possible that it is
messing things up as the the default configuration is to allow
dom0 any IO privs it asks for.

Disabling them seems safest in the first instance.

Ian


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Chris Andrews

2004-Jul-18 22:21 UTC

head link

Re: [Xen-devel] file corruption!!!

On 18 Jul 2004, at 17:22, Chris Andrews wrote:>
>> I''ve checked in a fix. It should work fine with LVM and loop
>> devices again.
>
> All seems well with that fix, with domain 0 using LVs, and exporting 
> them to other domains. I''ll give the machine some load and see
what
> happens.
So, I''ve got the following results. The machine has an aacraid 
controller, which reports no errors. My test, such as it is, is to 
rebuild the ''festival'' package -- simply because
that''s what I was
doing when I first saw corruption.

2.4.26 plus the device-mapper patch and the VFS-locking patch - stable.
Xen, 2.4.26 domain0 plus device-mapper, VFS-locking, domain 0 only - 
stable.
Xen, 2.4.26 domain0 etc, and a 2.6.7-xenU guest - stable.
Xen, 2.4.26 domain0 and a 2.4.26-xenU guest - corruption in dom0, and 
an oops.

The oops below happened shortly after I started the 2.4 guest, and 
killed the machine. I''ll run memtest86 as soon as I''ve got it
built
with serial support and hooked into grub...

Chris.



Unable to handle kernel paging request at virtual address 258b0619
c0124f32
Oops: 0000
CPU:    0
EIP:    0819:[<c0124f32>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00213202
eax: 258b0601   ebx: 00000001   ecx: 258b0601   edx: 00000000
esi: c66bdaa0   edi: c1a31034   ebp: c66bdaa0   esp: c767dbd8
ds: 0821   es: 0821   ss: 0821
Process find (pid: 2877, stackpage=c767d000)<1>
Stack: c0261d6c 00000000 c66bdaa0 c75eb220 c0261da3 c66bdaa0 c66bdaa0 
c0261ed9
        c66bdaa0 00000000 00000030 952163ff c0288163 c66bdaa0 c66bdaa0 
00000000
       c66bdaa0 c1189800 00000000 c02ac920 00000001 952163cf c02766a0 
007de526
Call Trace: [<c0261d6c>] [<c0261da3>] [<c0261ed9>]
[<c0288163>]
[<c02ac920>]
    [<c02766a0>] [<c028feb2>] [<c0290353>] [<c02af7e0>]
[<c027667e>]
[<c02af7e0>]
    [<c02766a0>] [<c0276859>] [<c026e85a>] [<c02b04b7>]
[<c026e506>]
[<c02766a0>]
    [<c026e85a>] [<c02766a0>] [<c027646e>] [<c02766a0>]
[<c02667d7>]
[<c0266935>]
    [<c0266a93>] [<c010ddba>] [<c01b07ee>] [<c01b4ffd>]
[<c01aefab>]
[<c01bf403>]
    [<c01be83d>] [<c01bf23f>] [<c01c0ab7>] [<c01bef3b>]
[<c01bc5fa>]
[<c01c0920>]
    [<c012c273>] [<c01aee97>] [<c01a0833>]
  <0>Kernel panic: Aiee, killing interrupt handler!
Warning (Oops_read): Code line not seen, dumping what data is available


 >>EIP; c0124f32 <__free_pages+2/20>   <====
 >>eax; 258b0601 <__start___xen_guest+258ad3b2/c00fcdb1>
 >>ecx; 258b0601 <__start___xen_guest+258ad3b2/c00fcdb1>

Trace; c0261d6c <skb_release_data+7c/a0>
Trace; c0261da3 <kfree_skbmem+13/30>
Trace; c0261ed9 <__kfree_skb+119/190>
Trace; c0288163 <tcp_rcv_established+483/850>
Trace; c02ac920 <br_handle_frame_finish+0/170>
Trace; c02766a0 <ip_rcv_finish+0/210>
Trace; c028feb2 <tcp_v4_do_rcv+122/130>
Trace; c0290353 <tcp_v4_rcv+493/5d0>
Trace; c02af7e0 <br_nf_pre_routing_finish+0/280>
Trace; c027667e <ip_local_deliver_finish+14e/170>
Trace; c02af7e0 <br_nf_pre_routing_finish+0/280>
Trace; c02766a0 <ip_rcv_finish+0/210>
Trace; c0276859 <ip_rcv_finish+1b9/210>
Trace; c026e85a <nf_hook_slow+7a/190>
Trace; c02b04b7 <ipv4_sabotage_in+27/30>
Trace; c026e506 <nf_iterate+76/b0>
Trace; c02766a0 <ip_rcv_finish+0/210>
Trace; c026e85a <nf_hook_slow+7a/190>
Trace; c02766a0 <ip_rcv_finish+0/210>
Trace; c027646e <ip_rcv+19e/260>
Trace; c02766a0 <ip_rcv_finish+0/210>
Trace; c02667d7 <netif_receive_skb+137/210>
Trace; c0266935 <process_backlog+85/160>
Trace; c0266a93 <net_rx_action+83/160>
Trace; c010ddba <do_softirq+da/f0>
Trace; c01b07ee <do_IRQ+9e/a0>
Trace; c01b4ffd <evtchn_do_upcall+ad/110>
Trace; c01aefab <hypervisor_callback+33/49>
Trace; c01bf403 <opost_block+b3/180>
Trace; c01be83d <tty_default_put_char+2d/40>
Trace; c01bf23f <opost+9f/1b0>
Trace; c01c0ab7 <write_chan+197/220>
Trace; c01bef3b <do_tty_write+db/134>
Trace; c01bc5fa <tty_write+13a/170>
Trace; c01c0920 <write_chan+0/220>
Trace; c012c273 <sys_write+a3/140>
Trace; c01aee97 <system_call+2f/33>
Trace; c01a0833 <nlmsvc_decode_notify+23/70>


1 warning and 1 error issued.  Results may not be reliable.



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Chris Andrews

2004-Jul-18 22:43 UTC

head link

Re: [Xen-devel] file corruption!!!

On 18 Jul 2004, at 18:48, Ian Pratt wrote:
>>
>> On 17 Jul 2004, at 21:21, Ian Pratt wrote:
>>
>>> It would be very interesting to hear whether you get the problem
>>> with the 2.6.7 xen linux. It might give us a clue as to whether
>>> the problem is with the backend blk driver or within the domain
>>> itself (the 2.6.7 implementation is completely different).
>>
>> I can certainly give the 2.6.7 guest another try. I did have it
>> booting, but I didn''t persist with it long enough to tell if
there was
>> fs corruption -- there seemed to be issues loading modules, and when I
>> compiled everything in, I got a gpf when racoon tried to use a PF_KEY
>> socket. I''ll try and get some useful dumps for both these
problems.
>
> I haven''t tried loading modules, but I can''t think why it
> wouldn''t work (assuming the mechanism is basically the same as
> 2.4).
It''s different enough to need new userspace tools. The symptoms of 
failure are a GPF, and the userspace process stuck in D (be it insmod 
or lsmod). The results of feeding the GPF to ksymoops are below (I 
hesitate to say it''s actually decoded).
> BTW:  what''s racoon, and what''s a PF_KEY socket?
racoon is the ISAKMP daemon used with the 2.6 kernel''s KAME IPSec code.
It uses a PF_KEY socket to communicate with the kernel. I''ve 
successfully used it in a 2.4 guest.

Chris.


No modules in ksyms, skipping objects
No ksyms, skipping lsmod
CPU:    0
EIP:    0061:[<c01471a7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246   (2.6.7-xenU)
eax: 00000600   ebx: c5400000   ecx: 00000001   edx: 00000600
esi: c0102c54   edi: c5089000   ebp: c5087000   esp: c04b1ec4
ds: 0069   es: 0069   ss: 0069
Stack: c0102c50 c5087000 00002000 c122c6a8 c122c6e0 00000001 c01473f8 
c122c6a8
        c5087000 fffffffe c0147491 c5087000 00000000 c5055c19 c5084380 
c5015000
        fffffffe c5084380 c014753e c5087000 00000001 c012d9c3 c5087000 
c5087000
Call Trace:
  c04b1ed0: [<c01473f8>]  c04b1ee0: [<c0147491>]  c04b1f00:
[<c014753e>]
  c04b1f0c: [<c012d9c3>]  c04b1f38: [<c02da440>]  c04b1f94:
[<c012dc5d>]
  c04b1fb4: [<c010a663>]
Code: 0f 22 e2 0f 20 d9 0f 22 d9 0f 22 e0 83 c4 0c 5b 5e 5f c3 e8


 >>EIP; c01471a7 <unmap_vm_area+5d/80>   <====
 >>ebx; c5400000 <pg0+50c8000/3bcc5000>
 >>esi; c0102c54 <swapper_pg_dir+c54/1000>
 >>edi; c5089000 <pg0+4d51000/3bcc5000>
 >>ebp; c5087000 <pg0+4d4f000/3bcc5000>
 >>esp; c04b1ec4 <pg0+179ec4/3bcc5000>

Code;  c01471a7 <unmap_vm_area+5d/80>
00000000 <_EIP>:
Code;  c01471a7 <unmap_vm_area+5d/80>   <====    0:   0f 22 e2         
mov    %edx,%cr4   <====Code;  c01471aa <unmap_vm_area+60/80>
    3:   0f 20 d9                  mov    %cr3,%ecx
Code;  c01471ad <unmap_vm_area+63/80>
    6:   0f 22 d9                  mov    %ecx,%cr3
Code;  c01471b0 <unmap_vm_area+66/80>
    9:   0f 22 e0                  mov    %eax,%cr4
Code;  c01471b3 <unmap_vm_area+69/80>
    c:   83 c4 0c                  add    $0xc,%esp
Code;  c01471b6 <unmap_vm_area+6c/80>
    f:   5b                        pop    %ebx
Code;  c01471b7 <unmap_vm_area+6d/80>
   10:   5e                        pop    %esi
Code;  c01471b8 <unmap_vm_area+6e/80>
   11:   5f                        pop    %edi
Code;  c01471b9 <unmap_vm_area+6f/80>
   12:   c3                        ret
Code;  c01471ba <unmap_vm_area+70/80>
   13:   e8 00 00 00 00            call   18 <_EIP+0x18>



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

James Harper

2004-Jul-18 23:36 UTC

head link

RE: [Xen-devel] file corruption!!!

I''m not in a position to test this, but is it possible that the
corruption problem could manifest itself after an out of memory condition? When
I first noticed the corruption I rebooted as quickly as possible so it
didn''t continue and so didn''t check, but it''s
possible that it ran out of memory first. I guess I could test this but
don''t really want to do anything to risk corruption any further :)

speaking of memory, I have 3 domains running currently, 0 + 2U, all declared
with 128mb memory, but xm list shows this:
Dom  Name             Mem(MB)  CPU  State  Time(s)
0    Domain-0             119    0  r----   1293.0
6    gaia                 127    1  -b---     81.9
7    mail2                126    0  -b---   1597.9

''free'' under mail2 and gaia shows 128124 as the total amount
of memory.

I appreciate that maybe something about dom0 means that it shows something
different, but why would the other two report different amounts of memory when
they both have the same amount??? Both are running identical kernels.

James






From: Chris Andrews
Sent: Mon 19/07/2004 8:43 AM
To: xen-devel@lists.sourceforge.net
Subject: Re: [Xen-devel] file corruption!!!


On 18 Jul 2004, at 18:48, Ian Pratt wrote:
>>
>> On 17 Jul 2004, at 21:21, Ian Pratt wrote:
>>
>>> It would be very interesting to hear whether you get the problem
>>> with the 2.6.7 xen linux. It might give us a clue as to whether
>>> the problem is with the backend blk driver or within the domain
>>> itself (the 2.6.7 implementation is completely different).
>>
>> I can certainly give the 2.6.7 guest another try. I did have it
>> booting, but I didn''t persist with it long enough to tell if
there was
>> fs corruption -- there seemed to be issues loading modules, and when I
>> compiled everything in, I got a gpf when racoon tried to use a PF_KEY
>> socket. I''ll try and get some useful dumps for both these
problems.
>
> I haven''t tried loading modules, but I can''t think why it
> wouldn''t work (assuming the mechanism is basically the same as
> 2.4).
It''s different enough to need new userspace tools. The symptoms of 
failure are a GPF, and the userspace process stuck in D (be it insmod 
or lsmod). The results of feeding the GPF to ksymoops are below (I 
hesitate to say it''s actually decoded).
> BTW:  what''s racoon, and what''s a PF_KEY socket?
racoon is the ISAKMP daemon used with the 2.6 kernel''s KAME IPSec code.
It uses a PF_KEY socket to communicate with the kernel. I''ve 
successfully used it in a 2.4 guest.

Chris.


No modules in ksyms, skipping objects
No ksyms, skipping lsmod
CPU:    0
EIP:    0061:[<c01471a7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246   (2.6.7-xenU)
eax: 00000600   ebx: c5400000   ecx: 00000001   edx: 00000600
esi: c0102c54   edi: c5089000   ebp: c5087000   esp: c04b1ec4
ds: 0069   es: 0069   ss: 0069
Stack: c0102c50 c5087000 00002000 c122c6a8 c122c6e0 00000001 c01473f8 
c122c6a8
        c5087000 fffffffe c0147491 c5087000 00000000 c5055c19 c5084380 
c5015000
        fffffffe c5084380 c014753e c5087000 00000001 c012d9c3 c5087000 
c5087000
Call Trace:
  c04b1ed0: [<c01473f8>]  c04b1ee0: [<c0147491>]  c04b1f00:
[<c014753e>]
  c04b1f0c: [<c012d9c3>]  c04b1f38: [<c02da440>]  c04b1f94:
[<c012dc5d>]
  c04b1fb4: [<c010a663>]
Code: 0f 22 e2 0f 20 d9 0f 22 d9 0f 22 e0 83 c4 0c 5b 5e 5f c3 e8


 >>EIP; c01471a7 <unmap_vm_area+5d/80>   <====
 >>ebx; c5400000 <pg0+50c8000/3bcc5000>
 >>esi; c0102c54 <swapper_pg_dir+c54/1000>
 >>edi; c5089000 <pg0+4d51000/3bcc5000>
 >>ebp; c5087000 <pg0+4d4f000/3bcc5000>
 >>esp; c04b1ec4 <pg0+179ec4/3bcc5000>

Code;  c01471a7 <unmap_vm_area+5d/80>
00000000 <_EIP>:
Code;  c01471a7 <unmap_vm_area+5d/80>   <====    0:   0f 22 e2         
mov    %edx,%cr4   <====Code;  c01471aa <unmap_vm_area+60/80>
    3:   0f 20 d9                  mov    %cr3,%ecx
Code;  c01471ad <unmap_vm_area+63/80>
    6:   0f 22 d9                  mov    %ecx,%cr3
Code;  c01471b0 <unmap_vm_area+66/80>
    9:   0f 22 e0                  mov    %eax,%cr4
Code;  c01471b3 <unmap_vm_area+69/80>
    c:   83 c4 0c                  add    $0xc,%esp
Code;  c01471b6 <unmap_vm_area+6c/80>
    f:   5b                        pop    %ebx
Code;  c01471b7 <unmap_vm_area+6d/80>
   10:   5e                        pop    %esi
Code;  c01471b8 <unmap_vm_area+6e/80>
   11:   5f                        pop    %edi
Code;  c01471b9 <unmap_vm_area+6f/80>
   12:   c3                        ret
Code;  c01471ba <unmap_vm_area+70/80>
   13:   e8 00 00 00 00            call   18 <_EIP+0x18>



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

James Harper

2004-Jul-19 01:02 UTC

head link

RE: [Xen-devel] file corruption!!!

I just tried another bk pull + make world, and it failed because it
couldn''t gunzip linux-2.4.26.tar.gz. I tried it manually and sure
enough it failed. ''xm list'' etc just seg faulted too.

After a reboot though, the file was fine again, so the corruption in this case
was a read error not a write error. I''m assuming that if I had done
enough io to flush any buffers and then tried to gunzip the file again it
probably would have worked.

Just prior to this I had run a little C program which would just try and
allocate memory in 1mb chunks until it was killed.. After reboot I tried the
same thing again and it appears to be staying up okay now, unfortunately. It
almost seems like I only start to get errors after a day or so uptime and a fair
bit of I/O.

Curiously though, the first time I ran my memory exhausting program, all my xenU
domains restarted...

Since starting this email I have managed to induce corruption again,
i''ll reboot and try it again without starting any other domains.

The server is a Compaq ProLiant 1600 2x550mhz P3 with 768mb memory. All the
memory is ECC and up until I acquired it for Linux purposes, it was running as
another company''s main Windows server, so I wouldn''t have
suspected a hardware issue.

I''ll follow up shortly hopefully with some instructions on inducing the
corruption on this server for anyone else to try to see if we have a general
problem.

There haven''t been any fixes in the last 2 days that would correct this
problem have there? I''m a few days out of date i think.

James

From: James Harper
Sent: Mon 19/07/2004 9:36 AM
To: xen-devel@lists.sourceforge.net
Subject: RE: [Xen-devel] file corruption!!!

I''m not in a position to test this, but is it possible that the
corruption problem could manifest itself after an out of memory condition? When
I first noticed the corruption I rebooted as quickly as possible so it
didn''t continue and so didn''t check, but it''s
possible that it ran out of memory first. I guess I could test this but
don''t really want to do anything to risk corruption any further :)

speaking of memory, I have 3 domains running currently, 0 + 2U, all declared
with 128mb memory, but xm list shows this:
Dom  Name             Mem(MB)  CPU  State  Time(s)
0    Domain-0             119    0  r----   1293.0
6    gaia                 127    1  -b---     81.9
7    mail2                126    0  -b---   1597.9

''free'' under mail2 and gaia shows 128124 as the total amount
of memory.

I appreciate that maybe something about dom0 means that it shows something
different, but why would the other two report different amounts of memory when
they both have the same amount??? Both are running identical kernels.

James

From: Chris Andrews
Sent: Mon 19/07/2004 8:43 AM
To: xen-devel@lists.sourceforge.net
Subject: Re: [Xen-devel] file corruption!!!

On 18 Jul 2004, at 18:48, Ian Pratt wrote:
>>
>> On 17 Jul 2004, at 21:21, Ian Pratt wrote:
>>
>>> It would be very interesting to hear whether you get the problem
>>> with the 2.6.7 xen linux. It might give us a clue as to whether
>>> the problem is with the backend blk driver or within the domain
>>> itself (the 2.6.7 implementation is completely different).
>>
>> I can certainly give the 2.6.7 guest another try. I did have it
>> booting, but I didn''t persist with it long enough to tell if
there was
>> fs corruption -- there seemed to be issues loading modules, and when I
>> compiled everything in, I got a gpf when racoon tried to use a PF_KEY
>> socket. I''ll try and get some useful dumps for both these
problems.
>
> I haven''t tried loading modules, but I can''t think why it
> wouldn''t work (assuming the mechanism is basically the same as
> 2.4).
It''s different enough to need new userspace tools. The symptoms of 
failure are a GPF, and the userspace process stuck in D (be it insmod 
or lsmod). The results of feeding the GPF to ksymoops are below (I 
hesitate to say it''s actually decoded).
> BTW:  what''s racoon, and what''s a PF_KEY socket?
racoon is the ISAKMP daemon used with the 2.6 kernel''s KAME IPSec code.
It uses a PF_KEY socket to communicate with the kernel. I''ve 
successfully used it in a 2.4 guest.

Chris.

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
CPU:    0
EIP:    0061:[<c01471a7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246   (2.6.7-xenU)
eax: 00000600   ebx: c5400000   ecx: 00000001   edx: 00000600
esi: c0102c54   edi: c5089000   ebp: c5087000   esp: c04b1ec4
ds: 0069   es: 0069   ss: 0069
Stack: c0102c50 c5087000 00002000 c122c6a8 c122c6e0 00000001 c01473f8 
c122c6a8
        c5087000 fffffffe c0147491 c5087000 00000000 c5055c19 c5084380 
c5015000
        fffffffe c5084380 c014753e c5087000 00000001 c012d9c3 c5087000 
c5087000
Call Trace:
  c04b1ed0: [<c01473f8>]  c04b1ee0: [<c0147491>]  c04b1f00:
[<c014753e>]
  c04b1f0c: [<c012d9c3>]  c04b1f38: [<c02da440>]  c04b1f94:
[<c012dc5d>]
  c04b1fb4: [<c010a663>]
Code: 0f 22 e2 0f 20 d9 0f 22 d9 0f 22 e0 83 c4 0c 5b 5e 5f c3 e8

 >>EIP; c01471a7 <unmap_vm_area+5d/80>   <====
 >>ebx; c5400000 <pg0+50c8000/3bcc5000>
 >>esi; c0102c54 <swapper_pg_dir+c54/1000>
 >>edi; c5089000 <pg0+4d51000/3bcc5000>
 >>ebp; c5087000 <pg0+4d4f000/3bcc5000>
 >>esp; c04b1ec4 <pg0+179ec4/3bcc5000>

Code;  c01471a7 <unmap_vm_area+5d/80>
00000000 <_EIP>:
Code;  c01471a7 <unmap_vm_area+5d/80>   <====    0:   0f 22 e2         
mov    %edx,%cr4   <====Code;  c01471aa <unmap_vm_area+60/80>
    3:   0f 20 d9                  mov    %cr3,%ecx
Code;  c01471ad <unmap_vm_area+63/80>
    6:   0f 22 d9                  mov    %ecx,%cr3
Code;  c01471b0 <unmap_vm_area+66/80>
    9:   0f 22 e0                  mov    %eax,%cr4
Code;  c01471b3 <unmap_vm_area+69/80>
    c:   83 c4 0c                  add    $0xc,%esp
Code;  c01471b6 <unmap_vm_area+6c/80>
    f:   5b                        pop    %ebx
Code;  c01471b7 <unmap_vm_area+6d/80>
   10:   5e                        pop    %esi
Code;  c01471b8 <unmap_vm_area+6e/80>
   11:   5f                        pop    %edi
Code;  c01471b9 <unmap_vm_area+6f/80>
   12:   c3                        ret
Code;  c01471ba <unmap_vm_area+70/80>
   13:   e8 00 00 00 00            call   18 <_EIP+0x18>

-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Keir Fraser

2004-Jul-19 07:30 UTC

head link

Re: [Xen-devel] file corruption!!!

> I''m not in a position to test this, but is it possible that the
corruption problem could manifest itself after an out of memory condition? When
I first noticed the corruption I rebooted as quickly as possible so it
didn''t continue and so didn''t check, but it''s
possible that it ran out of memory first. I guess I could test this but
don''t really want to do anything to risk corruption any further :)
> 
> speaking of memory, I have 3 domains running currently, 0 + 2U, all
declared with 128mb memory, but xm list shows this:
> Dom  Name             Mem(MB)  CPU  State  Time(s)
> 0    Domain-0             119    0  r----   1293.0
> 6    gaia                 127    1  -b---     81.9
> 7    mail2                126    0  -b---   1597.9
> 
> ''free'' under mail2 and gaia shows 128124 as the total
amount of memory.
> 
> I appreciate that maybe something about dom0 means that it shows something
different, but why would the other two report different amounts of memory when
they both have the same amount??? Both are running identical kernels.
> 
> James
The backend network and blkdev drivers in DOM0 allocate multi-MB
chunks of memory, then free all the pages in that chunk back to Xen.
The chunks are then used for ephemeral mappings of I/O buffers from
frontend drivers in other guest OSes.

The total used for this is about 7 or 8MB, so DOM0''s estimate of its
memory allocation will be thrown off by about that amount. It doesn''t
realise that those large allocated chunks have had all their memory
released. :-)

 -- Keir


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Xen devel - Jul 2004 - file corruption!!!

[Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!

RE: [Xen-devel] file corruption!!!

RE: [Xen-devel] file corruption!!!

Re: [Xen-devel] file corruption!!!