Hi, With the small debug patch below and CONFIG_DEBUG_SLAB=y I get plenty of these messages in the dom0 kernel log as soon as I start a domU: Slab corruption: start=dc423000, len=4096 Slab name: xen-skb 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This is hg1a7383f849896e60f8be631c96fa2b461f502615. any idea? Gerd Index: linux-2.6.11/mm/slab.c ==================================================================--- linux-2.6.11.orig/mm/slab.c 2005-03-02 08:38:38.000000000 +0100 +++ linux-2.6.11/mm/slab.c 2005-07-07 11:31:47.000000000 +0200 @@ -1007,6 +1007,9 @@ static void print_objinfo(kmem_cache_t * int i, size; char *realobj; + if (cachep->name) { + printk(KERN_ERR "Slab name: %s\n", cachep->name); + } if (cachep->flags & SLAB_RED_ZONE) { printk(KERN_ERR "Redzone: 0x%lx/0x%lx.\n", *dbg_redzone1(cachep, objp), _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>With the small debug patch below and CONFIG_DEBUG_SLAB=y I get >plenty of these messages in the dom0 kernel log as soon as I >start a domU: > > Slab corruption: start=dc423000, len=4096 > Slab name: xen-skb > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >This is hg1a7383f849896e60f8be631c96fa2b461f502615. > >any idea?No, although I think this is a real bug -- I first observed it yesterday after a few different changes went in; the most likely culprit is my change to the builder to move the store page to a different location.. but I think that change is correct (at least, the previous behaviour was certainly incorrect :-) One thing to check is what the underlying pfn and/or mfn is for the relevant slabs(s) which could point to an allocation issue. cheers, S. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> One thing to check is what the underlying pfn and/or mfn is for the > relevant slabs(s) which could point to an allocation issue.Looks like xen allocates via kmem_cache_alloc() and releases via kfree(), which is illegal according to a comment in mm/slab.c Seems to work nevertheless, but maybe it''s pure luck and we''ll hit a bug sooner or later. I''ll try fix this and see if the problem goes away then ... Gerd -- panic("it works"); /* avoid being flooded with debug messages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Jul 2005, at 12:35, Gerd Knorr wrote:> Looks like xen allocates via kmem_cache_alloc() and releases via > kfree(), which is illegal according to a comment in mm/slab.c > > Seems to work nevertheless, but maybe it''s pure luck and we''ll > hit a bug sooner or later. I''ll try fix this and see if the > problem goes away then ...I think this works because kfree calls kmem_cache_free after finding the cache pointer it has squirreled away. Where do we do this in XenLinux? Maybe there was a reason I did it that way. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >Looks like xen allocates via kmem_cache_alloc() and releases via > >kfree(), which is illegal according to a comment in mm/slab.cActually this isn''t in arch/xen code, so native linux probably does the same.> I think this works because kfree calls kmem_cache_free after finding > the cache pointer it has squirreled away.Hmm, maybe the comment is outdated then ... Gerd -- panic("it works"); /* avoid being flooded with debug messages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, seems the copy code in netback may triggers this: [ ... ] kfree: dc81a000 kmem_cache_alloc: dc81a000 netif_be_start_xmit: copy skb dc927238/db78a022 -> nskb dc83cb30/dc81a010 kmem_cache_alloc: dcf5f000 kfree: db78a000 kfree: dc81a000 Slab corruption: start=dc81a000, i=0, len=4096 Slab name: xen-skb 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 With the debug patch below Gerd Index: linux-2.6.11/mm/slab.c ==================================================================--- linux-2.6.11.orig/mm/slab.c 2005-03-02 08:38:38.000000000 +0100 +++ linux-2.6.11/mm/slab.c 2005-07-07 14:11:17.000000000 +0200 @@ -1007,6 +1007,9 @@ static void print_objinfo(kmem_cache_t * int i, size; char *realobj; + if (cachep->name) { + printk(KERN_ERR "Slab name: %s\n", cachep->name); + } if (cachep->flags & SLAB_RED_ZONE) { printk(KERN_ERR "Redzone: 0x%lx/0x%lx.\n", *dbg_redzone1(cachep, objp), @@ -1049,8 +1052,8 @@ static void check_poison_obj(kmem_cache_ /* Mismatch ! */ /* Print header */ if (lines == 0) { - printk(KERN_ERR "Slab corruption: start=%p, len=%d\n", - realobj, size); + printk(KERN_ERR "Slab corruption: start=%p, i=%d, len=%d\n", + realobj, i, size); print_objinfo(cachep, objp, 0); } /* Hexdump the affected line */ @@ -2294,9 +2297,17 @@ static inline void __cache_free (kmem_ca * Allocate an object from this cache. The flags are only relevant * if the cache has no available objects. */ + +extern kmem_cache_t *skbuff_cachep; /* in arch/xen/kernel/skbuff.c */ + void * kmem_cache_alloc (kmem_cache_t *cachep, int flags) { - return __cache_alloc(cachep, flags); + void *rc = __cache_alloc(cachep, flags); + + if (skbuff_cachep == cachep) { + printk("%s: %p\n", __FUNCTION__, rc); + } + return rc; } EXPORT_SYMBOL(kmem_cache_alloc); @@ -2530,6 +2541,9 @@ void kmem_cache_free (kmem_cache_t *cach { unsigned long flags; + if (skbuff_cachep == cachep) { + printk("%s: %p\n", __FUNCTION__, objp); + } local_irq_save(flags); __cache_free(cachep, objp); local_irq_restore(flags); @@ -2575,6 +2589,9 @@ void kfree (const void *objp) local_irq_save(flags); kfree_debugcheck(objp); c = GET_PAGE_CACHE(virt_to_page(objp)); + if (skbuff_cachep == c) { + printk("%s: %p\n", __FUNCTION__, objp); + } __cache_free(c, (void*)objp); local_irq_restore(flags); } Index: linux-2.6.11/arch/xen/kernel/skbuff.c ==================================================================--- linux-2.6.11.orig/arch/xen/kernel/skbuff.c 2005-07-07 11:04:31.000000000 +0200 +++ linux-2.6.11/arch/xen/kernel/skbuff.c 2005-07-07 14:09:37.000000000 +0200 @@ -27,6 +27,8 @@ EXPORT_SYMBOL(__dev_alloc_skb); struct sk_buff *__dev_alloc_skb(unsigned int length, int gfp_mask) { struct sk_buff *skb; + + BUG_ON(length+16 > PAGE_SIZE); skb = alloc_skb_from_cache(skbuff_cachep, length + 16, gfp_mask); if ( likely(skb != NULL) ) skb_reserve(skb, 16); Index: linux-2.6.11/drivers/xen/netback/netback.c ==================================================================--- linux-2.6.11.orig/drivers/xen/netback/netback.c 2005-07-07 11:04:31.000000000 +0200 +++ linux-2.6.11/drivers/xen/netback/netback.c 2005-07-07 14:12:51.000000000 +0200 @@ -151,6 +151,8 @@ int netif_be_start_xmit(struct sk_buff * struct sk_buff *nskb = dev_alloc_skb(hlen + skb->len); if ( unlikely(nskb == NULL) ) goto drop; + printk("%s: copy skb %p/%p -> nskb %p/%p\n", __FUNCTION__, + skb, skb->data, nskb, nskb->data); skb_reserve(nskb, hlen); __skb_put(nskb, skb->len); if (skb_copy_bits(skb, -hlen, nskb->data - hlen, skb->len + hlen)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Maybe related: I also see tcp connection stalls. Just booting domain0 is enougth for that, whereas I see the slab corruption stuff only after trying to boot some domU. As the tcpdump below shows the xen machine sends the same packet over and over again. I''d guess some kind of memory corruption which kills the packet checksum and makes eskarina drop the packet. Maybe just use-after-free as the slabdebug stuff will fill released memory blocks with some pattern. Gerd 14:56:41.445799 IP eskarina.40106 > master-xen.ssh: S 4050091082:4050091082(0) win 5840 <mss 1460,sackOK,timestamp 2080736744 0,nop,wscale 2> 14:56:41.446106 IP master-xen.ssh > eskarina.40106: S 892717242:892717242(0) ack 4050091083 win 5792 <mss 1460,sackOK,timestamp 5230 2080736744,nop,wscale 2> 14:56:41.446129 IP eskarina.40106 > master-xen.ssh: . ack 1 win 1460 <nop,nop,timestamp 2080736744 5230> 14:56:41.461316 IP master-xen.ssh > eskarina.40106: P 1:24(23) ack 1 win 1448 <nop,nop,timestamp 5232 2080736744> 14:56:41.461720 IP eskarina.40106 > master-xen.ssh: . ack 24 win 1460 <nop,nop,timestamp 2080736760 5232> 14:56:41.462250 IP eskarina.40106 > master-xen.ssh: P 1:23(22) ack 24 win 1460 <nop,nop,timestamp 2080736760 5232> 14:56:41.462641 IP master-xen.ssh > eskarina.40106: . ack 23 win 1448 <nop,nop,timestamp 5232 2080736760> 14:56:41.463196 IP eskarina.40106 > master-xen.ssh: P 23:663(640) ack 24 win 1460 <nop,nop,timestamp 2080736761 5232> 14:56:41.463988 IP master-xen.ssh > eskarina.40106: . ack 663 win 1768 <nop,nop,timestamp 5232 2080736761> 14:56:41.464835 IP master-xen.ssh > eskarina.40106: P 24:664(640) ack 663 win 1768 <nop,nop,timestamp 5232 2080736761> 14:56:41.465183 IP eskarina.40106 > master-xen.ssh: P 663:687(24) ack 664 win 1780 <nop,nop,timestamp 2080736763 5232> 14:56:41.469605 IP master-xen.ssh > eskarina.40106: P 664:816(152) ack 687 win 1768 <nop,nop,timestamp 5233 2080736763> 14:56:41.485010 IP eskarina.40106 > master-xen.ssh: P 687:831(144) ack 816 win 2100 <nop,nop,timestamp 2080736783 5233> 14:56:41.493626 IP master-xen.ssh > eskarina.40106: P 816:1280(464) ack 831 win 1768 <nop,nop,timestamp 5235 2080736783> 14:56:41.495546 IP eskarina.40106 > master-xen.ssh: P 831:847(16) ack 1280 win 2420 <nop,nop,timestamp 2080736793 5235> 14:56:41.529119 IP master-xen.ssh > eskarina.40106: . ack 847 win 1768 <nop,nop,timestamp 5239 2080736793> 14:56:41.529143 IP eskarina.40106 > master-xen.ssh: P 847:895(48) ack 1280 win 2420 <nop,nop,timestamp 2080736827 5239> 14:56:41.529488 IP master-xen.ssh > eskarina.40106: . ack 895 win 1768 <nop,nop,timestamp 5239 2080736827> 14:56:41.529642 IP master-xen.ssh > eskarina.40106: P 1280:1328(48) ack 895 win 1768 <nop,nop,timestamp 5239 2080736827> 14:56:41.530430 IP eskarina.40106 > master-xen.ssh: P 895:959(64) ack 1328 win 2420 <nop,nop,timestamp 2080736828 5239> 14:56:41.534746 IP master-xen.ssh > eskarina.40106: P 1328:1392(64) ack 959 win 1768 <nop,nop,timestamp 5239 2080736828> 14:56:41.534918 IP eskarina.40106 > master-xen.ssh: P 959:1199(240) ack 1392 win 2420 <nop,nop,timestamp 2080736833 5239> 14:56:41.536297 IP master-xen.ssh > eskarina.40106: P 1392:1584(192) ack 1199 win 2088 <nop,nop,timestamp 5239 2080736833> 14:56:41.538960 IP eskarina.40106 > master-xen.ssh: P 1199:1583(384) ack 1584 win 2420 <nop,nop,timestamp 2080736837 5239> 14:56:41.541838 IP master-xen.ssh > eskarina.40106: P 1584:1616(32) ack 1583 win 2408 <nop,nop,timestamp 5240 2080736837> 14:56:41.542436 IP eskarina.40106 > master-xen.ssh: P 1583:1647(64) ack 1616 win 2420 <nop,nop,timestamp 2080736840 5240> 14:56:41.579115 IP master-xen.ssh > eskarina.40106: . ack 1647 win 2408 <nop,nop,timestamp 5244 2080736840> 14:56:41.580513 IP master-xen.ssh > eskarina.40106: P 1616:1664(48) ack 1647 win 2408 <nop,nop,timestamp 5244 2080736840> 14:56:41.581070 IP eskarina.40106 > master-xen.ssh: P 1647:2287(640) ack 1664 win 2420 <nop,nop,timestamp 2080736879 5244> 14:56:41.581859 IP master-xen.ssh > eskarina.40106: . ack 2287 win 2728 <nop,nop,timestamp 5244 2080736879> 14:56:41.613880 IP master-xen.ssh > eskarina.40106: P 1664:1712(48) ack 2287 win 2728 <nop,nop,timestamp 5247 2080736879> 14:56:41.614052 IP master-xen.ssh > eskarina.40106: P 1712:1840(128) ack 2287 win 2728 <nop,nop,timestamp 5247 2080736879> 14:56:41.653594 IP eskarina.40106 > master-xen.ssh: . ack 1840 win 2740 <nop,nop,timestamp 2080736952 5247> 14:56:41.662633 IP master-xen.ssh > eskarina.40106: P 1840:1904(64) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736952> 14:56:41.662697 IP eskarina.40106 > master-xen.ssh: . ack 1904 win 2740 <nop,nop,timestamp 2080736961 5252> 14:56:41.668714 IP master-xen.ssh > eskarina.40106: P 1904:1984(80) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736961> 14:56:41.668798 IP eskarina.40106 > master-xen.ssh: . ack 1984 win 2740 <nop,nop,timestamp 2080736967 5252> 14:56:41.668836 IP master-xen.ssh > eskarina.40106: P 1984:2048(64) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736961> 14:56:41.668874 IP eskarina.40106 > master-xen.ssh: . ack 2048 win 2740 <nop,nop,timestamp 2080736967 5252> 14:56:41.669064 IP master-xen.ssh > eskarina.40106: P 2048:2096(48) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736961> 14:56:41.669254 IP eskarina.40106 > master-xen.ssh: . ack 2096 win 2740 <nop,nop,timestamp 2080736967 5252> 14:56:41.670240 IP master-xen.ssh > eskarina.40106: P 2096:2160(64) ack 2287 win 2728 <nop,nop,timestamp 5253 2080736967> 14:56:41.671049 IP eskarina.40106 > master-xen.ssh: . ack 2160 win 2740 <nop,nop,timestamp 2080736969 5253> 14:56:43.375106 IP eskarina.40106 > master-xen.ssh: P 2287:2335(48) ack 2160 win 2740 <nop,nop,timestamp 2080738673 5253> 14:56:43.376098 IP master-xen.ssh > eskarina.40106: P 2160:2208(48) ack 2335 win 2728 <nop,nop,timestamp 5423 2080738673> 14:56:43.376119 IP eskarina.40106 > master-xen.ssh: . ack 2208 win 2740 <nop,nop,timestamp 2080738674 5423> 14:56:43.518162 IP eskarina.40106 > master-xen.ssh: P 2335:2383(48) ack 2208 win 2740 <nop,nop,timestamp 2080738816 5423> 14:56:43.518924 IP master-xen.ssh > eskarina.40106: P 2208:2256(48) ack 2383 win 2728 <nop,nop,timestamp 5437 2080738816> 14:56:43.518942 IP eskarina.40106 > master-xen.ssh: . ack 2256 win 2740 <nop,nop,timestamp 2080738817 5437> 14:56:43.599847 IP eskarina.40106 > master-xen.ssh: P 2383:2431(48) ack 2256 win 2740 <nop,nop,timestamp 2080738898 5437> 14:56:43.600591 IP master-xen.ssh > eskarina.40106: P 2256:2304(48) ack 2431 win 2728 <nop,nop,timestamp 5446 2080738898> 14:56:43.600607 IP eskarina.40106 > master-xen.ssh: . ack 2304 win 2740 <nop,nop,timestamp 2080738899 5446> 14:56:43.773975 IP eskarina.40106 > master-xen.ssh: P 2431:2479(48) ack 2304 win 2740 <nop,nop,timestamp 2080739072 5446> 14:56:43.774815 IP master-xen.ssh > eskarina.40106: P 2304:2352(48) ack 2479 win 2728 <nop,nop,timestamp 5463 2080739072> 14:56:43.774834 IP eskarina.40106 > master-xen.ssh: . ack 2352 win 2740 <nop,nop,timestamp 2080739073 5463> 14:56:43.843676 IP eskarina.40106 > master-xen.ssh: P 2479:2527(48) ack 2352 win 2740 <nop,nop,timestamp 2080739142 5463> 14:56:43.844417 IP master-xen.ssh > eskarina.40106: P 2352:2400(48) ack 2527 win 2728 <nop,nop,timestamp 5470 2080739142> 14:56:43.844601 IP eskarina.40106 > master-xen.ssh: . ack 2400 win 2740 <nop,nop,timestamp 2080739143 5470> 14:56:44.200677 IP eskarina.40106 > master-xen.ssh: P 2527:2575(48) ack 2400 win 2740 <nop,nop,timestamp 2080739499 5470> 14:56:44.201595 IP master-xen.ssh > eskarina.40106: P 2400:2448(48) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739499> 14:56:44.201616 IP eskarina.40106 > master-xen.ssh: . ack 2448 win 2740 <nop,nop,timestamp 2080739500 5506> 14:56:44.207227 IP master-xen.ssh > eskarina.40106: . 2448:3896(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739500> 14:56:44.207255 IP eskarina.40106 > master-xen.ssh: . ack 3896 win 3464 <nop,nop,timestamp 2080739506 5506> 14:56:44.208455 IP master-xen.ssh > eskarina.40106: . 3896:5344(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739500> 14:56:44.208475 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739507 5506> 14:56:44.209923 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739506> 14:56:44.211154 IP master-xen.ssh > eskarina.40106: . 6792:8240(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739506> 14:56:44.211174 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739510 5506,nop,nop,sack sack 1 {6792:8240} > 14:56:44.212383 IP master-xen.ssh > eskarina.40106: . 8240:9688(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739507> 14:56:44.213790 IP master-xen.ssh > eskarina.40106: . 9688:11136(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739507> 14:56:44.213798 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739512 5506,nop,nop,sack sack 2 {9688:11136}{6792:8240} > 14:56:44.215022 IP master-xen.ssh > eskarina.40106: . 11136:12584(1448) ack 2575 win 2728 <nop,nop,timestamp 5507 2080739510> 14:56:44.215034 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739513 5506,nop,nop,sack sack 2 {9688:12584}{6792:8240} > 14:56:44.216438 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5507 2080739512> 14:56:44.420361 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5528 2080739513> 14:56:44.840346 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5570 2080739513> 14:56:45.680357 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5654 2080739513> 14:56:47.360382 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5822 2080739513> 14:56:50.720720 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 6158 2080739513> 14:56:57.440528 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 6830 2080739513> 14:57:10.880688 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 8174 2080739513> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Jul 2005, at 14:17, Gerd Knorr wrote:> Maybe related: I also see tcp connection stalls. Just booting domain0 > is enougth for that, whereas I see the slab corruption stuff only > after trying to boot some domU.Does it happen before starting xend? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 07, 2005 at 03:05:04PM +0100, Keir Fraser wrote:> > On 7 Jul 2005, at 14:17, Gerd Knorr wrote: > > >Maybe related: I also see tcp connection stalls. Just booting domain0 > >is enougth for that, whereas I see the slab corruption stuff only > >after trying to boot some domU. > > Does it happen before starting xend?Yes. It''s not the usual xend network setup bug ;) The tcpdump was taken on eskarina (not the xen host), so the packets really go out to the wire ... Gerd -- panic("it works"); /* avoid being flooded with debug messages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Knorr <kraxel@suse.de> writes:> The tcpdump was taken on eskarina (not the xen host), so the > packets really go out to the wire ...more detailed tcpdump of another connection stall: 16:32:44.311412 IP master-xen.ssh > eskarina.49606: P 2256:2304(48) ack 2335 win 2728 <nop,nop,timestamp 4294945237 2086500670> 0x0000: 4510 0064 71b9 4000 4006 3798 952c b36c E..dq.@.@.7..,.l 0x0010: 952c b36d 0016 c1c6 8b94 e24a 5c6d 220a .,.m.......J\m". 0x0020: 8018 0aa8 7028 0000 0101 080a ffff a9d5 ....p(.......... 0x0030: 7c5d 793e 8889 411e e090 5982 d79e fc6b |]y>..A...Y....k 0x0040: d9c6 3535 b109 5a9b e38f 3a3a e695 29af ..55..Z...::..). 0x0050: 81db 9dc7 a1a5 56c1 5586 1f3d ab76 1fde ......V.U..=.v.. 0x0060: 46d5 5d6f F.]o 16:32:44.311416 IP eskarina.49606 > master-xen.ssh: . ack 2304 win 2420 <nop,nop,timestamp 2086500691 4294945237> 0x0000: 4510 0034 68ba 4000 4006 40c7 952c b36d E..4h.@.@.@..,.m 0x0010: 952c b36c c1c6 0016 5c6d 220a 8b94 e27a .,.l....\m"....z 0x0020: 8010 0974 8e2c 0000 0101 080a 7c5d 7953 ...t.,......|]yS 0x0030: ffff a9d5 .... 16:32:44.311568 IP master-xen.ssh > eskarina.49606: P 2304:2384(80) ack 2335 win 2728 <nop,nop,timestamp 4294945237 2086500670> 0x0000: 4510 0084 71bb 4000 4006 3776 952c b36c E...q.@.@.7v.,.l 0x0010: 952c b36d 0016 c1c6 8b94 e27a 5c6d 220a .,.m.......z\m". 0x0020: 8018 0aa8 0ee4 0000 0101 080a ffff a9d5 ................ 0x0030: 7c5d 793e b9cb e472 6e18 cf4c f551 af4d |]y>...rn..L.Q.M 0x0040: 0131 a5c8 178e 610f 08ce 253b 084f b7cc .1....a...%;.O.. 0x0050: d7bb a84a 4865 322d 3634 ce9c 5026 2ad6 ...JHe2-64..P&*. 0x0060: b41b 830b 0000 0000 0000 0000 0000 0000 ................ 0x0070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0080: 0000 0000 .... 16:32:44.311866 IP master-xen.ssh > eskarina.49606: P 2384:2448(64) ack 2335 win 2728 <nop,nop,timestamp 4294945237 2086500670> 0x0000: 4510 0074 71bd 4000 4006 3784 952c b36c E..tq.@.@.7..,.l 0x0010: 952c b36d 0016 c1c6 8b94 e2ca 5c6d 220a .,.m........\m". 0x0020: 8018 0aa8 045e 0000 0101 080a ffff a9d5 .....^.......... 0x0030: 7c5d 793e 2134 146a 6adb 9a0e abc3 fc9d |]y>!4.jj....... 0x0040: 1350 7f6f eac9 b4d4 5059 6cf4 dcdd 085f .P.o....PYl...._ 0x0050: a943 68d7 9f60 b799 ddbf 0eee 6a8c 850d .Ch..`......j... 0x0060: 4294 4541 19c3 eddf 7fe5 afe5 6573 5c8f B.EA........es\. 0x0070: 62f2 a6ae b... 16:32:44.311875 IP eskarina.49606 > master-xen.ssh: . ack 2304 win 2420 <nop,nop,timestamp 2086500692 4294945237,nop,nop,sack sack 1 {2384:2448} > 0x0000: 4510 0040 68bc 4000 4006 40b9 952c b36d E..@h.@.@.@..,.m 0x0010: 952c b36c c1c6 0016 5c6d 220a 8b94 e27a .,.l....\m"....z 0x0020: b010 0974 7b15 0000 0101 080a 7c5d 7954 ...t{.......|]yT 0x0030: ffff a9d5 0101 050a 8b94 e2ca 8b94 e30a ................ The 2304:2384 payload package is broken and causes the stalls: End of package is filled with zeros. master-xen tried to resend the broken package over and over again, but that doesn''t work of course ... Gerd -- panic("it works"); /* avoid being flooded with debug messages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Knorr <kraxel@suse.de> writes:> With the small debug patch below and CONFIG_DEBUG_SLAB=y I get > plenty of these messages in the dom0 kernel log as soon as I > start a domU: > > Slab corruption: start=dc423000, len=4096 > Slab name: xen-skb > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00After digging in the netback code I''m not sure any more this is a real bug. Could also be the slab debugging can deal with the page mapping tricks the netback driver does when sending/receiving packets. The start address mentioned referes to a page allocated via alloc_mfn() in netback.c ... Gerd -- panic("it works"); /* avoid being flooded with debug messages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nivedita Singhvi
2005-Jul-07 16:06 UTC
Re: [Xen-devel] bug: slab corruption (net backend?)
Gerd Knorr wrote:> Hi, > > Maybe related: I also see tcp connection stalls. Just booting domain0 > is enougth for that, whereas I see the slab corruption stuff only > after trying to boot some domU. > > As the tcpdump below shows the xen machine sends the same packet over > and over again. I''d guess some kind of memory corruption which kills > the packet checksum and makes eskarina drop the packet. Maybe just > use-after-free as the slabdebug stuff will fill released memory blocks > with some pattern.Gerd, Can you disable tcp checksum offload (ethtool -K tx off) and see if you can reproduce this problem? thanks, Nivedita> 14:56:41.445799 IP eskarina.40106 > master-xen.ssh: S 4050091082:4050091082(0) win 5840 <mss 1460,sackOK,timestamp 2080736744 0,nop,wscale 2> > 14:56:41.446106 IP master-xen.ssh > eskarina.40106: S 892717242:892717242(0) ack 4050091083 win 5792 <mss 1460,sackOK,timestamp 5230 2080736744,nop,wscale 2> > 14:56:41.446129 IP eskarina.40106 > master-xen.ssh: . ack 1 win 1460 <nop,nop,timestamp 2080736744 5230> > 14:56:41.461316 IP master-xen.ssh > eskarina.40106: P 1:24(23) ack 1 win 1448 <nop,nop,timestamp 5232 2080736744> > 14:56:41.461720 IP eskarina.40106 > master-xen.ssh: . ack 24 win 1460 <nop,nop,timestamp 2080736760 5232> > 14:56:41.462250 IP eskarina.40106 > master-xen.ssh: P 1:23(22) ack 24 win 1460 <nop,nop,timestamp 2080736760 5232> > 14:56:41.462641 IP master-xen.ssh > eskarina.40106: . ack 23 win 1448 <nop,nop,timestamp 5232 2080736760> > 14:56:41.463196 IP eskarina.40106 > master-xen.ssh: P 23:663(640) ack 24 win 1460 <nop,nop,timestamp 2080736761 5232> > 14:56:41.463988 IP master-xen.ssh > eskarina.40106: . ack 663 win 1768 <nop,nop,timestamp 5232 2080736761> > 14:56:41.464835 IP master-xen.ssh > eskarina.40106: P 24:664(640) ack 663 win 1768 <nop,nop,timestamp 5232 2080736761> > 14:56:41.465183 IP eskarina.40106 > master-xen.ssh: P 663:687(24) ack 664 win 1780 <nop,nop,timestamp 2080736763 5232> > 14:56:41.469605 IP master-xen.ssh > eskarina.40106: P 664:816(152) ack 687 win 1768 <nop,nop,timestamp 5233 2080736763> > 14:56:41.485010 IP eskarina.40106 > master-xen.ssh: P 687:831(144) ack 816 win 2100 <nop,nop,timestamp 2080736783 5233> > 14:56:41.493626 IP master-xen.ssh > eskarina.40106: P 816:1280(464) ack 831 win 1768 <nop,nop,timestamp 5235 2080736783> > 14:56:41.495546 IP eskarina.40106 > master-xen.ssh: P 831:847(16) ack 1280 win 2420 <nop,nop,timestamp 2080736793 5235> > 14:56:41.529119 IP master-xen.ssh > eskarina.40106: . ack 847 win 1768 <nop,nop,timestamp 5239 2080736793> > 14:56:41.529143 IP eskarina.40106 > master-xen.ssh: P 847:895(48) ack 1280 win 2420 <nop,nop,timestamp 2080736827 5239> > 14:56:41.529488 IP master-xen.ssh > eskarina.40106: . ack 895 win 1768 <nop,nop,timestamp 5239 2080736827> > 14:56:41.529642 IP master-xen.ssh > eskarina.40106: P 1280:1328(48) ack 895 win 1768 <nop,nop,timestamp 5239 2080736827> > 14:56:41.530430 IP eskarina.40106 > master-xen.ssh: P 895:959(64) ack 1328 win 2420 <nop,nop,timestamp 2080736828 5239> > 14:56:41.534746 IP master-xen.ssh > eskarina.40106: P 1328:1392(64) ack 959 win 1768 <nop,nop,timestamp 5239 2080736828> > 14:56:41.534918 IP eskarina.40106 > master-xen.ssh: P 959:1199(240) ack 1392 win 2420 <nop,nop,timestamp 2080736833 5239> > 14:56:41.536297 IP master-xen.ssh > eskarina.40106: P 1392:1584(192) ack 1199 win 2088 <nop,nop,timestamp 5239 2080736833> > 14:56:41.538960 IP eskarina.40106 > master-xen.ssh: P 1199:1583(384) ack 1584 win 2420 <nop,nop,timestamp 2080736837 5239> > 14:56:41.541838 IP master-xen.ssh > eskarina.40106: P 1584:1616(32) ack 1583 win 2408 <nop,nop,timestamp 5240 2080736837> > 14:56:41.542436 IP eskarina.40106 > master-xen.ssh: P 1583:1647(64) ack 1616 win 2420 <nop,nop,timestamp 2080736840 5240> > 14:56:41.579115 IP master-xen.ssh > eskarina.40106: . ack 1647 win 2408 <nop,nop,timestamp 5244 2080736840> > 14:56:41.580513 IP master-xen.ssh > eskarina.40106: P 1616:1664(48) ack 1647 win 2408 <nop,nop,timestamp 5244 2080736840> > 14:56:41.581070 IP eskarina.40106 > master-xen.ssh: P 1647:2287(640) ack 1664 win 2420 <nop,nop,timestamp 2080736879 5244> > 14:56:41.581859 IP master-xen.ssh > eskarina.40106: . ack 2287 win 2728 <nop,nop,timestamp 5244 2080736879> > 14:56:41.613880 IP master-xen.ssh > eskarina.40106: P 1664:1712(48) ack 2287 win 2728 <nop,nop,timestamp 5247 2080736879> > 14:56:41.614052 IP master-xen.ssh > eskarina.40106: P 1712:1840(128) ack 2287 win 2728 <nop,nop,timestamp 5247 2080736879> > 14:56:41.653594 IP eskarina.40106 > master-xen.ssh: . ack 1840 win 2740 <nop,nop,timestamp 2080736952 5247> > 14:56:41.662633 IP master-xen.ssh > eskarina.40106: P 1840:1904(64) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736952> > 14:56:41.662697 IP eskarina.40106 > master-xen.ssh: . ack 1904 win 2740 <nop,nop,timestamp 2080736961 5252> > 14:56:41.668714 IP master-xen.ssh > eskarina.40106: P 1904:1984(80) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736961> > 14:56:41.668798 IP eskarina.40106 > master-xen.ssh: . ack 1984 win 2740 <nop,nop,timestamp 2080736967 5252> > 14:56:41.668836 IP master-xen.ssh > eskarina.40106: P 1984:2048(64) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736961> > 14:56:41.668874 IP eskarina.40106 > master-xen.ssh: . ack 2048 win 2740 <nop,nop,timestamp 2080736967 5252> > 14:56:41.669064 IP master-xen.ssh > eskarina.40106: P 2048:2096(48) ack 2287 win 2728 <nop,nop,timestamp 5252 2080736961> > 14:56:41.669254 IP eskarina.40106 > master-xen.ssh: . ack 2096 win 2740 <nop,nop,timestamp 2080736967 5252> > 14:56:41.670240 IP master-xen.ssh > eskarina.40106: P 2096:2160(64) ack 2287 win 2728 <nop,nop,timestamp 5253 2080736967> > 14:56:41.671049 IP eskarina.40106 > master-xen.ssh: . ack 2160 win 2740 <nop,nop,timestamp 2080736969 5253> > 14:56:43.375106 IP eskarina.40106 > master-xen.ssh: P 2287:2335(48) ack 2160 win 2740 <nop,nop,timestamp 2080738673 5253> > 14:56:43.376098 IP master-xen.ssh > eskarina.40106: P 2160:2208(48) ack 2335 win 2728 <nop,nop,timestamp 5423 2080738673> > 14:56:43.376119 IP eskarina.40106 > master-xen.ssh: . ack 2208 win 2740 <nop,nop,timestamp 2080738674 5423> > 14:56:43.518162 IP eskarina.40106 > master-xen.ssh: P 2335:2383(48) ack 2208 win 2740 <nop,nop,timestamp 2080738816 5423> > 14:56:43.518924 IP master-xen.ssh > eskarina.40106: P 2208:2256(48) ack 2383 win 2728 <nop,nop,timestamp 5437 2080738816> > 14:56:43.518942 IP eskarina.40106 > master-xen.ssh: . ack 2256 win 2740 <nop,nop,timestamp 2080738817 5437> > 14:56:43.599847 IP eskarina.40106 > master-xen.ssh: P 2383:2431(48) ack 2256 win 2740 <nop,nop,timestamp 2080738898 5437> > 14:56:43.600591 IP master-xen.ssh > eskarina.40106: P 2256:2304(48) ack 2431 win 2728 <nop,nop,timestamp 5446 2080738898> > 14:56:43.600607 IP eskarina.40106 > master-xen.ssh: . ack 2304 win 2740 <nop,nop,timestamp 2080738899 5446> > 14:56:43.773975 IP eskarina.40106 > master-xen.ssh: P 2431:2479(48) ack 2304 win 2740 <nop,nop,timestamp 2080739072 5446> > 14:56:43.774815 IP master-xen.ssh > eskarina.40106: P 2304:2352(48) ack 2479 win 2728 <nop,nop,timestamp 5463 2080739072> > 14:56:43.774834 IP eskarina.40106 > master-xen.ssh: . ack 2352 win 2740 <nop,nop,timestamp 2080739073 5463> > 14:56:43.843676 IP eskarina.40106 > master-xen.ssh: P 2479:2527(48) ack 2352 win 2740 <nop,nop,timestamp 2080739142 5463> > 14:56:43.844417 IP master-xen.ssh > eskarina.40106: P 2352:2400(48) ack 2527 win 2728 <nop,nop,timestamp 5470 2080739142> > 14:56:43.844601 IP eskarina.40106 > master-xen.ssh: . ack 2400 win 2740 <nop,nop,timestamp 2080739143 5470> > 14:56:44.200677 IP eskarina.40106 > master-xen.ssh: P 2527:2575(48) ack 2400 win 2740 <nop,nop,timestamp 2080739499 5470> > 14:56:44.201595 IP master-xen.ssh > eskarina.40106: P 2400:2448(48) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739499> > 14:56:44.201616 IP eskarina.40106 > master-xen.ssh: . ack 2448 win 2740 <nop,nop,timestamp 2080739500 5506> > 14:56:44.207227 IP master-xen.ssh > eskarina.40106: . 2448:3896(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739500> > 14:56:44.207255 IP eskarina.40106 > master-xen.ssh: . ack 3896 win 3464 <nop,nop,timestamp 2080739506 5506> > 14:56:44.208455 IP master-xen.ssh > eskarina.40106: . 3896:5344(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739500> > 14:56:44.208475 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739507 5506> > 14:56:44.209923 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739506> > 14:56:44.211154 IP master-xen.ssh > eskarina.40106: . 6792:8240(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739506> > 14:56:44.211174 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739510 5506,nop,nop,sack sack 1 {6792:8240} > > 14:56:44.212383 IP master-xen.ssh > eskarina.40106: . 8240:9688(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739507> > 14:56:44.213790 IP master-xen.ssh > eskarina.40106: . 9688:11136(1448) ack 2575 win 2728 <nop,nop,timestamp 5506 2080739507> > 14:56:44.213798 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739512 5506,nop,nop,sack sack 2 {9688:11136}{6792:8240} > > 14:56:44.215022 IP master-xen.ssh > eskarina.40106: . 11136:12584(1448) ack 2575 win 2728 <nop,nop,timestamp 5507 2080739510> > 14:56:44.215034 IP eskarina.40106 > master-xen.ssh: . ack 5344 win 4188 <nop,nop,timestamp 2080739513 5506,nop,nop,sack sack 2 {9688:12584}{6792:8240} > > 14:56:44.216438 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5507 2080739512> > 14:56:44.420361 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5528 2080739513> > 14:56:44.840346 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5570 2080739513> > 14:56:45.680357 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5654 2080739513> > 14:56:47.360382 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 5822 2080739513> > 14:56:50.720720 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 6158 2080739513> > 14:56:57.440528 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 6830 2080739513> > 14:57:10.880688 IP master-xen.ssh > eskarina.40106: . 5344:6792(1448) ack 2575 win 2728 <nop,nop,timestamp 8174 2080739513> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- --- Nivedita Singhvi (nivedita@us.ibm.com Lotus) Niv Singhvi (niv@us.ibm.com IMAP) (503) 578-4580 T/L 775-4580 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >As the tcpdump below shows the xen machine sends the same packet over > >and over again. I''d guess some kind of memory corruption which kills > >the packet checksum and makes eskarina drop the packet. Maybe just > >use-after-free as the slabdebug stuff will fill released memory blocks > >with some pattern. > > Can you disable tcp checksum offload (ethtool -K tx off) and see > if you can reproduce this problem?Never seen that again since I''ve turned off CONFIG_SLAB_DEBUG, which seems to be incompatible with the page mapping the xen backend drivers do. Gerd -- panic("it works"); /* avoid being flooded with debug messages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11 Jul 2005, at 14:25, Gerd Knorr wrote:>> Can you disable tcp checksum offload (ethtool -K tx off) and see >> if you can reproduce this problem? > > Never seen that again since I''ve turned off CONFIG_SLAB_DEBUG, > which seems to be incompatible with the page mapping the xen > backend drivers do.Well, I''m not 100% sure of this. It looks like guard bytes aren;t placed around slab allocations when slab object size is near a power of two. But if guard header/footers are added to our skbuff data objects then yes: that will be broken for xen netback. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Not sure whether this is related or not. Interdomain NFS stops working with the latest xen-unstable. Haven''t taken a closer look though. Last time the same thing happened was when checksum offloading was added. - Bin On 7/11/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> > On 11 Jul 2005, at 14:25, Gerd Knorr wrote: > > >> Can you disable tcp checksum offload (ethtool -K tx off) and see > >> if you can reproduce this problem? > > > > Never seen that again since I''ve turned off CONFIG_SLAB_DEBUG, > > which seems to be incompatible with the page mapping the xen > > backend drivers do. > > Well, I''m not 100% sure of this. It looks like guard bytes aren;t > placed around slab allocations when slab object size is near a power of > two. But if guard header/footers are added to our skbuff data objects > then yes: that will be broken for xen netback. > > -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel