I was running benchmark on IO performance using iozone3. In my build, the dom0 resides on a small usb stick and all the storage comes from a NFS mount. I test NFS performance on both dom0 && domU, mounting from the same server. The dom0 test works just well, but the domU run suffers from unstable NFS mount. Since this is a NFS root, the domU just appear to be freezed. The log from both end of the NFS mount shows that the connection is broken: Note that the client time stamp is about 20 seconds ahead of server. From the domU (client end): Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 not responding, still trying //(28 times within the same second) Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 not responding, still trying //(14 times within the same second) Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 not responding, still trying //(15 times within the same second) Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 OK //(25 times within the same second) Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 OK //(32 times within the same second) Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 not responding, still trying //(21 times within the same second) Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 OK //(25 times within the same second) Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 OK //(91 times within the same second) Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK From the server side: Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! Any suggestion how to debug this issue? My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. Thanks, Timothy
Forward this to the devel list. ---------- Forwarded message ---------- From: G.R. <firemeteor@users.sourceforge.net> Date: Sat, Jan 5, 2013 at 1:12 AM Subject: Unstable NFS mount at heavy load. To: xen-users@lists.xen.org I was running benchmark on IO performance using iozone3. In my build, the dom0 resides on a small usb stick and all the storage comes from a NFS mount. I test NFS performance on both dom0 && domU, mounting from the same server. The dom0 test works just well, but the domU run suffers from unstable NFS mount. Since this is a NFS root, the domU just appear to be freezed. The log from both end of the NFS mount shows that the connection is broken: Note that the client time stamp is about 20 seconds ahead of server. From the domU (client end): Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 not responding, still trying //(28 times within the same second) Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 not responding, still trying //(14 times within the same second) Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 not responding, still trying //(15 times within the same second) Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 OK //(25 times within the same second) Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 OK //(32 times within the same second) Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 not responding, still trying //(21 times within the same second) Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 OK //(25 times within the same second) Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 OK //(91 times within the same second) Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK From the server side: Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! Any suggestion how to debug this issue? My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. Thanks, Timothy
Nobody responses... Stefano, could you point me to the PVNET owner? I suspect this has something to do with the net emulation. Thanks, Timothy On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote:> Forward this to the devel list. > > > ---------- Forwarded message ---------- > From: G.R. <firemeteor@users.sourceforge.net> > Date: Sat, Jan 5, 2013 at 1:12 AM > Subject: Unstable NFS mount at heavy load. > To: xen-users@lists.xen.org > > > I was running benchmark on IO performance using iozone3. > In my build, the dom0 resides on a small usb stick and all the storage > comes from a NFS mount. > I test NFS performance on both dom0 && domU, mounting from the same server. > > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > Since this is a NFS root, the domU just appear to be freezed. > > The log from both end of the NFS mount shows that the connection is broken: > Note that the client time stamp is about 20 seconds ahead of server. > > From the domU (client end): > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > not responding, still trying //(once) > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > not responding, still trying //(28 times within the same second) > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > not responding, still trying //(once) > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > not responding, still trying //(14 times within the same second) > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > not responding, still trying //(15 times within the same second) > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > not responding, still trying //(once) > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > OK //(25 times within the same > second) > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > OK //(32 times within the same > second) > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > not responding, still trying //(21 times within the same second) > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > OK //(25 times within the same > second) > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > OK //(91 times within the same > second) > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > > From the server side: > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > > > Any suggestion how to debug this issue? > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > > Thanks, > Timothy
Do you mean the maintainer of the Linux PV network frontend and backend drivers (netfront and netback)? That would be Konrad. On Tue, 8 Jan 2013, G.R. wrote:> Nobody responses... > > Stefano, could you point me to the PVNET owner? > I suspect this has something to do with the net emulation. > > Thanks, > Timothy > > On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > > Forward this to the devel list. > > > > > > ---------- Forwarded message ---------- > > From: G.R. <firemeteor@users.sourceforge.net> > > Date: Sat, Jan 5, 2013 at 1:12 AM > > Subject: Unstable NFS mount at heavy load. > > To: xen-users@lists.xen.org > > > > > > I was running benchmark on IO performance using iozone3. > > In my build, the dom0 resides on a small usb stick and all the storage > > comes from a NFS mount. > > I test NFS performance on both dom0 && domU, mounting from the same server. > > > > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > > Since this is a NFS root, the domU just appear to be freezed. > > > > The log from both end of the NFS mount shows that the connection is broken: > > Note that the client time stamp is about 20 seconds ahead of server. > > > > From the domU (client end): > > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > > not responding, still trying //(once) > > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > > not responding, still trying //(28 times within the same second) > > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > > not responding, still trying //(once) > > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > > not responding, still trying //(14 times within the same second) > > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > > not responding, still trying //(15 times within the same second) > > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > > not responding, still trying //(once) > > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > > OK //(25 times within the same > > second) > > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > > OK //(32 times within the same > > second) > > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > > not responding, still trying //(21 times within the same second) > > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > > OK //(25 times within the same > > second) > > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > > OK //(91 times within the same > > second) > > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > > > > From the server side: > > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > > > > > > Any suggestion how to debug this issue? > > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > > > > Thanks, > > Timothy >
Hi Konrad, Do you have any suggestion how to troubleshooting the NFS mount issue as described below? The broken connection is quite suspicious to me. Thanks, Timothy On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> Do you mean the maintainer of the Linux PV network frontend and backend > drivers (netfront and netback)? > That would be Konrad. > > On Tue, 8 Jan 2013, G.R. wrote: >> Nobody responses... >> >> Stefano, could you point me to the PVNET owner? >> I suspect this has something to do with the net emulation. >> >> Thanks, >> Timothy >> >> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >> > Forward this to the devel list. >> > >> > >> > ---------- Forwarded message ---------- >> > From: G.R. <firemeteor@users.sourceforge.net> >> > Date: Sat, Jan 5, 2013 at 1:12 AM >> > Subject: Unstable NFS mount at heavy load. >> > To: xen-users@lists.xen.org >> > >> > >> > I was running benchmark on IO performance using iozone3. >> > In my build, the dom0 resides on a small usb stick and all the storage >> > comes from a NFS mount. >> > I test NFS performance on both dom0 && domU, mounting from the same server. >> > >> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. >> > Since this is a NFS root, the domU just appear to be freezed. >> > >> > The log from both end of the NFS mount shows that the connection is broken: >> > Note that the client time stamp is about 20 seconds ahead of server. >> > >> > From the domU (client end): >> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 >> > not responding, still trying //(once) >> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 >> > not responding, still trying //(28 times within the same second) >> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 >> > not responding, still trying //(once) >> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 >> > not responding, still trying //(14 times within the same second) >> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 >> > not responding, still trying //(15 times within the same second) >> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 >> > not responding, still trying //(once) >> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 >> > OK //(25 times within the same >> > second) >> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 >> > OK //(32 times within the same >> > second) >> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 >> > not responding, still trying //(21 times within the same second) >> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 >> > OK //(25 times within the same >> > second) >> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK >> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK >> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 >> > OK //(91 times within the same >> > second) >> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK >> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK >> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK >> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >> > >> > From the server side: >> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed >> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! >> > >> > >> > Any suggestion how to debug this issue? >> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. >> > >> > Thanks, >> > Timothy >>
Hi Konrad, do you have any suggestion how to debug? Thanks, Timothy On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote:> Hi Konrad, > Do you have any suggestion how to troubleshooting the NFS mount issue > as described below? > The broken connection is quite suspicious to me. > > Thanks, > Timothy > > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: >> Do you mean the maintainer of the Linux PV network frontend and backend >> drivers (netfront and netback)? >> That would be Konrad. >> >> On Tue, 8 Jan 2013, G.R. wrote: >>> Nobody responses... >>> >>> Stefano, could you point me to the PVNET owner? >>> I suspect this has something to do with the net emulation. >>> >>> Thanks, >>> Timothy >>> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >>> > Forward this to the devel list. >>> > >>> > >>> > ---------- Forwarded message ---------- >>> > From: G.R. <firemeteor@users.sourceforge.net> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM >>> > Subject: Unstable NFS mount at heavy load. >>> > To: xen-users@lists.xen.org >>> > >>> > >>> > I was running benchmark on IO performance using iozone3. >>> > In my build, the dom0 resides on a small usb stick and all the storage >>> > comes from a NFS mount. >>> > I test NFS performance on both dom0 && domU, mounting from the same server. >>> > >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. >>> > Since this is a NFS root, the domU just appear to be freezed. >>> > >>> > The log from both end of the NFS mount shows that the connection is broken: >>> > Note that the client time stamp is about 20 seconds ahead of server. >>> > >>> > From the domU (client end): >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 >>> > not responding, still trying //(once) >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 >>> > not responding, still trying //(28 times within the same second) >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 >>> > not responding, still trying //(once) >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 >>> > not responding, still trying //(14 times within the same second) >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 >>> > not responding, still trying //(15 times within the same second) >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 >>> > not responding, still trying //(once) >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 >>> > OK //(25 times within the same >>> > second) >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 >>> > OK //(32 times within the same >>> > second) >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 >>> > not responding, still trying //(21 times within the same second) >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 >>> > OK //(25 times within the same >>> > second) >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 >>> > OK //(91 times within the same >>> > second) >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >>> > >>> > From the server side: >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! >>> > >>> > >>> > Any suggestion how to debug this issue? >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. >>> > >>> > Thanks, >>> > Timothy >>>
On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote:> Hi Konrad, do you have any suggestion how to debug?Is your dom0 32-bit or 64-bit? And what kind of network card are you using for the NFS traffic?> > Thanks, > Timothy > > On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > > Hi Konrad, > > Do you have any suggestion how to troubleshooting the NFS mount issue > > as described below? > > The broken connection is quite suspicious to me. > > > > Thanks, > > Timothy > > > > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini > > <stefano.stabellini@eu.citrix.com> wrote: > >> Do you mean the maintainer of the Linux PV network frontend and backend > >> drivers (netfront and netback)? > >> That would be Konrad. > >> > >> On Tue, 8 Jan 2013, G.R. wrote: > >>> Nobody responses... > >>> > >>> Stefano, could you point me to the PVNET owner? > >>> I suspect this has something to do with the net emulation. > >>> > >>> Thanks, > >>> Timothy > >>> > >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > >>> > Forward this to the devel list. > >>> > > >>> > > >>> > ---------- Forwarded message ---------- > >>> > From: G.R. <firemeteor@users.sourceforge.net> > >>> > Date: Sat, Jan 5, 2013 at 1:12 AM > >>> > Subject: Unstable NFS mount at heavy load. > >>> > To: xen-users@lists.xen.org > >>> > > >>> > > >>> > I was running benchmark on IO performance using iozone3. > >>> > In my build, the dom0 resides on a small usb stick and all the storage > >>> > comes from a NFS mount. > >>> > I test NFS performance on both dom0 && domU, mounting from the same server. > >>> > > >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > >>> > Since this is a NFS root, the domU just appear to be freezed. > >>> > > >>> > The log from both end of the NFS mount shows that the connection is broken: > >>> > Note that the client time stamp is about 20 seconds ahead of server. > >>> > > >>> > From the domU (client end): > >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > >>> > not responding, still trying //(once) > >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > >>> > not responding, still trying //(28 times within the same second) > >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > >>> > not responding, still trying //(once) > >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > >>> > not responding, still trying //(14 times within the same second) > >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > >>> > not responding, still trying //(15 times within the same second) > >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > >>> > not responding, still trying //(once) > >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > >>> > OK //(25 times within the same > >>> > second) > >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > >>> > OK //(32 times within the same > >>> > second) > >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > >>> > not responding, still trying //(21 times within the same second) > >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > >>> > OK //(25 times within the same > >>> > second) > >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > >>> > OK //(91 times within the same > >>> > second) > >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > >>> > > >>> > From the server side: > >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > >>> > > >>> > > >>> > Any suggestion how to debug this issue? > >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > >>> > > >>> > Thanks, > >>> > Timothy > >>> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: >> Hi Konrad, do you have any suggestion how to debug? > > Is your dom0 32-bit or 64-bit? And what kind of network card are you > using for the NFS traffic? >I have both 64-bit dom0 && domU. The physical card I have is RTL8111/8168B (rev06) (10ec: 8168). And the virtual card I used is e1000, but I guess this is not important since I''ve seen this in the log: Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs. I''m thinking of dumping the traffic to check when I got spare time. Do you think this is a good idea or do you have other suggestion? Thanks, Timothy PS: I''m on xen testing 4.2.1. The dom0 is a debian 3.6.6 kernel. The domU is a 3.6.9 kernel built from debian source package.>> >> Thanks, >> Timothy >> >> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >> > Hi Konrad, >> > Do you have any suggestion how to troubleshooting the NFS mount issue >> > as described below? >> > The broken connection is quite suspicious to me. >> > >> > Thanks, >> > Timothy >> > >> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini >> > <stefano.stabellini@eu.citrix.com> wrote: >> >> Do you mean the maintainer of the Linux PV network frontend and backend >> >> drivers (netfront and netback)? >> >> That would be Konrad. >> >> >> >> On Tue, 8 Jan 2013, G.R. wrote: >> >>> Nobody responses... >> >>> >> >>> Stefano, could you point me to the PVNET owner? >> >>> I suspect this has something to do with the net emulation. >> >>> >> >>> Thanks, >> >>> Timothy >> >>> >> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >> >>> > Forward this to the devel list. >> >>> > >> >>> > >> >>> > ---------- Forwarded message ---------- >> >>> > From: G.R. <firemeteor@users.sourceforge.net> >> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM >> >>> > Subject: Unstable NFS mount at heavy load. >> >>> > To: xen-users@lists.xen.org >> >>> > >> >>> > >> >>> > I was running benchmark on IO performance using iozone3. >> >>> > In my build, the dom0 resides on a small usb stick and all the storage >> >>> > comes from a NFS mount. >> >>> > I test NFS performance on both dom0 && domU, mounting from the same server. >> >>> > >> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. >> >>> > Since this is a NFS root, the domU just appear to be freezed. >> >>> > >> >>> > The log from both end of the NFS mount shows that the connection is broken: >> >>> > Note that the client time stamp is about 20 seconds ahead of server. >> >>> > >> >>> > From the domU (client end): >> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(once) >> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(28 times within the same second) >> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(once) >> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(14 times within the same second) >> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(15 times within the same second) >> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(once) >> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 >> >>> > OK //(25 times within the same >> >>> > second) >> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 >> >>> > OK //(32 times within the same >> >>> > second) >> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(21 times within the same second) >> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 >> >>> > OK //(25 times within the same >> >>> > second) >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 >> >>> > OK //(91 times within the same >> >>> > second) >> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >> >>> > >> >>> > From the server side: >> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed >> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > >> >>> > >> >>> > Any suggestion how to debug this issue? >> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. >> >>> > >> >>> > Thanks, >> >>> > Timothy >> >>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >>
On Mon, Jan 21, 2013 at 12:01:43AM +0800, G.R. wrote:> On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: > >> Hi Konrad, do you have any suggestion how to debug? > > > > Is your dom0 32-bit or 64-bit? And what kind of network card are you > > using for the NFS traffic? > > > I have both 64-bit dom0 && domU. > The physical card I have is RTL8111/8168B (rev06) (10ec: 8168). > And the virtual card I used is e1000, but I guess this is not > important since I''ve seen this in the log: > Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen > platform PCI driver have been compiled for this kernel: unplug > emulated NICs. > > I''m thinking of dumping the traffic to check when I got spare time. > Do you think this is a good idea or do you have other suggestion?Well, the thread on "Fatal crash on xen4.2 HVM + qemu-xen dm + NFS" seems to imply that this a problem with NFS tcp-retransmit. And I''ve seen similar issues as well - but only on skge, tg3, and r8169 - but only when using the 32-bit domain0. I don''t know if the issue I am hitting is the same thing.> > Thanks, > Timothy > > PS: I''m on xen testing 4.2.1. The dom0 is a debian 3.6.6 kernel. The > domU is a 3.6.9 kernel built from debian source package. > >> > >> Thanks, > >> Timothy > >> > >> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > >> > Hi Konrad, > >> > Do you have any suggestion how to troubleshooting the NFS mount issue > >> > as described below? > >> > The broken connection is quite suspicious to me. > >> > > >> > Thanks, > >> > Timothy > >> > > >> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini > >> > <stefano.stabellini@eu.citrix.com> wrote: > >> >> Do you mean the maintainer of the Linux PV network frontend and backend > >> >> drivers (netfront and netback)? > >> >> That would be Konrad. > >> >> > >> >> On Tue, 8 Jan 2013, G.R. wrote: > >> >>> Nobody responses... > >> >>> > >> >>> Stefano, could you point me to the PVNET owner? > >> >>> I suspect this has something to do with the net emulation. > >> >>> > >> >>> Thanks, > >> >>> Timothy > >> >>> > >> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > >> >>> > Forward this to the devel list. > >> >>> > > >> >>> > > >> >>> > ---------- Forwarded message ---------- > >> >>> > From: G.R. <firemeteor@users.sourceforge.net> > >> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM > >> >>> > Subject: Unstable NFS mount at heavy load. > >> >>> > To: xen-users@lists.xen.org > >> >>> > > >> >>> > > >> >>> > I was running benchmark on IO performance using iozone3. > >> >>> > In my build, the dom0 resides on a small usb stick and all the storage > >> >>> > comes from a NFS mount. > >> >>> > I test NFS performance on both dom0 && domU, mounting from the same server. > >> >>> > > >> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > >> >>> > Since this is a NFS root, the domU just appear to be freezed. > >> >>> > > >> >>> > The log from both end of the NFS mount shows that the connection is broken: > >> >>> > Note that the client time stamp is about 20 seconds ahead of server. > >> >>> > > >> >>> > From the domU (client end): > >> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(once) > >> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(28 times within the same second) > >> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(once) > >> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(14 times within the same second) > >> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(15 times within the same second) > >> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(once) > >> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > >> >>> > OK //(25 times within the same > >> >>> > second) > >> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > >> >>> > OK //(32 times within the same > >> >>> > second) > >> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(21 times within the same second) > >> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > >> >>> > OK //(25 times within the same > >> >>> > second) > >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > >> >>> > OK //(91 times within the same > >> >>> > second) > >> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > >> >>> > > >> >>> > From the server side: > >> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > >> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > > >> >>> > > >> >>> > Any suggestion how to debug this issue? > >> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > >> >>> > > >> >>> > Thanks, > >> >>> > Timothy > >> >>> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xen.org > >> http://lists.xen.org/xen-devel > >>
On Wed, Jan 23, 2013 at 4:29 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Mon, Jan 21, 2013 at 12:01:43AM +0800, G.R. wrote: >> On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk >> <konrad.wilk@oracle.com> wrote: >> > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: >> >> Hi Konrad, do you have any suggestion how to debug? >> > >> > Is your dom0 32-bit or 64-bit? And what kind of network card are you >> > using for the NFS traffic? >> > >> I have both 64-bit dom0 && domU. >> The physical card I have is RTL8111/8168B (rev06) (10ec: 8168). >> And the virtual card I used is e1000, but I guess this is not >> important since I''ve seen this in the log: >> Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen >> platform PCI driver have been compiled for this kernel: unplug >> emulated NICs. >> >> I''m thinking of dumping the traffic to check when I got spare time. >> Do you think this is a good idea or do you have other suggestion? > > Well, the thread on "Fatal crash on xen4.2 HVM + qemu-xen dm + NFS" > seems to imply that this a problem with NFS tcp-retransmit. > > And I''ve seen similar issues as well - but only on skge, tg3, and > r8169 - but only when using the 32-bit domain0. > I don''t know if the issue I am hitting is the same thing. >I checked the thread and unfortunately did not find anything conclusive. In my case, my dom0 seems to work fine and even the domU is still alive -- everything back to order after the mount recovered (typically in a couple of minutes). According to the traffic I captured, server is kind of busy and keep sending ZeroWindow for a while and the client in domU reset the connection after retrying 6 times within 15 seconds. I''m not sure if this is a correct client behavior while the server is doing wrong. But why does this only happen in domU client? Please find the traffic log in the attached file. I''ve captured the traffic from both server and domU. And it appears that there is no mismatch. Thanks, Timothy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> > I checked the thread and unfortunately did not find anything conclusive. > In my case, my dom0 seems to work fine and even the domU is still alive > -- everything back to order after the mount recovered (typically in a > couple of minutes). > > According to the traffic I captured, server is kind of busy and keep > sending ZeroWindow for a while > and the client in domU reset the connection after retrying 6 times > within 15 seconds. > I''m not sure if this is a correct client behavior while the server is > doing wrong. > But why does this only happen in domU client?Well, I have to say sorry about this thread. After some more experiments, I find that the syndrome is not specific to domU. Both dom0 && non-xen system suffer from this issue, so this must be a server fault and has nothing to do with xen. This may have something to do with my weird setup (ext4 on loop image mounted on server and exported through NFS). But anyway this syndrome seems not fixed in recent kernel (3.6.11 tried). Thanks, Timothy