Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 06:37 UTC
[Xen-users] Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Dear All, I have created a virtual high performance computing (HPC) cluster of 6 compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. I am able to bring up the ring of mpd on the set of 6 compute nodes. However, I am consistently encountering the "(mpiexec 392): no msg recvd from mpd when expecting ack of request" error. After much troubleshooting, I have found that there are Receive Errors (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. Here is my PV guest configuration for node 1: [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 name="enming-f11-pv-hpc-node0001" memory=512 disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' ] vncconsole=1 bootloader = "/usr/bin/pygrub" #kernel = "/home/enming/fedora11/vmlinuz" #ramdisk = "/home/enming/fedora11/initrd.img" vcpus=2 on_reboot = ''restart'' on_crash = ''restart'' Will there be any problems with Xen networking for MPICH2 applications? Or it''s just a fine-tuning exercise for Xen networking? I am using PV guests because PV guests have much higher performance than HVM guests. Here are my mpich-discuss mailing list threads: http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html Please advise on the RX-ERR. Thank you very much. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 06:53 UTC
[Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Dear All, Here are more virtual network interface eth0 kernel messages. Notice the "net eth0: rx->offset: 0" messages. Are they of significance? *Node 1* Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 callbacks suppressed Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 *Node 6* Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 *Node 1 NFS Server Configuration* [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports /home/enming/mpich2-install/bin 192.168.1.0/24(ro) *Node 2 /etc/fstab Configuration Entry for NFS Client* 192.168.1.254:/home/enming/mpich2-install/bin /home/enming/mpich2-install/bin nfs rsize=8192,wsize=8192,timeo=14,intr -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Dear All, > > I have created a virtual high performance computing (HPC) cluster of 6 > compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit > paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel > Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. > > I am able to bring up the ring of mpd on the set of 6 compute nodes. > However, I am consistently encountering the "(mpiexec 392): no msg recvd > from mpd when expecting ack of request" error. > > After much troubleshooting, I have found that there are Receive Errors > (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. > All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. > > Here is my PV guest configuration for node 1: > > [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 > name="enming-f11-pv-hpc-node0001" > memory=512 > disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] > vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] > vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' ] > vncconsole=1 > bootloader = "/usr/bin/pygrub" > #kernel = "/home/enming/fedora11/vmlinuz" > #ramdisk = "/home/enming/fedora11/initrd.img" > vcpus=2 > on_reboot = ''restart'' > on_crash = ''restart'' > > Will there be any problems with Xen networking for MPICH2 applications? Or > it''s just a fine-tuning exercise for Xen networking? I am using PV guests > because PV guests have much higher performance than HVM guests. > > Here are my mpich-discuss mailing list threads: > > http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html > > http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html > > http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html > > http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html > > http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html > > Please advise on the RX-ERR. > > Thank you very much. > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 07:53 UTC
[Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Hi, I have reverted to the 2-node troubleshooting scenario. I have started node 1 and node 2. On node 1, I will try to bring up the ring of mpd for the 2 nodes using mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump messages on virtual network interface eth0. Please see attached PNG screenshots. They are numbered in sequence. Please check if there are any problems. Thank you. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Dear All, > > Here are more virtual network interface eth0 kernel messages. Notice the > "net eth0: rx->offset: 0" messages. Are they of significance? > > *Node 1* > > Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated > mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated > mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated > mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated > mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated > mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has > mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 > callbacks suppressed > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, > size: 4294967295 > > *Node 6* > > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has > mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, > size: 4294967295 > > *Node 1 NFS Server Configuration* > > [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports > /home/enming/mpich2-install/bin 192.168.1.0/24(ro)<http://192.168.1.0/24%28ro%29> > > *Node 2 /etc/fstab Configuration Entry for NFS Client* > > 192.168.1.254:/home/enming/mpich2-install/bin > /home/enming/mpich2-install/bin nfs > rsize=8192,wsize=8192,timeo=14,intr > > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) < > space.time.universe@gmail.com> wrote: > >> Dear All, >> >> I have created a virtual high performance computing (HPC) cluster of 6 >> compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit >> paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel >> Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. >> >> I am able to bring up the ring of mpd on the set of 6 compute nodes. >> However, I am consistently encountering the "(mpiexec 392): no msg recvd >> from mpd when expecting ack of request" error. >> >> After much troubleshooting, I have found that there are Receive Errors >> (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. >> All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. >> >> Here is my PV guest configuration for node 1: >> >> [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 >> name="enming-f11-pv-hpc-node0001" >> memory=512 >> disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] >> vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] >> vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' ] >> vncconsole=1 >> bootloader = "/usr/bin/pygrub" >> #kernel = "/home/enming/fedora11/vmlinuz" >> #ramdisk = "/home/enming/fedora11/initrd.img" >> vcpus=2 >> on_reboot = ''restart'' >> on_crash = ''restart'' >> >> Will there be any problems with Xen networking for MPICH2 applications? Or >> it''s just a fine-tuning exercise for Xen networking? I am using PV guests >> because PV guests have much higher performance than HVM guests. >> >> Here are my mpich-discuss mailing list threads: >> >> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html >> >> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html >> >> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html >> >> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html >> >> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html >> >> Please advise on the RX-ERR. >> >> Thank you very much. >> >> -- >> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >> Engineering) >> Alma Maters: >> (1) Singapore Polytechnic >> (2) National University of Singapore >> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >> My Youtube videos: http://www.youtube.com/user/enmingteo >> Email: space.time.universe@gmail.com >> MSN: teoenming@hotmail.com >> Mobile Phone (SingTel): +65-9648-9798 >> Mobile Phone (Starhub Prepaid): +65-8369-2618 >> Age: 31 (as at 30 Oct 2009) >> Height: 1.78 meters >> Race: Chinese >> Dialect: Hokkien >> Street: Bedok Reservoir Road >> Country: Singapore >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 08:12 UTC
[Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Dear All, I have googled something which may help to solve my problem. *[Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: 4294967295)* http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html Virtualization Tip: Always disable checksumming on virtual ethernet devices http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices Let me try to work on it first. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Hi, > > I have reverted to the 2-node troubleshooting scenario. I have started node > 1 and node 2. > > On node 1, I will try to bring up the ring of mpd for the 2 nodes using > mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump > messages on virtual network interface eth0. > > Please see attached PNG screenshots. They are numbered in sequence. > > Please check if there are any problems. > > Thank you. > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) < > space.time.universe@gmail.com> wrote: > >> Dear All, >> >> Here are more virtual network interface eth0 kernel messages. Notice the >> "net eth0: rx->offset: 0" messages. Are they of significance? >> >> *Node 1* >> >> Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >> mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >> mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >> mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >> mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >> mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet >> Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has >> mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) >> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 >> callbacks suppressed >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> >> *Node 6* >> >> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet >> Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has >> mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) >> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >> 0, size: 4294967295 >> >> *Node 1 NFS Server Configuration* >> >> [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports >> /home/enming/mpich2-install/bin 192.168.1.0/24(ro)<http://192.168.1.0/24%28ro%29> >> >> *Node 2 /etc/fstab Configuration Entry for NFS Client* >> >> 192.168.1.254:/home/enming/mpich2-install/bin >> /home/enming/mpich2-install/bin nfs >> rsize=8192,wsize=8192,timeo=14,intr >> >> >> -- >> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >> Engineering) >> Alma Maters: >> (1) Singapore Polytechnic >> (2) National University of Singapore >> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >> My Youtube videos: http://www.youtube.com/user/enmingteo >> Email: space.time.universe@gmail.com >> MSN: teoenming@hotmail.com >> Mobile Phone (SingTel): +65-9648-9798 >> Mobile Phone (Starhub Prepaid): +65-8369-2618 >> Age: 31 (as at 30 Oct 2009) >> Height: 1.78 meters >> Race: Chinese >> Dialect: Hokkien >> Street: Bedok Reservoir Road >> Country: Singapore >> >> On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) < >> space.time.universe@gmail.com> wrote: >> >>> Dear All, >>> >>> I have created a virtual high performance computing (HPC) cluster of 6 >>> compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit >>> paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel >>> Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. >>> >>> I am able to bring up the ring of mpd on the set of 6 compute nodes. >>> However, I am consistently encountering the "(mpiexec 392): no msg recvd >>> from mpd when expecting ack of request" error. >>> >>> After much troubleshooting, I have found that there are Receive Errors >>> (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. >>> All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. >>> >>> Here is my PV guest configuration for node 1: >>> >>> [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 >>> name="enming-f11-pv-hpc-node0001" >>> memory=512 >>> disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] >>> vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] >>> vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' ] >>> vncconsole=1 >>> bootloader = "/usr/bin/pygrub" >>> #kernel = "/home/enming/fedora11/vmlinuz" >>> #ramdisk = "/home/enming/fedora11/initrd.img" >>> vcpus=2 >>> on_reboot = ''restart'' >>> on_crash = ''restart'' >>> >>> Will there be any problems with Xen networking for MPICH2 applications? >>> Or it''s just a fine-tuning exercise for Xen networking? I am using PV guests >>> because PV guests have much higher performance than HVM guests. >>> >>> Here are my mpich-discuss mailing list threads: >>> >>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html >>> >>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html >>> >>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html >>> >>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html >>> >>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html >>> >>> Please advise on the RX-ERR. >>> >>> Thank you very much. >>> >>> -- >>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >>> Engineering) >>> Alma Maters: >>> (1) Singapore Polytechnic >>> (2) National University of Singapore >>> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >>> My Youtube videos: http://www.youtube.com/user/enmingteo >>> Email: space.time.universe@gmail.com >>> MSN: teoenming@hotmail.com >>> Mobile Phone (SingTel): +65-9648-9798 >>> Mobile Phone (Starhub Prepaid): +65-8369-2618 >>> Age: 31 (as at 30 Oct 2009) >>> Height: 1.78 meters >>> Race: Chinese >>> Dialect: Hokkien >>> Street: Bedok Reservoir Road >>> Country: Singapore >>> >> >> >> >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 08:35 UTC
[Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Dear All, I have solved the problem. With reference to http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html and http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices, I have executed the following command as root on all my 6 compute nodes (each compute node is a F11 linux 64-bit PV virtual machine). # ethtool -K eth0 tx off gso on Now I can successfully run mpiexec to execute MPI and non-MPI jobs on my Virtual HPC Compute Cluster. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 4:12 PM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Dear All, > > I have googled something which may help to solve my problem. > > *[Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: > 4294967295)* > http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html > > Virtualization Tip: Always disable checksumming on virtual ethernet devices > > http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > > > Let me try to work on it first. > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) < > space.time.universe@gmail.com> wrote: > >> Hi, >> >> I have reverted to the 2-node troubleshooting scenario. I have started >> node 1 and node 2. >> >> On node 1, I will try to bring up the ring of mpd for the 2 nodes using >> mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump >> messages on virtual network interface eth0. >> >> Please see attached PNG screenshots. They are numbered in sequence. >> >> Please check if there are any problems. >> >> Thank you. >> >> -- >> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >> Engineering) >> Alma Maters: >> (1) Singapore Polytechnic >> (2) National University of Singapore >> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >> My Youtube videos: http://www.youtube.com/user/enmingteo >> Email: space.time.universe@gmail.com >> MSN: teoenming@hotmail.com >> Mobile Phone (SingTel): +65-9648-9798 >> Mobile Phone (Starhub Prepaid): +65-8369-2618 >> Age: 31 (as at 30 Oct 2009) >> Height: 1.78 meters >> Race: Chinese >> Dialect: Hokkien >> Street: Bedok Reservoir Road >> Country: Singapore >> >> On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) < >> space.time.universe@gmail.com> wrote: >> >>> Dear All, >>> >>> Here are more virtual network interface eth0 kernel messages. Notice the >>> "net eth0: rx->offset: 0" messages. Are they of significance? >>> >>> *Node 1* >>> >>> Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.253:1009 for >>> /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) >>> Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin >>> (/home/enming/mpich2-install/bin) >>> Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.251:1000 for >>> /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) >>> Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin >>> (/home/enming/mpich2-install/bin) >>> Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin >>> (/home/enming/mpich2-install/bin) >>> Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid >>> yet >>> Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has >>> mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) >>> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 >>> callbacks suppressed >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> >>> *Node 6* >>> >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid >>> yet >>> Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has >>> mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) >>> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> >>> *Node 1 NFS Server Configuration* >>> >>> [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports >>> /home/enming/mpich2-install/bin 192.168.1.0/24(ro)<http://192.168.1.0/24%28ro%29> >>> >>> *Node 2 /etc/fstab Configuration Entry for NFS Client* >>> >>> 192.168.1.254:/home/enming/mpich2-install/bin >>> /home/enming/mpich2-install/bin nfs >>> rsize=8192,wsize=8192,timeo=14,intr >>> >>> >>> -- >>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >>> Engineering) >>> Alma Maters: >>> (1) Singapore Polytechnic >>> (2) National University of Singapore >>> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >>> My Youtube videos: http://www.youtube.com/user/enmingteo >>> Email: space.time.universe@gmail.com >>> MSN: teoenming@hotmail.com >>> Mobile Phone (SingTel): +65-9648-9798 >>> Mobile Phone (Starhub Prepaid): +65-8369-2618 >>> Age: 31 (as at 30 Oct 2009) >>> Height: 1.78 meters >>> Race: Chinese >>> Dialect: Hokkien >>> Street: Bedok Reservoir Road >>> Country: Singapore >>> >>> On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) < >>> space.time.universe@gmail.com> wrote: >>> >>>> Dear All, >>>> >>>> I have created a virtual high performance computing (HPC) cluster of 6 >>>> compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit >>>> paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel >>>> Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. >>>> >>>> I am able to bring up the ring of mpd on the set of 6 compute nodes. >>>> However, I am consistently encountering the "(mpiexec 392): no msg >>>> recvd from mpd when expecting ack of request" error. >>>> >>>> After much troubleshooting, I have found that there are Receive Errors >>>> (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. >>>> All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. >>>> >>>> Here is my PV guest configuration for node 1: >>>> >>>> [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 >>>> name="enming-f11-pv-hpc-node0001" >>>> memory=512 >>>> disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] >>>> vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] >>>> vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' >>>> ] >>>> vncconsole=1 >>>> bootloader = "/usr/bin/pygrub" >>>> #kernel = "/home/enming/fedora11/vmlinuz" >>>> #ramdisk = "/home/enming/fedora11/initrd.img" >>>> vcpus=2 >>>> on_reboot = ''restart'' >>>> on_crash = ''restart'' >>>> >>>> Will there be any problems with Xen networking for MPICH2 applications? >>>> Or it''s just a fine-tuning exercise for Xen networking? I am using PV guests >>>> because PV guests have much higher performance than HVM guests. >>>> >>>> Here are my mpich-discuss mailing list threads: >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html >>>> >>>> Please advise on the RX-ERR. >>>> >>>> Thank you very much. >>>> >>>> -- >>>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >>>> Engineering) >>>> Alma Maters: >>>> (1) Singapore Polytechnic >>>> (2) National University of Singapore >>>> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >>>> My Youtube videos: http://www.youtube.com/user/enmingteo >>>> Email: space.time.universe@gmail.com >>>> MSN: teoenming@hotmail.com >>>> Mobile Phone (SingTel): +65-9648-9798 >>>> Mobile Phone (Starhub Prepaid): +65-8369-2618 >>>> Age: 31 (as at 30 Oct 2009) >>>> Height: 1.78 meters >>>> Race: Chinese >>>> Dialect: Hokkien >>>> Street: Bedok Reservoir Road >>>> Country: Singapore >>>> >>> >>> >>> >>> >>> >> >> >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Boris Derzhavets
2009-Oct-30 08:41 UTC
Re: [Xen-users] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
What kind of tcpdump reports , obtained on Dom0 or some other box on the LAN brings you you to this idea ? Wrong checksum offloading at DomU front end network driver happens ( in my experience with RTL PCI Gigabit Ethernet 8110SC/8169 on SNV and OSOL, however RTL PCI-E Ethernet 8111SC works fine) , but not necessarily.> Virtualization Tip: Always disable checksumming on virtual ethernet devicesWhy always ?Boris. --- On Fri, 10/30/09, Mr. Teo En Ming (Zhang Enming) <space.time.universe@gmail.com> wrote: From: Mr. Teo En Ming (Zhang Enming) <space.time.universe@gmail.com> Subject: [Xen-users] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications To: xen-devel@lists.xensource.com, xen-users@lists.xensource.com Cc: space.time.universe@gmail.com Date: Friday, October 30, 2009, 4:12 AM Dear All, I have googled something which may help to solve my problem. [Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: 4294967295) http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html Virtualization Tip: Always disable checksumming on virtual ethernet devices http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices Let me try to work on it first. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@gmail.com> wrote: Hi, I have reverted to the 2-node troubleshooting scenario. I have started node 1 and node 2. On node 1, I will try to bring up the ring of mpd for the 2 nodes using mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump messages on virtual network interface eth0. Please see attached PNG screenshots. They are numbered in sequence. Please check if there are any problems. Thank you. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@gmail.com> wrote: Dear All, Here are more virtual network interface eth0 kernel messages. Notice the "net eth0: rx->offset: 0" messages. Are they of significance? Node 1 Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.253:1009 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.251:1000 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid yet Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 callbacks suppressed Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: 0, size: 4294967295 Node 6 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid yet Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: 0, size: 4294967295 Node 1 NFS Server Configuration [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports /home/enming/mpich2-install/bin 192.168.1.0/24(ro) Node 2 /etc/fstab Configuration Entry for NFS Client 192.168.1.254:/home/enming/mpich2-install/bin /home/enming/mpich2-install/bin nfs rsize=8192,wsize=8192,timeo=14,intr -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) <space.time.universe@gmail.com> wrote: Dear All, I have created a virtual high performance computing (HPC) cluster of 6 compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. I am able to bring up the ring of mpd on the set of 6 compute nodes. However, I am consistently encountering the "(mpiexec 392): no msg recvd from mpd when expecting ack of request" error. After much troubleshooting, I have found that there are Receive Errors (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. Here is my PV guest configuration for node 1: [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 name="enming-f11-pv-hpc-node0001" memory=512 disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' ] vncconsole=1 bootloader = "/usr/bin/pygrub" #kernel = "/home/enming/fedora11/vmlinuz" #ramdisk = "/home/enming/fedora11/initrd.img" vcpus=2 on_reboot = ''restart'' on_crash = ''restart'' Will there be any problems with Xen networking for MPICH2 applications? Or it''s just a fine-tuning exercise for Xen networking? I am using PV guests because PV guests have much higher performance than HVM guests. Here are my mpich-discuss mailing list threads: http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html Please advise on the RX-ERR. Thank you very much. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore -----Inline Attachment Follows----- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 09:09 UTC
[Xen-devel] Re: [Xen-users] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Dear Boris,>What kind of tcpdump reports , obtained on Dom0 or some >other box on theLAN>brings you you to this idea ?I executed "tcpdump -vvv -i eth0" on my compute node 2 (F11 PV domU).>Wrong checksum offloading at DomU front end network driver >happens ( in myexperience with RTL PCI Gigabit Ethernet >8110SC/8169 on SNV and OSOL,>however RTL PCI-E Ethernet 8111SC works fine) , but not >necessarily.???>> Virtualization Tip: Always disable checksumming on >virtual ethernetdevices>Why always ?It''s the title of the article which you have used for your own Xen virtual machines in the other xen-devel mailing list thread. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 4:41 PM, Boris Derzhavets <bderzhavets@yahoo.com>wrote:> What kind of tcpdump reports , obtained on Dom0 or some other box on the > LAN > brings you you to this idea ? > > Wrong checksum offloading at DomU front end network driver happens ( in my > experience with RTL PCI Gigabit Ethernet 8110SC/8169 on SNV and OSOL, > however RTL PCI-E Ethernet 8111SC works fine) , but not necessarily. > > > > Virtualization Tip: Always disable checksumming on virtual ethernet > devices > Why always ? > > Boris. > > > --- On *Fri, 10/30/09, Mr. Teo En Ming (Zhang Enming) < > space.time.universe@gmail.com>* wrote: > > > From: Mr. Teo En Ming (Zhang Enming) <space.time.universe@gmail.com> > Subject: [Xen-users] Re: Using Xen Virtualization Environment for > Development and Testing of Supercomputing and High Performance Computing > (HPC) Cluster MPICH2 MPI-2 Applications > To: xen-devel@lists.xensource.com, xen-users@lists.xensource.com > Cc: space.time.universe@gmail.com > Date: Friday, October 30, 2009, 4:12 AM > > > Dear All, > > I have googled something which may help to solve my problem. > > *[Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: > 4294967295)* > http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html > > Virtualization Tip: Always disable checksumming on virtual ethernet devices > > http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > > > Let me try to work on it first. > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> > MSN: teoenming@hotmail.com <http://mc/compose?to=teoenming@hotmail.com> > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Fri, Oct 30, 2009 at 3:53 PM, Mr. Teo En Ming (Zhang Enming) < > space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> > > wrote: > >> Hi, >> >> I have reverted to the 2-node troubleshooting scenario. I have started >> node 1 and node 2. >> >> On node 1, I will try to bring up the ring of mpd for the 2 nodes using >> mpdboot and try to execute mpiexec. On node 2, I will capture the tcpdump >> messages on virtual network interface eth0. >> >> Please see attached PNG screenshots. They are numbered in sequence. >> >> Please check if there are any problems. >> >> Thank you. >> >> -- >> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >> Engineering) >> Alma Maters: >> (1) Singapore Polytechnic >> (2) National University of Singapore >> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >> My Youtube videos: http://www.youtube.com/user/enmingteo >> Email: space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> >> MSN: teoenming@hotmail.com <http://mc/compose?to=teoenming@hotmail.com> >> Mobile Phone (SingTel): +65-9648-9798 >> Mobile Phone (Starhub Prepaid): +65-8369-2618 >> Age: 31 (as at 30 Oct 2009) >> Height: 1.78 meters >> Race: Chinese >> Dialect: Hokkien >> Street: Bedok Reservoir Road >> Country: Singapore >> >> On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) < >> space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> >> > wrote: >> >>> Dear All, >>> >>> Here are more virtual network interface eth0 kernel messages. Notice the >>> "net eth0: rx->offset: 0" messages. Are they of significance? >>> >>> *Node 1* >>> >>> Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.253:1009 for >>> /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) >>> Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.252:877 for /home/enming/mpich2-install/bin >>> (/home/enming/mpich2-install/bin) >>> Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.251:1000 for >>> /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) >>> Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.250:882 for /home/enming/mpich2-install/bin >>> (/home/enming/mpich2-install/bin) >>> Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: authenticated >>> mount request from 192.168.1.249:953 for /home/enming/mpich2-install/bin >>> (/home/enming/mpich2-install/bin) >>> Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no mpdid >>> yet >>> Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has >>> mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) >>> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 >>> callbacks suppressed >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> >>> *Node 6* >>> >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no mpdid >>> yet >>> Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has >>> mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) >>> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: rx->offset: >>> 0, size: 4294967295 >>> >>> *Node 1 NFS Server Configuration* >>> >>> [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports >>> /home/enming/mpich2-install/bin 192.168.1.0/24(ro)<http://192.168.1.0/24%28ro%29> >>> >>> *Node 2 /etc/fstab Configuration Entry for NFS Client* >>> >>> 192.168.1.254:/home/enming/mpich2-install/bin >>> /home/enming/mpich2-install/bin nfs >>> rsize=8192,wsize=8192,timeo=14,intr >>> >>> >>> -- >>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >>> Engineering) >>> Alma Maters: >>> (1) Singapore Polytechnic >>> (2) National University of Singapore >>> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >>> My Youtube videos: http://www.youtube.com/user/enmingteo >>> Email: space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> >>> MSN: teoenming@hotmail.com <http://mc/compose?to=teoenming@hotmail.com> >>> Mobile Phone (SingTel): +65-9648-9798 >>> Mobile Phone (Starhub Prepaid): +65-8369-2618 >>> Age: 31 (as at 30 Oct 2009) >>> Height: 1.78 meters >>> Race: Chinese >>> Dialect: Hokkien >>> Street: Bedok Reservoir Road >>> Country: Singapore >>> >>> On Fri, Oct 30, 2009 at 2:37 PM, Mr. Teo En Ming (Zhang Enming) < >>> space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> >>> > wrote: >>> >>>> Dear All, >>>> >>>> I have created a virtual high performance computing (HPC) cluster of 6 >>>> compute nodes with MPICH2 using Xen-based Fedora 11 Linux 64-bit >>>> paravirtualized (PV) domU guests. Dom0 is Fedora 11 Linux 64-bit. My Intel >>>> Desktop Board DQ45CB has a single onboard Gigabit LAN network adapter. >>>> >>>> I am able to bring up the ring of mpd on the set of 6 compute nodes. >>>> However, I am consistently encountering the "(mpiexec 392): no msg >>>> recvd from mpd when expecting ack of request" error. >>>> >>>> After much troubleshooting, I have found that there are Receive Errors >>>> (RX-ERR) in the virtual network interface eth0 of all the six compute nodes. >>>> All the 6 compute nodes are identical F11 linux 64-bit PV virtual machines. >>>> >>>> Here is my PV guest configuration for node 1: >>>> >>>> [enming@fedora11-x86-64-host xen]$ cat enming-f11-pv-hpc-node0001 >>>> name="enming-f11-pv-hpc-node0001" >>>> memory=512 >>>> disk = [''phy:/dev/virtualmachines/f11-pv-hpc-node0001,xvda,w'' ] >>>> vif = [ ''mac=00:16:3E:69:E9:11,bridge=eth0'' ] >>>> vfb = [ ''vnc=1,vncunused=1,vncdisplay=0,vnclisten=127.0.0.1,vncpasswd='' >>>> ] >>>> vncconsole=1 >>>> bootloader = "/usr/bin/pygrub" >>>> #kernel = "/home/enming/fedora11/vmlinuz" >>>> #ramdisk = "/home/enming/fedora11/initrd.img" >>>> vcpus=2 >>>> on_reboot = ''restart'' >>>> on_crash = ''restart'' >>>> >>>> Will there be any problems with Xen networking for MPICH2 applications? >>>> Or it''s just a fine-tuning exercise for Xen networking? I am using PV guests >>>> because PV guests have much higher performance than HVM guests. >>>> >>>> Here are my mpich-discuss mailing list threads: >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005883.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005887.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005889.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005890.html >>>> >>>> >>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2009-October/005891.html >>>> >>>> Please advise on the RX-ERR. >>>> >>>> Thank you very much. >>>> >>>> -- >>>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical >>>> Engineering) >>>> Alma Maters: >>>> (1) Singapore Polytechnic >>>> (2) National University of Singapore >>>> My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com >>>> My Youtube videos: http://www.youtube.com/user/enmingteo >>>> Email: space.time.universe@gmail.com<http://mc/compose?to=space.time.universe@gmail.com> >>>> MSN: teoenming@hotmail.com <http://mc/compose?to=teoenming@hotmail.com> >>>> Mobile Phone (SingTel): +65-9648-9798 >>>> Mobile Phone (Starhub Prepaid): +65-8369-2618 >>>> Age: 31 (as at 30 Oct 2009) >>>> Height: 1.78 meters >>>> Race: Chinese >>>> Dialect: Hokkien >>>> Street: Bedok Reservoir Road >>>> Country: Singapore >>>> >>> >>> >>> >>> >>> >> >> >> >> > > > > > -----Inline Attachment Follows----- > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com<http://mc/compose?to=Xen-users@lists.xensource.com> > http://lists.xensource.com/xen-users > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Oct-30 09:49 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On Fri, Oct 30, 2009 at 04:35:00PM +0800, Mr. Teo En Ming (Zhang Enming) wrote:> Dear All, > > I have solved the problem. > > With reference to > [1]http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html > and > [2]http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > , I have executed the following command as root on all my 6 compute nodes > (each compute node is a F11 linux 64-bit PV virtual machine). > > # ethtool -K eth0 tx off gso on >Jeremy: Do you know why this is still needed nowadays? Where''s the bug? Why doesn''t it work with the defaults? -- Pasi <snip>> On Fri, Oct 30, 2009 at 4:12 PM, Mr. Teo En Ming (Zhang Enming) > <[7]space.time.universe@gmail.com> wrote: > > Dear All, > > I have googled something which may help to solve my problem. > > [Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: 4294967295) > > [8]http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html > > Virtualization Tip: Always disable checksumming on virtual ethernet devices > > [9]http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > > Let me try to work on it first. ><snip>> On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) > <[19]space.time.universe@gmail.com> wrote: > > Dear All, > > Here are more virtual network interface eth0 kernel messages. Notice > the "net eth0: rx->offset: 0" messages. Are they of significance? > > Node 1 > > Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: > authenticated mount request from [20]192.168.1.253:1009 for > /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) > Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: > authenticated mount request from [21]192.168.1.252:877 for > /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) > Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: > authenticated mount request from [22]192.168.1.251:1000 for > /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) > Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: > authenticated mount request from [23]192.168.1.250:882 for > /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) > Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: > authenticated mount request from [24]192.168.1.249:953 for > /home/enming/mpich2-install/bin (/home/enming/mpich2-install/bin) > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no > mpdid yet > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has > mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: 12 > callbacks suppressed > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: > rx->offset: 0, size: 4294967295 > > Node 6 > > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no > mpdid yet > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has > mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: > rx->offset: 0, size: 4294967295 > > Node 1 NFS Server Configuration > > [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports > /home/enming/mpich2-install/bin [25]192.168.1.0/24(ro) > > Node 2 /etc/fstab Configuration Entry for NFS Client > > 192.168.1.254:/home/enming/mpich2-install/bin > /home/enming/mpich2-install/bin nfs > rsize=8192,wsize=8192,timeo=14,intr >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 10:56 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Check out my screenshots (15 png images) at my blog here: http://teo-en-ming-aka-zhang-enming.blogspot.com/2009/10/using-xen-virtualization-environment.html -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 5:49 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Fri, Oct 30, 2009 at 04:35:00PM +0800, Mr. Teo En Ming (Zhang Enming) > wrote: > > Dear All, > > > > I have solved the problem. > > > > With reference to > > [1] > http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html > > and > > [2] > http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > > , I have executed the following command as root on all my 6 compute > nodes > > (each compute node is a F11 linux 64-bit PV virtual machine). > > > > # ethtool -K eth0 tx off gso on > > > > Jeremy: Do you know why this is still needed nowadays? > > Where''s the bug? Why doesn''t it work with the defaults? > > -- Pasi > > > <snip> > > > On Fri, Oct 30, 2009 at 4:12 PM, Mr. Teo En Ming (Zhang Enming) > > <[7]space.time.universe@gmail.com> wrote: > > > > Dear All, > > > > I have googled something which may help to solve my problem. > > > > [Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: > 4294967295) > > > > [8] > http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html > > > > Virtualization Tip: Always disable checksumming on virtual ethernet > devices > > > > [9] > http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > > > > Let me try to work on it first. > > > > <snip> > > > On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) > > <[19]space.time.universe@gmail.com> wrote: > > > > Dear All, > > > > Here are more virtual network interface eth0 kernel messages. > Notice > > the "net eth0: rx->offset: 0" messages. Are they of > significance? > > > > Node 1 > > > > Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: > > authenticated mount request from [20]192.168.1.253:1009 for > > /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > > Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: > > authenticated mount request from [21]192.168.1.252:877 for > > /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > > Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: > > authenticated mount request from [22]192.168.1.251:1000 for > > /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > > Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: > > authenticated mount request from [23]192.168.1.250:882 for > > /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > > Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: > > authenticated mount request from [24]192.168.1.249:953 for > > /home/enming/mpich2-install/bin > (/home/enming/mpich2-install/bin) > > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; no > > mpdid yet > > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has > > mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) > > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: > 12 > > callbacks suppressed > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > > > Node 6 > > > > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; no > > mpdid yet > > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has > > mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) > > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: > > rx->offset: 0, size: 4294967295 > > > > Node 1 NFS Server Configuration > > > > [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports > > /home/enming/mpich2-install/bin [25]192.168.1.0/24(ro)<http://192.168.1.0/24%28ro%29> > > > > Node 2 /etc/fstab Configuration Entry for NFS Client > > > > 192.168.1.254:/home/enming/mpich2-install/bin > > /home/enming/mpich2-install/bin nfs > > rsize=8192,wsize=8192,timeo=14,intr > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 11:28 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
How can I get the PV guest compute nodes to automatically execute "ethtool -K eth0 tx off gso on" at boot time apart from appending the command to /etc/rc.local? -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Fri, Oct 30, 2009 at 6:56 PM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Check out my screenshots (15 png images) at my blog here: > > > > http://teo-en-ming-aka-zhang-enming.blogspot.com/2009/10/using-xen-virtualization-environment.html > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > > > > On Fri, Oct 30, 2009 at 5:49 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote: > >> On Fri, Oct 30, 2009 at 04:35:00PM +0800, Mr. Teo En Ming (Zhang Enming) >> wrote: >> > Dear All, >> > >> > I have solved the problem. >> > >> > With reference to >> > [1] >> http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html >> > and >> > [2] >> http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices >> > , I have executed the following command as root on all my 6 compute >> nodes >> > (each compute node is a F11 linux 64-bit PV virtual machine). >> > >> > # ethtool -K eth0 tx off gso on >> > >> >> Jeremy: Do you know why this is still needed nowadays? >> >> Where''s the bug? Why doesn''t it work with the defaults? >> >> -- Pasi >> >> >> <snip> >> >> > On Fri, Oct 30, 2009 at 4:12 PM, Mr. Teo En Ming (Zhang Enming) >> > <[7]space.time.universe@gmail.com> wrote: >> > >> > Dear All, >> > >> > I have googled something which may help to solve my problem. >> > >> > [Xen-devel] Network drop on domU (netfront: rx->offset: 0, size: >> 4294967295) >> > >> > [8] >> http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01274.html >> > >> > Virtualization Tip: Always disable checksumming on virtual ethernet >> devices >> > >> > [9] >> http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices >> > >> > Let me try to work on it first. >> > >> >> <snip> >> >> > On Fri, Oct 30, 2009 at 2:53 PM, Mr. Teo En Ming (Zhang Enming) >> > <[19]space.time.universe@gmail.com> wrote: >> > >> > Dear All, >> > >> > Here are more virtual network interface eth0 kernel messages. >> Notice >> > the "net eth0: rx->offset: 0" messages. Are they of >> significance? >> > >> > Node 1 >> > >> > Oct 30 22:40:34 enming-f11-pv-hpc-node0001 mountd[1304]: >> > authenticated mount request from [20]192.168.1.253:1009 for >> > /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> > Oct 30 22:40:56 enming-f11-pv-hpc-node0001 mountd[1304]: >> > authenticated mount request from [21]192.168.1.252:877 for >> > /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> > Oct 30 22:41:19 enming-f11-pv-hpc-node0001 mountd[1304]: >> > authenticated mount request from [22]192.168.1.251:1000 for >> > /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> > Oct 30 22:41:41 enming-f11-pv-hpc-node0001 mountd[1304]: >> > authenticated mount request from [23]192.168.1.250:882 for >> > /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> > Oct 30 22:42:04 enming-f11-pv-hpc-node0001 mountd[1304]: >> > authenticated mount request from [24]192.168.1.249:953 for >> > /home/enming/mpich2-install/bin >> (/home/enming/mpich2-install/bin) >> > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd starting; >> no >> > mpdid yet >> > Oct 30 22:42:34 enming-f11-pv-hpc-node0001 mpd: mpd has >> > mpdid=enming-f11-pv-hpc-node0001_48545 (port=48545) >> > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:37 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:38 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:39 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:40 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: __ratelimit: >> 12 >> > callbacks suppressed >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:46 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:47 enming-f11-pv-hpc-node0001 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > >> > Node 6 >> > >> > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:44 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd starting; >> no >> > mpdid yet >> > Oct 30 22:42:48 enming-f11-pv-hpc-node0006 mpd: mpd has >> > mpdid=enming-f11-pv-hpc-node0006_52805 (port=52805) >> > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > Oct 30 22:46:00 enming-f11-pv-hpc-node0006 kernel: net eth0: >> > rx->offset: 0, size: 4294967295 >> > >> > Node 1 NFS Server Configuration >> > >> > [root@enming-f11-pv-hpc-node0001 ~]# cat /etc/exports >> > /home/enming/mpich2-install/bin [25]192.168.1.0/24(ro)<http://192.168.1.0/24%28ro%29> >> > >> > Node 2 /etc/fstab Configuration Entry for NFS Client >> > >> > 192.168.1.254:/home/enming/mpich2-install/bin >> > /home/enming/mpich2-install/bin nfs >> > rsize=8192,wsize=8192,timeo=14,intr >> > >> >> > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jeremy Fitzhardinge
2009-Oct-30 15:22 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On 10/30/09 02:49, Pasi Kärkkäinen wrote:> On Fri, Oct 30, 2009 at 04:35:00PM +0800, Mr. Teo En Ming (Zhang Enming) wrote: > >> Dear All, >> >> I have solved the problem. >> >> With reference to >> [1]http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html >> and >> [2]http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices >> , I have executed the following command as root on all my 6 compute nodes >> (each compute node is a F11 linux 64-bit PV virtual machine). >> >> # ethtool -K eth0 tx off gso on >> >> > Jeremy: Do you know why this is still needed nowadays? > > Where''s the bug? Why doesn''t it work with the defaults? >I committed bugfix relating to gso/tso a couple of days ago, so I think (hope!) this shouldn''t be necessary any more. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Oct-30 16:04 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On Fri, Oct 30, 2009 at 08:22:11AM -0700, Jeremy Fitzhardinge wrote:> On 10/30/09 02:49, Pasi Kärkkäinen wrote: > > On Fri, Oct 30, 2009 at 04:35:00PM +0800, Mr. Teo En Ming (Zhang Enming) wrote: > > > >> Dear All, > >> > >> I have solved the problem. > >> > >> With reference to > >> [1]http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html > >> and > >> [2]http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > >> , I have executed the following command as root on all my 6 compute nodes > >> (each compute node is a F11 linux 64-bit PV virtual machine). > >> > >> # ethtool -K eth0 tx off gso on > >> > >> > > Jeremy: Do you know why this is still needed nowadays? > > > > Where''s the bug? Why doesn''t it work with the defaults? > > > > I committed bugfix relating to gso/tso a couple of days ago, so I think > (hope!) this shouldn''t be necessary any more. >Ok. So it''s just a matter of upgrading dom0 kernel, and then verifying it works without ethtool hacks? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 16:07 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Oh no. Another 1 hour to compile latest pv-ops dom0 kernel :-) -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Sat, Oct 31, 2009 at 12:04 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Fri, Oct 30, 2009 at 08:22:11AM -0700, Jeremy Fitzhardinge wrote: > > On 10/30/09 02:49, Pasi Kärkkäinen wrote: > > > On Fri, Oct 30, 2009 at 04:35:00PM +0800, Mr. Teo En Ming (Zhang > Enming) wrote: > > > > > >> Dear All, > > >> > > >> I have solved the problem. > > >> > > >> With reference to > > >> [1] > http://lists.xensource.com/archives/html/xen-devel/2009-05/msg01327.html > > >> and > > >> [2] > http://hightechsorcery.com/2008/03/virtualization-tip-always-disable-checksumming-virtual-ethernet-devices > > >> , I have executed the following command as root on all my 6 compute > nodes > > >> (each compute node is a F11 linux 64-bit PV virtual machine). > > >> > > >> # ethtool -K eth0 tx off gso on > > >> > > >> > > > Jeremy: Do you know why this is still needed nowadays? > > > > > > Where''s the bug? Why doesn''t it work with the defaults? > > > > > > > I committed bugfix relating to gso/tso a couple of days ago, so I think > > (hope!) this shouldn''t be necessary any more. > > > > Ok. So it''s just a matter of upgrading dom0 kernel, and then verifying > it works without ethtool hacks? > > -- Pasi > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-30 16:10 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On 10/30/09 09:07, Mr. Teo En Ming (Zhang Enming) wrote:> Oh no. Another 1 hour to compile latest pv-ops dom0 kernel :-)It shouldn''t take that long? An incremental build should be a matter of seconds; even a full rebuild is only a few minutes on any recent machine... J _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 16:14 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Strange. I took nearly 1 hour to compile on my Intel Pentium Dual Core E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB DDR2-800 memory. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Sat, Oct 31, 2009 at 12:10 AM, Jeremy Fitzhardinge <jeremy@goop.org>wrote:> On 10/30/09 09:07, Mr. Teo En Ming (Zhang Enming) wrote: > > Oh no. Another 1 hour to compile latest pv-ops dom0 kernel :-) > > It shouldn''t take that long? An incremental build should be a matter of > seconds; even a full rebuild is only a few minutes on any recent machine... > > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-30 16:24 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On 10/30/09 09:14, Mr. Teo En Ming (Zhang Enming) wrote:> Strange. I took nearly 1 hour to compile on my Intel Pentium Dual Core > E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB > DDR2-800 memory.It depends a lot on how much kernel you have configured; I can believe a full distro-like config with everything built with modules would take a lot longer. Are you using make''s -j option to use multiple cores? I tend to use "make -j4" to get good processor use on a dual-core system (rough rule of thumb, -jX where X = 2*ncpu). J _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2009-Oct-30 16:29 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On Fri, Oct 30, 2009 at 09:24:33AM -0700, Jeremy Fitzhardinge wrote:> On 10/30/09 09:14, Mr. Teo En Ming (Zhang Enming) wrote: > > Strange. I took nearly 1 hour to compile on my Intel Pentium Dual Core > > E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB > > DDR2-800 memory. > > It depends a lot on how much kernel you have configured; I can believe a > full distro-like config with everything built with modules would take a > lot longer. Are you using make''s -j option to use multiple cores? I > tend to use "make -j4" to get good processor use on a dual-core system > (rough rule of thumb, -jX where X = 2*ncpu). >On a Core2 quad 2,66 GHz CPU Fedora 12 default-config full build takes around 18 minutes.. with 8 GB of RAM and -j6. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 16:35 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
I see. So you are referring to those extremely lean and heavily customized kernels tailor made specifically for your hardware. Redundant kernel modules for which hardware devices do not exist are not built. Then that would take a few minutes. I did not use the -j argument to make. I will remember to use -j4 the next time I compile a kernel. For me, I just take the default kernel configuration file for Fedora 11 linux 64-bit OS (default pvops domU, but no dom0 support), do a "make oldconfig", accept the defaults for whatever new kernel features there are, and then configure whatever XEN options are necessary. So my pv-ops dom0 kernel is extremely distro-like. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Sat, Oct 31, 2009 at 12:24 AM, Jeremy Fitzhardinge <jeremy@goop.org>wrote:> On 10/30/09 09:14, Mr. Teo En Ming (Zhang Enming) wrote: > > Strange. I took nearly 1 hour to compile on my Intel Pentium Dual Core > > E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB > > DDR2-800 memory. > > It depends a lot on how much kernel you have configured; I can believe a > full distro-like config with everything built with modules would take a > lot longer. Are you using make''s -j option to use multiple cores? I > tend to use "make -j4" to get good processor use on a dual-core system > (rough rule of thumb, -jX where X = 2*ncpu). > > J >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 16:39 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Fedora 12 released already? It''s not November 2009 yet. Why not use j8? -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Sat, Oct 31, 2009 at 12:29 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Fri, Oct 30, 2009 at 09:24:33AM -0700, Jeremy Fitzhardinge wrote: > > On 10/30/09 09:14, Mr. Teo En Ming (Zhang Enming) wrote: > > > Strange. I took nearly 1 hour to compile on my Intel Pentium Dual Core > > > E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB > > > DDR2-800 memory. > > > > It depends a lot on how much kernel you have configured; I can believe a > > full distro-like config with everything built with modules would take a > > lot longer. Are you using make''s -j option to use multiple cores? I > > tend to use "make -j4" to get good processor use on a dual-core system > > (rough rule of thumb, -jX where X = 2*ncpu). > > > > On a Core2 quad 2,66 GHz CPU Fedora 12 default-config full build > takes around 18 minutes.. with 8 GB of RAM and -j6. > > -- Pasi > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Oct-30 16:57 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
The latest pv-ops dom0 kernel in xen/master is now 2.6.31.5? -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Sat, Oct 31, 2009 at 12:39 AM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Fedora 12 released already? It''s not November 2009 yet. > > Why not use j8? > > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Sat, Oct 31, 2009 at 12:29 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote: > >> On Fri, Oct 30, 2009 at 09:24:33AM -0700, Jeremy Fitzhardinge wrote: >> > On 10/30/09 09:14, Mr. Teo En Ming (Zhang Enming) wrote: >> > > Strange. I took nearly 1 hour to compile on my Intel Pentium Dual Core >> > > E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB >> > > DDR2-800 memory. >> > >> > It depends a lot on how much kernel you have configured; I can believe a >> > full distro-like config with everything built with modules would take a >> > lot longer. Are you using make''s -j option to use multiple cores? I >> > tend to use "make -j4" to get good processor use on a dual-core system >> > (rough rule of thumb, -jX where X = 2*ncpu). >> > >> >> On a Core2 quad 2,66 GHz CPU Fedora 12 default-config full build >> takes around 18 minutes.. with 8 GB of RAM and -j6. >> >> -- Pasi >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Oct-30 16:58 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On Sat, Oct 31, 2009 at 12:39:10AM +0800, Mr. Teo En Ming (Zhang Enming) wrote:> Fedora 12 released already? It''s not November 2009 yet. >Yeah, it''s not yet released.. I''m running Fedora rawhide, which is the development version of 12.> Why not use j8? >I haven''t really benchmarked which is the optimal value, but I thought 6 might be better than 4, and 8 might be already too much. -- Pasi> -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My blog URL: [1]http://teo-en-ming-aka-zhang-enming.blogspot.com > My Youtube videos: [2]http://www.youtube.com/user/enmingteo > Email: [3]space.time.universe@gmail.com > MSN: [4]teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Sat, Oct 31, 2009 at 12:29 AM, Pasi Kärkkäinen <[5]pasik@iki.fi> wrote: > > On Fri, Oct 30, 2009 at 09:24:33AM -0700, Jeremy Fitzhardinge wrote: > > On 10/30/09 09:14, Mr. Teo En Ming (Zhang Enming) wrote: > > > Strange. I took nearly 1 hour to compile on my Intel Pentium Dual > Core > > > E6300 2.8 GHz processor on Intel Desktop Board DQ45CB with 8 GB > > > DDR2-800 memory. > > > > It depends a lot on how much kernel you have configured; I can believe > a > > full distro-like config with everything built with modules would take > a > > lot longer. Are you using make''s -j option to use multiple cores? I > > tend to use "make -j4" to get good processor use on a dual-core system > > (rough rule of thumb, -jX where X = 2*ncpu). > > > > On a Core2 quad 2,66 GHz CPU Fedora 12 default-config full build > takes around 18 minutes.. with 8 GB of RAM and -j6. > -- Pasi > > References > > Visible links > 1. http://teo-en-ming-aka-zhang-enming.blogspot.com/ > 2. http://www.youtube.com/user/enmingteo > 3. mailto:space.time.universe@gmail.com > 4. mailto:teoenming@hotmail.com > 5. mailto:pasik@iki.fi_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2009-Oct-30 17:00 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On Sat, Oct 31, 2009 at 12:57:13AM +0800, Mr. Teo En Ming (Zhang Enming) wrote:> The latest pv-ops dom0 kernel in xen/master is now 2.6.31.5? >Does seem like it yet: http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=xen/master --Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Oct-30 17:26 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
On Fri, Oct 30, 2009 at 07:00:17PM +0200, Pasi Kärkkäinen wrote:> On Sat, Oct 31, 2009 at 12:57:13AM +0800, Mr. Teo En Ming (Zhang Enming) wrote: > > The latest pv-ops dom0 kernel in xen/master is now 2.6.31.5? > > > > Does seem like it yet: > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=xen/master >Doesn''t.. can''t type today. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Nov-02 10:22 UTC
Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Hi All, I have just downloaded, configured, compiled and installed latest pv_ops dom0 kernel 2.6.31.5. Linux fedora11-x86-64-host 2.6.31.5-xen-enming.teo #1 SMP Mon Nov 2 17:46:35 SGT 2009 x86_64 x86_64 x86_64 GNU/Linux For the first time, I have used the -j4 (I am using Intel Pentium Dual Core E6300 2.8 GHz on Intel DQ45CB) argument to make. I started the kernel build process on 2 November 2009 Monday 5:23 P.M. Singapore time. Kernel build was completed on 2 November 2009 Monday 5:51 P.M. Singapore time. The entire kernel build process now took only 28 minutes instead of the usual 1 hour without the -jX argument. I am happy to announce that the "net eth0: rx->offset: 0, size: 4294967295" message does not pop up any more. I can start the ring of mpd daemons (on 2 nodes) and run mpiexec to execute MPI/parallel jobs without having to execute "ethtool -K eth0 tx off gso on" in every compute node (PV guest). Thanks Jeremy. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My Primary Blog: http://teo-en-ming-aka-zhang-enming.blogspot.com My Secondary Blog: http://enmingteo.wordpress.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Sat, Oct 31, 2009 at 1:26 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Fri, Oct 30, 2009 at 07:00:17PM +0200, Pasi Kärkkäinen wrote: > > On Sat, Oct 31, 2009 at 12:57:13AM +0800, Mr. Teo En Ming (Zhang Enming) > wrote: > > > The latest pv-ops dom0 kernel in xen/master is now 2.6.31.5? > > > > > > > Does seem like it yet: > > > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=xen/master > > > > Doesn''t.. > > can''t type today. > > -- Pasi > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mr. Teo En Ming (Zhang Enming)
2009-Nov-02 10:49 UTC
[Xen-users] Re: [Xen-devel] Re: Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
Attached my pv-ops dom0 kernel 2.6.31.5 configuration file. -- Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical Engineering) Alma Maters: (1) Singapore Polytechnic (2) National University of Singapore My Primary Blog: http://teo-en-ming-aka-zhang-enming.blogspot.com My Secondary Blog: http://enmingteo.wordpress.com My Youtube videos: http://www.youtube.com/user/enmingteo Email: space.time.universe@gmail.com MSN: teoenming@hotmail.com Mobile Phone (SingTel): +65-9648-9798 Mobile Phone (Starhub Prepaid): +65-8369-2618 Age: 31 (as at 30 Oct 2009) Height: 1.78 meters Race: Chinese Dialect: Hokkien Street: Bedok Reservoir Road Country: Singapore On Mon, Nov 2, 2009 at 6:22 PM, Mr. Teo En Ming (Zhang Enming) < space.time.universe@gmail.com> wrote:> Hi All, > > I have just downloaded, configured, compiled and installed latest pv_ops > dom0 kernel 2.6.31.5. > > Linux fedora11-x86-64-host 2.6.31.5-xen-enming.teo #1 SMP Mon Nov 2 > 17:46:35 SGT 2009 x86_64 x86_64 x86_64 GNU/Linux > > For the first time, I have used the -j4 (I am using Intel Pentium Dual Core > E6300 2.8 GHz on Intel DQ45CB) argument to make. > > I started the kernel build process on 2 November 2009 Monday 5:23 P.M. > Singapore time. Kernel build was completed on 2 November 2009 Monday 5:51 > P.M. Singapore time. The entire kernel build process now took only 28 > minutes instead of the usual 1 hour without the -jX argument. > > I am happy to announce that the "net eth0: rx->offset: 0, size: 4294967295" > message does not pop up any more. I can start the ring of mpd daemons (on 2 > nodes) and run mpiexec to execute MPI/parallel jobs without having to > execute "ethtool -K eth0 tx off gso on" in every compute node (PV guest). > > Thanks Jeremy. > > > -- > Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics) BEng(Hons)(Mechanical > Engineering) > Alma Maters: > (1) Singapore Polytechnic > (2) National University of Singapore > My Primary Blog: http://teo-en-ming-aka-zhang-enming.blogspot.com > My Secondary Blog: http://enmingteo.wordpress.com > > My Youtube videos: http://www.youtube.com/user/enmingteo > Email: space.time.universe@gmail.com > MSN: teoenming@hotmail.com > Mobile Phone (SingTel): +65-9648-9798 > Mobile Phone (Starhub Prepaid): +65-8369-2618 > Age: 31 (as at 30 Oct 2009) > Height: 1.78 meters > Race: Chinese > Dialect: Hokkien > Street: Bedok Reservoir Road > Country: Singapore > > On Sat, Oct 31, 2009 at 1:26 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote: > >> On Fri, Oct 30, 2009 at 07:00:17PM +0200, Pasi Kärkkäinen wrote: >> > On Sat, Oct 31, 2009 at 12:57:13AM +0800, Mr. Teo En Ming (Zhang Enming) >> wrote: >> > > The latest pv-ops dom0 kernel in xen/master is now 2.6.31.5? >> > > >> > >> > Does seem like it yet: >> > >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=xen/master >> > >> >> Doesn''t.. >> >> can''t type today. >> >> -- Pasi >> >> > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Reasonably Related Threads
- Using Xen Virtualization Environment for Development and Testing of Supercomputing and High Performance Computing (HPC) Cluster MPICH2 MPI-2 Applications
- How to Backup and Restore MBR within Logical Volumes?
- Picture Tutorial: How to Setup Slackware64 13.0 HVM domU with Xen 3.5-unstable pv-ops Dom0 Kernel 2.6.31.4 in Fedora 11 x86-64 Dom0
- Picture Tutorial: How to Setup Slackware64 13.0 HVM domU with Xen 3.5-unstable pv-ops Dom0 Kernel 2.6.31.4 in Fedora 11 x86-64 Dom0
- Will pv-ops dom0-patched kernel be eventually merged into Linus Torvalds'' mainline Linux kernel tree?