Hello. We created an OST on an OSS. But when I try to mount up the OST, it keeps saying. mount.lustre /dev/vg/ost002 /vol/srv1/ost002 mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: Operation already in progress The target service is already running. (/dev/vg/ost002) However, mount | grep -i ost002 Nothing is mounted up.... lctl is even showing this OST and also the client is able to see it. lfs df -h ... lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% /lfs/srv5/lfs001[OST:5] ... The MDS/OSS Version: lustre: 1.6.5.52 kernel: patchless build: 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep 2.6.18 = Kernel Version I don''t think its bugzilla 11564, because my lustre fs name is only 6 characters long. Also, when the client tries to access the new OST''s space, it simple hangs. It placed it in "bloc Any thoughts about this? TIA
Anyone? On Wed, Feb 25, 2009 at 7:43 PM, Mag Gam <magawake at gmail.com> wrote:> Hello. > > > We created an OST on an OSS. But when I try to mount up the OST, it > keeps saying. > > > mount.lustre /dev/vg/ost002 /vol/srv1/ost002 > mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: > Operation already in progress > The target service is already running. (/dev/vg/ost002) > > > However, > mount | grep -i ost002 > Nothing is mounted up.... > > lctl is even showing this OST and also the client is able to see it. > lfs df -h > ... > lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% /lfs/srv5/lfs001[OST:5] > ... > > > The MDS/OSS Version: > lustre: 1.6.5.52 > kernel: patchless > build: 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep > 2.6.18 = Kernel Version > > I don''t think its bugzilla 11564, because my lustre fs name is only 6 > characters long. > > > Also, when the client tries to access the new OST''s space, it simple > hangs. It placed it in "bloc > > Any thoughts about this? > > TIA >
Any ideas? I am still unable to mount this new OST. I stopped the client hang problem by disabling the OST via lctl but crazy problem indeed. I would love to know how to activate the OST. On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote:> Hello. > > > We created an OST on an OSS. But when I try to mount up the OST, it > keeps saying. > > > mount.lustre /dev/vg/ost002 /vol/srv1/ost002 > mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: > Operation already in progress > The target service is already running. (/dev/vg/ost002) > > > However, > mount | grep -i ost002 > Nothing is mounted up.... > > lctl is even showing this OST and also the client is able to see it. > lfs df -h > ... > lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% /lfs/srv5/lfs001[OST:5] > ... > > > The MDS/OSS Version: > lustre: 1.6.5.52 > kernel: patchless > build: 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep > 2.6.18 = Kernel Version > > I don''t think its bugzilla 11564, because my lustre fs name is only 6 > characters long. > > > Also, when the client tries to access the new OST''s space, it simple > hangs. It placed it in "bloc > > Any thoughts about this? > > TIA >
Mag, Can you send us the output from your kernel log after you try the mount command that is failing? Just run ''dmesg'' and send us the last 20 lines or so.. evan On 2/26/09 7:12 PM, "Mag Gam" <magawake at gmail.com> wrote:> Any ideas? > > I am still unable to mount this new OST. I stopped the client hang > problem by disabling the OST via lctl but crazy problem indeed. > > > I would love to know how to activate the OST. > > > > On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote: >> Hello. >> >> >> We created an OST on an OSS. But when I try to mount up the OST, it >> keeps saying. >> >> >> mount.lustre /dev/vg/ost002 /vol/srv1/ost002 >> mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: >> Operation already in progress >> The target service is already running. (/dev/vg/ost002) >> >> >> However, >> mount | grep -i ost002 >> Nothing is mounted up.... >> >> lctl is even showing this OST and also the client is able to see it. >> lfs df -h >> ... >> lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% >> /lfs/srv5/lfs001[OST:5] >> ... >> >> >> The MDS/OSS Version: >> lustre: 1.6.5.52 >> kernel: patchless >> build: >> 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep >> 2.6.18 = Kernel Version >> >> I don''t think its bugzilla 11564, because my lustre fs name is only 6 >> characters long. >> >> >> Also, when the client tries to access the new OST''s space, it simple >> hangs. It placed it in "bloc >> >> Any thoughts about this? >> >> TIA >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Thankyou got getting back to me on this. So, when I try to mount the **new** ost I keep getting these messages. For some reason the new OST is active on the MGS side which I am not sure why. I think I made a mistake by trying to mount up a new OST while clients were still active. When I try to activaste the bad OST.I get this message. Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID mds_ip_addr at tcp when existing NID 0 at lo is already connected Feb 27 11:59:01 oss_server kernel: Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous similar messages Feb 27 11:59:01 mds_server kernel: Lustre: 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: tried all connections, increasing latency to 51s Feb 27 11:59:01 oss_server kernel: LustreError: 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 Feb 27 11:59:01 mds_server kernel: Lustre: 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous similar messages Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error occurred while communicating with oss_ip at tcp. The ost_connect operation failed with -114 Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous similar messages oss_server kernel: LustreError: 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 Also, I was wondering if there was a way to reset the state of my OST. Its keep thinking its already mounted. Even after a reboot. Any way to say "hey, I am not mounted" ? :-) Would a writeconf help on the OST? I am hesitant to run one on it. TIA On Fri, Feb 27, 2009 at 11:41 AM, Evan Felix <evan.felix at pnl.gov> wrote:> Mag, > > Can you send us the output from your kernel log after you try the mount > command that is failing? > > Just run ''dmesg'' and send us the last 20 lines or so.. > > evan > > > On 2/26/09 7:12 PM, "Mag Gam" <magawake at gmail.com> wrote: > >> Any ideas? >> >> I am still unable to mount this new OST. I stopped the client hang >> problem by disabling the OST via lctl but crazy problem indeed. >> >> >> I would love to know how to activate the OST. >> >> >> >> On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote: >>> Hello. >>> >>> >>> We created an OST on an OSS. But when I try to mount up the OST, it >>> keeps saying. >>> >>> >>> mount.lustre /dev/vg/ost002 /vol/srv1/ost002 >>> mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: >>> Operation already in progress >>> The target service is already running. (/dev/vg/ost002) >>> >>> >>> However, >>> mount | grep -i ost002 >>> Nothing is mounted up.... >>> >>> lctl is even showing this OST and also the client is able to see it. >>> lfs df -h >>> ... >>> lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% >>> /lfs/srv5/lfs001[OST:5] >>> ... >>> >>> >>> The MDS/OSS Version: >>> lustre: 1.6.5.52 >>> kernel: patchless >>> build: >>> 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep >>> 2.6.18 = Kernel Version >>> >>> I don''t think its bugzilla 11564, because my lustre fs name is only 6 >>> characters long. >>> >>> >>> Also, when the client tries to access the new OST''s space, it simple >>> hangs. It placed it in "bloc >>> >>> Any thoughts about this? >>> >>> TIA >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
(sorry adding the entire list for Evan''s reponse) Thankyou got getting back to me on this. So, when I try to mount the **new** ost I keep getting these messages. For some reason the new OST is active on the MGS side which I am not sure why. I think I made a mistake by trying to mount up a new OST while clients were still active. When I try to activaste the bad OST.I get this message. Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID mds_ip_addr at tcp when existing NID 0 at lo is already connected Feb 27 11:59:01 oss_server kernel: Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous similar messages Feb 27 11:59:01 mds_server kernel: Lustre: 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: tried all connections, increasing latency to 51s Feb 27 11:59:01 oss_server kernel: LustreError: 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 Feb 27 11:59:01 mds_server kernel: Lustre: 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous similar messages Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error occurred while communicating with oss_ip at tcp. The ost_connect operation failed with -114 Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous similar messages oss_server kernel: LustreError: 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 Also, I was wondering if there was a way to reset the state of my OST. Its keep thinking its already mounted. Even after a reboot. Any way to say "hey, I am not mounted" ? :-) TIA On Sat, Feb 28, 2009 at 8:38 AM, Mag Gam <magawake at gmail.com> wrote:> Thankyou got getting back to me on this. > So, when I try to mount the **new** ost I keep getting these messages. > > For some reason the new OST is active on the MGS side which I am not > sure why. I think I made a mistake by trying to mount up a new OST > while clients were still active. > > > > > When I try to activaste the bad OST.I get this message. > > Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) > lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID > mds_ip_addr at tcp when existing NID 0 at lo is already connected > Feb 27 11:59:01 oss_server kernel: Lustre: > 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous > similar messages > Feb 27 11:59:01 mds_server kernel: Lustre: > 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: > tried all connections, increasing latency to 51s > Feb 27 11:59:01 oss_server kernel: LustreError: > 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error > (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e > 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 > Feb 27 11:59:01 mds_server kernel: Lustre: > 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous > similar messages > Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error > occurred while communicating with oss_ip at tcp. The ost_connect > operation failed with -114 > Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous > similar messages > > > oss_server kernel: LustreError: > 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error > (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e > 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 > > > > Also, I was wondering if there was a way to reset the state of my OST. > Its keep thinking its already mounted. Even after a reboot. Any way to > say "hey, I am not mounted" ? :-) > > Would a writeconf help on the OST? I am hesitant to run one on it. > > TIA > > On Fri, Feb 27, 2009 at 11:41 AM, Evan Felix <evan.felix at pnl.gov> wrote: >> Mag, >> >> Can you send us the output from your kernel log after you try the mount >> command that is failing? >> >> Just run ''dmesg'' and send us the last 20 lines or so.. >> >> evan >> >> >> On 2/26/09 7:12 PM, "Mag Gam" <magawake at gmail.com> wrote: >> >>> Any ideas? >>> >>> I am still unable to mount this new OST. I stopped the client hang >>> problem by disabling the OST via lctl but crazy problem indeed. >>> >>> >>> I would love to know how to activate the OST. >>> >>> >>> >>> On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote: >>>> Hello. >>>> >>>> >>>> We created an OST on an OSS. But when I try to mount up the OST, it >>>> keeps saying. >>>> >>>> >>>> mount.lustre /dev/vg/ost002 /vol/srv1/ost002 >>>> mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: >>>> Operation already in progress >>>> The target service is already running. (/dev/vg/ost002) >>>> >>>> >>>> However, >>>> mount | grep -i ost002 >>>> Nothing is mounted up.... >>>> >>>> lctl is even showing this OST and also the client is able to see it. >>>> lfs df -h >>>> ... >>>> lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% >>>> /lfs/srv5/lfs001[OST:5] >>>> ... >>>> >>>> >>>> The MDS/OSS Version: >>>> lustre: 1.6.5.52 >>>> kernel: patchless >>>> build: >>>> 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep >>>> 2.6.18 = Kernel Version >>>> >>>> I don''t think its bugzilla 11564, because my lustre fs name is only 6 >>>> characters long. >>>> >>>> >>>> Also, when the client tries to access the new OST''s space, it simple >>>> hangs. It placed it in "bloc >>>> >>>> Any thoughts about this? >>>> >>>> TIA >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >
OK, I did a tunefs.lustre --writeconf /dev/ and tried to mount it up, still the same error. "Operation already in progress". The target service is already running. I am not sure whatelse I can try... Any suggestions? TIA On Sat, Feb 28, 2009 at 8:39 AM, Mag Gam <magawake at gmail.com> wrote:> (sorry adding the entire list for Evan''s reponse) > > Thankyou got getting back to me on this. > So, when I try to mount the **new** ost I keep getting these messages. > > For some reason the new OST is active on the MGS side which I am not > sure why. I think I made a mistake by trying to mount up a new OST > while clients were still active. > > > When I try to activaste the bad OST.I get this message. > > Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) > lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID > mds_ip_addr at tcp when existing NID 0 at lo is already connected > Feb 27 11:59:01 oss_server kernel: Lustre: > 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous > similar messages > Feb 27 11:59:01 mds_server kernel: Lustre: > 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: > tried all connections, increasing latency to 51s > Feb 27 11:59:01 oss_server kernel: LustreError: > 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error > (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e > 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 > Feb 27 11:59:01 mds_server kernel: Lustre: > 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous > similar messages > Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error > occurred while communicating with oss_ip at tcp. The ost_connect > operation failed with -114 > Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous > similar messages > > > oss_server kernel: LustreError: > 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error > (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e > 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 > > > > Also, I was wondering if there was a way to reset the state of my OST. > Its keep thinking its already mounted. Even after a reboot. Any way to > say "hey, I am not mounted" ? :-) > > TIA > > On Sat, Feb 28, 2009 at 8:38 AM, Mag Gam <magawake at gmail.com> wrote: >> Thankyou got getting back to me on this. >> So, when I try to mount the **new** ost I keep getting these messages. >> >> For some reason the new OST is active on the MGS side which I am not >> sure why. I think I made a mistake by trying to mount up a new OST >> while clients were still active. >> >> >> >> >> When I try to activaste the bad OST.I get this message. >> >> Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) >> lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID >> mds_ip_addr at tcp when existing NID 0 at lo is already connected >> Feb 27 11:59:01 oss_server kernel: Lustre: >> 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous >> similar messages >> Feb 27 11:59:01 mds_server kernel: Lustre: >> 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: >> tried all connections, increasing latency to 51s >> Feb 27 11:59:01 oss_server kernel: LustreError: >> 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error >> (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e >> 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 >> Feb 27 11:59:01 mds_server kernel: Lustre: >> 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous >> similar messages >> Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error >> occurred while communicating with oss_ip at tcp. The ost_connect >> operation failed with -114 >> Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous >> similar messages >> >> >> oss_server kernel: LustreError: >> 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error >> (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e >> 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 >> >> >> >> Also, I was wondering if there was a way to reset the state of my OST. >> Its keep thinking its already mounted. Even after a reboot. Any way to >> say "hey, I am not mounted" ? :-) >> >> Would a writeconf help on the OST? I am hesitant to run one on it. >> >> TIA >> >> On Fri, Feb 27, 2009 at 11:41 AM, Evan Felix <evan.felix at pnl.gov> wrote: >>> Mag, >>> >>> Can you send us the output from your kernel log after you try the mount >>> command that is failing? >>> >>> Just run ''dmesg'' and send us the last 20 lines or so.. >>> >>> evan >>> >>> >>> On 2/26/09 7:12 PM, "Mag Gam" <magawake at gmail.com> wrote: >>> >>>> Any ideas? >>>> >>>> I am still unable to mount this new OST. I stopped the client hang >>>> problem by disabling the OST via lctl but crazy problem indeed. >>>> >>>> >>>> I would love to know how to activate the OST. >>>> >>>> >>>> >>>> On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote: >>>>> Hello. >>>>> >>>>> >>>>> We created an OST on an OSS. But when I try to mount up the OST, it >>>>> keeps saying. >>>>> >>>>> >>>>> mount.lustre /dev/vg/ost002 /vol/srv1/ost002 >>>>> mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: >>>>> Operation already in progress >>>>> The target service is already running. (/dev/vg/ost002) >>>>> >>>>> >>>>> However, >>>>> mount | grep -i ost002 >>>>> Nothing is mounted up.... >>>>> >>>>> lctl is even showing this OST and also the client is able to see it. >>>>> lfs df -h >>>>> ... >>>>> lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% >>>>> /lfs/srv5/lfs001[OST:5] >>>>> ... >>>>> >>>>> >>>>> The MDS/OSS Version: >>>>> lustre: 1.6.5.52 >>>>> kernel: patchless >>>>> build: >>>>> 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep >>>>> 2.6.18 = Kernel Version >>>>> >>>>> I don''t think its bugzilla 11564, because my lustre fs name is only 6 >>>>> characters long. >>>>> >>>>> >>>>> Also, when the client tries to access the new OST''s space, it simple >>>>> hangs. It placed it in "bloc >>>>> >>>>> Any thoughts about this? >>>>> >>>>> TIA >>>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> >
Ok, try this.. I know you?ve done it already, but it may help us to understand: 1. reboot both the ost and the MGS machine 2. run tunefs.lustre ?writeconf /dev/<device> 3. mount the mgs, then the ost 4. run dmesg on both servers so we can see what that first mount failure says. Since this also looks like they are on the same machine we may only get one dmesg output here.. Also it looks like you are having trouble communicating with the server at times. Is it possible there are communication errors. What version of lustre is this? evan On 3/1/09 5:19 PM, "Mag Gam" <magawake at gmail.com> wrote:> OK, I did a tunefs.lustre --writeconf /dev/ > > and tried to mount it up, still the same error. "Operation already in > progress". The target service is already running. > > I am not sure whatelse I can try... > > Any suggestions? > > TIA > > On Sat, Feb 28, 2009 at 8:39 AM, Mag Gam <magawake at gmail.com> wrote: >> > (sorry adding the entire list for Evan''s reponse) >> > >> > Thankyou got getting back to me on this. >> > So, when I try to mount the **new** ost I keep getting these messages. >> > >> > For some reason the new OST is active on the MGS side which I am not >> > sure why. I think I made a mistake by trying to mount up a new OST >> > while clients were still active. >> > >> > >> > When I try to activaste the bad OST.I get this message. >> > >> > Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) >> > lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID >> > mds_ip_addr at tcp when existing NID 0 at lo is already connected >> > Feb 27 11:59:01 oss_server kernel: Lustre: >> > 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous >> > similar messages >> > Feb 27 11:59:01 mds_server kernel: Lustre: >> > 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: >> > tried all connections, increasing latency to 51s >> > Feb 27 11:59:01 oss_server kernel: LustreError: >> > 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error >> > (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e >> > 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 >> > Feb 27 11:59:01 mds_server kernel: Lustre: >> > 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous >> > similar messages >> > Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error >> > occurred while communicating with oss_ip at tcp. The ost_connect >> > operation failed with -114 >> > Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous >> > similar messages >> > >> > >> > oss_server kernel: LustreError: >> > 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error >> > (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e >> > 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 >> > >> > >> > >> > Also, I was wondering if there was a way to reset the state of my OST. >> > Its keep thinking its already mounted. Even after a reboot. Any way to >> > say "hey, I am not mounted" ? :-) >> > >> > TIA >> > >> > On Sat, Feb 28, 2009 at 8:38 AM, Mag Gam <magawake at gmail.com> wrote: >>> >> Thankyou got getting back to me on this. >>> >> So, when I try to mount the **new** ost I keep getting these messages. >>> >> >>> >> For some reason the new OST is active on the MGS side which I am not >>> >> sure why. I think I made a mistake by trying to mount up a new OST >>> >> while clients were still active. >>> >> >>> >> >>> >> >>> >> >>> >> When I try to activaste the bad OST.I get this message. >>> >> >>> >> Lustre: 11647:0:(ldlm_lib.c:736:target_handle_connect()) >>> >> lfs001-OST0005: cookie lfs001-mdtlov_UUID seen on new NID >>> >> mds_ip_addr at tcp when existing NID 0 at lo is already connected >>> >> Feb 27 11:59:01 oss_server kernel: Lustre: >>> >> 11647:0:(ldlm_lib.c:736:target_handle_connect()) Skipped 4 previous >>> >> similar messages >>> >> Feb 27 11:59:01 mds_server kernel: Lustre: >>> >> 3426:0:(import.c:411:import_select_connection()) lfs001-OST0005-osc: >>> >> tried all connections, increasing latency to 51s >>> >> Feb 27 11:59:01 oss_server kernel: LustreError: >>> >> 11647:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error >>> >> (-114) req at ffff8104251a4400 x388745/t0 o8-><?>@<?>:0/0 lens 240/144 e >>> >> 0 to 0 dl 1235754041 ref 1 fl Interpret:/0/0 rc -114/0 >>> >> Feb 27 11:59:01 mds_server kernel: Lustre: >>> >> 3426:0:(import.c:411:import_select_connection()) Skipped 6 previous >>> >> similar messages >>> >> Feb 27 11:59:01 mds_server kernel: LustreError: 11-0: an error >>> >> occurred while communicating with oss_ip at tcp. The ost_connect >>> >> operation failed with -114 >>> >> Feb 27 11:59:01 mds_server kernel: LustreError: Skipped 12 previous >>> >> similar messages >>> >> >>> >> >>> >> oss_server kernel: LustreError: >>> >> 11556:0:(ldlm_lib.c:1614:target_send_reply_msg()) @@@ processing error >>> >> (-114) req at ffff81042150a000 x388953/t0 o8-><?>@<?>:0/0 lens 240/144 e >>> >> 0 to 0 dl 1235754240 ref 1 fl Interpret:/0/0 rc -114/0 >>> >> >>> >> >>> >> >>> >> Also, I was wondering if there was a way to reset the state of my OST. >>> >> Its keep thinking its already mounted. Even after a reboot. Any way to >>> >> say "hey, I am not mounted" ? :-) >>> >> >>> >> Would a writeconf help on the OST? I am hesitant to run one on it. >>> >> >>> >> TIA >>> >> >>> >> On Fri, Feb 27, 2009 at 11:41 AM, Evan Felix <evan.felix at pnl.gov> wrote: >>>> >>> Mag, >>>> >>> >>>> >>> Can you send us the output from your kernel log after you try the mount >>>> >>> command that is failing? >>>> >>> >>>> >>> Just run ''dmesg'' and send us the last 20 lines or so.. >>>> >>> >>>> >>> evan >>>> >>> >>>> >>> >>>> >>> On 2/26/09 7:12 PM, "Mag Gam" <magawake at gmail.com> wrote: >>>> >>> >>>>> >>>> Any ideas? >>>>> >>>> >>>>> >>>> I am still unable to mount this new OST. I stopped the client hang >>>>> >>>> problem by disabling the OST via lctl but crazy problem indeed. >>>>> >>>> >>>>> >>>> >>>>> >>>> I would love to know how to activate the OST. >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Wed, Feb 25, 2009 at 4:43 PM, Mag Gam <magawake at gmail.com> wrote: >>>>>> >>>>> Hello. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> We created an OST on an OSS. But when I try to mount up the OST, it >>>>>> >>>>> keeps saying. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> mount.lustre /dev/vg/ost002 /vol/srv1/ost002 >>>>>> >>>>> mount.lustre: mount /dev/vg/ost002 at /vol/srv1/ost002 failed: >>>>>> >>>>> Operation already in progress >>>>>> >>>>> The target service is already running. (/dev/vg/ost002) >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> However, >>>>>> >>>>> mount | grep -i ost002 >>>>>> >>>>> Nothing is mounted up.... >>>>>> >>>>> >>>>>> >>>>> lctl is even showing this OST and also the client is able to seeit.>>>>>> >>>>> lfs df -h >>>>>> >>>>> ... >>>>>> >>>>> lfs001-OST0005_UUID 492.2G 445.2G 22.0G 90% >>>>>> >>>>> /lfs/srv5/lfs001[OST:5] >>>>>> >>>>> ... >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> The MDS/OSS Version: >>>>>> >>>>> lustre: 1.6.5.52 >>>>>> >>>>> kernel: patchless >>>>>> >>>>> build: >>>>>> >>>>> >>>>>> 1.6.5.52-19691231190000-PRISTINE-.var.tmp.linux-2.6.18.x86_64-2.6.18-prep >>>>>> >>>>> 2.6.18 = Kernel Version >>>>>> >>>>> >>>>>> >>>>> I don''t think its bugzilla 11564, because my lustre fs name is only6>>>>>> >>>>> characters long. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> Also, when the client tries to access the new OST''s space, it >>>>>> simple >>>>>> >>>>> hangs. It placed it in "bloc >>>>>> >>>>> >>>>>> >>>>> Any thoughts about this? >>>>>> >>>>> >>>>>> >>>>> TIA >>>>>> >>>>> >>>>> >>>> _______________________________________________ >>>>> >>>> Lustre-discuss mailing list >>>>> >>>> Lustre-discuss at lists.lustre.org >>>>> >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>> >>>> >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090302/05c44dd2/attachment-0001.html