thr3ads.net - Lustre discuss - [Lustre-discuss] lustre_mgs: operation ... on unconnected MGS [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Reto Gantenbein

2008-Apr-29 19:06 UTC

[Lustre-discuss] lustre_mgs: operation ... on unconnected MGS

Dear lustre users

I did setup a lustre file system with 7 osts (fibre-channel raids) and
an mgs/mdt which are exported via two nodes. One node has the mgs/mdt
and 3 osts, the other has 4 osts mounted. The nodes are running the
lustre patched 2.6.18 vanilla kernel. The clients are patchless and are
running the 2.6.22 gentoo kernel. The lustre-1.6.4.3 is compiled from
sources under gentoo linux.

The two nodes are called lustre01 and lustre02.

I did format the mgs/mdt on lustre01 with:
mkfs.lustre --mgs --mdt --fsname=homefs --failnode=lustre02 at tcp
--reformat /dev/sdb

Then I mounted it and formatted the osts also on lustre01 with:
mkfs.lustre --ost --mgsnode=lustre01 at tcp --mgsnode=lustre02 at tcp
--fsname=homefs --failnode=lustre02 at tcp --index=1 /dev/sdc

and so on...

Is there already a general mistake in this installation setup?

The osts are distributed over both servers to enlarge bandwidth and also
for failover reasons. All osts and mgs are connected to both servers but
only mounted on a single one.

Now to my problem:
I mounted the file system from a client with ip 10.1.1.65 and these are
the messages that appear in the system log:

 lustre01 LustreError: 13533:0:(handler.c:148:mds_sendpage()) @@@ bulk
failed: timeout 0(4096), evicting
87fb775c-8f64-5d85-2a95-8fb595e62892 at NET_0x200000a010141_UUID
 lustre01 req at ffff81011dc72e00 x2483/t0
o37->87fb775c-8f64-5d85-2a95-8fb595e62892 at NET_0x200000a010141_UUID:-1
lens 296/296 ref 0 fl Interpret:/0/0 rc 0/0

 lustre01 LustreError: 13469:0:(ldlm_lib.c:1442:target_send_reply_msg())
@@@ processing error (-107)  req at ffff81011d704a00 x2479/t0
o400-><?>@<?>:-1 lens 128/0 ref 0 fl Interpret:/0/0 rc -107/0

 lustre01 LustreError: 13469:0:(handler.c:1499:mds_handle()) operation
400 on unconnected MDS from 12345-10.1.1.65 at tcp

 lustre01 LustreError: 13535:0:(mgs_handler.c:515:mgs_handle())
lustre_mgs: operation 101 on unconnected MGS

 lustre01 LustreError: 13535:0:(mgs_handler.c:515:mgs_handle())
lustre_mgs: operation 501 on unconnected MGS

I already tried to find some answers in the net but without much
success. I cannot find what they mean or where they come from. 

Maybe it also helps to show you my device list:

lustre01:
lctl > device_list
  0 UP mgs MGS MGS 11
  1 UP mgc MGC10.1.140.2 at tcp 89b4c0f0-c602-0857-c22e-ed232d8ad7aa 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov homefs-mdtlov homefs-mdtlov_UUID 4
  4 UP mds homefs-MDT0000 homefs-MDT0000_UUID 5
  5 UP osc homefs-OST0001-osc homefs-mdtlov_UUID 5
  6 UP osc homefs-OST0004-osc homefs-mdtlov_UUID 5
  7 UP osc homefs-OST0005-osc homefs-mdtlov_UUID 5
  8 UP osc homefs-OST0002-osc homefs-mdtlov_UUID 5
  9 UP osc homefs-OST0003-osc homefs-mdtlov_UUID 5
 10 UP osc homefs-OST0006-osc homefs-mdtlov_UUID 5
 11 UP osc homefs-OST0007-osc homefs-mdtlov_UUID 5
 12 UP mgc MGC10.1.140.1 at tcp c8ad2ab0-9eef-b334-37af-85734b53ac94 5
 13 UP ost OSS OSS_uuid 3
 14 UP obdfilter homefs-OST0001 homefs-OST0001_UUID 7
 15 UP obdfilter homefs-OST0004 homefs-OST0004_UUID 7
 16 UP obdfilter homefs-OST0005 homefs-OST0005_UUID 7

lustre02:
lctl > device_list
  0 UP mgc MGC10.1.140.1 at tcp 6154baf3-e830-81d9-ff6c-451d107650c1 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter homefs-OST0002 homefs-OST0002_UUID 7
  3 UP obdfilter homefs-OST0003 homefs-OST0003_UUID 7
  4 UP obdfilter homefs-OST0006 homefs-OST0006_UUID 7
  5 UP obdfilter homefs-OST0007 homefs-OST0007_UUID 7


Can someone give me some hints? What is going wrong here?

Kind regards,
Reto Gantenbein

Reto Gantenbein

2008-Apr-30 14:33 UTC

head link

[Lustre-discuss] lustre_mgs: operation ... on unconnected MGS

Hello everybody

I did a clean install of the osts/mgs and now it seems to work without  
these errors. But still it''s unclear to me why they appeared or what  
they explicitly mean. Simpler error messages would be a nice thing in  
lustre, especially for newbies.

Cheers,
Reto



On Apr 29, 2008, at 9:06 PM, Reto Gantenbein wrote:
> Dear lustre users
>
> I did setup a lustre file system with 7 osts (fibre-channel raids) and
> an mgs/mdt which are exported via two nodes. One node has the mgs/mdt
> and 3 osts, the other has 4 osts mounted. The nodes are running the
> lustre patched 2.6.18 vanilla kernel. The clients are patchless and  
> are
> running the 2.6.22 gentoo kernel. The lustre-1.6.4.3 is compiled from
> sources under gentoo linux.
>
> The two nodes are called lustre01 and lustre02.
>
> I did format the mgs/mdt on lustre01 with:
> mkfs.lustre --mgs --mdt --fsname=homefs --failnode=lustre02 at tcp
> --reformat /dev/sdb
>
> Then I mounted it and formatted the osts also on lustre01 with:
> mkfs.lustre --ost --mgsnode=lustre01 at tcp --mgsnode=lustre02 at tcp
> --fsname=homefs --failnode=lustre02 at tcp --index=1 /dev/sdc
>
> and so on...
>
> Is there already a general mistake in this installation setup?
>
> The osts are distributed over both servers to enlarge bandwidth and  
> also
> for failover reasons. All osts and mgs are connected to both servers  
> but
> only mounted on a single one.
>
> Now to my problem:
> I mounted the file system from a client with ip 10.1.1.65 and these  
> are
> the messages that appear in the system log:
>
> lustre01 LustreError: 13533:0:(handler.c:148:mds_sendpage()) @@@ bulk
> failed: timeout 0(4096), evicting
> 87fb775c-8f64-5d85-2a95-8fb595e62892 at NET_0x200000a010141_UUID
> lustre01 req at ffff81011dc72e00 x2483/t0
> o37->87fb775c-8f64-5d85-2a95-8fb595e62892 at NET_0x200000a010141_UUID:-1
> lens 296/296 ref 0 fl Interpret:/0/0 rc 0/0
>
> lustre01 LustreError: 13469:0:(ldlm_lib.c: 
> 1442:target_send_reply_msg())
> @@@ processing error (-107)  req at ffff81011d704a00 x2479/t0
> o400-><?>@<?>:-1 lens 128/0 ref 0 fl Interpret:/0/0 rc
-107/0
>
> lustre01 LustreError: 13469:0:(handler.c:1499:mds_handle()) operation
> 400 on unconnected MDS from 12345-10.1.1.65 at tcp
>
> lustre01 LustreError: 13535:0:(mgs_handler.c:515:mgs_handle())
> lustre_mgs: operation 101 on unconnected MGS
>
> lustre01 LustreError: 13535:0:(mgs_handler.c:515:mgs_handle())
> lustre_mgs: operation 501 on unconnected MGS
>
> I already tried to find some answers in the net but without much
> success. I cannot find what they mean or where they come from.
>
> Maybe it also helps to show you my device list:
>
> lustre01:
> lctl > device_list
>  0 UP mgs MGS MGS 11
>  1 UP mgc MGC10.1.140.2 at tcp 89b4c0f0-c602-0857-c22e-ed232d8ad7aa 5
>  2 UP mdt MDS MDS_uuid 3
>  3 UP lov homefs-mdtlov homefs-mdtlov_UUID 4
>  4 UP mds homefs-MDT0000 homefs-MDT0000_UUID 5
>  5 UP osc homefs-OST0001-osc homefs-mdtlov_UUID 5
>  6 UP osc homefs-OST0004-osc homefs-mdtlov_UUID 5
>  7 UP osc homefs-OST0005-osc homefs-mdtlov_UUID 5
>  8 UP osc homefs-OST0002-osc homefs-mdtlov_UUID 5
>  9 UP osc homefs-OST0003-osc homefs-mdtlov_UUID 5
> 10 UP osc homefs-OST0006-osc homefs-mdtlov_UUID 5
> 11 UP osc homefs-OST0007-osc homefs-mdtlov_UUID 5
> 12 UP mgc MGC10.1.140.1 at tcp c8ad2ab0-9eef-b334-37af-85734b53ac94 5
> 13 UP ost OSS OSS_uuid 3
> 14 UP obdfilter homefs-OST0001 homefs-OST0001_UUID 7
> 15 UP obdfilter homefs-OST0004 homefs-OST0004_UUID 7
> 16 UP obdfilter homefs-OST0005 homefs-OST0005_UUID 7
>
> lustre02:
> lctl > device_list
>  0 UP mgc MGC10.1.140.1 at tcp 6154baf3-e830-81d9-ff6c-451d107650c1 5
>  1 UP ost OSS OSS_uuid 3
>  2 UP obdfilter homefs-OST0002 homefs-OST0002_UUID 7
>  3 UP obdfilter homefs-OST0003 homefs-OST0003_UUID 7
>  4 UP obdfilter homefs-OST0006 homefs-OST0006_UUID 7
>  5 UP obdfilter homefs-OST0007 homefs-OST0007_UUID 7
>
>
> Can someone give me some hints? What is going wrong here?
>
> Kind regards,
> Reto Gantenbein
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - Apr 2008 - lustre_mgs: operation ... on unconnected MGS

[Lustre-discuss] lustre_mgs: operation ... on unconnected MGS

[Lustre-discuss] lustre_mgs: operation ... on unconnected MGS