On Fri, 2009-04-17 at 15:51 -0400, Roger Spellman wrote:> Hi,
>
> I just upgraded some servers to 1.6.7.1, and I started getting some
> error messages. So, I reformatted my file system, and started again:
> Here are the messages on the MDS:
>
>
>
> Lustre: MDT storage-MDT0000 now serving storage-MDT0000_UUID
> (storage-MDT0000/82f9201e-d26f-a513-4ad8-bdb4091f8afd) with recovery
> enabled
>
> Lustre: 6890:0:(lproc_mds.c:271:lprocfs_wr_group_upcall())
> storage-MDT0000: group upcall set to /usr/sbin/l_getgroups
>
> Lustre: storage-MDT0000.mdt: set parameter
> group_upcall=/usr/sbin/l_getgroups
>
> Lustre: Server storage-MDT0000 on device /dev/sdb2 has started
>
> Lustre: Request x7 sent from storage-OST0000-osc to NID 10.2.46.2 at o2ib
> 0s ago has timed out (limit 5s).
>
> Lustre: Request x8 sent from storage-OST0001-osc to NID 10.2.46.3 at o2ib
> 0s ago has timed out (limit 5s).
hm.. mds send request when o2ib was not ready for send - ''0s
ago'' say
this was network issue with send request, and not really timeout.
>
> Lustre: 6749:0:(lproc_mds.c:271:lprocfs_wr_group_upcall())
> storage-MDT0000: group upcall set to NONE
>
> Lustre: 6544:0:(import.c:507:import_select_connection())
> storage-OST0000-osc: tried all connections, increasing latency to 5s
>
> Lustre: 6544:0:(import.c:507:import_select_connection())
> storage-OST0001-osc: tried all connections, increasing latency to 5s
>
> Lustre: 6543:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts
> are active, abort quota recovery
>
> Lustre: 6543:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts
> are active, abort quota recovery
>
> Lustre: MDS storage-MDT0000: storage-OST0000_UUID now active,
> resetting orphans
>
> Lustre: MDS storage-MDT0000: storage-OST0002_UUID now active,
> resetting orphans
>
> Lustre: Skipped 1 previous similar message
>
>
but later looks connects finished fine.
>
>
> I don?t seem to have any error messages on the OSTs. I tested my
> network, and it is running well.
>
>
>
> Any thoughts?
this isn''t errors. just notices - which say - first connect request
which send from mds to ost is timeout (or o2ib not ready for send) - and
mds can''t connect to ost from first pass, but after 5s they reconnect
successfully.
>
--
Alex Lyashkov <alexey.lyashkov at sun.com>
Lustre Group, Sun Microsystems