I have a small lustre test cluster with eight OST''s running. The servers were shut off over the weekend, upon turning them back on and trying to startup lustre I seem to have lost my OST''s. [root at node1 ~]$ lctl dl 0 UP mgs MGS MGS 19 1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3 5 UP ost OSS OSS_uuid 3 6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3 Everything in the messages log appears to be fine as if it was just a normal startup of lustre, except for the below message. I''m not sure what logfile the error is referring to, and the message gives little detail on where i should start looking for an error. Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation CONFIGS/lustre-MDT0000T: -28 Jun 16 20:13:55 node1-eth0 kernel: LustreError: 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log lustre-MDT0000 (-28) Can anyone give me an idea on what happened? Thanks
On Tue, Jun 16, 2009 at 8:25 PM, Michael Di Domenico<mdidomenico4 at gmail.com> wrote:> I have a small lustre test cluster with eight OST''s running. ?The > servers were shut off over the weekend, upon turning them back on and > trying to startup lustre I seem to have lost my OST''s. > > [root at node1 ~]$ lctl dl > ?0 UP mgs MGS MGS 19 > ?1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 > ?2 UP mdt MDS MDS_uuid 3 > ?3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 > ?4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3 > ?5 UP ost OSS OSS_uuid 3 > ?6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3 > > Everything in the messages log appears to be fine as if it was just a > normal startup of lustre, except for the below message. ?I''m not sure > what logfile the error is referring to, and the message gives little > detail on where i should start looking for an error. > > Jun 16 20:13:55 node1-eth0 kernel: LustreError: > 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation > CONFIGS/lustre-MDT0000T: -28 > Jun 16 20:13:55 node1-eth0 kernel: LustreError: > 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log > lustre-MDT0000 (-28)Apparently from the lustre manual the -28 at the end of the line is an error code, which points to -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df (query the amount of file system space) or lfs df -i (query the number of inodes). verified by [root at node1 ~]$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/md2 1280000 42132 1237868 4% / /dev/md0 255232 45 255187 1% /boot tmpfs 124645 1 124644 1% /dev/shm /dev/md3 63872 24 63848 1% /mgs /dev/md4 255040 255040 0 100% /mdt /dev/md5 29892608 28726 29863882 1% /ost I only put 500k files in the filesystem i would not have thought the mdt would have used up the inodes that fast
do you have many small files? On Tue, Jun 16, 2009 at 8:58 PM, Michael Di Domenico<mdidomenico4 at gmail.com> wrote:> On Tue, Jun 16, 2009 at 8:25 PM, Michael Di > Domenico<mdidomenico4 at gmail.com> wrote: >> I have a small lustre test cluster with eight OST''s running. ?The >> servers were shut off over the weekend, upon turning them back on and >> trying to startup lustre I seem to have lost my OST''s. >> >> [root at node1 ~]$ lctl dl >> ?0 UP mgs MGS MGS 19 >> ?1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 >> ?2 UP mdt MDS MDS_uuid 3 >> ?3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 >> ?4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3 >> ?5 UP ost OSS OSS_uuid 3 >> ?6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3 >> >> Everything in the messages log appears to be fine as if it was just a >> normal startup of lustre, except for the below message. ?I''m not sure >> what logfile the error is referring to, and the message gives little >> detail on where i should start looking for an error. >> >> Jun 16 20:13:55 node1-eth0 kernel: LustreError: >> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation >> CONFIGS/lustre-MDT0000T: -28 >> Jun 16 20:13:55 node1-eth0 kernel: LustreError: >> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log >> lustre-MDT0000 (-28) > > Apparently from the lustre manual the -28 at the end of the line is an > error code, which points to > > -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df > (query the amount of file system space) or lfs df -i > (query the number of inodes). > > verified by > > [root at node1 ~]$ df -i > Filesystem ? ? ? ? ? ?Inodes ? IUsed ? IFree IUse% Mounted on > /dev/md2 ? ? ? ? ? ? 1280000 ? 42132 1237868 ? ?4% / > /dev/md0 ? ? ? ? ? ? ?255232 ? ? ?45 ?255187 ? ?1% /boot > tmpfs ? ? ? ? ? ? ? ? 124645 ? ? ? 1 ?124644 ? ?1% /dev/shm > /dev/md3 ? ? ? ? ? ? ? 63872 ? ? ?24 ? 63848 ? ?1% /mgs > /dev/md4 ? ? ? ? ? ? ?255040 ?255040 ? ? ? 0 ?100% /mdt > /dev/md5 ? ? ? ? ? ? 29892608 ? 28726 29863882 ? ?1% /ost > > I only put 500k files in the filesystem i would not have thought the > mdt would have used up the inodes that fast > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
As long as the inode-discussion is up, two questions; what exactly is stored in the inode (how big should i make them) I''ve read the manual about this and it doesnt really say except the notation about stripes/osts. Is there a "proper" way of "moving" or recreating the mdt-filesystem to hold more inodes or is it backup -> reformat -> restore procedure that is the proper way? Sorry to hijack your thread. Regards, Timh 2009/6/17 Mag Gam <magawake at gmail.com>:> do you have many small files? > > > > On Tue, Jun 16, 2009 at 8:58 PM, Michael Di > Domenico<mdidomenico4 at gmail.com> wrote: >> On Tue, Jun 16, 2009 at 8:25 PM, Michael Di >> Domenico<mdidomenico4 at gmail.com> wrote: >>> I have a small lustre test cluster with eight OST''s running. ?The >>> servers were shut off over the weekend, upon turning them back on and >>> trying to startup lustre I seem to have lost my OST''s. >>> >>> [root at node1 ~]$ lctl dl >>> ?0 UP mgs MGS MGS 19 >>> ?1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 >>> ?2 UP mdt MDS MDS_uuid 3 >>> ?3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 >>> ?4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3 >>> ?5 UP ost OSS OSS_uuid 3 >>> ?6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3 >>> >>> Everything in the messages log appears to be fine as if it was just a >>> normal startup of lustre, except for the below message. ?I''m not sure >>> what logfile the error is referring to, and the message gives little >>> detail on where i should start looking for an error. >>> >>> Jun 16 20:13:55 node1-eth0 kernel: LustreError: >>> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation >>> CONFIGS/lustre-MDT0000T: -28 >>> Jun 16 20:13:55 node1-eth0 kernel: LustreError: >>> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log >>> lustre-MDT0000 (-28) >> >> Apparently from the lustre manual the -28 at the end of the line is an >> error code, which points to >> >> -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df >> (query the amount of file system space) or lfs df -i >> (query the number of inodes). >> >> verified by >> >> [root at node1 ~]$ df -i >> Filesystem ? ? ? ? ? ?Inodes ? IUsed ? IFree IUse% Mounted on >> /dev/md2 ? ? ? ? ? ? 1280000 ? 42132 1237868 ? ?4% / >> /dev/md0 ? ? ? ? ? ? ?255232 ? ? ?45 ?255187 ? ?1% /boot >> tmpfs ? ? ? ? ? ? ? ? 124645 ? ? ? 1 ?124644 ? ?1% /dev/shm >> /dev/md3 ? ? ? ? ? ? ? 63872 ? ? ?24 ? 63848 ? ?1% /mgs >> /dev/md4 ? ? ? ? ? ? ?255040 ?255040 ? ? ? 0 ?100% /mdt >> /dev/md5 ? ? ? ? ? ? 29892608 ? 28726 29863882 ? ?1% /ost >> >> I only put 500k files in the filesystem i would not have thought the >> mdt would have used up the inodes that fast >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Timh Bergstr?m System Operations Manager Diino AB - www.diino.com :wq
On Tue, Jun 16, 2009 at 11:08 PM, Mag Gam<magawake at gmail.com> wrote:> do you have many small files?There was a mix of small vs medium size. I reread the "Sizing MDT" section in the manual and see my error. That section should be in big bold letters at the very beginning... :)
2009/6/17 Timh Bergstr?m <timh.bergstrom at diino.net>:> As long as the inode-discussion is up, two questions; what exactly is > stored in the inode (how big should i make them) I''ve read the manual > about this and it doesnt really say except the notation about > stripes/osts. > > Is there a "proper" way of "moving" or recreating the mdt-filesystem > to hold more inodes or is it backup -> reformat -> restore procedure > that is the proper way? > > Sorry to hijack your thread.Its okay. I have roughly the same question. In my current case, the filesystem is only a test so i can just recreate it, but i can see this happening in production, so preparing for it not to happen i can do, but users are unpredictable...
Michael Di Domenico wrote:> On Tue, Jun 16, 2009 at 8:25 PM, Michael Di > Domenico<mdidomenico4 at gmail.com> wrote: >> I have a small lustre test cluster with eight OST''s running. The >> servers were shut off over the weekend, upon turning them back on and >> trying to startup lustre I seem to have lost my OST''s. >> >> [root at node1 ~]$ lctl dl >> 0 UP mgs MGS MGS 19 >> 1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5 >> 2 UP mdt MDS MDS_uuid 3 >> 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 >> 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3 >> 5 UP ost OSS OSS_uuid 3 >> 6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3 >> >> Everything in the messages log appears to be fine as if it was just a >> normal startup of lustre, except for the below message. I''m not sure >> what logfile the error is referring to, and the message gives little >> detail on where i should start looking for an error. >> >> Jun 16 20:13:55 node1-eth0 kernel: LustreError: >> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation >> CONFIGS/lustre-MDT0000T: -28 >> Jun 16 20:13:55 node1-eth0 kernel: LustreError: >> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log >> lustre-MDT0000 (-28) > > Apparently from the lustre manual the -28 at the end of the line is an > error code, which points to > > -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df > (query the amount of file system space) or lfs df -i > (query the number of inodes). > > verified by > > [root at node1 ~]$ df -i > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/md2 1280000 42132 1237868 4% / > /dev/md0 255232 45 255187 1% /boot > tmpfs 124645 1 124644 1% /dev/shm > /dev/md3 63872 24 63848 1% /mgs > /dev/md4 255040 255040 0 100% /mdt > /dev/md5 29892608 28726 29863882 1% /ost > > I only put 500k files in the filesystem i would not have thought the > mdt would have used up the inodes that fastThe MDT will consume one inode for each file in the global Lustre file system. You have plenty of OST space, but no inodes. You have 255K inodes on the MDS, but you are trying to create 500k files. cliffw> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss