Alastair Ferguson
2013-Jun-17 07:12 UTC
lctl --device XX deactivate doesn''t make OST read only
OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free space around the other OSTs, so I do: lfs df -h Get this part as one of the OSTs I need to deactivate: AC3-OST000c_UUID 14.3T 13.6T 87.4M 100% /data[OST:12] then lctl dl 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5 Then lctl --device 19 deactivate then lctl dl: 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5 Should be read only right>>? Then lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >> ost000c_raw.txt To find the files in the filesystem (/data) and strip out all the stuff you don''t need. Then: while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv "$line.___bak" "$line"; done < ost000c_raw.txt This should move the data off the OST but it doesn''t. I have used this procedure before to remove data from a whole server (which worked) and I can see when I lfs df -h the ost emptying but in this case it goes up and down suggesting it is copying BACK to the same OST despite the fact it is IN not UP when lctl dl is run. How can I get files off this as I get errors spaying no space on device?? Alastair Ferguson IT Manager Capital Markets CRC Limited (CMCRC) Telephone: +61 2 8088 4222 Mobile: +61 424 235 159 Fax: +61 2 8088 4201 www.cmcrc.com Capital Markets CRC Ltd - Confidential Communication The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given. _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Dilger, Andreas
2013-Jun-17 23:26 UTC
Re: lctl --device XX deactivate doesn''t make OST read only
On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson-KXB8ZELdThkAvxtiuMwx3w@public.gmane.org> wrote:>OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free >space around the other OSTs, so I do: > >lfs df -h > >Get this part as one of the OSTs I need to deactivate: > >AC3-OST000c_UUID 14.3T 13.6T 87.4M 100% >/data[OST:12] > >then > >lctl dl > > 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5 > >Then > >lctl --device 19 deactivate > >then > >lctl dl: > > 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5 > >Should be read only right>>?Right, this is the MDS OSC device, so no new files should be allocated on that OST.>Then > >lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >> >ost000c_raw.txt > >To find the files in the filesystem (/data) and strip out all the stuff >you don''t need. Then: > >while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv >"$line.___bak" "$line"; done < ost000c_raw.txt > >This should move the data off the OST but it doesn''t. I have used this >procedure before to remove data from a whole server (which worked) and I >can see when I lfs df -h >the ost emptying but in this case it goes up and down suggesting it is >copying BACK to the same OST despite the fact it is IN not UP when lctl >dl is run.You should look at "lfs_migrate" and its man page, for a more robust mechanism for doing the above migration. Your script is unsafe if interrupted after "rm -f" but before "mv" moves the old file into place. You can also use "lfs_migrate" in a pipeline, so that it only moves new files, while your script would re-move the same files repeatedly if interrupted and restarted.>How can I get files off this as I get errors saying no space on device??Your process _should_ be working, but if you are moving small files the effects may be slow. As mentioned in the "lfs_migrate" man page, you should select large files to migrate, since you will get better IO performance, and will free space more quickly. Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division
Alastair Ferguson
2013-Jun-17 23:56 UTC
Re: lctl --device XX deactivate doesn''t make OST read only
Awesome thanks! Doing a: lfs find /data -obd AC3-OST000a_UUID -size +4G | lfs_migrate -y now (as per a man page I found) Alastair Ferguson IT Manager Capital Markets CRC Limited (CMCRC) Telephone: +61 2 8088 4222 Mobile: +61 424 235 159 Fax: +61 2 8088 4201 www.cmcrc.com Capital Markets CRC Ltd - Confidential Communication The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given. On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson-KXB8ZELdThkAvxtiuMwx3w@public.gmane.org> wrote: >> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free >> space around the other OSTs, so I do: >> >> lfs df -h >> >> Get this part as one of the OSTs I need to deactivate: >> >> AC3-OST000c_UUID 14.3T 13.6T 87.4M 100% >> /data[OST:12] >> >> then >> >> lctl dl >> >> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >> >> Then >> >> lctl --device 19 deactivate >> >> then >> >> lctl dl: >> >> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >> >> Should be read only right>>? > > Right, this is the MDS OSC device, so no new files should be allocated on > that OST. > >> Then >> >> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >> >> ost000c_raw.txt >> >> To find the files in the filesystem (/data) and strip out all the stuff >> you don''t need. Then: >> >> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv >> "$line.___bak" "$line"; done < ost000c_raw.txt >> >> This should move the data off the OST but it doesn''t. I have used this >> procedure before to remove data from a whole server (which worked) and I >> can see when I lfs df -h >> the ost emptying but in this case it goes up and down suggesting it is >> copying BACK to the same OST despite the fact it is IN not UP when lctl >> dl is run. > > You should look at "lfs_migrate" and its man page, for a more robust > mechanism for > doing the above migration. Your script is unsafe if interrupted after "rm > -f" but > before "mv" moves the old file into place. You can also use "lfs_migrate" > in a > pipeline, so that it only moves new files, while your script would re-move > the same > files repeatedly if interrupted and restarted. > >> How can I get files off this as I get errors saying no space on device?? > > Your process _should_ be working, but if you are moving small files the > effects may > be slow. As mentioned in the "lfs_migrate" man page, you should select > large files > to migrate, since you will get better IO performance, and will free space > more quickly. > > Cheers, Andreas > -- > Andreas Dilger > > Lustre Software Architect > Intel High Performance Data Division > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Alastair Ferguson
2013-Jun-18 05:10 UTC
Re: lctl --device XX deactivate doesn''t make OST read only
Update: It keeps on saying 60mb spare space but the percentage is now 94% instead of 100% - so of a 15TB OST that should be 800GB approximately. Does it take a while to change the MB from this procedure? Alastair Ferguson IT Manager Capital Markets CRC Limited (CMCRC) Telephone: +61 2 8088 4222 Mobile: +61 424 235 159 Fax: +61 2 8088 4201 www.cmcrc.com Capital Markets CRC Ltd - Confidential Communication The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given. On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson-KXB8ZELdThkAvxtiuMwx3w@public.gmane.org> wrote: >> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free >> space around the other OSTs, so I do: >> >> lfs df -h >> >> Get this part as one of the OSTs I need to deactivate: >> >> AC3-OST000c_UUID 14.3T 13.6T 87.4M 100% >> /data[OST:12] >> >> then >> >> lctl dl >> >> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >> >> Then >> >> lctl --device 19 deactivate >> >> then >> >> lctl dl: >> >> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >> >> Should be read only right>>? > > Right, this is the MDS OSC device, so no new files should be allocated on > that OST. > >> Then >> >> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >> >> ost000c_raw.txt >> >> To find the files in the filesystem (/data) and strip out all the stuff >> you don''t need. Then: >> >> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv >> "$line.___bak" "$line"; done < ost000c_raw.txt >> >> This should move the data off the OST but it doesn''t. I have used this >> procedure before to remove data from a whole server (which worked) and I >> can see when I lfs df -h >> the ost emptying but in this case it goes up and down suggesting it is >> copying BACK to the same OST despite the fact it is IN not UP when lctl >> dl is run. > > You should look at "lfs_migrate" and its man page, for a more robust > mechanism for > doing the above migration. Your script is unsafe if interrupted after "rm > -f" but > before "mv" moves the old file into place. You can also use "lfs_migrate" > in a > pipeline, so that it only moves new files, while your script would re-move > the same > files repeatedly if interrupted and restarted. > >> How can I get files off this as I get errors saying no space on device?? > > Your process _should_ be working, but if you are moving small files the > effects may > be slow. As mentioned in the "lfs_migrate" man page, you should select > large files > to migrate, since you will get better IO performance, and will free space > more quickly. > > Cheers, Andreas > -- > Andreas Dilger > > Lustre Software Architect > Intel High Performance Data Division > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Alastair Ferguson
2013-Jun-18 23:14 UTC
Re: lctl --device XX deactivate doesn''t make OST read only
Sorry - final update. It appears that two osts are both still at 100% (don''t know how I got that wrong) and 40Mb space. I tried: lfs find /data -O AC3-OST000a_UUID -size +20G | lfs_migrate -y Now getting this: /data/smarts/ksc_mq/am/03723.am: llapi_semantic_traverse: Failed to open ''/data/home/zzhao/workspace/topical_collocation_model/results/ir_evaluation/sjmn2k_tng/DocLDA/k050-alpha0.10-gamma0.01/GibbsRun-2'': No such file or directory (2) error: find failed for +20G. rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) rsync: write failed on "/data/smarts/ksc_mq/am/03723.am.tmp.N13504": No space left on device (28) Also doing: lfs_migrate /data/workflow (8TB in size) & lfs_migrate /data/raw (15TB) and still: AC3-OST000a_UUID 14.3T 13.6T 46.1M 100% /data[OST:10] AC3-OST0010_UUID 7.2T 6.8T 46.1M 100% /data[OST:16] We can''t run our processes because of the no space on device errors. Help! Alastair Ferguson IT Manager Capital Markets CRC Limited (CMCRC) Telephone: +61 2 8088 4222 Mobile: +61 424 235 159 Fax: +61 2 8088 4201 www.cmcrc.com Capital Markets CRC Ltd - Confidential Communication The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given. On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson-KXB8ZELdThkAvxtiuMwx3w@public.gmane.org> wrote: >> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free >> space around the other OSTs, so I do: >> >> lfs df -h >> >> Get this part as one of the OSTs I need to deactivate: >> >> AC3-OST000c_UUID 14.3T 13.6T 87.4M 100% >> /data[OST:12] >> >> then >> >> lctl dl >> >> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >> >> Then >> >> lctl --device 19 deactivate >> >> then >> >> lctl dl: >> >> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >> >> Should be read only right>>? > > Right, this is the MDS OSC device, so no new files should be allocated on > that OST. > >> Then >> >> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >> >> ost000c_raw.txt >> >> To find the files in the filesystem (/data) and strip out all the stuff >> you don''t need. Then: >> >> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv >> "$line.___bak" "$line"; done < ost000c_raw.txt >> >> This should move the data off the OST but it doesn''t. I have used this >> procedure before to remove data from a whole server (which worked) and I >> can see when I lfs df -h >> the ost emptying but in this case it goes up and down suggesting it is >> copying BACK to the same OST despite the fact it is IN not UP when lctl >> dl is run. > > You should look at "lfs_migrate" and its man page, for a more robust > mechanism for > doing the above migration. Your script is unsafe if interrupted after "rm > -f" but > before "mv" moves the old file into place. You can also use "lfs_migrate" > in a > pipeline, so that it only moves new files, while your script would re-move > the same > files repeatedly if interrupted and restarted. > >> How can I get files off this as I get errors saying no space on device?? > > Your process _should_ be working, but if you are moving small files the > effects may > be slow. As mentioned in the "lfs_migrate" man page, you should select > large files > to migrate, since you will get better IO performance, and will free space > more quickly. > > Cheers, Andreas > -- > Andreas Dilger > > Lustre Software Architect > Intel High Performance Data Division > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Alastair Ferguson
2013-Jun-19 01:41 UTC
Fwd: lctl --device XX deactivate doesn''t make OST read only
Update - lfs df -h is not working correctly. It said I had 44M free 110% so I did: lfs find /data -O AC3-OST0010_UUID -size +20G Then it found /data/smarts/ksc_mq/am/03456.am so I did: cp -vp /data/smarts/ksc_mq/am/03456.am /data/smarts/ksc_mq/am/03456.am.bkp Then when it had finished: rm -f /data/smarts/ksc_mq/am/03456.am mv /data/smarts/ksc_mq/am/03456.am.bkp /data/smarts/ksc_mq/am/03456.am This file was 359GB therefore, lfs df -h HAS TO BE wrong. How can I make it right? Alastair Ferguson IT Manager Capital Markets CRC Limited (CMCRC) Telephone: +61 2 8088 4222 Mobile: +61 424 235 159 Fax: +61 2 8088 4201 www.cmcrc.com Capital Markets CRC Ltd - Confidential Communication The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given. Begin forwarded message:> From: Alastair Ferguson <aferguson-KXB8ZELdThkAvxtiuMwx3w@public.gmane.org> > Subject: Re: [Lustre-discuss] lctl --device XX deactivate doesn''t make OST read only > Date: 19 June 2013 9:14:45 AM AEST > To: Andreas Dilger <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, "lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org" <lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org> > > Sorry - final update. > > It appears that two osts are both still at 100% (don''t know how I got that wrong) and 40Mb space. > > I tried: > > lfs find /data -O AC3-OST000a_UUID -size +20G | lfs_migrate -y > > Now getting this: > > /data/smarts/ksc_mq/am/03723.am: llapi_semantic_traverse: Failed to open ''/data/home/zzhao/workspace/topical_collocation_model/results/ir_evaluation/sjmn2k_tng/DocLDA/k050-alpha0.10-gamma0.01/GibbsRun-2'': No such file or directory (2) > error: find failed for +20G. > rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) > rsync: write failed on "/data/smarts/ksc_mq/am/03723.am.tmp.N13504": No space left on device (28) > > > Also doing: > > lfs_migrate /data/workflow > > (8TB in size) > > & > > lfs_migrate /data/raw > > (15TB) > > and still: > > AC3-OST000a_UUID 14.3T 13.6T 46.1M 100% /data[OST:10] > AC3-OST0010_UUID 7.2T 6.8T 46.1M 100% /data[OST:16] > > We can''t run our processes because of the no space on device errors. Help! > > Alastair Ferguson > IT Manager > Capital Markets CRC Limited (CMCRC) > Telephone: +61 2 8088 4222 > Mobile: +61 424 235 159 > Fax: +61 2 8088 4201 > www.cmcrc.com > > > > Capital Markets CRC Ltd - Confidential Communication > The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given. > > > On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > >> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson-KXB8ZELdThkAvxtiuMwx3w@public.gmane.org> wrote: >>> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free >>> space around the other OSTs, so I do: >>> >>> lfs df -h >>> >>> Get this part as one of the OSTs I need to deactivate: >>> >>> AC3-OST000c_UUID 14.3T 13.6T 87.4M 100% >>> /data[OST:12] >>> >>> then >>> >>> lctl dl >>> >>> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >>> >>> Then >>> >>> lctl --device 19 deactivate >>> >>> then >>> >>> lctl dl: >>> >>> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5 >>> >>> Should be read only right>>? >> >> Right, this is the MDS OSC device, so no new files should be allocated on >> that OST. >> >>> Then >>> >>> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >> >>> ost000c_raw.txt >>> >>> To find the files in the filesystem (/data) and strip out all the stuff >>> you don''t need. Then: >>> >>> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv >>> "$line.___bak" "$line"; done < ost000c_raw.txt >>> >>> This should move the data off the OST but it doesn''t. I have used this >>> procedure before to remove data from a whole server (which worked) and I >>> can see when I lfs df -h >>> the ost emptying but in this case it goes up and down suggesting it is >>> copying BACK to the same OST despite the fact it is IN not UP when lctl >>> dl is run. >> >> You should look at "lfs_migrate" and its man page, for a more robust >> mechanism for >> doing the above migration. Your script is unsafe if interrupted after "rm >> -f" but >> before "mv" moves the old file into place. You can also use "lfs_migrate" >> in a >> pipeline, so that it only moves new files, while your script would re-move >> the same >> files repeatedly if interrupted and restarted. >> >>> How can I get files off this as I get errors saying no space on device?? >> >> Your process _should_ be working, but if you are moving small files the >> effects may >> be slow. As mentioned in the "lfs_migrate" man page, you should select >> large files >> to migrate, since you will get better IO performance, and will free space >> more quickly. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> >> Lustre Software Architect >> Intel High Performance Data Division >> >> >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss