Hello, I''m still playing with Lustre. I wanted to know what happens, if one of OSTs disappear. So I created a file (it seems that the file is stored only to one OST, is it expected?), I unmounted the OST from the metadata server, I tried to read the file and the reading process is frozen (is is about 18 hours now). I would expect that I/O error will be returned. In dmesg I have: [75510.730390] Lustre: 2111:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages [76060.424140] LustreError: 11-0: an error occurred while communicating with 195.113.235.225 at tcp. The ost_connect operation failed with -19 [76060.424197] LustreError: Skipped 24 previous similar messages [76135.381367] Lustre: 2111:0:(import.c:395:import_select_connection()) l_smaug2-OST0002-osc-ffff8104371bbc00: tried all connections, increasing latency to 21s [76135.381432] Lustre: 2111:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages [76685.075827] LustreError: 11-0: an error occurred while communicating with 195.113.235.225 at tcp. The ost_connect operation failed with -19 [76685.075885] LustreError: Skipped 24 previous similar messages [76760.031536] Lustre: 2111:0:(import.c:395:import_select_connection()) l_smaug2-OST0002-osc-ffff8104371bbc00: tried all connections, increasing latency to 51s [76760.031601] Lustre: 2111:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages [77309.725917] LustreError: 11-0: an error occurred while communicating with 195.113.235.225 at tcp. The ost_connect operation failed with -19 [77309.725977] LustreError: Skipped 24 previous similar messages [77384.683624] Lustre: 2111:0:(import.c:395:import_select_connection()) l_smaug2-OST0002-osc-ffff8104371bbc00: tried all connections, increasing latency to 51s [77384.683689] Lustre: 2111:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages -- Luk?? Hejtm?nek
On Fri, Oct 17, 2008 at 10:47:16AM +0200, Lukas Hejtmanek wrote:> So I created a file (it seems that the file is stored only to one OST, is it > expected?), I unmounted the OST from the metadata server, I tried to read the > file and the reading process is frozen (is is about 18 hours now).By default, the client just waits for the OST to be available again (it also tries to reach the OST through the failover partners, if any) and does not return any errors to the application (needed for transparent failover). The lustre manual provides more information about this: http://manual.lustre.org/manual/LustreManual16_HTML/Failover.html#50642999_pgfId-5529 More particularly: http://manual.lustre.org/manual/LustreManual16_HTML/Failover.html#50642999_pgfId-1287626 http://manual.lustre.org/manual/LustreManual16_HTML/Failover.html#50642999_pgfId-1287643> I would expect that I/O error will be returned.If you want to get EIO, you should use the failout mode. This can be set at mkfs time: mkfs.lustre ... --param="failover.mode=failout" $dev or with tunefs.lustre: tunefs.lustre --writeconf --param="failover.mode=failout" $dev Johann