pascal.deveze at bull.net
2009-Jun-05 15:34 UTC
[Lustre-discuss] Réf. : Re: Réf. : Re: [ ROMIOReq #940] a new Lustre ADIO driver]
Rob,> i_noncontig and noncontig require fcntl() locks, which Lustre supports > only if you mount with a special mount option (I don''t remember what > that is). Was that the cause for the ''abort'' (it should be pretty > clear from the error messages).mount | grep lustre: XX.XX.XX.XX at tcp:/romio on /mnt/romio type lustre (rw,user_xattr,acl,flock) Error message (same for i_noncontig and noncontig): rank 0 in job 180 inti12_52233 caused collective abort of all ranks exit status of rank 0: killed by signal 9> > If coll_test passed, that''s very good progress. > > Can you tell me more about the noncontig_coll and noncontig_coll2 test > failures?noncontig_coll: Problem of data in the file Process 1: buf 1 is 0, should be 5001 Process 1: buf 3 is 0, should be 5003 Process 1: buf 5 is 0, should be 5005 Process 1: buf 7 is 0, should be 5007 Process 1: buf 9 is 0, should be 5009 Process 1: buf 11 is 0, should be 5011 Process 1: buf 13 is 0, should be 5013 Process 1: buf 15 is 0, should be 5015 Process 1: buf 17 is 0, should be 5017 Process 1: buf 19 is 0, should be 5019 Process 1: buf 21 is 0, should be 5021 Process 1: buf 23 is 0, should be 5023 ............. noncontig_coll2: Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(476)..................: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier(82)...................: MPIC_Sendrecv(164).................: MPIC_Wait(405).....................: MPIDI_CH3I_Progress(149)...........: MPID_nem_mpich2_blocking_recv(1074): MPID_nem_tcp_connpoll(1667)........: state_commrdy_handler(1517)........: MPID_nem_tcp_recv_handler(1413)....: socket closed Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier(82).............: MPIC_Sendrecv(158)...........: MPID_Isend(113)..............: failure occurred while attempting to send an eager message MPIDI_CH3_iSend(29)..........: MPID_nem_tcp_iSendContig(371): writev to socket failed - Broken pipe rank 0 in job 183 inti12_52233 caused collective abort of all ranks I''ll investigate more next week. Pascal