Dear all, Though I have read the lustre manual roughly, but i am wondering about the MDS operation flow: 1) Clients get the layout and capabilities from MDS, then do IO operation. While the client modify the stripes, then the modification time(Mtime) should be updated. Who(clients or OSSs) and When(after closing the file or after the modification ) does send the update request to MDS. 2) I have read several documents, but about unlinking a file, after unlinking all stripes, who does send the message about all stripe have been unlinked. One says, it is the clients who want to delte that file(in the manual), but another one says, it is the OSTs(in Xyratex Lustre Architecture Priorities Overview)? 3) During the creation, the MDS may ask the OSSs to allocation the available stripes, then the clients can write or append the data. But who does keep the information of available space in the stripe? For example, while all stripe are used up, so, another new stripe(or stripes) is needed, how do things go in this situation? (first who find there is no available space, and then send request to MDS to allocation a new stripe?) 4) Is the opened_file list kept in the acitve MDS'' memory? Maybe the description is involoved, does anyone give me some hints. Thank you very much, Best regards, Liao
On 2011-07-01, at 12:54 PM, Jianwei Liao <liaotoad at 163.com> wrote:> Though I have read the lustre manual roughly, but i am wondering about > the MDS operation flow: > 1) Clients get the layout and capabilities from MDS, then do IO > operation. While the client modify the stripes, then the modification > time(Mtime) should be updated. Who(clients or OSSs) and When(after > closing the file or after the modification ) does send the update > request to MDS.The mtime value is distributed over all of the servers that store part of the file. When clients write data to an OST (as each RPC is sent) the mtime+ctime from the client is stored on that OST object. If the client is setting timestamps on the file, the mtime+ctime from the client is stored on the MDT inode. When the client is retrieving the mtime, it accesses the MDT inode and all OST objects where the file is striped and uses the mtime on the node with the newest ctime.> 2) I have read several documents, but about unlinking a file, after > unlinking all stripes, who does send the message about all stripe have > been unlinked. One says, it is the clients who want to delete that > file (in the manual), but another one says, it is the OSTs(in Xyratex > Lustre Architecture Priorities Overview)?The manual is correct. Don''t be confused by the Architectural Priorities document. That only contains features which do not exist yet.> 3) During the creation, the MDS may ask the OSSs to allocation the > available stripes, then the clients can write or append the data. But > who does keep the information of available space in the stripe?The client and OST continually negotiate how much space is available for it to keep in the client writeback cache, so that the client does not cache unwritten data that cannot fit on the OST. That is called "space grant".> For example, while all stripe are used up, so, another new stripe(or > stripes) is needed, how do things go in this situation? (first who find > there is no available space, and then send request to MDS to allocation > a new stripe?)There is currently no mechanism for adding more stripes to a file if one of the OSTs is full. Each file can write to a stripe until the OST has no more space. This is unlike other distributed filesystems where a "stripe" is a fixed or maximum sized unit of space.> 4) Is the opened_file list kept in the acitve MDS'' memory?Yes. There is one such list per client. Cheers, Andreas
Dear Dilger, Thank you for your reply ! But I still have something unclear. On 07/03/2011 08:54 AM, Andreas Dilger wrote:> On 2011-07-01, at 12:54 PM, Jianwei Liao<liaotoad at 163.com> wrote: >> Though I have read the lustre manual roughly, but i am wondering about >> the MDS operation flow: >> 1) Clients get the layout and capabilities from MDS, then do IO >> operation. While the client modify the stripes, then the modification >> time(Mtime) should be updated. Who(clients or OSSs) and When(after >> closing the file or after the modification ) does send the update >> request to MDS. > The mtime value is distributed over all of the servers that store part of the file. When clients write data to an OST (as each RPC is sent) the mtime+ctime from the client is stored on that OST object. If the client is setting timestamps on the file, the mtime+ctime from the client is stored on the MDT inode. > > When the client is retrieving the mtime, it accesses the MDT inode and all OST objects where the file is striped and uses the mtime on the node with the newest ctime. > >> 2) I have read several documents, but about unlinking a file, after >> unlinking all stripes, who does send the message about all stripe have >> been unlinked. One says, it is the clients who want to delete that >> file (in the manual), but another one says, it is the OSTs(in Xyratex >> Lustre Architecture Priorities Overview)? > The manual is correct. Don''t be confused by the Architectural Priorities document. That only contains features which do not exist yet. > >> 3) During the creation, the MDS may ask the OSSs to allocation the >> available stripes, then the clients can write or append the data. But >> who does keep the information of available space in the stripe? > The client and OST continually negotiate how much space is available for it to keep in the client writeback cache, so that the client does not cache unwritten data that cannot fit on the OST. That is called "space grant". > >> For example, while all stripe are used up, so, another new stripe(or >> stripes) is needed, how do things go in this situation? (first who find >> there is no available space, and then send request to MDS to allocation >> a new stripe?) > There is currently no mechanism for adding more stripes to a file if one of the OSTs is full. Each file can write to a stripe until the OST has no more space. This is unlike other distributed filesystems where a "stripe" is a fixed or maximum sized unit of space. > >It seems that answer conflicts with the previous one. If the client does not cache unwritten data that cannot fit on OST, how does the file write to a stripe until the OST has no more space? And this stripe is the last one in the layout?>> 4) Is the opened_file list kept in the acitve MDS'' memory? > Yes. There is one such list per client. > > Cheers, Andreasbest regards, Liao