On 2015/1/27 15:08, Srinivas Eeda wrote:> Hi Yangwenfang,
>
> thank you very much for initiating this RFC :). This feature is long due
for OCFS2 and we are also interested in implementing this feature.
Wengang(cc'ed) has been looking into analysing and giving an attempt to
implement it. We haven't looked at splitting and merging the range locking
yet, but looked at having lock fairness and range locking. Wengang has done some
of the dlm changes to see how it can be done but other changes are still work in
progress. We will email more details in coming few days.
>
> Since you are also looking into it, it would be great if we can collaborate
work on this feature. Can you please share more info on the demo code you
mentioned ? Like what it does and how much work has been done on this ?
>
Hi,
About 6k lines of code was modified including dlmglue and dlm in our demo.
code modification:
1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including
many range locks which have different range.
determine the existance of conflicts betwen multiple threads within the
node.
manage the cache of range lock to support unlock-delay.
3.dlm: determine the existance of conflicts betwen multiple nodes.
add splitting and merging the range locking.
4.lib: interval tree.> One of the thing we considered was making the rw lock itself support range
locking, which is a different approach from what you mentioned. Is there any
reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>
RW lock can be used, but it is complicated to add the feature to rw_lock because
RW lock is also applicated in read/write/truncate.
Byte range lock is only beneficial for update write, so I just modify write IO
to finish the demo to get performance results as soon as possible.
I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to
ocfs2_rw_lock(ex)?am I rigth?> Thanks,
> --Srini
>
>
> Hi On 01/26/2015 04:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>>
>> Why:
>> Currently ocfs2 does not support byte range lock. Since multiple nodes
>> may concurrently update/write at different positions of the same file
>> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer
than
>> DB+GPFS in running TPCC.
>> Aiming at improving the efficiency of parallel accesses to the same
file,
>> we have implemented a demo of range lock feature which has been
supported
>> by lustre and GPFS, so that a file can be updated by different nodes in
>> the cluster when they are visiting different blocks.
>>
>> How:
>> Key issues in design and implementation:
>> 1.In ocfs2, each file only has one lock, which is incapable of telling
>> different position.
>> One solution is to add a range field (start,end) in a lock. For
example:
>> -ocfs2_lock_res(N1) dlm_lock_resource(Master)
ocfs2_lock_res(N2)
>> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9) N1
>> - dlm_lock(10,19) N2<--ocfs2_res_range_lock(10,19)
>> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29) N1
>> - dlm_lock(30,49) N2<--ocfs2_res_range_lock(30,49)
>> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59) N1
>> - dlm_lock(60,69) N2<--ocfs2_res_range_lock(60,69)
>>
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and
merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged
into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock
should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1
should
>> split the lock and keep (6,9)PR.
>>
>> 2.In ocfs2, there are only three types of lock resources: rw, inode and
open
>> which provide protections to different contents.
>> We need to add another lock resource(ip_range_lock_lockres) to protect
>> different ranges in IO read/write process.
>> For example: buffer read/write.
>> (1)ocfs2_file_aio_write ------------->ocfs2_file_aio_write
>> ocfs2_rw_lock(ex) ocfs2_rw_lock(pr)
>> ocfs2_range_lock(start, end, ex)
>> ocfs2_write_begin
>> ocfs2_inode_lock(ex) ocfs2_inode_lock(pr)
>> if append, update to ex;
>> (2)ocfs2_file_aio_read---------------> no need to change.
>> ocfs2_readpage
>> ocfs2_inode_lock(pr)
>> (3)but it is a problem in read_ahead.
>> ocfs2_readpages------------------>ocfs2_readpages
>> ocfs2_inode_lock(pr) ocfs2_inode_lock(pr)
>> ocfs2_range_lock(start, end, pr)
>>
>> Limitations based on our assumption:
>> 1.Byte range lock is only beneficial for update write.
>> 2.Too many locks because of delayed unlock.
>> 3.Significant source code modification is necessitated, involving
almost the
>> whole dlmglue and dlm modules.
>>
>> As described above, there are also many limitations base on our
assumption.
>> Many thanks for any advice.
>>
>> thanks.
>>
>
>
> .
>