Hi all, This is kind of a followup question of the thread "One or two OSS, no difference?" last month. In that thread, Andreas stated: "There is work currently underway to improve the SMP scaling performance for the RPC handling layer in Lustre. Currently that limits the delivered RPC rate to 10-15k/sec or so." My question is: is the limitation also applied to RDMA on IB? By SMP, I guess Andreas was talking about CPU, right? Since RDMA can bypass the host CPU, does it mean it can also bypass the limitation? Thanks, Jiahua
Yes, the RPC rate is limited by Lustre code locking to that rate, even with rdma. Kevin On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote:> Hi all, > > This is kind of a followup question of the thread "One or two OSS, no > difference?" last month. In that thread, Andreas stated: > > "There is work currently underway to improve the SMP scaling > performance for the RPC handling layer in Lustre. Currently that > limits the delivered RPC rate to 10-15k/sec or so." > > My question is: is the limitation also applied to RDMA on IB? By SMP, > I guess Andreas was talking about CPU, right? Since RDMA can bypass > the host CPU, does it mean it can also bypass the limitation? > > Thanks, > Jiahua > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
You mean it is inherent in the code? Can you point me to the actual code if possible? I am just curious why. Any pointers or hints will be appreciated. Thanks, Jiahua On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com> wrote:> Yes, the RPC rate is limited by Lustre code locking to that rate, even with > rdma. > > Kevin > > > On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote: > >> Hi all, >> >> This is kind of a followup question of the thread "One or two OSS, no >> difference?" last month. In that thread, Andreas stated: >> >> "There is work currently underway to improve the SMP scaling >> performance for the RPC handling layer in Lustre. ?Currently that >> limits the delivered RPC rate to 10-15k/sec or so." >> >> My question is: is the limitation also applied to RDMA on IB? By SMP, >> I guess Andreas was talking about CPU, right? Since RDMA can bypass >> the host CPU, does it mean it can also bypass the limitation? >> >> Thanks, >> Jiahua >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >
It''s a kind of story like: "if you have to take dozens of global locks on lifetime of a RPC, then the code can''t scale well on large SMP system, not matter what kind of network you are using?, so the problem is scattered everywhere. Also, we are trying to reduce RPC bounce between CPUs, in current code, a request can be received by CPU A, then queued on CPU B, processed by CPU C, and replied by CPU D, it''s very bad on large SMP system because of data traffic between CPUs. Regards Liang Jiahua wrote:> You mean it is inherent in the code? Can you point me to the actual > code if possible? I am just curious why. Any pointers or hints will be > appreciated. > > Thanks, > Jiahua > > > On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com> wrote: > >> Yes, the RPC rate is limited by Lustre code locking to that rate, even with >> rdma. >> >> Kevin >> >> >> On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote: >> >> >>> Hi all, >>> >>> This is kind of a followup question of the thread "One or two OSS, no >>> difference?" last month. In that thread, Andreas stated: >>> >>> "There is work currently underway to improve the SMP scaling >>> performance for the RPC handling layer in Lustre. Currently that >>> limits the delivered RPC rate to 10-15k/sec or so." >>> >>> My question is: is the limitation also applied to RDMA on IB? By SMP, >>> I guess Andreas was talking about CPU, right? Since RDMA can bypass >>> the host CPU, does it mean it can also bypass the limitation? >>> >>> Thanks, >>> Jiahua >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Thanks for your answers! More questions: * Do you only lock for writes? What if I only read? Do you still lock even for simultaneous reads? * Is the limitation system wide or just in one server? That is, can I improve the performance by adding more OSS or OST? * By RPC bouncing, are you talking about the Linux storage stack? It is not inherent to Lustre, right? Thanks, Jiahua On Tue, Apr 13, 2010 at 8:43 PM, Liang Zhen <Zhen.Liang at sun.com> wrote:> It''s a kind of story like: "if you have to take dozens of global locks on > lifetime of a RPC, then the code can''t scale well on large SMP system, not > matter what kind of network you are using?, so the problem is scattered > everywhere. > Also, we are trying to reduce RPC bounce between CPUs, in current code, a > request can be received by CPU A, then queued on CPU B, processed by CPU C, > and replied by CPU D, it''s very bad on large SMP system because of data > traffic between CPUs. > > Regards > Liang > > Jiahua wrote: >> >> You mean it is inherent in the code? Can you point me to the actual >> code if possible? I am just curious why. Any pointers or hints will be >> appreciated. >> >> Thanks, >> Jiahua >> >> >> On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com> >> wrote: >> >>> >>> Yes, the RPC rate is limited by Lustre code locking to that rate, even >>> with >>> rdma. >>> >>> Kevin >>> >>> >>> On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote: >>> >>> >>>> >>>> Hi all, >>>> >>>> This is kind of a followup question of the thread "One or two OSS, no >>>> difference?" last month. In that thread, Andreas stated: >>>> >>>> "There is work currently underway to improve the SMP scaling >>>> performance for the RPC handling layer in Lustre. ?Currently that >>>> limits the delivered RPC rate to 10-15k/sec or so." >>>> >>>> My question is: is the limitation also applied to RDMA on IB? By SMP, >>>> I guess Andreas was talking about CPU, right? Since RDMA can bypass >>>> the host CPU, does it mean it can also bypass the limitation? >>>> >>>> Thanks, >>>> Jiahua >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > >
Sorry to send it again! Can anyone help? Jiahua On Tue, Apr 13, 2010 at 10:45 PM, Jiahua <jiahua at gmail.com> wrote:> Thanks for your answers! More questions: > > * Do you only lock for writes? What if I only read? Do you still lock > even for simultaneous reads? > * Is the limitation system wide or just in one server? That is, can I > improve the performance by adding more OSS or OST? > * By RPC bouncing, are you talking about the Linux storage stack? It > is not inherent to Lustre, right? > > Thanks, > Jiahua > > > On Tue, Apr 13, 2010 at 8:43 PM, Liang Zhen <Zhen.Liang at sun.com> wrote: >> It''s a kind of story like: "if you have to take dozens of global locks on >> lifetime of a RPC, then the code can''t scale well on large SMP system, not >> matter what kind of network you are using?, so the problem is scattered >> everywhere. >> Also, we are trying to reduce RPC bounce between CPUs, in current code, a >> request can be received by CPU A, then queued on CPU B, processed by CPU C, >> and replied by CPU D, it''s very bad on large SMP system because of data >> traffic between CPUs. >> >> Regards >> Liang >> >> Jiahua wrote: >>> >>> You mean it is inherent in the code? Can you point me to the actual >>> code if possible? I am just curious why. Any pointers or hints will be >>> appreciated. >>> >>> Thanks, >>> Jiahua >>> >>> >>> On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com> >>> wrote: >>> >>>> >>>> Yes, the RPC rate is limited by Lustre code locking to that rate, even >>>> with >>>> rdma. >>>> >>>> Kevin >>>> >>>> >>>> On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote: >>>> >>>> >>>>> >>>>> Hi all, >>>>> >>>>> This is kind of a followup question of the thread "One or two OSS, no >>>>> difference?" last month. In that thread, Andreas stated: >>>>> >>>>> "There is work currently underway to improve the SMP scaling >>>>> performance for the RPC handling layer in Lustre. ?Currently that >>>>> limits the delivered RPC rate to 10-15k/sec or so." >>>>> >>>>> My question is: is the limitation also applied to RDMA on IB? By SMP, >>>>> I guess Andreas was talking about CPU, right? Since RDMA can bypass >>>>> the host CPU, does it mean it can also bypass the limitation? >>>>> >>>>> Thanks, >>>>> Jiahua >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> >> >
Jiahua wrote:> Sorry to send it again! Can anyone help? > > Jiahua > > > On Tue, Apr 13, 2010 at 10:45 PM, Jiahua <jiahua at gmail.com> wrote: > >> Thanks for your answers! More questions: >> >> * Do you only lock for writes? What if I only read? Do you still lock >> even for simultaneous reads? >>"lock" here is synchronization of operating system, not dlm lock.>> * Is the limitation system wide or just in one server? That is, can I >> improve the performance by adding more OSS or OST? >>SMP improvements are for performance of handling small RPCs, so it''s mostly for metadata performance or I/O performance on NUMA system, it''s about how to fully drive machines, not about scalability of whole cluster.>> * By RPC bouncing, are you talking about the Linux storage stack? It >> is not inherent to Lustre, right? >>it is about lustre stack.>> Thanks, >> Jiahua >> >> >> On Tue, Apr 13, 2010 at 8:43 PM, Liang Zhen <Zhen.Liang at sun.com> wrote: >> >>> It''s a kind of story like: "if you have to take dozens of global locks on >>> lifetime of a RPC, then the code can''t scale well on large SMP system, not >>> matter what kind of network you are using?, so the problem is scattered >>> everywhere. >>> Also, we are trying to reduce RPC bounce between CPUs, in current code, a >>> request can be received by CPU A, then queued on CPU B, processed by CPU C, >>> and replied by CPU D, it''s very bad on large SMP system because of data >>> traffic between CPUs. >>> >>> Regards >>> Liang >>> >>> Jiahua wrote: >>> >>>> You mean it is inherent in the code? Can you point me to the actual >>>> code if possible? I am just curious why. Any pointers or hints will be >>>> appreciated. >>>> >>>> Thanks, >>>> Jiahua >>>> >>>> >>>> On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com> >>>> wrote: >>>> >>>> >>>>> Yes, the RPC rate is limited by Lustre code locking to that rate, even >>>>> with >>>>> rdma. >>>>> >>>>> Kevin >>>>> >>>>> >>>>> On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote: >>>>> >>>>> >>>>> >>>>>> Hi all, >>>>>> >>>>>> This is kind of a followup question of the thread "One or two OSS, no >>>>>> difference?" last month. In that thread, Andreas stated: >>>>>> >>>>>> "There is work currently underway to improve the SMP scaling >>>>>> performance for the RPC handling layer in Lustre. Currently that >>>>>> limits the delivered RPC rate to 10-15k/sec or so." >>>>>> >>>>>> My question is: is the limitation also applied to RDMA on IB? By SMP, >>>>>> I guess Andreas was talking about CPU, right? Since RDMA can bypass >>>>>> the host CPU, does it mean it can also bypass the limitation? >>>>>> >>>>>> Thanks, >>>>>> Jiahua >>>>>> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> Lustre-discuss at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>>> >>>>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>>