Hi, I''m not sure if dtrace-discuss is a good place to put this but I''ll run it though this group anyway. Solaris doesn''t seem to like the way SunDS 5.2 uses poll(2). What I''m seeing is that SunDS inserts new pollfd_t near the head of the array it passes to poll(2). The problem is that this causes Solaris to invalidate the existing pcacheset even though pcacheset is only "shifted". I got this answer by watching pcache_cmp(). I can supply the D-script if anyone is interested. On my system, this resulted in pollsys() running in 2 distinct time buckets of 4-8ms vs 62-200us system time. This is due to the ns-slapd polling 1k+ fds. The more expensive bucket is happening >50% of the time. Even then, we are looking at a little over 5% CPU time on a 900MHz US3 but it gets worse as the connection counts increases. To get around this, either Solaris need to get a more intelligent (expensive!) pcache*() or SunDS need to change the way it maintains its pollfd_t[]. The latter seems easier. In any case, the lesson here is that poll(2) is extremely sensitive to the way you expand pollfd_t[]. btw, is there a way to know which fds fired when poll(2) returns. We can of course watch any subsequent read/write(2) calls but I will appreciate something closer. Finally, I''m declaring my addiction to DTrace & OpenSolaris. I wouldn''t know where to start tracking the problem much less get this close in half a day. Great tools!!! Now, if I can have my hands on SunDS codes as well... -- Just me, Wire ...
On Tue, Aug 30, 2005 at 08:42:47PM +0800, Wee Yeh Tan wrote:> Hi, > > I''m not sure if dtrace-discuss is a good place to put this but I''ll > run it though this group anyway. > > Solaris doesn''t seem to like the way SunDS 5.2 uses poll(2). What I''m > seeing is that SunDS inserts new pollfd_t near the head of the array > it passes to poll(2). The problem is that this causes Solaris to > invalidate the existing pcacheset even though pcacheset is only > "shifted". I got this answer by watching pcache_cmp(). I can supply > the D-script if anyone is interested.That seems wasteful, in any case; doesn''t it have to do a memmove() or something to shift the array?> On my system, this resulted in pollsys() running in 2 distinct time > buckets of 4-8ms vs 62-200us system time. This is due to the ns-slapd > polling 1k+ fds. The more expensive bucket is happening >50% of the > time. Even then, we are looking at a little over 5% CPU time on a > 900MHz US3 but it gets worse as the connection counts increases. > > To get around this, either Solaris need to get a more intelligent > (expensive!) pcache*() or SunDS need to change the way it maintains > its pollfd_t[]. The latter seems easier. In any case, the lesson > here is that poll(2) is extremely sensitive to the way you expand > pollfd_t[].<nod>> btw, is there a way to know which fds fired when poll(2) returns. We > can of course watch any subsequent read/write(2) calls but I will > appreciate something closer.file descriptor of interest. The array''s members are pollfd structures, which contain the following members: int fd; /* file descriptor */ short events; /* requested events */ short revents; /* returned events */ The ''revents'' field will be non-zero for the fd if it had events. Or are you trying to do this from dtrace? Cheers, - jonathan> Finally, I''m declaring my addiction to DTrace & OpenSolaris. I > wouldn''t know where to start tracking the problem much less get this > close in half a day. Great tools!!! Now, if I can have my hands on > SunDS codes as well... > > > -- > Just me, > Wire ... > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Jonathan Adams, Solaris Kernel Development
On 8/31/05, Jonathan Adams <jonathan.adams at sun.com> wrote:> That seems wasteful, in any case; doesn''t it have to do a memmove() or > something to shift the array?Exactly! My guess is that SunDS could be more concerned with very short-lived connections and maintains a queue-like structure for that.> file descriptor of interest. The array''s members are pollfd > structures, which contain the following members: > > int fd; /* file descriptor */ > short events; /* requested events */ > short revents; /* returned events */ > > The ''revents'' field will be non-zero for the fd if it had events. > > Or are you trying to do this from dtrace?Yep, I''m hoping to do it by watching some events, perhaps pollwakeup() or something, rather than construct some sort of "loop". -- Just me, Wire ...