thr3ads.net - Xen devel - [Xen-tools] RE: [Xen-devel] Hi,something about the xentrace tool [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Ian Pratt

2006-Jun-15 07:06 UTC

[Xen-tools] RE: [Xen-devel] Hi,something about the xentrace tool

> If overflow occurs, it is not handled. The mechanism I implemented was
> just designed to drastically reduce the probability of overflow.
It does count the number of lost trace messages and add a trace message
to that effect though, right?

Thanks,
Ian
> Currently, the trace buffer "high water" mark is set to 50%. That
is,
> when the hypervisor trace buffer becomes 1/2 full, it sends a soft
> interrupt to wake up xenbaked from its blocking select(). If nobody
> wakes up to read trace records from the trace buffer, I take that to
> mean that nobody cares about the trace records. When somebody does
care,> they will read those records in a timely manner. Obviously, the
> hypervisor cannot "block" if there is no room in the trace
buffers; In
> this case, new trace records simply overwrite old ones, and the old
ones> are lost.
> 
> If you encounter a situation where trace records are being generated
too> fast, and fill up the trace buffer too quickly, then the simple next
> step is to increase the size of the trace buffers. So far, use of the
> trace records has not been linked to anything so critical that
it''s
> necessary to take extraordinary measures to avoid loss of data.
> 
> Rob
> 
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-tools mailing list
Xen-tools@lists.xensource.com
http://lists.xensource.com/xen-tools

rickey berkeley

2006-Jun-15 08:58 UTC

head link

[Xen-users] Re: [Xen-devel] Hi,something about the xentrace tool

>
>
> > If you encounter a situation where trace records are being generated
> > fast, and fill up the trace buffer too quickly, then the simple next
> > step is to increase the size of the trace buffers. So far, use of the
> > trace records has not been linked to anything so critical that
it''s
> > necessary to take extraordinary measures to avoid loss of data.
> >
> > Rob
>Hi,Rob

as xentrace can be used as the performance tracing and debugging tool.
you mean when transfer the large amounts of data from kernel space to
the user space,xentrace use its own mechanisms to relay the data and balance
the transfer speed.And we can enlarge the buffer size if we want to save
more tracing raw data.

so,dose this mechanisms will effect the system performance evidently?as we
know ,copy huge raw data from kernel space to user space will exhaust so
much efficiency and system resource.

How about make use of relayfs? It is some kind of standardization of the way
in which large amounts of data are transferred from kernel space to user
space.

Anyway,it is just a piece of idea.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rob Gardner

2006-Jun-15 16:41 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

Ian Pratt wrote:>> If overflow occurs, it is not handled. The mechanism I implemented was
>> just designed to drastically reduce the probability of overflow.
>>     
>
> It does count the number of lost trace messages and add a trace message
> to that effect though, right?
>   
No, but I''ll add that to the list of things to do in the future.

Rob



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2006-Jun-15 17:03 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

rickey berkeley wrote:>   
> so,dose this mechanisms will effect the system performance 
> evidently?as we know ,copy huge raw data from kernel space to user 
> space will exhaust so much efficiency and system resource.
I wouldn''t call the amount of data ''huge''. Even on a
very busy system,
where there are thousands of trace records being generated every second, 
that''s still a pretty small amount of data. (The size of a trace record
is something like 50 or 60 bytes.) Also, the data is not "copied" from
kernel space to user space. There is a shared memory buffer which xen 
writes into, and the user app reads out of. Memory read speeds are 
currently in the Gb/s range. So to answer your question, I don''t think 
that this mechanism affects system performance in any significant way.
>  
> How about make use of relayfs? It is some kind of standardization of 
> the way in which large amounts of data are transferred from kernel 
> space to user space.
>  
If the data were only being transferred between the linux kernel and a 
linux app, then I''d say yeah, relayfs sounds like a cool thing to do. 
However, the trace records are generated by the xen hypervisor, not the 
linux kernel. The hypervisor doesn''t have relayfs (or any fs for that 
matter), so you''re stuck with involving the linux kernel which would 
read stuff from a shared hypervisor buffer, then present the data to 
userland via relayfs. Doesn''t sound like a better solution than what we
have now.


Rob



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2006-Jun-15 18:20 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

On 6/15/06, Rob Gardner <rob.gardner@hp.com>
wrote:> I wouldn''t call the amount of data ''huge''. Even
on a very busy system,
> where there are thousands of trace records being generated every second,
> that''s still a pretty small amount of data. (The size of a trace
record
> is something like 50 or 60 bytes.)
For the record, I think the trace record size in the trace buffers is
probably 32 bytes:

struct {
   unsigned long long rdtsc;  /* 8 */
   unsigned long event; /* + 4 = 12 */
   unsigned long data[5] /* + (4 * 5) = 32 */
};

The size on disk from xentrace is 36 bytes (it adds 4 bytes for the cpu).

If someone were really worried about copy time, one could write
something which uses raw disks (or, perhaps, the O_DIRECT flag) to DMA
data straight from the buffers to the disk.  But I''m not really
worried about it at this point. :-)

Peace,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2006-Jun-15 18:28 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

On 6/15/06, George Dunlap <dunlapg@umich.edu>
wrote:> For the record, I think the trace record size in the trace buffers is
> probably 32 bytes:
On 32-bit architectures, that is...

(Sorry for the 32-bit provincialism... haven''t coded on a 64-bit box
yet.)

-G

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2006-Jun-15 18:53 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

George Dunlap wrote:> On 6/15/06, Rob Gardner <rob.gardner@hp.com> wrote:
>> I wouldn''t call the amount of data ''huge''.
Even on a very busy system,
>> where there are thousands of trace records being generated every
second,
>> that''s still a pretty small amount of data. (The size of a
trace record
>> is something like 50 or 60 bytes.)
>
> For the record, I think the trace record size in the trace buffers is
> probably 32 bytes:
You''re right, I was thinking everything is 64 bits these days. In any 
case, it''s a small amount of data.
> If someone were really worried about copy time, one could write
> something which uses raw disks (or, perhaps, the O_DIRECT flag) to DMA
> data straight from the buffers to the disk.  
Once again, there is no explicit copying of the data between kernel and 
user space, so nobody should be worried about it.

Rob



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2006-Jun-19 01:06 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

On 6/15/06, Rob Gardner <rob.gardner@hp.com>
wrote:> > If someone were really worried about copy time, one could write
> > something which uses raw disks (or, perhaps, the O_DIRECT flag) to DMA
> > data straight from the buffers to the disk.
>
> Once again, there is no explicit copying of the data between kernel and
> user space, so nobody should be worried about it.
There''s no copying from the HV to the xentrace process.  But there is
copying from xentrace to the dom0 kernel for the output file.  Some
copying is necessary right now, because rather than writing out the
pages verbatim, xentrace writes out the pcpu before writing out each
record:

void write_rec(unsigned int cpu, struct t_rec *rec, FILE *out)
{
    size_t written = 0;
    written += fwrite(&cpu, sizeof(cpu), 1, out);
    written += fwrite(rec, sizeof(*rec), 1, out);
    if ( written != 2 )
    {
        PERROR("Failed to write trace record");
        exit(EXIT_FAILURE);
    }
}

If we wanted to make it zero copy all the way from the HV to the disk,
we could have the xentrace process one stream per cpu, and do
whatever''s necessary to use DMA.  (Does anyone know if O_DIRECT will
do direct DMA, or if one would have to use a raw disk?)

But I think we all seem to agree, this is not a high priority. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2006-Jun-19 05:00 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

George Dunlap wrote:>
> There''s no copying from the HV to the xentrace process.  But there
is
> copying from xentrace to the dom0 kernel for the output file.  Some
> copying is necessary right now, because rather than writing out the
> pages verbatim, xentrace writes out the pcpu before writing out each
> record:
>
> void write_rec(unsigned int cpu, struct t_rec *rec, FILE *out)
> {
>    size_t written = 0;
>    written += fwrite(&cpu, sizeof(cpu), 1, out);
>    written += fwrite(rec, sizeof(*rec), 1, out);
>    if ( written != 2 )
>    {
>        PERROR("Failed to write trace record");
>        exit(EXIT_FAILURE);
>    }
> }
>
> If we wanted to make it zero copy all the way from the HV to the disk,
> we could have the xentrace process one stream per cpu, and do
> whatever''s necessary to use DMA.  (Does anyone know if O_DIRECT
will
> do direct DMA, or if one would have to use a raw disk?) 

So you''re saying if we didn''t have to write the cpu number,
then we
could bypass stdio, and directly do a write() using the trace buffer? 
And this would be better because it would avoid a memory to memory copy, 
and use DMA immediately on the trace buffer memory? Do I understand you 
correctly? Assuming this is what you mean, allow me to correct a slight 
logic flaw. Stdio is there for a reason; Doing lots of raw I/O using 
very small buffers is highly inefficient. There''s the overhead of
kernel
entry/exit and of setting up and tearing down DMA transactions. And 
writing to a block device will result in I/O''s that are multiples of
the
devices'' block size, so writing a 32 byte trace record will probably 
cause a 512-byte block to actually be written to disk. So bypassing 
stdio in this case will result in lots more disk accesses, lots more dma 
setup/teardown, and lots more system calls. In other words, the 
performance is going to horrible. The Stdio library greatly reduces all 
this overhead by buffering stuff in memory until there''s enough to make
a genuine I/O relatively efficient. In this case, the memory copies are 
intentional and beneficial; We do not want to eliminate them in our 
quest for "zero copy".

Rob

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2006-Jun-19 14:02 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

On 6/19/06, Rob Gardner <rob.gardner@hp.com>
wrote:> Stdio is there for a reason; Doing lots of raw I/O using
> very small buffers is highly inefficient.
You misunderstand me. :-)  I meant to write out (via DMA) several
pages at a time, straight from the HV trace buffers.  The default tbuf
size in xentrace is 20 pages, so if (as the plan is) xentrace would be
notified when it would be half full, we could easily write out 10
pages in one transaction.  The tbuf size could be increased if DMA
setup/teardown overhead were an issue on that scale.

You''re right, for traces that fit in the file cache, buffering is a
big win.  The copy overhead is negligible, writes to disk are more
efficient, and the data will be in the file cache for reading for
subsequent analysis.  But for traces that won''t fit in the file cache,
the best thing would be to get them to disk with as little copying and
cache-trashing as possible.

Some of my recent traces have been on the order of 10 gigabytes.  I
haven''t done much to modify xentrace, because I''m not worried
about
the trace overhead at this point.  But I''ve had to pull some tricks to
get my analysis tools to run in anything like a reasonable amount of
time.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rob Gardner

2006-Jun-19 17:19 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

George Dunlap wrote:> You misunderstand me. :-)  I meant to write out (via DMA) several
> pages at a time, straight from the HV trace buffers.  The default tbuf
> size in xentrace is 20 pages, so if (as the plan is) xentrace would be
> notified when it would be half full, we could easily write out 10
> pages in one transaction.  The tbuf size could be increased if DMA
> setup/teardown overhead were an issue on that scale.
> ...
> Some of my recent traces have been on the order of 10 gigabytes.  I
> haven''t done much to modify xentrace, because I''m not
worried about
> the trace overhead at this point.  But I''ve had to pull some
tricks to
> get my analysis tools to run in anything like a reasonable amount of
> time.
I am glad to discover that I misunderstood you. ;) But I am still having 
trouble understanding what the actual problem is, or even if one exists.

If you have a trace that is 10 gigabytes, that''s several days (maybe 
weeks) worth of trace records, depending on the rate they''re generated.

A memory to memory copy of 10 gigabytes will take mere seconds on any 
modern machine, and amortized over a few days, I don''t see how
it''s
worth any work to further reduce that or eliminate it. Is the system so 
cpu-bound that the loss of a few seconds over several days is that 
serious? Even compared to the disk I/O to write out 10 gb, which is 
probably several minutes, I don''t see how the memory copies are a big 
deal. Perhaps kernel buffer cache effects are noticeable, but again at 
the data rate you''re talking about, the cache will only get completely 
purged once every 5 or 10 hours.

If your analysis tools take a long time to run, I''d guess it''s
because
of the size of the data, not because system resources are being hogged 
by xentrace; If you are generating that much data, maybe you consider 
methods to reduce it. Take a look the the trace code 
(xen/common/trace.c) and you''ll see that there is a facility to mask
out
tracing of certain events, classes of events, and cpu''s. You might use 
this to drastically reduce the number of trace records generated. For 
instance, if you are not interested in tracing I/O related events, you 
don''t want to be storing TRC_MEM records, which account for a large 
percentage of the trace records generated on a busy system.

Rob

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2006-Jun-21 19:02 UTC

head link

Re: [Xen-devel] Hi,something about the xentrace tool

On 6/19/06, Rob Gardner <rob.gardner@hp.com>
wrote:> I am glad to discover that I misunderstood you. ;) But I am still having
> trouble understanding what the actual problem is, or even if one exists.
Well, I ran some tests, and no problem exists, yet.  Running the following:

# time xentrace -e 0x81000 /tmp/test22-passmark.trace
change evtmask to 0x81000

real    7m15.456s
user    0m0.080s
sys     0m0.050s
# ls -l /tmp/test22-passmark.trace
-rw-r--r-- 1 root root 2654091720 Jun 21 14:49 /tmp/test22-passmark.trace

So although 2.6 gigabytes was generated in 7 minutes, the total time
spent in user and system (if the numbers time report are accurate) was
less than .13 seconds.

The only potential issues would be with cache trashing -- both the
buffer cache (from plain writes to the file), and the cpu caches (from
copying the data).  If anyone finds a workload this is a problem for,
we can look at it then.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jun 2006 - [Xen-tools] RE: Hi,something about the xentrace tool

[Xen-tools] RE: [Xen-devel] Hi,something about the xentrace tool

[Xen-users] Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool

Re: [Xen-devel] Hi,something about the xentrace tool