thr3ads.net - Xen devel - [Xen-devel] Memory Trace Project [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Sameer Pramod Niphadkar

2011-Aug-26 09:36 UTC

[Xen-devel] Memory Trace Project

Hi guys,

I hope to get your valuable inputs to this pet project of mine, please do
feel free to mention your ideas, suggestions and recommendations for the
same.

I''ve collected a huge number of memory traces almost 10 GB of data.
These
memory traces were gathered from a set of servers, desktops, and laptops in
a university CS Department. Each trace file contains a list of hashes
representing the contents of the machine''s memory, as well as some meta
information about the running processes and OS type.

The traces have been grouped by type and date. Traces were recorded
approximately every 30 minutes, although if machines were turned off or away
from an internet connection for a long period, no traces were acquired. Each
trace file is split into two portions. The top segment is ASCII text
containing the system meta data about operating system type and a list of
running processes. This is followed by binary data containing the list of
hashes generated for each page in the system. Hashes are stored as
consecutive 32bit values. There is a simple tool called "traceReader"
for
extracting the hashes from a trace file. This takes as an argument the file
to be parsed, and will output the hash list as a series of integer values.
If you would like to compare to traces to estimate the amount of sharing
between them, you could run:

./traceReader trace-x.dat > trace-all
./traceReader trace-y.dat >> trace-all
cat trace-all | sort | uniq -c

This will tell you the number of times that each hash occurs in the system.

Now my idea is to take the trace for every interval (every 30 mins) for each
of the systems and find the frequency of each memory hash. I then plan to
collect the highest frequencies (hashes maximally occurring) of the entire
hour (60 mins) and then divide the memory into ''k'' different
patterns based
on the count of these frequencies. Like for instance say hashes 14F430C8
,1550068, 15AD480A, 161384B6, 16985213, 17CA274B, 18E5F038 and 1A3329 have
the highest frequencies then I might divide the memory into 8 patterns
(k=8). I plan to use the Approximate Nearest neighbor algorithm (ANN)
http://www.cs.umd.edu/~mount/ANN/ for this division. In ANN one needs to
provide a set of query points, data points and dimensions. I guess in my
case my query points can be all the remaining hashes other than the highest
frequency ones, the data points are all the hashes for the hour and
dimension can be 1. I can thus formulate the memory patterns for every hour,
I then plan to formulate memory patterns for every 3 hrs, 6 hrs, 12 hrs and
finally all the 24 hrs. Armed with these statistics, I plan to compare the
patterns based on the time of the day. I hope to provide certain overlap
with the patterns and create what I call as "heat zones" for memory
based on
the time of the day and finally come up with a suitable report for the same.


The entire objective of this project is to provide a sort of relation
between the memory page access and the interval of time of the day. So for
specific intervals there are certain memory "heat zones". I understand
that
these "heat zones" might change and may not be consistent with every
system
and user. The study here intends to only establish this relationship and
doesn''t do any kind of qualitative or quantitative analysis of these
heat
zones per system and user. The above can be considered to be an extension of
this work.

Please feel free to comment and suggest for any new insights


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sameer Pramod Niphadkar

2011-Aug-27 06:55 UTC

head link

[Xen-devel] Memory Trace Project

Hi guys,

I hope to get your valuable inputs to this pet project of mine, please do
feel free to mention your ideas, suggestions and recommendations for the
same.

I''ve collected a huge number of memory traces almost 10 GB of data.
These
memory traces were gathered from a set of servers, desktops, and laptops in
a university CS Department. Each trace file contains a list of hashes
representing the contents of the machine''s memory, as well as some meta
information about the running processes and OS type.

The traces have been grouped by type and date. Traces were recorded
approximately every 30 minutes, although if machines were turned off or away
from an internet connection for a long period, no traces were acquired. Each
trace file is split into two portions. The top segment is ASCII text
containing the system meta data about operating system type and a list of
running processes. This is followed by binary data containing the list of
hashes generated for each page in the system. Hashes are stored as
consecutive 32bit values. There is a simple tool called "traceReader"
for
extracting the hashes from a trace file. This takes as an argument the file
to be parsed, and will output the hash list as a series of integer values.
If you would like to compare to traces to estimate the amount of sharing
between them, you could run:

./traceReader trace-x.dat > trace-all
./traceReader trace-y.dat >> trace-all
cat trace-all | sort | uniq -c

This will tell you the number of times that each hash occurs in the system.

Now my idea is to take the trace for every interval (every 30 mins) for each
of the systems and find the frequency of each memory hash. I then plan to
collect the highest frequencies (hashes maximally occurring) of the entire
hour (60 mins) and then divide the memory into ''k'' different
patterns based
on the count of these frequencies. Like for instance say hashes 14F430C8
,1550068, 15AD480A, 161384B6, 16985213, 17CA274B, 18E5F038 and 1A3329 have
the highest frequencies then I might divide the memory into 8 patterns
(k=8). I plan to use the Approximate Nearest neighbor algorithm (ANN)
http://www.cs.umd.edu/~mount/ANN/ for this division. In ANN one needs to
provide a set of query points, data points and dimensions. I guess in my
case my query points can be all the remaining hashes other than the highest
frequency ones, the data points are all the hashes for the hour and
dimension can be 1. I can thus formulate the memory patterns for every hour,
I then plan to formulate memory patterns for every 3 hrs, 6 hrs, 12 hrs and
finally all the 24 hrs. Armed with these statistics, I plan to compare the
patterns based on the time of the day. I hope to provide certain overlap
with the patterns and create what I call as "heat zones" for memory
based on
the time of the day and finally come up with a suitable report for the same.


The entire objective of this project is to provide a sort of relation
between the memory page access and the interval of time of the day. So for
specific intervals there are certain memory "heat zones". I understand
that
these "heat zones" might change and may not be consistent with every
system
and user. The study here intends to only establish this relationship and
doesn''t do any kind of qualitative or quantitative analysis of these
heat
zones per system and user. The above can be considered to be an extension of
this work.

Please feel free to comment and suggest for any new insights


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sameer Pramod Niphadkar

2011-Aug-31 05:12 UTC

head link

Re: [Xen-devel] Memory Trace Project

On Wed, Aug 31, 2011 at 1:11 AM, Dushmanta Mohapatra <dmpatra@gmail.com>
wrote:>
>
> On Tue, Aug 30, 2011 at 7:09 AM, Sameer Pramod Niphadkar
> <spniphadkar@gmail.com> wrote:
>>
>> On Mon, Aug 29, 2011 at 10:20 PM, Dushmanta Mohapatra
<dmpatra@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am Dushmanta, a fellow CS PhD student at GaTech.
>> >
>> > I also work in memory related areas in virtualization.
>> > I am interested in learning more about this project. So
>> > would you be willing to discuss about your project and
>> > may be other memory related research.
>> >
>> > Please let me know.
>> >
>> > Dushmanta
>> >
>>
>> Hi Dushmanta
>>
>> I''m glad for your interest in this project. I hope you can
offer your
>> insights to the same.
>>
>> Well... the idea of page sharing is not very new in systems both
>> physical and virtualized ones and there are different ways/means to
>> implement it. But I thought we could extend the idea of a working set
>> model further more - so as most of us know that identical systems
>> belonging in a particular network and during a time frame might end up
>
> What is network referring to here?
The network is the subnet to which all the physical systems are
connected. In this case the university CS department subnet from which
all the traces are collected. Please ref :
http://traces.cs.umass.edu/index.php/CpuMem/CpuMem>
>>
>> accessing similar physical memory frame blocks. (A block here being
>> groups of frames) I intend to find if there is any kind of correlation
>> between this universal time period and the page access.
>
> Also I do not understand what universal time period is referring to?
>Universal time is the actual time when the traces were collected
>> I intent to
>> see if the working set analogy can be applied to the entire memory
>> address space for access with respect to a universal time interval. I
>> mean if there exists some sort of a pattern emerging for physical
>> memory access based on time and space.
>>
>
> Does the entire memory address space refer to memory contents of all
> the VMs?
Yes..and it may not be necessary for the system to host VMs, I mean at
present  we may not care if it does.
> I am kind of guessing that the universal time interval is just referring to
> a time interval over which you analyze the memory access pattern of
> all the VMs and try to analyze them.
>The idea here is to do real analysis of memory access patterns over a
period of time. For VMs it may be even more important to do so as most
VMs end up sharing loads of memory. I just want to know if this seems
a feasible project as this would involve loads of randomization with
systems executing many different processes over a period of time and
with each of them having some sort of page caching as well a page
faults.  So the first step here would be see if this seems to be
possible with the aid of some sort of pattern classifying algorithm. I
also hope to find any other research done previously on memory access
using traces.
>
>>
>> Please feel free to express your opinions ideas about the same.
I''m
>> also looking for a good pattern classification algorithm - similar to
>> approximate nearest neighbor for 2 dimensional data analysis (Time Vs
>> Memory Address)
>>
>> regards
>> Sameer
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2011 - Memory Trace Project

[Xen-devel] Memory Trace Project

[Xen-devel] Memory Trace Project

Re: [Xen-devel] Memory Trace Project