thr3ads.net - R help - [R] Producing multiple analyses (histograms/kernel densities) of network timings between groups [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Jack Challen

2013-Aug-14 15:23 UTC

[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

(This is a repost from a little while ago. I assume my mail got silently bounced
because I used some rather strange email routing. If it did get through, and I
simply haven't seen it or a response, then please accept my apologies)
Hi,

I'm new to R, and new to statistics. I'm *trying* to learn R, but
I'm struggling with the R-intro, mainly (I think) due to the fact that I
have no background in stats, and some of the language is unfamiliar to me (I
started with C and Perl, mainly) so I might use the wrong terms. I think the
"R in action" book might help, but recommendations are welcome.


I have a whole bunch of network timings (ICMP echos) between different groups of
nodes using two different networks. I want to compare the timings between the
groups and across networks, as I /believe/ that one network has much greater
variability than the other. I want to prove this, one way or the other, and I
think a graphical view of the ~20000 results would help. The initial
histograms/kernel densities I've produced so far support that theory (i.e.
they look a bit like the Normal distribution, but one network is much more
"stretched out" and "bumpy"), but I've resorted to
pre-processing that data in Perl in order to produce the graphs. I think R can
be used to do all of this in one.

For each network, I have files like this:

==RoomA RoomB 0.34
RoomC RoomA 0.12
RoomB RoomA 0.12
==
The columns are: From, To, and Time taken. There are 4 rooms in total.
The data's unsorted, and there will be multiple pairs (i.e. I haven't
done de-duplication of pairings via the handshake algorithm, I just pinged
everything from everything). There will be multiple entries for each pairing.

The graphs I think I want to produce are:

For "From RoomA", overlay each timing graph for every other room. That
means there will be 4 kernel densities (well actually I'd take a histogram
plotted as a line, as I think that's more appropriate, and I don't know
what a kernel density is) on one graph.
I'd also like to do the above for "From RoomB", "From
RoomC", and "From RoomD", so I'd end up with with 4 graphs
(all with the same xlim/ylim) each with 4 lines plotted. I'd eventually like
those produced as vector Postscript for inclusion in a report, but I think I can
handle that with ?postscript() and ?layout()

I've got as far as importing the data with
read.table("eth_ping_timings.dat", col.names=c("From",
"To", "Time"))
Then I can do "standard" simple operations on Foo$Time.
"Factoring" (if that is indeed the term) is where I fall down. I
simply don't know how to break out the pairings.

Is R actually the way to go for this? I feel pretty confident I could cobble
together some Perl which produces Postscript to describe the curves, but I
suspect that once I produce what these graphs, I will immediately think of other
questions to ask, and R sounds like it's the proper tool to ask those
questions.

cheers
jack

________________________________

This email and any files transmitted with it are confide...{{dropped:10}}

David Carlson

2013-Aug-14 21:03 UTC

head link

[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

I'm not sure I follow you exactly so let's start with some data
and one graph and move on from there:

First the data (I'm assuming you don't have A to A so you really
want 3 lines on a graph)?

set.seed(42)
pairs <- structure(list(From = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 
3L, 3L, 4L, 4L, 4L), .Label = c("A", "B", "C",
"D"), class "factor"),
    To = structure(c(2L, 3L, 4L, 1L, 3L, 4L, 1L, 2L, 4L, 1L, 
    2L, 3L), .Label = c("A", "B", "C",
"D"), class = "factor")),

    .Names = c("From", "To"), class =
"data.frame", row.names c(NA, -12L))
net <- data.frame(pairs[sample.int(12, 1000, replace=TRUE),], 
	Time=rnorm(1000, .2, .05))

Now generate one plot:

plot(density(net$Time[net$From=="A" & net$To=="B"]),
xlim=c(0,
.4), 
	ylim=c(0, 8), main="From A")
lines(density(net$Time[net$From=="A" & net$To=="C"]),
lty=2)
lines(density(net$Time[net$From=="A" & net$To=="D"]),
lty=3)
legend("topright", c("B", "C", "D"),
lty=1:3)

Is this on the right track?

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Jack Challen
Sent: Wednesday, August 14, 2013 10:23 AM
To: r-help at r-project.org
Subject: [R] Producing multiple analyses (histograms/kernel
densities) of network timings between groups


(This is a repost from a little while ago. I assume my mail got
silently bounced because I used some rather strange email
routing. If it did get through, and I simply haven't seen it or
a response, then please accept my apologies)
Hi,

I'm new to R, and new to statistics. I'm *trying* to learn R,
but I'm struggling with the R-intro, mainly (I think) due to the
fact that I have no background in stats, and some of the
language is unfamiliar to me (I started with C and Perl, mainly)
so I might use the wrong terms. I think the "R in action" book
might help, but recommendations are welcome.


I have a whole bunch of network timings (ICMP echos) between
different groups of nodes using two different networks. I want
to compare the timings between the groups and across networks,
as I /believe/ that one network has much greater variability
than the other. I want to prove this, one way or the other, and
I think a graphical view of the ~20000 results would help. The
initial histograms/kernel densities I've produced so far support
that theory (i.e. they look a bit like the Normal distribution,
but one network is much more "stretched out" and "bumpy"),
but
I've resorted to pre-processing that data in Perl in order to
produce the graphs. I think R can be used to do all of this in
one.

For each network, I have files like this:

==RoomA RoomB 0.34
RoomC RoomA 0.12
RoomB RoomA 0.12
==
The columns are: From, To, and Time taken. There are 4 rooms in
total.
The data's unsorted, and there will be multiple pairs (i.e. I
haven't done de-duplication of pairings via the handshake
algorithm, I just pinged everything from everything). There will
be multiple entries for each pairing.

The graphs I think I want to produce are:

For "From RoomA", overlay each timing graph for every other
room. That means there will be 4 kernel densities (well actually
I'd take a histogram plotted as a line, as I think that's more
appropriate, and I don't know what a kernel density is) on one
graph.
I'd also like to do the above for "From RoomB", "From
RoomC",
and "From RoomD", so I'd end up with with 4 graphs (all with the
same xlim/ylim) each with 4 lines plotted. I'd eventually like
those produced as vector Postscript for inclusion in a report,
but I think I can handle that with ?postscript() and ?layout()

I've got as far as importing the data with
read.table("eth_ping_timings.dat", col.names=c("From",
"To",
"Time"))
Then I can do "standard" simple operations on Foo$Time.
"Factoring" (if that is indeed the term) is where I fall down. I
simply don't know how to break out the pairings.

Is R actually the way to go for this? I feel pretty confident I
could cobble together some Perl which produces Postscript to
describe the curves, but I suspect that once I produce what
these graphs, I will immediately think of other questions to
ask, and R sounds like it's the proper tool to ask those
questions.

cheers
jack

________________________________

This email and any files transmitted with it are
confide...{{dropped:10}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

R help - Aug 2013 - Producing multiple analyses (histograms/kernel densities) of network timings between groups

[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups