Jack Challen
2013-Aug-14 15:23 UTC
[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups
(This is a repost from a little while ago. I assume my mail got silently bounced because I used some rather strange email routing. If it did get through, and I simply haven't seen it or a response, then please accept my apologies) Hi, I'm new to R, and new to statistics. I'm *trying* to learn R, but I'm struggling with the R-intro, mainly (I think) due to the fact that I have no background in stats, and some of the language is unfamiliar to me (I started with C and Perl, mainly) so I might use the wrong terms. I think the "R in action" book might help, but recommendations are welcome. I have a whole bunch of network timings (ICMP echos) between different groups of nodes using two different networks. I want to compare the timings between the groups and across networks, as I /believe/ that one network has much greater variability than the other. I want to prove this, one way or the other, and I think a graphical view of the ~20000 results would help. The initial histograms/kernel densities I've produced so far support that theory (i.e. they look a bit like the Normal distribution, but one network is much more "stretched out" and "bumpy"), but I've resorted to pre-processing that data in Perl in order to produce the graphs. I think R can be used to do all of this in one. For each network, I have files like this: ==RoomA RoomB 0.34 RoomC RoomA 0.12 RoomB RoomA 0.12 == The columns are: From, To, and Time taken. There are 4 rooms in total. The data's unsorted, and there will be multiple pairs (i.e. I haven't done de-duplication of pairings via the handshake algorithm, I just pinged everything from everything). There will be multiple entries for each pairing. The graphs I think I want to produce are: For "From RoomA", overlay each timing graph for every other room. That means there will be 4 kernel densities (well actually I'd take a histogram plotted as a line, as I think that's more appropriate, and I don't know what a kernel density is) on one graph. I'd also like to do the above for "From RoomB", "From RoomC", and "From RoomD", so I'd end up with with 4 graphs (all with the same xlim/ylim) each with 4 lines plotted. I'd eventually like those produced as vector Postscript for inclusion in a report, but I think I can handle that with ?postscript() and ?layout() I've got as far as importing the data with read.table("eth_ping_timings.dat", col.names=c("From", "To", "Time")) Then I can do "standard" simple operations on Foo$Time. "Factoring" (if that is indeed the term) is where I fall down. I simply don't know how to break out the pairings. Is R actually the way to go for this? I feel pretty confident I could cobble together some Perl which produces Postscript to describe the curves, but I suspect that once I produce what these graphs, I will immediately think of other questions to ask, and R sounds like it's the proper tool to ask those questions. cheers jack ________________________________ This email and any files transmitted with it are confide...{{dropped:10}}
David Carlson
2013-Aug-14 21:03 UTC
[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups
I'm not sure I follow you exactly so let's start with some data and one graph and move on from there: First the data (I'm assuming you don't have A to A so you really want 3 lines on a graph)? set.seed(42) pairs <- structure(list(From = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class "factor"), To = structure(c(2L, 3L, 4L, 1L, 3L, 4L, 1L, 2L, 4L, 1L, 2L, 3L), .Label = c("A", "B", "C", "D"), class = "factor")), .Names = c("From", "To"), class = "data.frame", row.names c(NA, -12L)) net <- data.frame(pairs[sample.int(12, 1000, replace=TRUE),], Time=rnorm(1000, .2, .05)) Now generate one plot: plot(density(net$Time[net$From=="A" & net$To=="B"]), xlim=c(0, .4), ylim=c(0, 8), main="From A") lines(density(net$Time[net$From=="A" & net$To=="C"]), lty=2) lines(density(net$Time[net$From=="A" & net$To=="D"]), lty=3) legend("topright", c("B", "C", "D"), lty=1:3) Is this on the right track? ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Jack Challen Sent: Wednesday, August 14, 2013 10:23 AM To: r-help at r-project.org Subject: [R] Producing multiple analyses (histograms/kernel densities) of network timings between groups (This is a repost from a little while ago. I assume my mail got silently bounced because I used some rather strange email routing. If it did get through, and I simply haven't seen it or a response, then please accept my apologies) Hi, I'm new to R, and new to statistics. I'm *trying* to learn R, but I'm struggling with the R-intro, mainly (I think) due to the fact that I have no background in stats, and some of the language is unfamiliar to me (I started with C and Perl, mainly) so I might use the wrong terms. I think the "R in action" book might help, but recommendations are welcome. I have a whole bunch of network timings (ICMP echos) between different groups of nodes using two different networks. I want to compare the timings between the groups and across networks, as I /believe/ that one network has much greater variability than the other. I want to prove this, one way or the other, and I think a graphical view of the ~20000 results would help. The initial histograms/kernel densities I've produced so far support that theory (i.e. they look a bit like the Normal distribution, but one network is much more "stretched out" and "bumpy"), but I've resorted to pre-processing that data in Perl in order to produce the graphs. I think R can be used to do all of this in one. For each network, I have files like this: ==RoomA RoomB 0.34 RoomC RoomA 0.12 RoomB RoomA 0.12 == The columns are: From, To, and Time taken. There are 4 rooms in total. The data's unsorted, and there will be multiple pairs (i.e. I haven't done de-duplication of pairings via the handshake algorithm, I just pinged everything from everything). There will be multiple entries for each pairing. The graphs I think I want to produce are: For "From RoomA", overlay each timing graph for every other room. That means there will be 4 kernel densities (well actually I'd take a histogram plotted as a line, as I think that's more appropriate, and I don't know what a kernel density is) on one graph. I'd also like to do the above for "From RoomB", "From RoomC", and "From RoomD", so I'd end up with with 4 graphs (all with the same xlim/ylim) each with 4 lines plotted. I'd eventually like those produced as vector Postscript for inclusion in a report, but I think I can handle that with ?postscript() and ?layout() I've got as far as importing the data with read.table("eth_ping_timings.dat", col.names=c("From", "To", "Time")) Then I can do "standard" simple operations on Foo$Time. "Factoring" (if that is indeed the term) is where I fall down. I simply don't know how to break out the pairings. Is R actually the way to go for this? I feel pretty confident I could cobble together some Perl which produces Postscript to describe the curves, but I suspect that once I produce what these graphs, I will immediately think of other questions to ask, and R sounds like it's the proper tool to ask those questions. cheers jack ________________________________ This email and any files transmitted with it are confide...{{dropped:10}} ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.