William Simpson
2009-Feb-16 16:21 UTC
[R] scatterplot and correlation for weird data format
I have data in a format like this: name ssex sex view num rating rt ahl4 f m f 56 -108 2246 ahl4 f m f 74 85 1444 ahl4 f m f 52 151 1595 ahl4 f m f 85 1 1447 ahl4 f m f 53 46 1716 ahl4 f m f 37 145 1276 ahl4 f m f 50 98 1465 ahl4 f m f 51 -26 1322 ahl4 f m f 38 -97 1790 ahl4 f m f 14 -158 865 ... ahl4 f m p 43 -136 1669 ahl4 f m p 10 -59 808 ahl4 f m p 67 -111 1279 ahl4 f m p 85 -86 994 ahl4 f m p 100 134 1337 ahl4 f m p 76 56 665 ahl4 f m p 51 -49 594 ahl4 f m p 33 -118 505 ahl4 f m p 49 -156 1283 ... and so on for many subjects (name) I would like to do a scatterplot of the rating given by each subject (with identifier "name") for the frontal (view=="f") and profile (view=="p") views of each face (each face has an identifier "num"). I'd like to find the correlation as well. For each subject, since there are 100 faces, there will be 100 points on the scatterplot. I would just lump all the subjects' data together for the plot and correlation I think (unless somebody tells me I should do each subject separately). I'm stumped on how to do this. Thanks very much for any help! Bill
hadley wickham
2009-Feb-16 17:13 UTC
[R] scatterplot and correlation for weird data format
On Mon, Feb 16, 2009 at 10:21 AM, William Simpson <william.a.simpson at gmail.com> wrote:> I have data in a format like this: > > name ssex sex view num rating rt > ahl4 f m f 56 -108 2246 > ahl4 f m f 74 85 1444 > ahl4 f m f 52 151 1595 > ahl4 f m f 85 1 1447 > ahl4 f m f 53 46 1716 > ahl4 f m f 37 145 1276 > ahl4 f m f 50 98 1465 > ahl4 f m f 51 -26 1322 > ahl4 f m f 38 -97 1790 > ahl4 f m f 14 -158 865 > ... > ahl4 f m p 43 -136 1669 > ahl4 f m p 10 -59 808 > ahl4 f m p 67 -111 1279 > ahl4 f m p 85 -86 994 > ahl4 f m p 100 134 1337 > ahl4 f m p 76 56 665 > ahl4 f m p 51 -49 594 > ahl4 f m p 33 -118 505 > ahl4 f m p 49 -156 1283 > ... > and so on for many subjects (name) > > I would like to do a scatterplot of the rating given by each subject > (with identifier "name") for the frontal (view=="f") and profile > (view=="p") views of each face (each face has an identifier "num"). > I'd like to find the correlation as well. > For each subject, since there are 100 faces, there will be 100 points > on the scatterplot. I would just lump all the subjects' data together > for the plot and correlation I think (unless somebody tells me I > should do each subject separately).You might find the reshape package, http://had.co.nz/reshape, helpful. You could do something like: dfm <- melt(mydataframe, m = c("num", "rating", "rt")) cast(dfm, ... ~ view, subset = variable == "rating") Then do a scatterplot of the variables f and p. Hadley -- http://had.co.nz/
William Simpson wrote:> I have data in a format like this: > > name ssex sex view num rating rt > ahl4 f m f 56 -108 2246 > ahl4 f m f 74 85 1444 > ahl4 f m f 52 151 1595 > ahl4 f m f 85 1 1447 > ahl4 f m f 53 46 1716 > ahl4 f m f 37 145 1276 > ahl4 f m f 50 98 1465 > ahl4 f m f 51 -26 1322 > ahl4 f m f 38 -97 1790 > ahl4 f m f 14 -158 865 > ... > ahl4 f m p 43 -136 1669 > ahl4 f m p 10 -59 808 > ahl4 f m p 67 -111 1279 > ahl4 f m p 85 -86 994 > ahl4 f m p 100 134 1337 > ahl4 f m p 76 56 665 > ahl4 f m p 51 -49 594 > ahl4 f m p 33 -118 505 > ahl4 f m p 49 -156 1283 > ... > and so on for many subjects (name) > > I would like to do a scatterplot of the rating given by each subject > (with identifier "name") for the frontal (view=="f") and profile > (view=="p") views of each face (each face has an identifier "num"). > I'd like to find the correlation as well. > For each subject, since there are 100 faces, there will be 100 points > on the scatterplot. I would just lump all the subjects' data together > for the plot and correlation I think (unless somebody tells me I > should do each subject separately). > > I'm stumped on how to do this. Thanks very much for any help! >Hi Bill, The first thing that comes to mind is a variation on count.overplot, a function that displays the number of overplotted points for a given tolerance rather than a blur of separate symbols. The problem would be separating the various categories of experimental stimuli in your case. You could use, say, "F" and "P" as suffixes for the counts to indicate orientation, color to indicate sex of face, male/female symbol for sex of respondent, and so on. The problem is that you end up with a difficult to interpret plot, as each entry (of which there will still be many) must be decoded by the viewer. If you think this is worth pursuing, email me and I will try to outline a way to do it. Another, perhaps simpler way is to define a summary score for each subject for each class of face and plot that. Jim