Keith Larson
2011-Dec-18 22:38 UTC
[R] Identifying records with the correct number of repeated measures
Dear list, I have a dataset where we sampled multiple individuals either 1 or 9 times. Our measurement variable is 'Delta13C' (see below sample dataset). I cannot figure out how to efficiently use a vector command (preferably) or a loop to create a new vector of the names of the individuals sampled 9 times. Note that the 'FeatherPosition' variable will only be "P1" for individuals sampled only once, while it will be %in% c('P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'P9') for individuals sampled 9 times. In my sample data below the new vector (e.g. WW_Names) would include only 'WW_08I_01' and 'WW_08I_03'. Two other quick questions: 1) how can I re-number my 'ROWID', as when I subset my complete dataset to a smaller dataset the old ROWID's are no longer meaningful, and 2) when I subset my dataset my 'factor' variables contain all the levels from the complete dataset, how can I reset these factor variables to condense my 'dump' file as much as possible? Many Holiday Cheers from a NEW R user! Keith Sample data: WW_Sample_SI <- structure(list(Individual_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 5L ), .Label = c("WW_08I_01", "WW_08I_02", "WW_08I_03", "WW_08I_04", "WW_08I_05"), class = "factor"), Site_Name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Anjan", class = "factor"), Latitude = c(63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935), Longitude = c(12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022), FeatherPosition = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 1L), .Label = c("P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8", "P9"), class = "factor"), Delta13C = c(-18.3, -18.53, -19.55, -20.18, -20.96, -21.08, -21.5, -17.42, -13.18, -19.95, -22.3, -22.2, -22.18, -22.14, -21.55, -20.85, -23.1, -20.75, -20.9, -21.61, -22.24)), .Names = c("Individual_ID", "Site_Name", "Latitude", "Longitude", "FeatherPosition", "Delta13C" ), class = "data.frame", row.names = c("1282", "1277", "1279", "1270", "1272", "1274", "1280", "1276", "1271", "1284", "1289", "1290", "1295", "1293", "1292", "1288", "1291", "1285", "1297", "1298", "1299")) ******************************************************************************************* Keith Larson, PhD Student Evolutionary Ecology, Lund University S?lvegatan 37 223 62 Lund Sweden Phone: +46 (0)46 2229014 Mobile: +46 (0)73 0465016 Fax: +46 (0)46 2224716 Skype: sternacaspia FB: keith.w.larson at gmail.com
Sarah Goslee
2011-Dec-18 23:35 UTC
[R] Identifying records with the correct number of repeated measures
Thank you for asking a clear question and including a reproducible small example. Here's one possible (2-line) solution to your main question, and both the others:> WW_Names <- table(WW_Sample_SI$Individual_ID) > WW_Names <- names(WW_Names)[WW_Names == 9] > WW_Names[1] "WW_08I_01" "WW_08I_03"> > #by ROWID you mean row names? If so: > row.names(WW_Sample_SI) <- 1:nrow(WW_Sample_SI) > head(WW_Sample_SI)Individual_ID Site_Name Latitude Longitude FeatherPosition Delta13C 1 WW_08I_01 Anjan 63.72935 12.54022 P1 -18.30 2 WW_08I_01 Anjan 63.72935 12.54022 P2 -18.53 3 WW_08I_01 Anjan 63.72935 12.54022 P3 -19.55 4 WW_08I_01 Anjan 63.72935 12.54022 P4 -20.18 5 WW_08I_01 Anjan 63.72935 12.54022 P5 -20.96 6 WW_08I_01 Anjan 63.72935 12.54022 P6 -21.08># factor() can be used to eliminate unused levels # your sample data doesn't have any, but here's an example:> testdata <- factor(c("a", "a", "b", "c", "d")) > str(testdata)Factor w/ 4 levels "a","b","c","d": 1 1 2 3 4> testdata <- testdata[1:3] > str(testdata)Factor w/ 4 levels "a","b","c","d": 1 1 2> testdata <- factor(testdata) > str(testdata)Factor w/ 2 levels "a","b": 1 1 2 Sarah On Sun, Dec 18, 2011 at 5:38 PM, Keith Larson <keith.larson at biol.lu.se> wrote:> Dear list, > > I have a dataset where we sampled multiple individuals either 1 or 9 > times. Our measurement variable is 'Delta13C' (see below sample > dataset). I cannot figure out how to efficiently use a vector command > (preferably) or a loop to create a new vector of the names of the > individuals sampled 9 times. Note that the 'FeatherPosition' variable > will only be "P1" for individuals sampled only once, while it will be > %in% c('P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'P9') ?for > individuals sampled 9 times. In my sample data below the new vector > (e.g. WW_Names) would include only 'WW_08I_01' and 'WW_08I_03'. > > Two other quick questions: 1) how can I re-number my 'ROWID', as when > I subset my complete dataset to a smaller dataset the old ROWID's are > no longer meaningful, and 2) when I subset my dataset my 'factor' > variables contain all the levels from the complete dataset, how can I > reset these factor variables to condense my 'dump' file as much as > possible? > > Many Holiday Cheers from a NEW R user! > Keith > > Sample data: > > WW_Sample_SI <- > structure(list(Individual_ID = structure(c(1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 5L > ), .Label = c("WW_08I_01", "WW_08I_02", "WW_08I_03", "WW_08I_04", > "WW_08I_05"), class = "factor"), Site_Name = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L), .Label = "Anjan", class = "factor"), Latitude = c(63.72935, > 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, > 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, > 63.72935, 63.72935, 63.72935, 63.72935, 63.72935, 63.72935), > ? ?Longitude = c(12.54022, 12.54022, 12.54022, 12.54022, 12.54022, > ? ?12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, > ? ?12.54022, 12.54022, 12.54022, 12.54022, 12.54022, 12.54022, > ? ?12.54022, 12.54022, 12.54022, 12.54022), FeatherPosition = structure(c(1L, > ? ?2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, > ? ?7L, 8L, 9L, 1L, 1L), .Label = c("P1", "P2", "P3", "P4", "P5", > ? ?"P6", "P7", "P8", "P9"), class = "factor"), Delta13C = c(-18.3, > ? ?-18.53, -19.55, -20.18, -20.96, -21.08, -21.5, -17.42, -13.18, > ? ?-19.95, -22.3, -22.2, -22.18, -22.14, -21.55, -20.85, -23.1, > ? ?-20.75, -20.9, -21.61, -22.24)), .Names = c("Individual_ID", > "Site_Name", "Latitude", "Longitude", "FeatherPosition", "Delta13C" > ), class = "data.frame", row.names = c("1282", "1277", "1279", > "1270", "1272", "1274", "1280", "1276", "1271", "1284", "1289", > "1290", "1295", "1293", "1292", "1288", "1291", "1285", "1297", > "1298", "1299")) >-- Sarah Goslee http://www.sarahgoslee.com
Apparently Analagous Threads
- calculating correlation coefficients on repeated measures
- How to create a loop and then extract values from the list generated by cor.test
- convert variable types when creating data frame from cor.test results
- Subsetting without partial matches
- doing zero inflated glmm for count data with fmr