I have data on current and previous location of individuals. I would like to have a matrix with bilateral movement between locations. I would like the final output to look like the second table below. I have tried using crosstab() from the ecodist but I do not have another variable to measure the flow. Ultimately I would like to compute the probability of movement between cities (movement to city_i/total movement from city_j). Is it possible to aggregate the data in this way? Any guidance would be highly appreciated. Thank you! # Original data structure(list(id = 101:115, current_location = structure(c(2L, 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label c("Austin", "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", "New York"), class = "factor"), previous_location = structure(c(6L, 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label c("Atlanta", "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" ), class = "factor")), class = "data.frame", row.names = c(NA, -15L)) # Expected output structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin", "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA), New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class "data.frame", row.names = c(NA, -3L)) Sincerely, Milu [[alternative HTML version deleted]]
Dear Miluji, If I understand correctly, this should get you what you need. temp1 <- structure(list(id = 101:115, current_location = structure(c(2L, 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label c("Austin", "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", "New York"), class = "factor"), previous_location = structure(c(6L, 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label c("Atlanta", "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" ), class = "factor")), class = "data.frame", row.names = c(NA, -15L)) dcast(temp1, previous_location ~ current_location) On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <milujisb at gmail.com> wrote:> I have data on current and previous location of individuals. I would like > to have a matrix with bilateral movement between locations. I would like > the final output to look like the second table below. > > I have tried using crosstab() from the ecodist but I do not have another > variable to measure the flow. Ultimately I would like to compute the > probability of movement between cities (movement to city_i/total movement > from city_j). > > Is it possible to aggregate the data in this way? Any guidance would be > highly appreciated. Thank you! > > # Original data > structure(list(id = 101:115, current_location = structure(c(2L, > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label > c("Austin", > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", > "New York"), class = "factor"), previous_location = structure(c(6L, > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label > c("Atlanta", > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" > ), class = "factor")), class = "data.frame", row.names = c(NA, > -15L)) > > # Expected output > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin", > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA), > New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class > "data.frame", row.names = c(NA, > -3L)) > > Sincerely, > > Milu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
or in base R : ?xtabs ?? as in: xtabs(~previous_location + current_location,data=x) (You can convert the 0s to NA's if you like) Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <huzefa.khalil at umich.edu> wrote:> Dear Miluji, > > If I understand correctly, this should get you what you need. > > temp1 <- > structure(list(id = 101:115, current_location = structure(c(2L, > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label > c("Austin", > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", > "New York"), class = "factor"), previous_location = structure(c(6L, > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label > c("Atlanta", > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" > ), class = "factor")), class = "data.frame", row.names = c(NA, > -15L)) > > dcast(temp1, previous_location ~ current_location) > > On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <milujisb at gmail.com> wrote: > > I have data on current and previous location of individuals. I would like > > to have a matrix with bilateral movement between locations. I would like > > the final output to look like the second table below. > > > > I have tried using crosstab() from the ecodist but I do not have another > > variable to measure the flow. Ultimately I would like to compute the > > probability of movement between cities (movement to city_i/total movement > > from city_j). > > > > Is it possible to aggregate the data in this way? Any guidance would be > > highly appreciated. Thank you! > > > > # Original data > > structure(list(id = 101:115, current_location = structure(c(2L, > > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label > > c("Austin", > > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", > > "New York"), class = "factor"), previous_location = structure(c(6L, > > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label > > c("Atlanta", > > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" > > ), class = "factor")), class = "data.frame", row.names = c(NA, > > -15L)) > > > > # Expected output > > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin", > > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA), > > New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class > > "data.frame", row.names = c(NA, > > -3L)) > > > > Sincerely, > > > > Milu > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]