Rune Grønseth
2017-May-16 09:30 UTC
[R] Extracting metadata information to corresponding dissimilarity matrix
Hi, I am R beginner. I've tried googling and reading, but this might be too simple to be found in the documentation. I have a dissimilarity index (symmetric matrix) from which I have extracted the unique values using the exodist package command "lower". There are 14 observations, so there are 91 unique comparisons. After this I'd like to extract corresponding metadata from a separate data frame (the 14 observations organized in rows identified by a samplenumber-vector, and other variables as gender, age, et cetera). The aim is to have a new data frame with 91 rows and metadata vectors giving me the value of the dissimilarity index, gender each of the two observations that are compared by the dissimilarity metric. So if I'm looking for gender differences, I need 5 vectors in the data frame: samplenumber1, samplenumber2, gender1, gender2 and dissimilarity metric. Does anyone have suggestions or experiences in reformatting data in this manner? This is just a test-dataset. My full data-set is for more than 100 observations, so I need a more general code, if that is possible. With great appreciation of any help. Rune Gr?nseth --- Rune Gr?nseth, MD, PhD, postdoctoral fellow Department of Thoracic Medicine Haukeland University Hospital N-5021 Bergen Norway [[alternative HTML version deleted]]
Jeff Newmiller
2017-May-16 13:47 UTC
[R] Extracting metadata information to corresponding dissimilarity matrix
Hello R Beginner... It is good that you are articulate, but R code has subtleties that words miss, so you really need to provide sample code and sample data to convey where you are. This is not necessarily easy, but it avoids a lot of us fixing the wrong problem and you might even solve your own problem in the course of making a simple example. Help with doing this is available on the Web [1] [2]. I suspect that the reshape function or one of the many alternative packages like reshape2 or tidyr are what you are looking for. You also need to read the Posting Guide, which among other things mentions that this is a plain text mailing list. It is up to you to figure out how to adjust your email program to send only plain text, but if you don't we may not be able to read your garbled code and may ignore you. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example [2] http://adv-r.had.co.nz/Reproducibility.html -- Sent from my phone. Please excuse my brevity. On May 16, 2017 2:30:30 AM PDT, "Rune Gr?nseth" <nielsenrune at me.com> wrote:>Hi, >I am R beginner. I've tried googling and reading, but this might be too >simple to be found in the documentation. > >I have a dissimilarity index (symmetric matrix) from which I have >extracted the unique values using the exodist package command "lower". >There are 14 observations, so there are 91 unique comparisons. > >After this I'd like to extract corresponding metadata from a separate >data frame (the 14 observations organized in rows identified by a >samplenumber-vector, and other variables as gender, age, et cetera). >The aim is to have a new data frame with 91 rows and metadata vectors >giving me the value of the dissimilarity index, gender each of the two >observations that are compared by the dissimilarity metric. So if I'm >looking for gender differences, I need 5 vectors in the data frame: >samplenumber1, samplenumber2, gender1, gender2 and dissimilarity >metric. > >Does anyone have suggestions or experiences in reformatting data in >this manner? This is just a test-dataset. My full data-set is for more >than 100 observations, so I need a more general code, if that is >possible. > >With great appreciation of any help. > >Rune Gr?nseth > >--- > >Rune Gr?nseth, MD, PhD, postdoctoral fellow >Department of Thoracic Medicine >Haukeland University Hospital >N-5021 Bergen >Norway > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
David L Carlson
2017-May-16 16:21 UTC
[R] Extracting metadata information to corresponding dissimilarity matrix
I think this is what you are trying to do. I've created a data set with 7 rows and a similarity matrix based on age: set.seed(42) dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE), age=sample.int(75, 7)) sim <- max(dist(dta$age)) - dist(dta$age) # already lower triangular sim # 1 2 3 4 5 6 # 2 24 # 3 21 59 # 4 40 46 43 # 5 0 38 41 22 # 6 7 45 48 29 55 # 7 55 31 28 47 7 14 # Now duplicate dta: dta1 <- dta names(dta1) <- c("ID1", "gender1", "age1") dta2 <- dta names(dta2) <- c("ID2", "gender2", "age2") # Now merge and eliminate unneeded rows dta12 <- merge(dta2, dta1) # order is important dta12 <- dta12[dta12$ID1 < dta12$ID2, ] # Finally combine the similarities with the combined data and rearrange # the variable names dta12 <- data.frame(dta12mod, sim=as.vector(sim)) dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "sim")] dta12 # ID1 ID2 gender1 gender2 age1 age2 sim # 2 1 2 F F 11 49 24 # 3 1 3 F M 11 52 21 # 4 1 4 F F 11 33 40 # 5 1 5 F F 11 73 0 # 6 1 6 F F 11 66 7 # 7 1 7 F F 11 18 55 # 10 2 3 F M 49 52 59 # 11 2 4 F F 49 33 46 # 12 2 5 F F 49 73 38 # 13 2 6 F F 49 66 45 # 14 2 7 F F 49 18 31 # 18 3 4 M F 52 33 43 # 19 3 5 M F 52 73 41 # 20 3 6 M F 52 66 48 # 21 3 7 M F 52 18 28 # 26 4 5 F F 33 73 22 # 27 4 6 F F 33 66 29 # 28 4 7 F F 33 18 47 # 34 5 6 F F 73 66 55 # 35 5 7 F F 73 18 7 # 42 6 7 F F 66 18 14 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rune Gr?nseth Sent: Tuesday, May 16, 2017 4:31 AM To: r-help at r-project.org Subject: [R] Extracting metadata information to corresponding dissimilarity matrix Hi, I am R beginner. I've tried googling and reading, but this might be too simple to be found in the documentation. I have a dissimilarity index (symmetric matrix) from which I have extracted the unique values using the exodist package command "lower". There are 14 observations, so there are 91 unique comparisons. After this I'd like to extract corresponding metadata from a separate data frame (the 14 observations organized in rows identified by a samplenumber-vector, and other variables as gender, age, et cetera). The aim is to have a new data frame with 91 rows and metadata vectors giving me the value of the dissimilarity index, gender each of the two observations that are compared by the dissimilarity metric. So if I'm looking for gender differences, I need 5 vectors in the data frame: samplenumber1, samplenumber2, gender1, gender2 and dissimilarity metric. Does anyone have suggestions or experiences in reformatting data in this manner? This is just a test-dataset. My full data-set is for more than 100 observations, so I need a more general code, if that is possible. With great appreciation of any help. Rune Gr?nseth --- Rune Gr?nseth, MD, PhD, postdoctoral fellow Department of Thoracic Medicine Haukeland University Hospital N-5021 Bergen Norway [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David L Carlson
2017-May-16 16:44 UTC
[R] Extracting metadata information to corresponding dissimilarity matrix
Fixing a typo in the original, adding a simplification, and using dissimilarity instead of similarity: set.seed(42) dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE), age=sample.int(75, 7)) dsim <- dist(dta$age) # distance, already lower triangular dsim dta1 <- dta names(dta1) <- paste0(names(dta), "1") # generalizes to more than 3 columns dta2 <- dta names(dta2) <- paste0(names(dta), "2") dta12 <- merge(dta2, dta1) # order is important dta12 <- dta12[dta12$ID1 < dta12$ID2, ] # get rid of duplicates dta12 <- data.frame(dta12, dsim=as.vector(dsim)) # Typo was here dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "dsim")] dta12 David C -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David L Carlson Sent: Tuesday, May 16, 2017 11:21 AM To: Rune Gr?nseth <nielsenrune at me.com>; r-help at r-project.org Subject: Re: [R] Extracting metadata information to corresponding dissimilarity matrix I think this is what you are trying to do. I've created a data set with 7 rows and a similarity matrix based on age: set.seed(42) dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE), age=sample.int(75, 7)) sim <- max(dist(dta$age)) - dist(dta$age) # already lower triangular sim # 1 2 3 4 5 6 # 2 24 # 3 21 59 # 4 40 46 43 # 5 0 38 41 22 # 6 7 45 48 29 55 # 7 55 31 28 47 7 14 # Now duplicate dta: dta1 <- dta names(dta1) <- c("ID1", "gender1", "age1") dta2 <- dta names(dta2) <- c("ID2", "gender2", "age2") # Now merge and eliminate unneeded rows dta12 <- merge(dta2, dta1) # order is important dta12 <- dta12[dta12$ID1 < dta12$ID2, ] # Finally combine the similarities with the combined data and rearrange # the variable names dta12 <- data.frame(dta12mod, sim=as.vector(sim)) dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "sim")] dta12 # ID1 ID2 gender1 gender2 age1 age2 sim # 2 1 2 F F 11 49 24 # 3 1 3 F M 11 52 21 # 4 1 4 F F 11 33 40 # 5 1 5 F F 11 73 0 # 6 1 6 F F 11 66 7 # 7 1 7 F F 11 18 55 # 10 2 3 F M 49 52 59 # 11 2 4 F F 49 33 46 # 12 2 5 F F 49 73 38 # 13 2 6 F F 49 66 45 # 14 2 7 F F 49 18 31 # 18 3 4 M F 52 33 43 # 19 3 5 M F 52 73 41 # 20 3 6 M F 52 66 48 # 21 3 7 M F 52 18 28 # 26 4 5 F F 33 73 22 # 27 4 6 F F 33 66 29 # 28 4 7 F F 33 18 47 # 34 5 6 F F 73 66 55 # 35 5 7 F F 73 18 7 # 42 6 7 F F 66 18 14 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rune Gr?nseth Sent: Tuesday, May 16, 2017 4:31 AM To: r-help at r-project.org Subject: [R] Extracting metadata information to corresponding dissimilarity matrix Hi, I am R beginner. I've tried googling and reading, but this might be too simple to be found in the documentation. I have a dissimilarity index (symmetric matrix) from which I have extracted the unique values using the exodist package command "lower". There are 14 observations, so there are 91 unique comparisons. After this I'd like to extract corresponding metadata from a separate data frame (the 14 observations organized in rows identified by a samplenumber-vector, and other variables as gender, age, et cetera). The aim is to have a new data frame with 91 rows and metadata vectors giving me the value of the dissimilarity index, gender each of the two observations that are compared by the dissimilarity metric. So if I'm looking for gender differences, I need 5 vectors in the data frame: samplenumber1, samplenumber2, gender1, gender2 and dissimilarity metric. Does anyone have suggestions or experiences in reformatting data in this manner? This is just a test-dataset. My full data-set is for more than 100 observations, so I need a more general code, if that is possible. With great appreciation of any help. Rune Gr?nseth --- Rune Gr?nseth, MD, PhD, postdoctoral fellow Department of Thoracic Medicine Haukeland University Hospital N-5021 Bergen Norway [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.