michael.reed at reagan.com
2017-Jul-16 03:49 UTC
[R] Arranging column data to create plots
Dear All, I need some help arranging data that was imported. The imported data frame looks something like this (the actual file is huge, so this is example data) DF: IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4 Name1 21 15 25 10 Name2 15 18 35 24 27 45 Name3 17 21 30 22 15 40 32 55 I would like to create a new data frame with the following NewDF: IDKey X Y Name1 21 15 Name1 25 10 Name2 15 18 Name2 35 24 Name2 27 45 Name3 17 21 Name3 30 22 Name3 15 40 Name3 32 55 With the data like this I think I can do the following ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line and get 3 lines with the various number of points. The point is that each of the XY pairs is a data point tied to NameX. I would like to rearrange the data so I can plot the points/lines by the IDKey. There will be at least 2 points, but the number of points for each IDKey can be as many as 4. I have tried using the gather() function from the tidyverse package, but I can't make it work. The issue is that I believe I need two separate gather statements (one for X, another for Y) to consolidate the data. This causes the pairs to not stay together and the data becomes jumbled. Thoughts Thanks for your help Michael E. Reed
Hi Michael, Try gather from the tidyr package HTH Ulrik Michael Reed via R-help <r-help at r-project.org> schrieb am So., 16. Juli 2017, 10:19:> Dear All, > > I need some help arranging data that was imported. > > The imported data frame looks something like this (the actual file is > huge, so this is example data) > > DF: > IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4 > Name1 21 15 25 10 > Name2 15 18 35 24 27 45 > Name3 17 21 30 22 15 40 32 55 > > I would like to create a new data frame with the following > > NewDF: > IDKey X Y > Name1 21 15 > Name1 25 10 > Name2 15 18 > Name2 35 24 > Name2 27 45 > Name3 17 21 > Name3 30 22 > Name3 15 40 > Name3 32 55 > > With the data like this I think I can do the following > > ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line > > and get 3 lines with the various number of points. > > The point is that each of the XY pairs is a data point tied to NameX. I > would like to rearrange the data so I can plot the points/lines by the > IDKey. There will be at least 2 points, but the number of points for each > IDKey can be as many as 4. > > I have tried using the gather() function from the tidyverse package, but I > can't make it work. The issue is that I believe I need two separate gather > statements (one for X, another for Y) to consolidate the data. This causes > the pairs to not stay together and the data becomes jumbled. > > Thoughts > Thanks for your help > > Michael E. Reed > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Sat, 15 Jul 2017, Michael Reed via R-help wrote:> Dear All, > > I need some help arranging data that was imported.It would be helpful if you were to use dput to give us the sample data since you say you have already imported it.> The imported data frame looks something like this (the actual file is > huge, so this is example data) > > DF: > IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4 > Name1 21 15 25 10 > Name2 15 18 35 24 27 45 > Name3 17 21 30 22 15 40 32 55That data is missing in X3 etc, but would be NA in an actual data frame, so I don't know if my workaround was the same as your workaround. Dput would have clarified the starting point.> I would like to create a new data frame with the following > > NewDF: > IDKey X Y > Name1 21 15 > Name1 25 10 > Name2 15 18 > Name2 35 24 > Name2 27 45 > Name3 17 21 > Name3 30 22 > Name3 15 40 > Name3 32 55 > > With the data like this I think I can do the following > > ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_lineYou are missing parentheses. If you use the reprex library to test your examples before posting them, you can be sure your simple errors don't send us off on wild goose chases.> and get 3 lines with the various number of points. > > The point is that each of the XY pairs is a data point tied to NameX. > I would like to rearrange the data so I can plot the points/lines by the > IDKey. There will be at least 2 points, but the number of points for > each IDKey can be as many as 4. > > I have tried using the gather() function from the tidyverse package, butThe tidyverse package is a virtual package that pulls in many packages.> I can't make it work. The issue is that I believe I need two separate > gather statements (one for X, another for Y) to consolidate the data. > This causes the pairs to not stay together and the data becomes jumbled.No, what you need is a gather-spread. ###### library(dplyr) library(tidyr) DF <- read.table( text"IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4 Name1 21 15 25 10 NA NA NA NA Name2 15 18 35 24 27 45 NA NA Name3 17 21 30 22 15 40 32 55 ", header=TRUE, as.is=TRUE ) NewDF <- ( dta %>% gather( XY, value, -IDKey ) %>% separate( XY, c( "Coord", "Num" ), 1 ) %>% spread( Coord, value ) %>% filter( !is.na( X ) & !is.na( Y ) ) ) ###### --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Correction at the end. On Sun, 16 Jul 2017, Jeff Newmiller wrote:> On Sat, 15 Jul 2017, Michael Reed via R-help wrote: > >> Dear All, >> >> I need some help arranging data that was imported. > > It would be helpful if you were to use dput to give us the sample data since > you say you have already imported it. > >> The imported data frame looks something like this (the actual file is huge, >> so this is example data) >> >> DF: >> IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4 >> Name1 21 15 25 10 >> Name2 15 18 35 24 27 45 >> Name3 17 21 30 22 15 40 32 55 > > That data is missing in X3 etc, but would be NA in an actual data frame, so I > don't know if my workaround was the same as your workaround. Dput > would have clarified the starting point. > >> I would like to create a new data frame with the following >> >> NewDF: >> IDKey X Y >> Name1 21 15 >> Name1 25 10 >> Name2 15 18 >> Name2 35 24 >> Name2 27 45 >> Name3 17 21 >> Name3 30 22 >> Name3 15 40 >> Name3 32 55 >> >> With the data like this I think I can do the following >> >> ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line > > You are missing parentheses. If you use the reprex library to test your > examples before posting them, you can be sure your simple errors don't send > us off on wild goose chases. > >> and get 3 lines with the various number of points. >> >> The point is that each of the XY pairs is a data point tied to NameX. I >> would like to rearrange the data so I can plot the points/lines by the >> IDKey. There will be at least 2 points, but the number of points for each >> IDKey can be as many as 4. >> >> I have tried using the gather() function from the tidyverse package, but > > The tidyverse package is a virtual package that pulls in many packages. > >> I can't make it work. The issue is that I believe I need two separate >> gather statements (one for X, another for Y) to consolidate the data. This >> causes the pairs to not stay together and the data becomes jumbled. > > No, what you need is a gather-spread. > > ###### > library(dplyr) > library(tidyr) > > DF <- read.table( text> "IDKey X1 Y1 X2 Y2 X3 Y3 X4 Y4 > Name1 21 15 25 10 NA NA NA NA > Name2 15 18 35 24 27 45 NA NA > Name3 17 21 30 22 15 40 32 55 > ", header=TRUE, as.is=TRUE ) > > NewDF <- ( dta > %>% gather( XY, value, -IDKey ) > %>% separate( XY, c( "Coord", "Num" ), 1 ) > %>% spread( Coord, value ) > %>% filter( !is.na( X ) & !is.na( Y ) ) > ) > ######Sorry, should have practiced what I preached... ########## library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(tidyr) DF <- structure(list(IDKey = c("Name1", "Name2", "Name3"), X1 = c(21L, 15L, 17L), Y1 = c(15L, 18L, 21L), X2 = c(25L, 35L, 30L), Y2 = c(10L, 24L, 22L), X3 = c(NA, 27L, 15L), Y3 = c(NA, 45L, 40L), X4 = c(NA, NA, 32L), Y4 = c(NA, NA, 55L)), .Names = c("IDKey", "X1", "Y1", "X2", "Y2", "X3", "Y3", "X4", "Y4"), class = "data.frame", row.names = c(NA, -3L)) NewDF <- ( DF %>% gather( XY, value, -IDKey ) %>% separate( XY, c( "Coord", "Num" ), 1 ) %>% spread( Coord, value ) %>% filter( !is.na( X ) & !is.na( Y ) ) ) NewDF #> IDKey Num X Y #> 1 Name1 1 21 15 #> 2 Name1 2 25 10 #> 3 Name2 1 15 18 #> 4 Name2 2 35 24 #> 5 Name2 3 27 45 #> 6 Name3 1 17 21 #> 7 Name3 2 30 22 #> 8 Name3 3 15 40 #> 9 Name3 4 32 55 ########## --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k