Joel Pulliam
2013-Jan-14 22:30 UTC
[R] I'm trying to parse 1 column of a dataframe into 3 seperate columns
I have a factor called 'utm_medium' in the dataframe 'data'> str(data$utm_medium)Factor w/ 396925 levels "","affiliateID=&sessionID=0000821850667323ec6ae6cffd28f380&etag=",..: 366183 355880 357141 20908 357513 365348 368088 360827 31704 364767 ... The data in this factor is delimited with '&'. I basically want the affiliateID, sessionID and etag data separated. Ex.> data$utm_medium[1:10][1] affiliateID=4f3ac4b695e7d&sessionID=993f4c447e68dfc36ed692223349f2e3&eta g [2] affiliateID=4f3ac4b695e7d&sessionID=209dd9986ace55d50a450afeba62b78f&eta g [3] affiliateID=4f3ac4b695e7d&sessionID=2efdb8e1e1f5ac9c0d5baec355c78f85&eta g [4] affiliateID=&sessionID=5a6ca9d41148f30ce694628427af7991&etag [5] affiliateID=4f3ac4b695e7d&sessionID=331fbcdf1f3d5e7bac0d92c12e19f63d&eta g [6] affiliateID=4f3ac4b695e7d&sessionID=8fc27c8478e9bd30043ea4d3c7ddb29c&eta g [7] affiliateID=4f3ac4b695e7d&sessionID=af467d480addffca43ffbdbce1edfdb4&eta g [8] affiliateID=4f3ac4b695e7d&sessionID=598645e05a187ee63ff922a36360f021&eta g [9] affiliateID=&sessionID=8895e21d0842ed45063ba8328dc3bc61&etag [10] affiliateID=4f3ac4b695e7d&sessionID=88ca2998c5a91b6efbece0c4f79caeb7&eta g 396925 Levels: ... affiliateID=50bfbbbeed918&sessionID=5c49c142cbf1b149c6a4647d1a4fc97b&eta g I've parsed it via: test <-as.character(data$utm_medium) test <- strsplit(test, "&") which results in a list, which I 'unlisted': test2 <- unlist(test) and then attempted to extract into separate vectors: a <- vector(mode = "character", length = length(test2)) s <- vector(mode = "character", length = length(test2)) e <- vector(mode = "character", length = length(test2)) i <- 1 j <- 1 for (i in 1:length(test2)) { a[j] <- test2[i] s[j] <- test2[i+1] e[j] <- test2[i+2] i <- i + 3 j <- j + 1 } This code runs, but I'm indexing it incorrectly and I can't figure out why. I'll sleep on it tonight and probably figure it out, but I can't help thinking that there's a much easier way to parse this data. Help! Please! joel [[alternative HTML version deleted]]
David L Carlson
2013-Jan-14 23:31 UTC
[R] I'm trying to parse 1 column of a dataframe into 3 seperate columns
How about a <- sapply(test, function(x) x[1]) s <- sapply(test, function(x) x[2]) e <- sapply(test, function(x) x[3]) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Joel Pulliam > Sent: Monday, January 14, 2013 4:30 PM > To: r-help at r-project.org > Cc: pulliamjs at gmail.com > Subject: [R] I'm trying to parse 1 column of a dataframe into 3 > seperate columns > > I have a factor called 'utm_medium' in the dataframe 'data' > > > str(data$utm_medium) > > Factor w/ 396925 levels > "","affiliateID=&sessionID=0000821850667323ec6ae6cffd28f380&etag=",..: > 366183 355880 357141 20908 357513 365348 368088 360827 31704 364767 ... > > > > The data in this factor is delimited with '&'. I basically want the > affiliateID, sessionID and etag data separated. Ex. > > > data$utm_medium[1:10] > > [1] > affiliateID=4f3ac4b695e7d&sessionID=993f4c447e68dfc36ed692223349f2e3&et > a > g> > [2] > affiliateID=4f3ac4b695e7d&sessionID=209dd9986ace55d50a450afeba62b78f&et > a > g> > [3] > affiliateID=4f3ac4b695e7d&sessionID=2efdb8e1e1f5ac9c0d5baec355c78f85&et > a > g> > [4] affiliateID=&sessionID=5a6ca9d41148f30ce694628427af7991&etag> > > [5] > affiliateID=4f3ac4b695e7d&sessionID=331fbcdf1f3d5e7bac0d92c12e19f63d&et > a > g> > [6] > affiliateID=4f3ac4b695e7d&sessionID=8fc27c8478e9bd30043ea4d3c7ddb29c&et > a > g> > [7] > affiliateID=4f3ac4b695e7d&sessionID=af467d480addffca43ffbdbce1edfdb4&et > a > g> > [8] > affiliateID=4f3ac4b695e7d&sessionID=598645e05a187ee63ff922a36360f021&et > a > g> > [9] affiliateID=&sessionID=8895e21d0842ed45063ba8328dc3bc61&etag> > > [10] > affiliateID=4f3ac4b695e7d&sessionID=88ca2998c5a91b6efbece0c4f79caeb7&et > a > g> > 396925 Levels: ... > affiliateID=50bfbbbeed918&sessionID=5c49c142cbf1b149c6a4647d1a4fc97b&et > a > g> > > > I've parsed it via: > > test <-as.character(data$utm_medium) > > test <- strsplit(test, "&") > > > > which results in a list, which I 'unlisted': > > test2 <- unlist(test) > > > > and then attempted to extract into separate vectors: > > a <- vector(mode = "character", length = length(test2)) > > s <- vector(mode = "character", length = length(test2)) > > e <- vector(mode = "character", length = length(test2)) > > i <- 1 > > j <- 1 > > > > for (i in 1:length(test2)) > > { > > a[j] <- test2[i] > > s[j] <- test2[i+1] > > e[j] <- test2[i+2] > > i <- i + 3 > > j <- j + 1 > > } > > > > This code runs, but I'm indexing it incorrectly and I can't figure out > why. I'll sleep on it tonight and probably figure it out, but I can't > help thinking that there's a much easier way to parse this data. Help! > Please! > > > > joel > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.