Hi, I have fixed width data that I would like to split into columns. Here is a sanpshot of the data (actual data is a list object): lst1Sub<- "20131124GGG1 23.00" "20131125GGG1 15.00" "20131128GGG1 0.00" "201312 1GGG1 0.00" "201312 4GGG1 0.00" "201312 7GGG1 10.00" "20131210GGG1 0.00" "20131213GGG1 0.00" "20131216GGG1 0.00" "20131219GGG1 0.00" "20131222GGG1 0.00" "20131225GGG1 0.00" "20131228GGG1 0.00" The following script will split the data into [Year Month Day Site Precipitation] ------------------------------------------------------------------------------------------------------ library(stringr) dateSite <- gsub("(.*G.{3}).*","\\1",lst1Sub); dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), Month=as.numeric(substr(dateSite,5,6)), Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),Rain=substr(dateSite,13,18),stringsAsFactors=FALSE); lst3 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G.{3}).*","\\1",x); dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),stringsAsFactors=FALSE); Sims <- str_trim(gsub(".*G.{3}\\s?(.*)","\\1",x));Sims[grep("\\d+-",Sims)] <- gsub("(.*)([-][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", Sims[grep("\\d+-",Sims)])); Sims1 <- read.table(text=Sims,header=FALSE); names(Sims1) <- c("Precipitation");dat2 <- cbind(dat1,Sims1)}) ------------------------------------------------------------------------------------------------------------------------------------------ Problem: the above script deletes the first value of my precipitation values. For example, after splitting, "20131124GGG1 23.00" becomes 2013 11 24 GGG1 3.00 INSTEAD of 2013 11 24 GGG1 23.00 (right answer). Anything wrong with the string trimming? Is there another way to arrive at the same answer? Thanks, AT.
?read.fortran Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Wed, 22 Oct 2014, Zilefac Elvis wrote:> Hi, > I have fixed width data that I would like to split into columns. Here is a sanpshot of the data (actual data is a list object): > lst1Sub<- > "20131124GGG1 23.00" > "20131125GGG1 15.00" > "20131128GGG1 0.00" > "201312 1GGG1 0.00" > "201312 4GGG1 0.00" > "201312 7GGG1 10.00" > "20131210GGG1 0.00" > "20131213GGG1 0.00" > "20131216GGG1 0.00" > "20131219GGG1 0.00" > "20131222GGG1 0.00" > "20131225GGG1 0.00" > "20131228GGG1 0.00" > > The following script will split the data into [Year Month Day Site Precipitation] > ------------------------------------------------------------------------------------------------------ > library(stringr) > dateSite <- gsub("(.*G.{3}).*","\\1",lst1Sub); > dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), Month=as.numeric(substr(dateSite,5,6)), > Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),Rain=substr(dateSite,13,18),stringsAsFactors=FALSE); > lst3 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G.{3}).*","\\1",x); > dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),stringsAsFactors=FALSE); > Sims <- str_trim(gsub(".*G.{3}\\s?(.*)","\\1",x));Sims[grep("\\d+-",Sims)] <- gsub("(.*)([-][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", Sims[grep("\\d+-",Sims)])); > Sims1 <- read.table(text=Sims,header=FALSE); names(Sims1) <- c("Precipitation");dat2 <- cbind(dat1,Sims1)}) > ------------------------------------------------------------------------------------------------------------------------------------------ > > Problem: the above script deletes the first value of my precipitation values. For example, after splitting, "20131124GGG1 23.00" becomes > 2013 11 24 GGG1 3.00 INSTEAD of 2013 11 24 GGG1 23.00 (right answer). > > Anything wrong with the string trimming? Is there another way to arrive at the same answer? > > Thanks, > AT. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
This seems to do a fair bit of it on your example data; you can pull out the date bits separately using Date functions if you need them decode.lst <- function(x) { data.frame(Date=as.Date(substr(x,1,8), format="%Y%m%d"), Site=substr(x, 9,12), Precipitation=as.numeric(substring(x,13))) } decode.lst(lst1Sub) S Ellison> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of Zilefac Elvis > Sent: 22 October 2014 16:38 > To: R. Help > Subject: [R] Split fixed width data in R > > Hi, > I have fixed width data that I would like to split into columns. Here is a sanpshot > of the data (actual data is a list object): > lst1Sub<- > "20131124GGG1 23.00" > "20131125GGG1 15.00" > "20131128GGG1 0.00" > "201312 1GGG1 0.00" > "201312 4GGG1 0.00" > "201312 7GGG1 10.00" > "20131210GGG1 0.00" > "20131213GGG1 0.00" > "20131216GGG1 0.00" > "20131219GGG1 0.00" > "20131222GGG1 0.00" > "20131225GGG1 0.00" > "20131228GGG1 0.00" > > The following script will split the data into [Year Month Day Site Precipitation] > ------------------------------------------------------------------------------------------------------ > library(stringr) > dateSite <- gsub("(.*G.{3}).*","\\1",lst1Sub); > dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), > Month=as.numeric(substr(dateSite,5,6)), > > Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),Rain=substr(dat > eSite,13,18),stringsAsFactors=FALSE); > lst3 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G.{3}).*","\\1",x); > dat1 <- > data.frame(Year=as.numeric(substr(dateSite,1,4)), > Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Si > te=substr(dateSite,9,12),stringsAsFactors=FALSE); > Sims <- > str_trim(gsub(".*G.{3}\\s?(.*)","\\1",x));Sims[grep("\\d+-",Sims)] <- gsub("(.*)([- > ][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", > Sims[grep("\\d+-",Sims)])); > Sims1 <- read.table(text=Sims,header=FALSE); > names(Sims1) <- c("Precipitation");dat2 <- cbind(dat1,Sims1)}) > ------------------------------------------------------------------------------------------------------- > ----------------------------------- > > Problem: the above script deletes the first value of my precipitation values. For > example, after splitting, "20131124GGG1 23.00" becomes > 2013 11 24 GGG1 3.00 INSTEAD of 2013 11 24 GGG1 23.00 (right answer). > > Anything wrong with the string trimming? Is there another way to arrive at the > same answer? > > Thanks, > AT. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
On Wed, Oct 22, 2014 at 10:37 AM, Zilefac Elvis <zilefacelvis at yahoo.com> wrote:> Hi, > I have fixed width data that I would like to split into columns. Here is a > sanpshot of the data (actual data is a list object): > ?<snip> >> Thanks, > AT. > >?I see you already have an answer that you like. I will add that read.fwf might also be a possibility. It's difficult for me to tell if that last column is always 6 characters in length. file <- textConnection(list_object) read.fwf(file=file,c(4,2,2,4,6)) -- The temperature of the aqueous content of an unremittingly ogled culinary vessel will not achieve 100 degrees on the Celsius scale. Maranatha! <>< John McKown [[alternative HTML version deleted]]