Hey R users, suppose we have data: [1] 2010.12.26 00:00:52 688,88 11,69 43,00 [2] 11,69 43,00 [3] 11,69 43,00 [4] 11,69 43,00 [5] 11,69 43,00 [6] 11,69 43,00 [7] 11,69 43,00 [8] 11,69 43,00 [9] 11,69 43,00 [10] 11,69 43,00 [11] 11,69 43,00 [12] 11,69 43,00 [13] 11,69 43,00 [14] 11,69 43,00 [15] 11,69 43,00 [16] 11,69 43,00 [17] 11,69 43,00 [18] 11,69 43,00 [19] 11,69 43,00 [20] 11,69 43,00 [21] 11,69 43,00 [22] 11,69 43,00 [23] 11,69 43,00 [24] 11,69 43,00 [25] 11,69 43,00 [26] 11,69 43,00 [27] 11,69 43,00 [28] 11,69 43,00 [29] 11,69 43,00 [30] 11,69 43,00 [31] 11,69 43,00 [32] 11,69 43,00 [33] 11,69 43,00 [34] 11,69 43,00 [35] 11,69 43,00 [36] 11,69 43,00 [37] 11,69 43,00 [38] 11,69 43,00 [39] 11,69 43,00 [40] 11,69 43,00 [41] 11,69 43,00 [42] 11,69 43,00 [43] 11,69 43,00 [44] 11,69 43,00 [45] 11,69 43,00 [46] 11,69 43,00 [47] 11,69 43,00 [48] 11,69 43,00 [49] 11,69 43,00 [50] 11,69 43,00 [51] 11,69 43,00 [52] 11,69 43,00 [53] 11,69 43,00 [54] 11,69 43,00 [55] 11,69 43,00 [56] 11,69 43,00 [57] 11,69 43,00 [58] 11,69 43,00 [59] 11,69 43,00 [60] 11,69 43,00 [61] 2010.12.26 00:01:52 696,19 11,69 43,00 [62] 11,69 43,00 [63] 11,69 43,00 [64] 11,69 43,00 [65] 11,69 43,00 [66] 11,69 43,00 ..................................... etc. Is there a way to split data into date column, V2, V3 and V4 columns and erase those lines without date, so that data would look like that: date V2 V3 V4 2010.12.26 00:01:52 555 11.67 44 2010.12.26 00:02:52 566 11.67 44 etc. Thanks a lot! -- Simonas Kecorius ** [[alternative HTML version deleted]]
Hi, It would be helpful if you dput() the data. If I read your data like this: dat1<-read.table(text=" ?2010.12.26 00:00:52??? 688,88 11,69??? 43,00 ? 11,69??? 43,00 ?? 11,69??? 43,00 ?? 11,69??? 43,00 ?? 11,69??? 43,00 ?2010.12.26 00:01:52??? 696,19??? 11,69??? 43,00 ",sep="",header=FALSE,fill=TRUE,dec=",",stringsAsFactors=FALSE) ?dat2<-na.omit(dat1) dat2 #????????? V1?????? V2???? V3??? V4 V5 #1 2010.12.26 00:00:52 688.88 11.69 43 #6 2010.12.26 00:01:52 696.19 11.69 43 #or dat1[complete.cases(dat1),] A.K. ----- Original Message ----- From: Simonas Kecorius <simolas2008 at gmail.com> To: r-help at r-project.org Cc: Sent: Monday, December 17, 2012 7:04 AM Subject: [R] split character line into rows Hey R users, suppose we have data: [1] 2010.12.26 00:00:52? ? 688,88? ? ? ? ? 11,69? ? 43,00 ? [2] 11,69? ? 43,00 ? [3] 11,69? ? 43,00 ? [4] 11,69? ? 43,00 ? [5] 11,69? ? 43,00 ? [6] 11,69? ? 43,00 ? [7] 11,69? ? 43,00 ? [8] 11,69? ? 43,00 ? [9] 11,69? ? 43,00 [10] 11,69? ? 43,00 [11] 11,69? ? 43,00 [12] 11,69? ? 43,00 [13] 11,69? ? 43,00 [14] 11,69? ? 43,00 [15] 11,69? ? 43,00 [16] 11,69? ? 43,00 [17] 11,69? ? 43,00 [18] 11,69? ? 43,00 [19] 11,69? ? 43,00 [20] 11,69? ? 43,00 [21] 11,69? ? 43,00 [22] 11,69? ? 43,00 [23] 11,69? ? 43,00 [24] 11,69? ? 43,00 [25] 11,69? ? 43,00 [26] 11,69? ? 43,00 [27] 11,69? ? 43,00 [28] 11,69? ? 43,00 [29] 11,69? ? 43,00 [30] 11,69? ? 43,00 [31] 11,69? ? 43,00 [32] 11,69? ? 43,00 [33] 11,69? ? 43,00 [34] 11,69? ? 43,00 [35] 11,69? ? 43,00 [36] 11,69? ? 43,00 [37] 11,69? ? 43,00 [38] 11,69? ? 43,00 [39] 11,69? ? 43,00 [40] 11,69? ? 43,00 [41] 11,69? ? 43,00 [42] 11,69? ? 43,00 [43] 11,69? ? 43,00 [44] 11,69? ? 43,00 [45] 11,69? ? 43,00 [46] 11,69? ? 43,00 [47] 11,69? ? 43,00 [48] 11,69? ? 43,00 [49] 11,69? ? 43,00 [50] 11,69? ? 43,00 [51] 11,69? ? 43,00 [52] 11,69? ? 43,00 [53] 11,69? ? 43,00 [54] 11,69? ? 43,00 [55] 11,69? ? 43,00 [56] 11,69? ? 43,00 [57] 11,69? ? 43,00 [58] 11,69? ? 43,00 [59] 11,69? ? 43,00 [60] 11,69? ? 43,00 [61] 2010.12.26 00:01:52? ? 696,19? ? ? ? ? 11,69? ? 43,00 [62] 11,69? ? 43,00 [63] 11,69? ? 43,00 [64] 11,69? ? 43,00 [65] 11,69? ? 43,00 [66] 11,69? ? 43,00 ..................................... etc. Is there a way to split data into date column, V2, V3 and V4 columns and erase those lines without date, so that data would look like that: date? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? V3? ? ? V4 2010.12.26 00:01:52? ? 555? ? 11.67? ? 44 2010.12.26 00:02:52? ? 566? ? 11.67? ? 44 etc. Thanks a lot! -- Simonas Kecorius ** ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, Just to add: dat1<-read.table(text=" ?2010.12.26 00:00:52??? 688,88 11,69??? 43,00 ? 11,69??? 43,00 ?? 11,69??? 43,00 ?? 11,69??? 43,00 ?? 11,69??? 43,00 ?2010.12.26 00:01:52??? 696,19??? 11,69??? 43,00 ",sep="",header=FALSE,fill=TRUE,dec=",",stringsAsFactors=FALSE) ?dat2<-na.omit(dat1) ?dat2$Date<-as.POSIXct(paste(dat2$V1,dat2$V2,sep=" "),format="%Y.%m.%d %H:%M:%S") ?dat3<-dat2[,c(6,3:5)] ?colnames(dat3)[2:4]<-paste0("V",2:4) ?dat3 #???????????????? Date???? V2??? V3 V4 #1 2010-12-26 00:00:52 688.88 11.69 43 #6 2010-12-26 00:01:52 696.19 11.69 43 A.K. ----- Original Message ----- From: Simonas Kecorius <simolas2008 at gmail.com> To: r-help at r-project.org Cc: Sent: Monday, December 17, 2012 7:04 AM Subject: [R] split character line into rows Hey R users, suppose we have data: [1] 2010.12.26 00:00:52? ? 688,88? ? ? ? ? 11,69? ? 43,00 ? [2] 11,69? ? 43,00 ? [3] 11,69? ? 43,00 ? [4] 11,69? ? 43,00 ? [5] 11,69? ? 43,00 ? [6] 11,69? ? 43,00 ? [7] 11,69? ? 43,00 ? [8] 11,69? ? 43,00 ? [9] 11,69? ? 43,00 [10] 11,69? ? 43,00 [11] 11,69? ? 43,00 [12] 11,69? ? 43,00 [13] 11,69? ? 43,00 [14] 11,69? ? 43,00 [15] 11,69? ? 43,00 [16] 11,69? ? 43,00 [17] 11,69? ? 43,00 [18] 11,69? ? 43,00 [19] 11,69? ? 43,00 [20] 11,69? ? 43,00 [21] 11,69? ? 43,00 [22] 11,69? ? 43,00 [23] 11,69? ? 43,00 [24] 11,69? ? 43,00 [25] 11,69? ? 43,00 [26] 11,69? ? 43,00 [27] 11,69? ? 43,00 [28] 11,69? ? 43,00 [29] 11,69? ? 43,00 [30] 11,69? ? 43,00 [31] 11,69? ? 43,00 [32] 11,69? ? 43,00 [33] 11,69? ? 43,00 [34] 11,69? ? 43,00 [35] 11,69? ? 43,00 [36] 11,69? ? 43,00 [37] 11,69? ? 43,00 [38] 11,69? ? 43,00 [39] 11,69? ? 43,00 [40] 11,69? ? 43,00 [41] 11,69? ? 43,00 [42] 11,69? ? 43,00 [43] 11,69? ? 43,00 [44] 11,69? ? 43,00 [45] 11,69? ? 43,00 [46] 11,69? ? 43,00 [47] 11,69? ? 43,00 [48] 11,69? ? 43,00 [49] 11,69? ? 43,00 [50] 11,69? ? 43,00 [51] 11,69? ? 43,00 [52] 11,69? ? 43,00 [53] 11,69? ? 43,00 [54] 11,69? ? 43,00 [55] 11,69? ? 43,00 [56] 11,69? ? 43,00 [57] 11,69? ? 43,00 [58] 11,69? ? 43,00 [59] 11,69? ? 43,00 [60] 11,69? ? 43,00 [61] 2010.12.26 00:01:52? ? 696,19? ? ? ? ? 11,69? ? 43,00 [62] 11,69? ? 43,00 [63] 11,69? ? 43,00 [64] 11,69? ? 43,00 [65] 11,69? ? 43,00 [66] 11,69? ? 43,00 ..................................... etc. Is there a way to split data into date column, V2, V3 and V4 columns and erase those lines without date, so that data would look like that: date? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? V3? ? ? V4 2010.12.26 00:01:52? ? 555? ? 11.67? ? 44 2010.12.26 00:02:52? ? 566? ? 11.67? ? 44 etc. Thanks a lot! -- Simonas Kecorius ** ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Certainly. But you'd be better advised to use dput(head(yourdata, 20)) to provide data, since we don't actually know what's in your data after it has passed through print, copy, and email. How you got it into R may also be relevant. Also, I don't see how you get from the given data to the desired results: Given data, first line with date: [1] 2010.12.26 00:00:52 688,88 11,69 43,00 First line of result: 2010.12.26 00:01:52 555 11.67 44 I'm guessing that there might be tab characters in the data as separators, or are they spaces (this is why we need dput), and you want the commas as decimal marks rather than separators? If it were me, I'd extract the non-date rows outside of R using grep, then use read.csv2() to import it. But you can achieve much the same effect using grep() within R to get the rows with dates, then strsplit() to divide them into separate elements. Assuming that these are character vectors, that is. For actual working code, you need to provide actual working data. Sarah On Mon, Dec 17, 2012 at 7:04 AM, Simonas Kecorius <simolas2008 at gmail.com> wrote:> Hey R users, > > suppose we have data: > > [1] 2010.12.26 00:00:52 688,88 11,69 43,00 > [2] 11,69 43,00 > [3] 11,69 43,00 > [4] 11,69 43,00 > [5] 11,69 43,00 > [6] 11,69 43,00 > [7] 11,69 43,00 > [8] 11,69 43,00 > [9] 11,69 43,00 > [10] 11,69 43,00 > [11] 11,69 43,00 > [12] 11,69 43,00 > [13] 11,69 43,00 > [14] 11,69 43,00 > [15] 11,69 43,00 > [16] 11,69 43,00 > [17] 11,69 43,00 > [18] 11,69 43,00 > [19] 11,69 43,00 > [20] 11,69 43,00 > [21] 11,69 43,00 > [22] 11,69 43,00 > [23] 11,69 43,00 > [24] 11,69 43,00 > [25] 11,69 43,00 > [26] 11,69 43,00 > [27] 11,69 43,00 > [28] 11,69 43,00 > [29] 11,69 43,00 > [30] 11,69 43,00 > [31] 11,69 43,00 > [32] 11,69 43,00 > [33] 11,69 43,00 > [34] 11,69 43,00 > [35] 11,69 43,00 > [36] 11,69 43,00 > [37] 11,69 43,00 > [38] 11,69 43,00 > [39] 11,69 43,00 > [40] 11,69 43,00 > [41] 11,69 43,00 > [42] 11,69 43,00 > [43] 11,69 43,00 > [44] 11,69 43,00 > [45] 11,69 43,00 > [46] 11,69 43,00 > [47] 11,69 43,00 > [48] 11,69 43,00 > [49] 11,69 43,00 > [50] 11,69 43,00 > [51] 11,69 43,00 > [52] 11,69 43,00 > [53] 11,69 43,00 > [54] 11,69 43,00 > [55] 11,69 43,00 > [56] 11,69 43,00 > [57] 11,69 43,00 > [58] 11,69 43,00 > [59] 11,69 43,00 > [60] 11,69 43,00 > [61] 2010.12.26 00:01:52 696,19 11,69 43,00 > [62] 11,69 43,00 > [63] 11,69 43,00 > [64] 11,69 43,00 > [65] 11,69 43,00 > [66] 11,69 43,00 > ..................................... etc. > > Is there a way to split data into date column, V2, V3 and V4 columns and > erase those lines without date, so that data would look like that: > > date V2 V3 V4 > 2010.12.26 00:01:52 555 11.67 44 > 2010.12.26 00:02:52 566 11.67 44 > > etc. > > Thanks a lot! > >-- Sarah Goslee http://www.functionaldiversity.org
On Dec 17, 2012, at 4:04 AM, Simonas Kecorius wrote:> Hey R users, > > suppose we have data: >txt <- readLines(textConnection("[1] 2010.12.26 00:00:52 688,88 11,69 43,00 [2] 11,69 43,00 [3] 11,69 43,00 [4] 11,69 43,00 [5] 11,69 43,00 [6] 11,69 43,00 [7] 11,69 43,00 [8] 11,69 43,00 [9] 11,69 43,00 [10] 11,69 43,00 [11] 11,69 43,00 [12] 11,69 43,00 [13] 11,69 43,00 [14] 11,69 43,00 [15] 11,69 43,00 [16] 11,69 43,00 [17] 11,69 43,00 [18] 11,69 43,00 [19] 11,69 43,00 [20] 11,69 43,00 [21] 11,69 43,00 [22] 11,69 43,00 [23] 11,69 43,00 [24] 11,69 43,00 [25] 11,69 43,00 [26] 11,69 43,00 [27] 11,69 43,00 [28] 11,69 43,00 [29] 11,69 43,00 [30] 11,69 43,00 [31] 11,69 43,00 [32] 11,69 43,00 [33] 11,69 43,00 [34] 11,69 43,00 [35] 11,69 43,00 [36] 11,69 43,00 [37] 11,69 43,00 [38] 11,69 43,00 [39] 11,69 43,00 [40] 11,69 43,00 [41] 11,69 43,00 [42] 11,69 43,00 [43] 11,69 43,00 [44] 11,69 43,00 [45] 11,69 43,00 [46] 11,69 43,00 [47] 11,69 43,00 [48] 11,69 43,00 [49] 11,69 43,00 [50] 11,69 43,00 [51] 11,69 43,00 [52] 11,69 43,00 [53] 11,69 43,00 [54] 11,69 43,00 [55] 11,69 43,00 [56] 11,69 43,00 [57] 11,69 43,00 [58] 11,69 43,00 [59] 11,69 43,00 [60] 11,69 43,00 [61] 2010.12.26 00:01:52 696,19 11,69 43,00 [62] 11,69 43,00 [63] 11,69 43,00 [64] 11,69 43,00 [65] 11,69 43,00 [66] 11,69 43,00")) txt <- sub("\\[.+\\]","", txt) read.table(text=txt[ grepl("[[:digit:]]{4}\\.", txt) ] ) V1 V2 V3 V4 V5 1 2010.12.26 00:00:52 688,88 11,69 43,00 2 2010.12.26 00:01:52 696,19 11,69 43,00 Since you seemed to be using commas for decimal points I thought search for "NNNN." as a pattern might be sufficient, but you could extend that to a full date matching pattern if needed.> ..................................... etc. > > Is there a way to split data into date column, V2, V3 and V4 columns and > erase those lines without date, so that data would look like that: > > date V2 V3 V4 > 2010.12.26 00:01:52 555 11.67 44 > 2010.12.26 00:02:52 566 11.67 44 > > etc. > > Thanks a lot! > > > -- > Simonas Kecorius > ** > > [[alternative HTML version deleted]]Please read the Posting Guide and learn to post in plain text. David Winsemius Alameda, CA, USA
Hi, This could also work: ?max(nchar(txt)) #[1] 58 res<-read.table(text=substr(txt[nchar(txt)>20],5,58),sep="",dec=",",header=FALSE,stringsAsFactors=FALSE) ?res #????????? V1?????? V2???? V3??? V4 V5 #1 2010.12.26 00:00:52 688.88 11.69 43 #2 2010.12.26 00:01:52 696.19 11.69 43 A.K. ----- Original Message ----- From: David Winsemius <dwinsemius at comcast.net> To: Simonas Kecorius <simolas2008 at gmail.com> Cc: r-help at r-project.org Sent: Monday, December 17, 2012 3:15 PM Subject: Re: [R] split character line into rows On Dec 17, 2012, at 4:04 AM, Simonas Kecorius wrote:> Hey R users, > > suppose we have data: >txt <- readLines(textConnection("[1] 2010.12.26 00:00:52? ? 688,88? ? ? ? ? 11,69? ? 43,00 [2] 11,69? ? 43,00 [3] 11,69? ? 43,00 [4] 11,69? ? 43,00 [5] 11,69? ? 43,00 [6] 11,69? ? 43,00 [7] 11,69? ? 43,00 [8] 11,69? ? 43,00 [9] 11,69? ? 43,00 [10] 11,69? ? 43,00 [11] 11,69? ? 43,00 [12] 11,69? ? 43,00 [13] 11,69? ? 43,00 [14] 11,69? ? 43,00 [15] 11,69? ? 43,00 [16] 11,69? ? 43,00 [17] 11,69? ? 43,00 [18] 11,69? ? 43,00 [19] 11,69? ? 43,00 [20] 11,69? ? 43,00 [21] 11,69? ? 43,00 [22] 11,69? ? 43,00 [23] 11,69? ? 43,00 [24] 11,69? ? 43,00 [25] 11,69? ? 43,00 [26] 11,69? ? 43,00 [27] 11,69? ? 43,00 [28] 11,69? ? 43,00 [29] 11,69? ? 43,00 [30] 11,69? ? 43,00 [31] 11,69? ? 43,00 [32] 11,69? ? 43,00 [33] 11,69? ? 43,00 [34] 11,69? ? 43,00 [35] 11,69? ? 43,00 [36] 11,69? ? 43,00 [37] 11,69? ? 43,00 [38] 11,69? ? 43,00 [39] 11,69? ? 43,00 [40] 11,69? ? 43,00 [41] 11,69? ? 43,00 [42] 11,69? ? 43,00 [43] 11,69? ? 43,00 [44] 11,69? ? 43,00 [45] 11,69? ? 43,00 [46] 11,69? ? 43,00 [47] 11,69? ? 43,00 [48] 11,69? ? 43,00 [49] 11,69? ? 43,00 [50] 11,69? ? 43,00 [51] 11,69? ? 43,00 [52] 11,69? ? 43,00 [53] 11,69? ? 43,00 [54] 11,69? ? 43,00 [55] 11,69? ? 43,00 [56] 11,69? ? 43,00 [57] 11,69? ? 43,00 [58] 11,69? ? 43,00 [59] 11,69? ? 43,00 [60] 11,69? ? 43,00 [61] 2010.12.26 00:01:52? ? 696,19? ? ? ? ? 11,69? ? 43,00 [62] 11,69? ? 43,00 [63] 11,69? ? 43,00 [64] 11,69? ? 43,00 [65] 11,69? ? 43,00 [66] 11,69? ? 43,00")) txt <- sub("\\[.+\\]","", txt) read.table(text=txt[ grepl("[[:digit:]]{4}\\.", txt) ] ) ? ? ? ? ? V1? ? ? V2? ? V3? ? V4? ? V5 1 2010.12.26 00:00:52 688,88 11,69 43,00 2 2010.12.26 00:01:52 696,19 11,69 43,00 Since you seemed to be using commas for decimal points I thought search for "NNNN." as a pattern might be sufficient, but you could extend that to a full date matching pattern if needed.> ..................................... etc. > > Is there a way to split data into date column, V2, V3 and V4 columns and > erase those lines without date, so that data would look like that: > > date? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? V3? ? ? V4 > 2010.12.26 00:01:52? ? 555? ? 11.67? ? 44 > 2010.12.26 00:02:52? ? 566? ? 11.67? ? 44 > > etc. > > Thanks a lot! > > > -- > Simonas Kecorius > ** > > ??? [[alternative HTML version deleted]]Please read the Posting Guide and learn to post in plain text. David Winsemius Alameda, CA, USA ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, This could be read using the first method I suggested.? The ?substr() method was suggested as I thought that there will be square brackets in each line. dat1<-read.table("preila.txt",sep="",header=FALSE,fill=TRUE,dec=",",stringsAsFactors=FALSE) dat2<-na.omit(dat1) ?dat2$Date<-as.POSIXct(paste(dat2$V1,dat2$V2,sep=" "),format="%Y.%m.%d %H:%M:%S") ? dat3<-dat2[,c(6,3:5)] ? colnames(dat3)[2:4]<-paste0("V",2:4) str(dat3) #'data.frame':??? 1437 obs. of? 4 variables: # $ Date: POSIXct, format: "2010-12-26 00:00:52" "2010-12-26 00:01:52" ... # $ V2? : num? 689 696 712 721 726 ... # $ V3? : num? 11.7 11.7 11.7 11.7 11.7 ... # $ V4? : num? 43 43 43 43 43 43 43 43 43 43 ... ?row.names(dat3)<-1:nrow(dat3) head(dat3) #???????????????? Date???? V2??? V3 V4 #1 2010-12-26 00:00:52 688.88 11.69 43 #2 2010-12-26 00:01:52 696.19 11.69 43 #3 2010-12-26 00:02:52 712.26 11.69 43 #4 2010-12-26 00:03:52 720.70 11.69 43 #5 2010-12-26 00:04:52 726.16 11.69 43 #6 2010-12-26 00:05:52 713.26 11.69 43 A.K. ________________________________ From: Simonas Kecorius <simolas2008 at gmail.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, December 17, 2012 4:41 PM Subject: Re: [R] split character line into rows Hey Arun, thanks for quick response and your kind help. I appreciate it a lot. When I input data as you have written me, there is no problem. It separates by it self. But when I try to read data from txt file, everything goes wrong... I add you a raw txt data. Maybe it would help. Thanks a lot for your attempts to solve my problems. p.s. dont know how to write back to my own post, therefore I wrote straight to your mail. Sorry for that. 2012/12/17 arun <smartpink111 at yahoo.com> Hi,>This could also work: >?max(nchar(txt)) >#[1] 58 >res<-read.table(text=substr(txt[nchar(txt)>20],5,58),sep="",dec=",",header=FALSE,stringsAsFactors=FALSE) >?res >#????????? V1?????? V2???? V3??? V4 V5 >#1 2010.12.26 00:00:52 688.88 11.69 43 >#2 2010.12.26 00:01:52 696.19 11.69 43 > >A.K. > > > > >----- Original Message ----- > >From: David Winsemius <dwinsemius at comcast.net> >To: Simonas Kecorius <simolas2008 at gmail.com> >Cc: r-help at r-project.org >Sent: Monday, December 17, 2012 3:15 PM >Subject: Re: [R] split character line into rows > > >On Dec 17, 2012, at 4:04 AM, Simonas Kecorius wrote: > >> Hey R users, >> >> suppose we have data: >> >txt <- readLines(textConnection("[1] 2010.12.26 00:00:52? ? 688,88? ? ? ? ? 11,69? ? ?43,00 >[2] 11,69? ? ?43,00 >[3] 11,69? ? ?43,00 >[4] 11,69? ? ?43,00 >[5] 11,69? ? ?43,00 >[6] 11,69? ? ?43,00 >[7] 11,69? ? ?43,00 >[8] 11,69? ? ?43,00 >[9] 11,69? ? ?43,00 >[10] 11,69? ? ?43,00 >[11] 11,69? ? ?43,00 >[12] 11,69? ? ?43,00 >[13] 11,69? ? ?43,00 >[14] 11,69? ? ?43,00 >[15] 11,69? ? ?43,00 >[16] 11,69? ? ?43,00 >[17] 11,69? ? ?43,00 >[18] 11,69? ? ?43,00 >[19] 11,69? ? ?43,00 >[20] 11,69? ? ?43,00 >[21] 11,69? ? ?43,00 >[22] 11,69? ? ?43,00 >[23] 11,69? ? ?43,00 >[24] 11,69? ? ?43,00 >[25] 11,69? ? ?43,00 >[26] 11,69? ? ?43,00 >[27] 11,69? ? ?43,00 >[28] 11,69? ? ?43,00 >[29] 11,69? ? ?43,00 >[30] 11,69? ? ?43,00 >[31] 11,69? ? ?43,00 >[32] 11,69? ? ?43,00 >[33] 11,69? ? ?43,00 >[34] 11,69? ? ?43,00 >[35] 11,69? ? ?43,00 >[36] 11,69? ? ?43,00 >[37] 11,69? ? ?43,00 >[38] 11,69? ? ?43,00 >[39] 11,69? ? ?43,00 >[40] 11,69? ? ?43,00 >[41] 11,69? ? ?43,00 >[42] 11,69? ? ?43,00 >[43] 11,69? ? ?43,00 >[44] 11,69? ? ?43,00 >[45] 11,69? ? ?43,00 >[46] 11,69? ? ?43,00 >[47] 11,69? ? ?43,00 >[48] 11,69? ? ?43,00 >[49] 11,69? ? ?43,00 >[50] 11,69? ? ?43,00 >[51] 11,69? ? ?43,00 >[52] 11,69? ? ?43,00 >[53] 11,69? ? ?43,00 >[54] 11,69? ? ?43,00 >[55] 11,69? ? ?43,00 >[56] 11,69? ? ?43,00 >[57] 11,69? ? ?43,00 >[58] 11,69? ? ?43,00 >[59] 11,69? ? ?43,00 >[60] 11,69? ? ?43,00 >[61] 2010.12.26 00:01:52? ? 696,19? ? ? ? ? 11,69? ? ?43,00 >[62] 11,69? ? ?43,00 >[63] 11,69? ? ?43,00 >[64] 11,69? ? ?43,00 >[65] 11,69? ? ?43,00 >[66] 11,69? ? ?43,00")) > >txt <- sub("\\[.+\\]","", txt) >read.table(text=txt[ grepl("[[:digit:]]{4}\\.", txt) ] ) >? ? ? ? ? V1? ? ? ?V2? ? ?V3? ? V4? ? V5 >1 2010.12.26 00:00:52 688,88 11,69 43,00 >2 2010.12.26 00:01:52 696,19 11,69 43,00 > >Since you seemed to be using commas for decimal points I thought search for "NNNN." as a pattern might be sufficient, but you could extend that to a full date matching pattern if needed. > > >> ..................................... etc. >> >> Is there a way to split data into date column, V2, V3 and V4 columns and >> erase those lines without date, so that data would look like that: >> >> date? ? ? ? ? ? ? ? ? ? ? ? ? ? ?V2? ? ? ?V3? ? ? ?V4 >> 2010.12.26 00:01:52? ? ?555? ? ?11.67? ? ?44 >> 2010.12.26 00:02:52? ? ?566? ? ?11.67? ? ?44 >> >> etc. >> >> Thanks a lot! >> >> >> -- >> Simonas Kecorius >> ** >> >> ??? [[alternative HTML version deleted]] > >Please read the Posting Guide and learn to post in plain text. > > >David Winsemius >Alameda, CA, USA > > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >-- Simonas Kecorius
On Mon, Dec 17, 2012 at 4:43 PM, Simonas Kecorius <simolas2008 at gmail.com> wrote:> Hey Sarah, > thanks for quick response and your kind help. I appreciate it a lot. > You are completely right. When I input data as Arun suggested, there is no > problems. But when I try to read it from txr file - something wrong happens. > I add you a raw txt data. Maybe it would help. > Thanks a lot for your attempts to solve my problems.What have you tried to read it? And where's the dput() results that I requested?> p.s. dont know how to write back to my own post, therefore I wrote straight > to your mail. Sorry for that.Choose "reply all" or include r-help at r-project.org in the To: line.> 2012/12/17 Sarah Goslee <sarah.goslee at gmail.com> >> >> Certainly. >> >> But you'd be better advised to use dput(head(yourdata, 20)) to provide >> data, since we don't actually know what's in your data after it has >> passed through print, copy, and email. How you got it into R may also >> be relevant. >> >> Also, I don't see how you get from the given data to the desired results: >> >> Given data, first line with date: >> >> [1] 2010.12.26 00:00:52 688,88 11,69 43,00 >> >> >> First line of result: >> 2010.12.26 00:01:52 555 11.67 44 >> >> >> I'm guessing that there might be tab characters in the data as >> separators, or are they spaces (this is why we need dput), and you >> want the commas as decimal marks rather than separators? >> >> If it were me, I'd extract the non-date rows outside of R using grep, >> then use read.csv2() to import it. But you can achieve much the same >> effect using grep() within R to get the rows with dates, then >> strsplit() to divide them into separate elements. Assuming that these >> are character vectors, that is. >> >> For actual working code, you need to provide actual working data. >> >> Sarah >> >> On Mon, Dec 17, 2012 at 7:04 AM, Simonas Kecorius <simolas2008 at gmail.com> >> wrote: >> > Hey R users, >> > >> > suppose we have data: >> > >> > [1] 2010.12.26 00:00:52 688,88 11,69 43,00 >> > [2] 11,69 43,00 >> > [3] 11,69 43,00 >> > [4] 11,69 43,00 >> > [5] 11,69 43,00 >> > [6] 11,69 43,00 >> > [7] 11,69 43,00 >> > [8] 11,69 43,00 >> > [9] 11,69 43,00 >> > [10] 11,69 43,00 >> > [11] 11,69 43,00 >> > [12] 11,69 43,00 >> > [13] 11,69 43,00 >> > [14] 11,69 43,00 >> > [15] 11,69 43,00 >> > [16] 11,69 43,00 >> > [17] 11,69 43,00 >> > [18] 11,69 43,00 >> > [19] 11,69 43,00 >> > [20] 11,69 43,00 >> > [21] 11,69 43,00 >> > [22] 11,69 43,00 >> > [23] 11,69 43,00 >> > [24] 11,69 43,00 >> > [25] 11,69 43,00 >> > [26] 11,69 43,00 >> > [27] 11,69 43,00 >> > [28] 11,69 43,00 >> > [29] 11,69 43,00 >> > [30] 11,69 43,00 >> > [31] 11,69 43,00 >> > [32] 11,69 43,00 >> > [33] 11,69 43,00 >> > [34] 11,69 43,00 >> > [35] 11,69 43,00 >> > [36] 11,69 43,00 >> > [37] 11,69 43,00 >> > [38] 11,69 43,00 >> > [39] 11,69 43,00 >> > [40] 11,69 43,00 >> > [41] 11,69 43,00 >> > [42] 11,69 43,00 >> > [43] 11,69 43,00 >> > [44] 11,69 43,00 >> > [45] 11,69 43,00 >> > [46] 11,69 43,00 >> > [47] 11,69 43,00 >> > [48] 11,69 43,00 >> > [49] 11,69 43,00 >> > [50] 11,69 43,00 >> > [51] 11,69 43,00 >> > [52] 11,69 43,00 >> > [53] 11,69 43,00 >> > [54] 11,69 43,00 >> > [55] 11,69 43,00 >> > [56] 11,69 43,00 >> > [57] 11,69 43,00 >> > [58] 11,69 43,00 >> > [59] 11,69 43,00 >> > [60] 11,69 43,00 >> > [61] 2010.12.26 00:01:52 696,19 11,69 43,00 >> > [62] 11,69 43,00 >> > [63] 11,69 43,00 >> > [64] 11,69 43,00 >> > [65] 11,69 43,00 >> > [66] 11,69 43,00 >> > ..................................... etc. >> > >> > Is there a way to split data into date column, V2, V3 and V4 columns and >> > erase those lines without date, so that data would look like that: >> > >> > date V2 V3 V4 >> > 2010.12.26 00:01:52 555 11.67 44 >> > 2010.12.26 00:02:52 566 11.67 44 >> > >> > etc. >> > >> > Thanks a lot! >> > >> >-- Sarah Goslee http://www.functionaldiversity.org