Dear Eliza,
Try this:
Lines1<-readLines(textConnection("1911.01.01?????? 7.87
1911.01.02?????? 9.26
1911.01.03?????? 8.06
1911.01.04?????? 8.13
1911.01.05????? 12.90
1911.02.06?????? 5.45
1911.02.07?????? 3.26
1911.03.08?????? 5.70
1911.03.09?????? 9.24
1911.04.10?????? 7.60
1911.05.11????? 14.82
1911.05.12????? 14.10
1911.06.13?????? 7.87
1911.06.14?????? 9.26
1911.07.15?????? 8.06
1911.07.16?????? 8.13
1911.08.17????? 12.90
1911.08.18?????? 5.45
1911.09.19?????? 3.26
1911.09.20?????? 5.70
1911.10.21?????? 9.24
1911.10.22?????? 7.60
1911.11.23????? 14.82
1911.12.24????? 14.10"))?
Lines2<-Lines1[Lines1!=""]
library(stringr)
?str_count(Lines2, " ")
# [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
Lines2[str_count(Lines2," ")==7]<-
str_replace(Lines2[str_count(Lines2,"
")==7],"\\s+","???? ") #reduced 2 spaces
?Lines2[str_count(Lines2," ")==6]<-
str_replace(Lines2[str_count(Lines2,"
")==6],"\\s+","??? ") #reduced 2 spaces
?str_count(Lines2," ")
# [1] 5 5 5 5 4 5 5 5 5 5 4 4 5 5 5 5 4 5 5 5 5 5 4 4
substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],6,6)<-"
"
substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],9,9)<-"
"
str_count(Lines2," ") #see the difference in space.? This counts all
the space.? Here 2 white space are added to replace 0
# [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
Lines2
# [1] "1911. 1. 1???? 7.87" "1911. 1. 2???? 9.26"
"1911. 1. 3???? 8.06"
# [4] "1911. 1. 4???? 8.13" "1911. 1. 5??? 12.90"
"1911. 2. 6???? 5.45"
# [7] "1911. 2. 7???? 3.26" "1911. 3. 8???? 5.70"
"1911. 3. 9???? 9.24"
#[10] "1911. 4.10???? 7.60" "1911. 5.11??? 14.82"
"1911. 5.12??? 14.10"
#[13] "1911. 6.13???? 7.87" "1911. 6.14???? 9.26"
"1911. 7.15???? 8.06"
#[16] "1911. 7.16???? 8.13" "1911. 8.17??? 12.90"
"1911. 8.18???? 5.45"
#[19] "1911. 9.19???? 3.26" "1911. 9.20???? 5.70"
"1911.10.21???? 9.24"
#[22] "1911.10.22???? 7.60" "1911.11.23??? 14.82"
"1911.12.24??? 14.10"
A.K.
________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Friday, February 15, 2013 12:38 PM
Subject: data formatting
Dear Arun,
[text file is also attached if format is changed]
i need to data managing genius expertise on the following issue.
i have data like the following table
1911.01.01?????? 7.87 ##(7 spaces between the columns)
1911.01.02?????? 9.26 ##(7 spaces between the columns)
1911.01.03?????? 8.06 ##(7 spaces between the columns)
1911.01.04?????? 8.13 ##(7 spaces between the columns)
1911.01.05????? 12.90 ##(6 spaces between the columns)
1911.02.06?????? 5.45 ##(7 spaces between the columns)
1911.02.07?????? 3.26 ##(7 spaces between the columns)
1911.03.08?????? 5.70 ##(7 spaces between the columns)
1911.03.09?????? 9.24 ##(7 spaces between the columns)
1911.04.10?????? 7.60 ##(7 spaces between the columns)
1911.05.11????? 14.82 ##(6 spaces between the columns)
1911.05.12????? 14.10 ##(6 spaces between the columns)
1911.06.13?????? 7.87 ##(7 spaces between the columns)
1911.06.14?????? 9.26 ##(7 spaces between the columns)
1911.07.15?????? 8.06 ##(7 spaces between the columns)
1911.07.16?????? 8.13 ##(7 spaces between the columns)
1911.08.17????? 12.90 ##(6 spaces between the columns)
1911.08.18?????? 5.45 ##(7 spaces between the columns)
1911.09.19?????? 3.26 ##(7 spaces between the columns)
1911.09.20?????? 5.70 ##(7 spaces between the columns)
1911.10.21?????? 9.24 ##(7 spaces between the columns)
1911.10.22?????? 7.60 ##(7 spaces between the columns)
1911.11.23????? 14.82 ##(6 spaces between the columns)
1911.12.24????? 14.10 ##(6 spaces between the columns)
and i want it to be in the following manner and afterwards i want to save that
file in ".txt" format.
?1911. 1. 1???? 7.87 ##(5 spaces between the columns)
?1911. 1. 2???? 9.26 ##(5 spaces between the columns)
?1911. 1. 3???? 8.06 ##(5 spaces between the columns)
?1911. 1. 4???? 8.13 ##(5 spaces between the columns)
?1911. 1. 5??? 12.90 ##(4 spaces between the columns)
?1911. 2. 6???? 5.45 ##(5 spaces between the columns)
?1911. 2. 7???? 3.26 ##(5 spaces between the columns)
?1911. 3. 8???? 5.70 ##(5 spaces between the columns)
?1911. 3. 9???? 9.24 ##(5 spaces between the columns)
?1911. 4.10???? 7.60 ##(5 spaces between the columns)
?1911. 5.11??? 14.82 ##(4 spaces between the columns)
?1911. 5.12??? 14.10 ##(4 spaces between the columns)
?1911. 6.13???? 7.87 ##(5 spaces between the columns)
?1911. 6.14???? 9.26 ##(5 spaces between the columns)
?1911. 7.15???? 8.06 ##(5 spaces between the columns)
?1911. 7.16???? 8.13 ##(5 spaces between the columns)
?1911. 8.17??? 12.90 ##(4 spaces between the columns)
?1911. 8.18???? 5.45 ##(5 spaces between the columns)
?1911. 9.19???? 3.26 ##(5 spaces between the columns)
?1911. 9.20???? 5.70 ##(5 spaces between the columns)
?1911.10.21???? 9.24 ##(5 spaces between the columns)
?1911.10.22???? 7.60 ##(5 spaces between the columns)
?1911.11.23??? 14.82 ##(4 spaces between the columns)
?1911.12.24??? 14.10 ##(4 spaces between the columns)
you could see that spaces between the columns needed to be reduced in executed
file and also the zeros in date columns with months and days are needed to be
replaced with space.
thankyou very very much in advance
elisa
HI Eliza,
Suppose you have 147 data files in the same working directory.?? Here, I am
using "Eliza1.txt" and a modified "Eliza2.txt" (attached).
list.files()
#[1] "Eliza1.txt" "Eliza2.txt"
lapply(list.files(),function(i) str_count(gsub("
$","",readLines(i))," ")) #count the spaces.? Used gsub
as there were spaces at the end (possibly due to formatting error) #which was
removed.? If there are no spaces at the end, you don't need ?gsub()
#[[1]]
?#[1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
#
#[[2]]
# [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
res<- lapply(list.files(),function(i) {Lines2<-gsub("
$","",readLines(i));Lines2[str_count(Lines2,"
")==7]<- str_replace(Lines2[str_count(Lines2,"
")==7],"\\s+","???? ");Lines2[str_count(Lines2,"
")==6]<- str_replace(Lines2[str_count(Lines2,"
")==6],"\\s+","???
");substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],6,6)<-"
";substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],9,9)<-"
";Lines2})
names(res)<-gsub("\\..*","",list.files())
res
#$Eliza1
# [1] "1911. 1. 1???? 7.87" "1911. 1. 2???? 9.26"
"1911. 1. 3???? 8.06"
# [4] "1911. 1. 4???? 8.13" "1911. 1. 5??? 12.90"
"1911. 2. 6???? 5.45"
# [7] "1911. 2. 7???? 3.26" "1911. 3. 8???? 5.70"
"1911. 3. 9???? 9.24"
#[10] "1911. 4.10???? 7.60" "1911. 5.11??? 14.82"
"1911. 5.12??? 14.10"
#[13] "1911. 6.13???? 7.87" "1911. 6.14???? 9.26"
"1911. 7.15???? 8.06"
#[16] "1911. 7.16???? 8.13" "1911. 8.17??? 12.90"
"1911. 8.18???? 5.45"
#[19] "1911. 9.19???? 3.26" "1911. 9.20???? 5.70"
"1911.10.21???? 9.24"
#[22] "1911.10.22???? 7.60" "1911.11.23??? 14.82"
"1911.12.24??? 14.10"
#$Eliza2
# [1] "1911. 1. 1???? 4.87"? "1911. 1. 2???? 11.26"
"1911. 1. 3???? 6.06"
# [4] "1911. 1. 4???? 8.13"? "1911. 1. 5??? 11.90"?
"1911. 2. 6???? 5.55"
# [7] "1911. 2. 7???? 3.16"? "1911. 3. 8???? 5.10"?
"1911. 3. 9???? 9.34"
#[10] "1911. 4.10???? 7.10"? "1911. 5.11??? 14.92"?
"1911. 5.12??? 14.20"
#[13] "1911. 6.13???? 7.77"? "1911. 6.14???? 9.36"?
"1911. 7.15???? 8.66"
#[16] "1911. 7.16???? 8.23"? "1911. 8.17??? 11.90"?
"1911. 8.18???? 15.45"
#[19] "1911. 9.19???? 13.26" "1911. 9.20???? 15.77"
"1911.10.21???? 19.34"
#[22] "1911.10.22???? 7.66"? "1911.11.23??? 14.84"?
"1911.12.24??? 14.11"
?lapply(res,function(x) str_count(x," "))
#$Eliza1
# [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
#$Eliza2
# [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
Hope this helps.
A.K.
________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Friday, February 15, 2013 4:47 PM
Subject: RE: data formatting
Thankyou very much for replying arun. i just need to know, what change will i
have ?to make if i am importing 147 data files into a list. what difference will
it make on ?the first command which is,
?Lines1<-readLines(textConnection("1911.01.01?????? 7.87
?1911.01.02?????? 9.26?
?1911.01.03?????? 8.06?
?1911.01.04?????? 8.13?
?1911.01.05????? 12.90?
?1911.02.06?????? 5.45?
?1911.02.07?????? 3.26?
?1911.03.08?????? 5.70?
?1911.03.09?????? 9.24?
?1911.04.10?????? 7.60?
?1911.05.11????? 14.82?
?1911.05.12????? 14.10?
?1911.06.13?????? 7.87?
?1911.06.14?????? 9.26?
?1911.07.15?????? 8.06?
?1911.07.16?????? 8.13?
?1911.08.17????? 12.90?
?1911.08.18?????? 5.45?
?1911.09.19?????? 3.26?
?1911.09.20?????? 5.70?
?1911.10.21?????? 9.24?
?1911.10.22?????? 7.60?
?1911.11.23????? 14.82?
?1911.12.24????? 14.10"))?
thankyou so very much...
elisa
> Date: Fri, 15 Feb 2013 11:11:36 -0800
> From: smartpink111 at yahoo.com
> Subject: Re: data formatting
> To: eliza_botto at hotmail.com
> CC: r-help at r-project.org
>
>
>
> Dear Eliza,
>
> Try this:
>
> Lines1<-readLines(textConnection("1911.01.01?????? 7.87
> 1911.01.02?????? 9.26
> 1911.01.03?????? 8.06
> 1911.01.04?????? 8.13
> 1911.01.05????? 12.90
> 1911.02.06?????? 5.45
> 1911.02.07?????? 3.26
> 1911.03.08?????? 5.70
> 1911.03.09?????? 9.24
> 1911.04.10?????? 7.60
> 1911.05.11????? 14.82
> 1911.05.12????? 14.10
> 1911.06.13?????? 7.87
> 1911.06.14?????? 9.26
> 1911.07.15?????? 8.06
> 1911.07.16?????? 8.13
> 1911.08.17????? 12.90
> 1911.08.18?????? 5.45
> 1911.09.19?????? 3.26
> 1911.09.20?????? 5.70
> 1911.10.21?????? 9.24
> 1911.10.22?????? 7.60
> 1911.11.23????? 14.82
> 1911.12.24????? 14.10"))?
>
> Lines2<-Lines1[Lines1!=""]
> library(stringr)
> ?str_count(Lines2, " ")
> # [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
>
>
> Lines2[str_count(Lines2," ")==7]<-
str_replace(Lines2[str_count(Lines2,"
")==7],"\\s+","???? ") #reduced 2 spaces
>
> ?Lines2[str_count(Lines2," ")==6]<-
str_replace(Lines2[str_count(Lines2,"
")==6],"\\s+","??? ") #reduced 2 spaces
> ?str_count(Lines2," ")
> # [1] 5 5 5 5 4 5 5 5 5 5 4 4 5 5 5 5 4 5 5 5 5 5 4 4
> substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],6,6)<-"
"
> substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],9,9)<-"
"
> str_count(Lines2," ") #see the difference in space.? This counts
all the space.? Here 2 white space are added to replace 0
> # [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
> Lines2
> # [1] "1911. 1. 1???? 7.87" "1911. 1. 2???? 9.26"
"1911. 1. 3???? 8.06"
> # [4] "1911. 1. 4???? 8.13" "1911. 1. 5??? 12.90"
"1911. 2. 6???? 5.45"
> # [7] "1911. 2. 7???? 3.26" "1911. 3. 8???? 5.70"
"1911. 3. 9???? 9.24"
> #[10] "1911. 4.10???? 7.60" "1911. 5.11??? 14.82"
"1911. 5.12??? 14.10"
> #[13] "1911. 6.13???? 7.87" "1911. 6.14???? 9.26"
"1911. 7.15???? 8.06"
> #[16] "1911. 7.16???? 8.13" "1911. 8.17??? 12.90"
"1911. 8.18???? 5.45"
> #[19] "1911. 9.19???? 3.26" "1911. 9.20???? 5.70"
"1911.10.21???? 9.24"
> #[22] "1911.10.22???? 7.60" "1911.11.23??? 14.82"
"1911.12.24??? 14.10"
>
> A.K.
> ________________________________
> From: eliza botto <eliza_botto at hotmail.com>
> To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
> Sent: Friday, February 15, 2013 12:38 PM
> Subject: data formatting
>
>
>
> Dear Arun,
> [text file is also attached if format is changed]
> i need to data managing genius expertise on the following issue.
> i have data like the following table
>
> 1911.01.01?????? 7.87 ##(7 spaces between the columns)
> 1911.01.02?????? 9.26 ##(7 spaces between the columns)
> 1911.01.03?????? 8.06 ##(7 spaces between the columns)
> 1911.01.04?????? 8.13 ##(7 spaces between the columns)
> 1911.01.05????? 12.90 ##(6 spaces between the columns)
> 1911.02.06?????? 5.45 ##(7 spaces between the columns)
> 1911.02.07?????? 3.26 ##(7 spaces between the columns)
> 1911.03.08?????? 5.70 ##(7 spaces between the columns)
> 1911.03.09?????? 9.24 ##(7 spaces between the columns)
> 1911.04.10?????? 7.60 ##(7 spaces between the columns)
> 1911.05.11????? 14.82 ##(6 spaces between the columns)
> 1911.05.12????? 14.10 ##(6 spaces between the columns)
> 1911.06.13?????? 7.87 ##(7 spaces between the columns)
> 1911.06.14?????? 9.26 ##(7 spaces between the columns)
> 1911.07.15?????? 8.06 ##(7 spaces between the columns)
> 1911.07.16?????? 8.13 ##(7 spaces between the columns)
> 1911.08.17????? 12.90 ##(6 spaces between the columns)
> 1911.08.18?????? 5.45 ##(7 spaces between the columns)
> 1911.09.19?????? 3.26 ##(7 spaces between the columns)
> 1911.09.20?????? 5.70 ##(7 spaces between the columns)
> 1911.10.21?????? 9.24 ##(7 spaces between the columns)
> 1911.10.22?????? 7.60 ##(7 spaces between the columns)
> 1911.11.23????? 14.82 ##(6 spaces between the columns)
> 1911.12.24????? 14.10 ##(6 spaces between the columns)
> and i want it to be in the following manner and afterwards i want to save
that file in ".txt" format.
> ?1911. 1. 1???? 7.87 ##(5 spaces between the columns)
> ?1911. 1. 2???? 9.26 ##(5 spaces between the columns)
> ?1911. 1. 3???? 8.06 ##(5 spaces between the columns)
> ?1911. 1. 4???? 8.13 ##(5 spaces between the columns)
> ?1911. 1. 5??? 12.90 ##(4 spaces between the columns)
> ?1911. 2. 6???? 5.45 ##(5 spaces between the columns)
> ?1911. 2. 7???? 3.26 ##(5 spaces between the columns)
> ?1911. 3. 8???? 5.70 ##(5 spaces between the columns)
> ?1911. 3. 9???? 9.24 ##(5 spaces between the columns)
> ?1911. 4.10???? 7.60 ##(5 spaces between the columns)
> ?1911. 5.11??? 14.82 ##(4 spaces between the columns)
> ?1911. 5.12??? 14.10 ##(4 spaces between the columns)
> ?1911. 6.13???? 7.87 ##(5 spaces between the columns)
> ?1911. 6.14???? 9.26 ##(5 spaces between the columns)
> ?1911. 7.15???? 8.06 ##(5 spaces between the columns)
> ?1911. 7.16???? 8.13 ##(5 spaces between the columns)
> ?1911. 8.17??? 12.90 ##(4 spaces between the columns)
> ?1911. 8.18???? 5.45 ##(5 spaces between the columns)
> ?1911. 9.19???? 3.26 ##(5 spaces between the columns)
> ?1911. 9.20???? 5.70 ##(5 spaces between the columns)
> ?1911.10.21???? 9.24 ##(5 spaces between the columns)
> ?1911.10.22???? 7.60 ##(5 spaces between the columns)
> ?1911.11.23??? 14.82 ##(4 spaces between the columns)
> ?1911.12.24??? 14.10 ##(4 spaces between the columns)
> you could see that spaces between the columns needed to be reduced in
executed file and also the zeros in date columns with months and days are needed
to be replaced with space.
> thankyou very very much in advance
> elisa
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Eliza1.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130215/60cfa746/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Eliza2.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130215/60cfa746/attachment-0001.txt>
Dear Eliza,
Thank you for testing the code.? Sorry, it was a mistake.
I created one more file "Eliza3.txt" (attached)
Try this.
library(stringr)
lapply(list.files(),function(i) str_count(gsub("
$","",readLines(i))," "))?
#[[1]]
# [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
#[[2]]
# [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
#[[3]]
# [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6
resNew<-lapply(list.files(),function(i) {Lines2<-gsub("
$","",readLines(i));Lines2[str_count(Lines2,"
")==7]<- str_replace(Lines2[str_count(Lines2,"
")==7],"\\s+","???? ");Lines2[str_count(Lines2,"
")==6]<- str_replace(Lines2[str_count(Lines2,"
")==6],"\\s+","???
");substr(Lines2,6,6)<-ifelse(substr(Lines2,6,6)==0,"
",substr(Lines2,6,6));substr(Lines2,9,9)<-ifelse(substr(Lines2,9,9)==0,"
",substr(Lines2,9,9));Lines2})
?names(resNew)<- gsub("\\..*","",list.files())
lapply(resNew,function(x) str_count(x," "))
#$Eliza1
# [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
#$Eliza2
# [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
#$Eliza3
# [1] 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 5 5 4
?resNew
#$Eliza1
# [1] "1911. 1. 1???? 7.87" "1911. 1. 2???? 9.26"
"1911. 1. 3???? 8.06"
# [4] "1911. 1. 4???? 8.13" "1911. 1. 5??? 12.90"
"1911. 2. 6???? 5.45"
# [7] "1911. 2. 7???? 3.26" "1911. 3. 8???? 5.70"
"1911. 3. 9???? 9.24"
#[10] "1911. 4.10???? 7.60" "1911. 5.11??? 14.82"
"1911. 5.12??? 14.10"
#[13] "1911. 6.13???? 7.87" "1911. 6.14???? 9.26"
"1911. 7.15???? 8.06"
#[16] "1911. 7.16???? 8.13" "1911. 8.17??? 12.90"
"1911. 8.18???? 5.45"
#[19] "1911. 9.19???? 3.26" "1911. 9.20???? 5.70"
"1911.10.21???? 9.24"
#[22] "1911.10.22???? 7.60" "1911.11.23??? 14.82"
"1911.12.24??? 14.10"
#$Eliza2
# [1] "1911. 1. 1???? 4.87"? "1911. 1. 2???? 11.26"
"1911. 1. 3???? 6.06"
# [4] "1911. 1. 4???? 8.13"? "1911. 1. 5??? 11.90"?
"1911. 2. 6???? 5.55"
# [7] "1911. 2. 7???? 3.16"? "1911. 3. 8???? 5.10"?
"1911. 3. 9???? 9.34"
#[10] "1911. 4.10???? 7.10"? "1911. 5.11??? 14.92"?
"1911. 5.12??? 14.20"
#[13] "1911. 6.13???? 7.77"? "1911. 6.14???? 9.36"?
"1911. 7.15???? 8.66"
#[16] "1911. 7.16???? 8.23"? "1911. 8.17??? 11.90"?
"1911. 8.18???? 15.45"
#[19] "1911. 9.19???? 13.26" "1911. 9.20???? 15.77"
"1911.10.21???? 19.34"
#[22] "1911.10.22???? 7.66"? "1911.11.23??? 14.84"?
"1911.12.24??? 14.11"
#$Eliza3
# [1] "1996.11.24???? 3.26" "1996.11.25???? 3.02"
"1996.11.26???? 3.61"
# [4] "1996.11.27???? 3.43" "1996.11.28???? 5.91"
"1996.11.29???? 4.48"
# [7] "1996.11.30???? 1.30" "1996.12. 1???? 1.50"
"1996.12. 2???? 2.37"
#[10] "1996.12. 3???? 3.62" "1996.12. 4???? 4.60"
"1996.12. 5???? 4.34"
#[13] "1996.12. 6???? 3.26" "1996.12. 7???? 3.02"
"1996.12. 8???? 3.61"
#[16] "1996.12. 9???? 3.43" "1996.12.10???? 5.91"
"1996.12.11???? 4.48"
#[19] "1911.12.12??? 12.92"
Hope it works.
A.K.
________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Friday, February 15, 2013 7:14 PM
Subject: RE: data formatting
Dear Arun,
Thankyou very very much. these commands did work. but there is one slight
mistake i want to point at. the codes work good. but when months goes in double
figure and date is less than 10, then the digit of month at 10th place gets
deleted.
for example, see the following table from?[31382,]? to?[31390,]. instead of 12,
it wrote 2.
[31375,] "1996.11.24 ? ? 3.26"
[31376,] "1996.11.25 ? ? 3.02"
[31377,] "1996.11.26 ? ? 3.61"
[31378,] "1996.11.27 ? ? 3.43"
[31379,] "1996.11.28 ? ? 5.91"
[31380,] "1996.11.29 ? ? 4.48"
[31381,] "1996.11.30 ? ? 1.30"
[31382,] "1996. 2. 1 ? ? 1.50"
[31383,] "1996. 2. 2 ? ? 2.37"
[31384,] "1996. 2. 3 ? ? 3.62"
[31385,] "1996. 2. 4 ? ? 4.60"
[31386,] "1996. 2. 5 ? ? 4.34"
[31387,] "1996. 2. 6 ? ? 3.26"
[31388,] "1996. 2. 7 ? ? 3.02"
[31389,] "1996. 2. 8 ? ? 3.61"
[31390,] "1996. 2. 9 ? ? 3.43"
[31391,] "1996.12.10 ? ? 5.91"
[31392,] "1996.12.11 ? ? 4.48"
[31393,] "1996.12.12 ? ? 1.30"
[31394,] "1996.12.13 ? ? 1.50"
[31395,] "1996.12.14 ? ? 2.37"
[31396,] "1996.12.15 ? ? 3.62"
[31397,] "1996.12.16 ? ? 4.60"
[31398,] "1996.12.17 ? ? 4.34"
[31399,] "1996.12.18 ? ? 3.26"
[31400,] "1996.12.19 ? ? 3.02"
[31401,] "1996.12.20 ? ? 3.61"
[31402,] "1996.12.21 ? ? 3.43"
[31403,] "1996.12.22 ? ? 5.91"
[31404,] "1996.12.23 ? ? 4.48"
[31405,] "1996.12.24 ? ? 1.30"
[31406,] "1996.12.25 ? ? 1.50"
[31407,] "1996.12.26 ? ? 2.37"
[31408,] "1996.12.27 ? ? 3.62"
[31409,] "1996.12.28 ? ? 4.60"
[31410,] "1996.12.29 ? ? 4.34"
[31411,] "1996.12.30 ? ? 3.26"
[31412,] "1996.12.31 ? ? 3.02"
[31413,] "1911. 1. 1 ? ? 3.61"
[31414,] "1911. 1. 2 ? ? 3.43"
[31415,] "1911. 1. 3 ? ? 5.91"
[31416,] "1911. 1. 4 ? ? 4.48"
Thankyou so very much...
elisa
> Date: Fri, 15 Feb 2013 14:48:17 -0800
> From: smartpink111 at yahoo.com
> Subject: Re: data formatting
> To: eliza_botto at hotmail.com
> CC: r-help at r-project.org
>
>
>
> HI Eliza,
>
> Suppose you have 147 data files in the same working directory.?? Here, I am
using "Eliza1.txt" and a modified "Eliza2.txt" (attached).
> list.files()
> #[1] "Eliza1.txt" "Eliza2.txt"
>
> lapply(list.files(),function(i) str_count(gsub("
$","",readLines(i))," ")) #count the spaces.? Used gsub
as there were spaces at the end (possibly due to formatting error) #which was
removed.? If there are no spaces at the end, you don't need ?gsub()
> #[[1]]
> ?#[1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
> #
> #[[2]]
> # [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
>
>
> res<- lapply(list.files(),function(i) {Lines2<-gsub("
$","",readLines(i));Lines2[str_count(Lines2,"
")==7]<- str_replace(Lines2[str_count(Lines2,"
")==7],"\\s+","???? ");Lines2[str_count(Lines2,"
")==6]<- str_replace(Lines2[str_count(Lines2,"
")==6],"\\s+","???
");substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],6,6)<-"
";substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],9,9)<-"
";Lines2})
>
> names(res)<-gsub("\\..*","",list.files())
> res
> #$Eliza1
> # [1] "1911. 1. 1???? 7.87" "1911. 1. 2???? 9.26"
"1911. 1. 3???? 8.06"
> # [4] "1911. 1. 4???? 8.13" "1911. 1. 5??? 12.90"
"1911. 2. 6???? 5.45"
> # [7] "1911. 2. 7???? 3.26" "1911. 3. 8???? 5.70"
"1911. 3. 9???? 9.24"
> #[10] "1911. 4.10???? 7.60" "1911. 5.11??? 14.82"
"1911. 5.12??? 14.10"
> #[13] "1911. 6.13???? 7.87" "1911. 6.14???? 9.26"
"1911. 7.15???? 8.06"
> #[16] "1911. 7.16???? 8.13" "1911. 8.17??? 12.90"
"1911. 8.18???? 5.45"
> #[19] "1911. 9.19???? 3.26" "1911. 9.20???? 5.70"
"1911.10.21???? 9.24"
> #[22] "1911.10.22???? 7.60" "1911.11.23??? 14.82"
"1911.12.24??? 14.10"
>
> #$Eliza2
> # [1] "1911. 1. 1???? 4.87"? "1911. 1. 2???? 11.26"
"1911. 1. 3???? 6.06"
> # [4] "1911. 1. 4???? 8.13"? "1911. 1. 5??? 11.90"?
"1911. 2. 6???? 5.55"
> # [7] "1911. 2. 7???? 3.16"? "1911. 3. 8???? 5.10"?
"1911. 3. 9???? 9.34"
> #[10] "1911. 4.10???? 7.10"? "1911. 5.11??? 14.92"?
"1911. 5.12??? 14.20"
> #[13] "1911. 6.13???? 7.77"? "1911. 6.14???? 9.36"?
"1911. 7.15???? 8.66"
> #[16] "1911. 7.16???? 8.23"? "1911. 8.17??? 11.90"?
"1911. 8.18???? 15.45"
> #[19] "1911. 9.19???? 13.26" "1911. 9.20???? 15.77"
"1911.10.21???? 19.34"
> #[22] "1911.10.22???? 7.66"? "1911.11.23??? 14.84"?
"1911.12.24??? 14.11"
> ?lapply(res,function(x) str_count(x," "))
> #$Eliza1
> # [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
>
> #$Eliza2
> # [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
> Hope this helps.
> A.K.
>
>
>
>
>
>
>
> ________________________________
> From: eliza botto <eliza_botto at hotmail.com>
> To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
> Sent: Friday, February 15, 2013 4:47 PM
> Subject: RE: data formatting
>
>
>
> Thankyou very much for replying arun. i just need to know, what change will
i have ?to make if i am importing 147 data files into a list. what difference
will it make on ?the first command which is,
> ?Lines1<-readLines(textConnection("1911.01.01?????? 7.87
> ?1911.01.02?????? 9.26?
> ?1911.01.03?????? 8.06?
> ?1911.01.04?????? 8.13?
> ?1911.01.05????? 12.90?
> ?1911.02.06?????? 5.45?
> ?1911.02.07?????? 3.26?
> ?1911.03.08?????? 5.70?
> ?1911.03.09?????? 9.24?
> ?1911.04.10?????? 7.60?
> ?1911.05.11????? 14.82?
> ?1911.05.12????? 14.10?
> ?1911.06.13?????? 7.87?
> ?1911.06.14?????? 9.26?
> ?1911.07.15?????? 8.06?
> ?1911.07.16?????? 8.13?
> ?1911.08.17????? 12.90?
> ?1911.08.18?????? 5.45?
> ?1911.09.19?????? 3.26?
> ?1911.09.20?????? 5.70?
> ?1911.10.21?????? 9.24?
> ?1911.10.22?????? 7.60?
> ?1911.11.23????? 14.82?
> ?1911.12.24????? 14.10"))?
>
> thankyou so very much...
>
> elisa
>
>
> > Date: Fri, 15 Feb 2013 11:11:36 -0800
> > From: smartpink111 at yahoo.com
> > Subject: Re: data formatting
> > To: eliza_botto at hotmail.com
> > CC: r-help at r-project.org
> >
> >
> >
> > Dear Eliza,
> >
> > Try this:
> >
> > Lines1<-readLines(textConnection("1911.01.01?????? 7.87
> > 1911.01.02?????? 9.26
> > 1911.01.03?????? 8.06
> > 1911.01.04?????? 8.13
> > 1911.01.05????? 12.90
> > 1911.02.06?????? 5.45
> > 1911.02.07?????? 3.26
> > 1911.03.08?????? 5.70
> > 1911.03.09?????? 9.24
> > 1911.04.10?????? 7.60
> > 1911.05.11????? 14.82
> > 1911.05.12????? 14.10
> > 1911.06.13?????? 7.87
> > 1911.06.14?????? 9.26
> > 1911.07.15?????? 8.06
> > 1911.07.16?????? 8.13
> > 1911.08.17????? 12.90
> > 1911.08.18?????? 5.45
> > 1911.09.19?????? 3.26
> > 1911.09.20?????? 5.70
> > 1911.10.21?????? 9.24
> > 1911.10.22?????? 7.60
> > 1911.11.23????? 14.82
> > 1911.12.24????? 14.10"))?
> >
> > Lines2<-Lines1[Lines1!=""]
> > library(stringr)
> > ?str_count(Lines2, " ")
> > # [1] 7 7 7 7 6 7 7 7 7 7 6 6 7 7 7 7 6 7 7 7 7 7 6 6
> >
> >
> > Lines2[str_count(Lines2," ")==7]<-
str_replace(Lines2[str_count(Lines2,"
")==7],"\\s+","???? ") #reduced 2 spaces
> >
> > ?Lines2[str_count(Lines2," ")==6]<-
str_replace(Lines2[str_count(Lines2,"
")==6],"\\s+","??? ") #reduced 2 spaces
> > ?str_count(Lines2," ")
> > # [1] 5 5 5 5 4 5 5 5 5 5 4 4 5 5 5 5 4 5 5 5 5 5 4 4
> >
substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],6,6)<-"
"
> >
substr(Lines2[substr(Lines2,6,6)==0|substr(Lines2,9,9)==0],9,9)<-"
"
> > str_count(Lines2," ") #see the difference in space.? This
counts all the space.? Here 2 white space are added to replace 0
> > # [1] 7 7 7 7 6 7 7 7 7 6 5 5 6 6 6 6 5 6 6 6 5 5 4 4
> > Lines2
> > # [1] "1911. 1. 1???? 7.87" "1911. 1. 2???? 9.26"
"1911. 1. 3???? 8.06"
> > # [4] "1911. 1. 4???? 8.13" "1911. 1. 5??? 12.90"
"1911. 2. 6???? 5.45"
> > # [7] "1911. 2. 7???? 3.26" "1911. 3. 8???? 5.70"
"1911. 3. 9???? 9.24"
> > #[10] "1911. 4.10???? 7.60" "1911. 5.11??? 14.82"
"1911. 5.12??? 14.10"
> > #[13] "1911. 6.13???? 7.87" "1911. 6.14???? 9.26"
"1911. 7.15???? 8.06"
> > #[16] "1911. 7.16???? 8.13" "1911. 8.17??? 12.90"
"1911. 8.18???? 5.45"
> > #[19] "1911. 9.19???? 3.26" "1911. 9.20???? 5.70"
"1911.10.21???? 9.24"
> > #[22] "1911.10.22???? 7.60" "1911.11.23??? 14.82"
"1911.12.24??? 14.10"
> >
> > A.K.
> > ________________________________
> > From: eliza botto <eliza_botto at hotmail.com>
> > To: "smartpink111 at yahoo.com" <smartpink111 at
yahoo.com>
> > Sent: Friday, February 15, 2013 12:38 PM
> > Subject: data formatting
> >
> >
> >
> > Dear Arun,
> > [text file is also attached if format is changed]
> > i need to data managing genius expertise on the following issue.
> > i have data like the following table
> >
> > 1911.01.01?????? 7.87 ##(7 spaces between the columns)
> > 1911.01.02?????? 9.26 ##(7 spaces between the columns)
> > 1911.01.03?????? 8.06 ##(7 spaces between the columns)
> > 1911.01.04?????? 8.13 ##(7 spaces between the columns)
> > 1911.01.05????? 12.90 ##(6 spaces between the columns)
> > 1911.02.06?????? 5.45 ##(7 spaces between the columns)
> > 1911.02.07?????? 3.26 ##(7 spaces between the columns)
> > 1911.03.08?????? 5.70 ##(7 spaces between the columns)
> > 1911.03.09?????? 9.24 ##(7 spaces between the columns)
> > 1911.04.10?????? 7.60 ##(7 spaces between the columns)
> > 1911.05.11????? 14.82 ##(6 spaces between the columns)
> > 1911.05.12????? 14.10 ##(6 spaces between the columns)
> > 1911.06.13?????? 7.87 ##(7 spaces between the columns)
> > 1911.06.14?????? 9.26 ##(7 spaces between the columns)
> > 1911.07.15?????? 8.06 ##(7 spaces between the columns)
> > 1911.07.16?????? 8.13 ##(7 spaces between the columns)
> > 1911.08.17????? 12.90 ##(6 spaces between the columns)
> > 1911.08.18?????? 5.45 ##(7 spaces between the columns)
> > 1911.09.19?????? 3.26 ##(7 spaces between the columns)
> > 1911.09.20?????? 5.70 ##(7 spaces between the columns)
> > 1911.10.21?????? 9.24 ##(7 spaces between the columns)
> > 1911.10.22?????? 7.60 ##(7 spaces between the columns)
> > 1911.11.23????? 14.82 ##(6 spaces between the columns)
> > 1911.12.24????? 14.10 ##(6 spaces between the columns)
> > and i want it to be in the following manner and afterwards i want to
save that file in ".txt" format.
> > ?1911. 1. 1???? 7.87 ##(5 spaces between the columns)
> > ?1911. 1. 2???? 9.26 ##(5 spaces between the columns)
> > ?1911. 1. 3???? 8.06 ##(5 spaces between the columns)
> > ?1911. 1. 4???? 8.13 ##(5 spaces between the columns)
> > ?1911. 1. 5??? 12.90 ##(4 spaces between the columns)
> > ?1911. 2. 6???? 5.45 ##(5 spaces between the columns)
> > ?1911. 2. 7???? 3.26 ##(5 spaces between the columns)
> > ?1911. 3. 8???? 5.70 ##(5 spaces between the columns)
> > ?1911. 3. 9???? 9.24 ##(5 spaces between the columns)
> > ?1911. 4.10???? 7.60 ##(5 spaces between the columns)
> > ?1911. 5.11??? 14.82 ##(4 spaces between the columns)
> > ?1911. 5.12??? 14.10 ##(4 spaces between the columns)
> > ?1911. 6.13???? 7.87 ##(5 spaces between the columns)
> > ?1911. 6.14???? 9.26 ##(5 spaces between the columns)
> > ?1911. 7.15???? 8.06 ##(5 spaces between the columns)
> > ?1911. 7.16???? 8.13 ##(5 spaces between the columns)
> > ?1911. 8.17??? 12.90 ##(4 spaces between the columns)
> > ?1911. 8.18???? 5.45 ##(5 spaces between the columns)
> > ?1911. 9.19???? 3.26 ##(5 spaces between the columns)
> > ?1911. 9.20???? 5.70 ##(5 spaces between the columns)
> > ?1911.10.21???? 9.24 ##(5 spaces between the columns)
> > ?1911.10.22???? 7.60 ##(5 spaces between the columns)
> > ?1911.11.23??? 14.82 ##(4 spaces between the columns)
> > ?1911.12.24??? 14.10 ##(4 spaces between the columns)
> > you could see that spaces between the columns needed to be reduced in
executed file and also the zeros in date columns with months and days are needed
to be replaced with space.
> > thankyou very very much in advance
> > elisa
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Eliza3.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130215/134eb387/attachment.txt>