Hello All, I have a data file in a text format and there are two data sets. The data set are continuous. For each data set there is a header which has the number of data rows and the name of data series. For example first data set has "6240 Terry Cove-Model". Then the data for that series follows upto 6240 rows. Then another data would start and it will have the header such as "5200 Terry-Observed" The sample data would look like: 6240 Terry Cove-Model 300 .300110459327698 300.041656494141 .289277672767639 300.083343505859 .276237487792969 300.125 .258902788162231 300.166656494141 .236579895019531 300.208343505859 .221315026283264 300.25 .214318037033081 300.291656494141 .190926909446716 300.333343505859 .158144593238831 300.375 .113302707672119 300.416656494141 .103684902191162 300.458343505859 9.72903966903687E-02 300.5 8.76948833465576E-02 300.541656494141 8.42459201812744E-02 300.583343505859 .078397274017334 300.625 8.44632387161255E-02 300.666656494141 9.32939052581787E-02 300.708343505859 .113663911819458 300.75 .123064398765564 300.791656494141 .157548069953918 300.833343505859 .148393034934998 300.875 .135645747184753 300.916656494141 .137590646743774 300.958343505859 .133154153823853 301 .131152510643005 301.041656494141 .114152908325195 301.083343505859 8.04083347320557E-02 301.125 5.53587675094604E-02 301.166656494141 3.17397117614746E-02 301.208343505859 4.07266616821289E-03 301.25 -2.15455293655396E-02 301.291656494141 -4.07489538192749E-02 301.333343505859 -5.85414171218872E-02 301.375 -7.53517150878906E-02 301.416656494141 -8.49723815917969E-02 301.458343505859 -7.91778564453125E-02 301.5 -7.02846050262451E-02 301.541656494141 -7.24701881408691E-02 301.583343505859 -7.76907205581665E-02 301.625 -6.82642459869385E-02 62401 Terry Cove-Data 300 .216407993 300.0042 .204216005 300.0083 .210311999 300.0125 .195071996 300.0167 .192023999 300.0208 .179831992 300.025 .188976001 300.0292 .185928004 300.0333 .195071996 300.0375 .219456009 300.0417 .210311999 300.0458 .204216005 300.05 .195071996 300.0542 .188976001 300.0583 .195071996 300.0625 .195071996 300.0667 .185928004 300.0708 .173735998 300.075 .170688001 300.0792 .167640004 300.0833 .167640004 300.0875 .167640004 300.0917 .167640004 300.0958 .161543991 300.1 .1524 300.1042 .158495994 300.1083 .149352003 300.1125 .158495994 300.1167 .1524 300.1208 .1524 300.125 .149352003 300.1292 .143256 300.1333 .146303997 300.1375 .149352003 300.1417 .146303997 300.1458 .137159996 300.15 .131064002 300.1542 .124967999 300.1583 .128015996 300.1625 .124967999 300.1667 .131064002 300.1708 .124967999 300.175 .124967999 300.1792 .134111999 300.1833 .118871996 300.1875 .128015996 300.1917 .131064002 300.1958 .128015996 300.2 .131064002 300.2042 .128015996 300.2083 .121920002 300.2125 .115823999 300.2167 .112776001 300.2208 .103632001 300.225 .097535998 300.2292 .103632001 300.2333 .094488001 300.2375 .082296003 300.2417 .0762 300.2458 .079247997 300.25 .067056 300.2542 .064007998 300.2583 .045720002 300.2625 .033528 300.2667 .036575999 300.2708 .036575999 300.275 .036575999 300.2792 .027432001 300.2833 .027432001 300.2875 .021336 300.2917 .012192 300.2958 .009144 300.3 .009144 300.3042 .003048 300.3083 0 300.3125 -.003048 300.3167 -.006096 300.3208 0 300.325 .006096 300.3292 -.003048 300.3333 .006096 The full data set can be downloaded from https://www.dropbox.com/s/chhw3vz6ru1godk/Practicedata.Dat I want to make a comparison graph between modeled and observed. Once I am able to read two data sets as two sets of data or combined in one I would be able to create the time series graph. Another thing I need to do is create another sub data set where both the series have common data. One data might have more intervals than another. After I find two data sets of same interval then I want to plot a correlation graph. I hope I made it clear what I want to do. Thank you so much. Best Regards, Janesh [[alternative HTML version deleted]]
Janesh Devkota
2013-Jan-21 08:19 UTC
[R] How to read a file with two data sets in text format
I was able to read the data using the following code:
jd1 <-
read.table('Practicedata.dat',header=T,sep="\t",nrow=6240)
jd2 <-
read.table('Practicedata.dat',header=T,sep="\t",skip=6241)
colnames(jd1) <- c("Date","Mod")
colnames(jd2) <- c("Date", "Obs")
p <- ggplot(jd1,aes(x=Date,y=Mod))+geom_line()
p <- p + geom_line(data=jd2,aes(x=Date,y=Obs),color="red")
p
Now, I want to make a scatter plot between jd1$Mod and jd2$Obs. But I
cannot create one since both of them have different number of rows. Since I
have less number of rows for Mod I am planning to use the date of Mod and
then find the corresponding values of Obs for those time periods. How can I
find the corresponding values of Obs for the give date in jd1 ?
Or is there any way to create a scatter plot and put the regression
equation and correlation coefficient.
Thank you so much.
Best Regards,
Janesh
On Mon, Jan 21, 2013 at 1:19 AM, Jd Devkota
<janesh.devkota@gmail.com>wrote:
> Hello All,
>
> I have a data file in a text format and there are two data sets. The data
> set are continuous.
> For each data set there is a header which has the number of data rows and
> the name of data series.
> For example first data set has "6240 Terry Cove-Model". Then the
data for
> that series follows upto 6240 rows. Then another data would start and it
> will have the header such as "5200 Terry-Observed"
>
> The sample data would look like:
>
> 6240 Terry Cove-Model
> 300 .300110459327698
> 300.041656494141 .289277672767639
> 300.083343505859 .276237487792969
> 300.125 .258902788162231
> 300.166656494141 .236579895019531
> 300.208343505859 .221315026283264
> 300.25 .214318037033081
> 300.291656494141 .190926909446716
> 300.333343505859 .158144593238831
> 300.375 .113302707672119
> 300.416656494141 .103684902191162
> 300.458343505859 9.72903966903687E-02
> 300.5 8.76948833465576E-02
> 300.541656494141 8.42459201812744E-02
> 300.583343505859 .078397274017334
> 300.625 8.44632387161255E-02
> 300.666656494141 9.32939052581787E-02
> 300.708343505859 .113663911819458
> 300.75 .123064398765564
> 300.791656494141 .157548069953918
> 300.833343505859 .148393034934998
> 300.875 .135645747184753
> 300.916656494141 .137590646743774
> 300.958343505859 .133154153823853
> 301 .131152510643005
> 301.041656494141 .114152908325195
> 301.083343505859 8.04083347320557E-02
> 301.125 5.53587675094604E-02
> 301.166656494141 3.17397117614746E-02
> 301.208343505859 4.07266616821289E-03
> 301.25 -2.15455293655396E-02
> 301.291656494141 -4.07489538192749E-02
> 301.333343505859 -5.85414171218872E-02
> 301.375 -7.53517150878906E-02
> 301.416656494141 -8.49723815917969E-02
> 301.458343505859 -7.91778564453125E-02
> 301.5 -7.02846050262451E-02
> 301.541656494141 -7.24701881408691E-02
> 301.583343505859 -7.76907205581665E-02
> 301.625 -6.82642459869385E-02
> 62401 Terry Cove-Data
> 300 .216407993
> 300.0042 .204216005
> 300.0083 .210311999
> 300.0125 .195071996
> 300.0167 .192023999
> 300.0208 .179831992
> 300.025 .188976001
> 300.0292 .185928004
> 300.0333 .195071996
> 300.0375 .219456009
> 300.0417 .210311999
> 300.0458 .204216005
> 300.05 .195071996
> 300.0542 .188976001
> 300.0583 .195071996
> 300.0625 .195071996
> 300.0667 .185928004
> 300.0708 .173735998
> 300.075 .170688001
> 300.0792 .167640004
> 300.0833 .167640004
> 300.0875 .167640004
> 300.0917 .167640004
> 300.0958 .161543991
> 300.1 .1524
> 300.1042 .158495994
> 300.1083 .149352003
> 300.1125 .158495994
> 300.1167 .1524
> 300.1208 .1524
> 300.125 .149352003
> 300.1292 .143256
> 300.1333 .146303997
> 300.1375 .149352003
> 300.1417 .146303997
> 300.1458 .137159996
> 300.15 .131064002
> 300.1542 .124967999
> 300.1583 .128015996
> 300.1625 .124967999
> 300.1667 .131064002
> 300.1708 .124967999
> 300.175 .124967999
> 300.1792 .134111999
> 300.1833 .118871996
> 300.1875 .128015996
> 300.1917 .131064002
> 300.1958 .128015996
> 300.2 .131064002
> 300.2042 .128015996
> 300.2083 .121920002
> 300.2125 .115823999
> 300.2167 .112776001
> 300.2208 .103632001
> 300.225 .097535998
> 300.2292 .103632001
> 300.2333 .094488001
> 300.2375 .082296003
> 300.2417 .0762
> 300.2458 .079247997
> 300.25 .067056
> 300.2542 .064007998
> 300.2583 .045720002
> 300.2625 .033528
> 300.2667 .036575999
> 300.2708 .036575999
> 300.275 .036575999
> 300.2792 .027432001
> 300.2833 .027432001
> 300.2875 .021336
> 300.2917 .012192
> 300.2958 .009144
> 300.3 .009144
> 300.3042 .003048
> 300.3083 0
> 300.3125 -.003048
> 300.3167 -.006096
> 300.3208 0
> 300.325 .006096
> 300.3292 -.003048
> 300.3333 .006096
>
> The full data set can be downloaded from
> https://www.dropbox.com/s/chhw3vz6ru1godk/Practicedata.Dat
>
> I want to make a comparison graph between modeled and observed. Once I am
> able to read two data sets as two sets of data or combined in one I would
> be able to create the time series graph.
>
> Another thing I need to do is create another sub data set where both the
> series have common data. One data might have more intervals than another.
> After I find two data sets of same interval then I want to plot a
> correlation graph.
>
> I hope I made it clear what I want to do.
>
> Thank you so much.
>
> Best Regards,
> Janesh
>
[[alternative HTML version deleted]]
jim holtman
2013-Jan-21 13:31 UTC
[R] How to read a file with two data sets in text format
Here is one way to read the data. Modified your sample for the line
counts of actual data:
x <- readLines(textConnection("40 Terry Cove-Model
300 .300110459327698
300.041656494141 .289277672767639
300.083343505859 .276237487792969
300.125 .258902788162231
300.166656494141 .236579895019531
300.208343505859 .221315026283264
300.25 .214318037033081
300.291656494141 .190926909446716
300.333343505859 .158144593238831
300.375 .113302707672119
300.416656494141 .103684902191162
300.458343505859 9.72903966903687E-02
300.5 8.76948833465576E-02
300.541656494141 8.42459201812744E-02
300.583343505859 .078397274017334
300.625 8.44632387161255E-02
300.666656494141 9.32939052581787E-02
300.708343505859 .113663911819458
300.75 .123064398765564
300.791656494141 .157548069953918
300.833343505859 .148393034934998
300.875 .135645747184753
300.916656494141 .137590646743774
300.958343505859 .133154153823853
301 .131152510643005
301.041656494141 .114152908325195
301.083343505859 8.04083347320557E-02
301.125 5.53587675094604E-02
301.166656494141 3.17397117614746E-02
301.208343505859 4.07266616821289E-03
301.25 -2.15455293655396E-02
301.291656494141 -4.07489538192749E-02
301.333343505859 -5.85414171218872E-02
301.375 -7.53517150878906E-02
301.416656494141 -8.49723815917969E-02
301.458343505859 -7.91778564453125E-02
301.5 -7.02846050262451E-02
301.541656494141 -7.24701881408691E-02
301.583343505859 -7.76907205581665E-02
301.625 -6.82642459869385E-02
81 Terry Cove-Data
300 .216407993
300.0042 .204216005
300.0083 .210311999
300.0125 .195071996
300.0167 .192023999
300.0208 .179831992
300.025 .188976001
300.0292 .185928004
300.0333 .195071996
300.0375 .219456009
300.0417 .210311999
300.0458 .204216005
300.05 .195071996
300.0542 .188976001
300.0583 .195071996
300.0625 .195071996
300.0667 .185928004
300.0708 .173735998
300.075 .170688001
300.0792 .167640004
300.0833 .167640004
300.0875 .167640004
300.0917 .167640004
300.0958 .161543991
300.1 .1524
300.1042 .158495994
300.1083 .149352003
300.1125 .158495994
300.1167 .1524
300.1208 .1524
300.125 .149352003
300.1292 .143256
300.1333 .146303997
300.1375 .149352003
300.1417 .146303997
300.1458 .137159996
300.15 .131064002
300.1542 .124967999
300.1583 .128015996
300.1625 .124967999
300.1667 .131064002
300.1708 .124967999
300.175 .124967999
300.1792 .134111999
300.1833 .118871996
300.1875 .128015996
300.1917 .131064002
300.1958 .128015996
300.2 .131064002
300.2042 .128015996
300.2083 .121920002
300.2125 .115823999
300.2167 .112776001
300.2208 .103632001
300.225 .097535998
300.2292 .103632001
300.2333 .094488001
300.2375 .082296003
300.2417 .0762
300.2458 .079247997
300.25 .067056
300.2542 .064007998
300.2583 .045720002
300.2625 .033528
300.2667 .036575999
300.2708 .036575999
300.275 .036575999
300.2792 .027432001
300.2833 .027432001
300.2875 .021336
300.2917 .012192
300.2958 .009144
300.3 .009144
300.3042 .003048
300.3083 0
300.3125 -.003048
300.3167 -.006096
300.3208 0
300.325 .006096
300.3292 -.003048
300.3333 .006096"))
indx <- grep("^[0-9]+ [[:alpha:]]", x) # determine where breaks
are
# read data into a list
result <- lapply(indx, function(.start){
# extract the line count
n <- as.integer(sub("^\\s*([0-9]+).*", "\\1",
x[.start]))
read.table(text = x[seq(.start + 1L, length = n)])
})
str(result)
> str(result)
List of 2
$ :'data.frame': 40 obs. of 2 variables:
..$ V1: num [1:40] 300 300 300 300 300 ...
..$ V2: num [1:40] 0.3 0.289 0.276 0.259 0.237 ...
$ :'data.frame': 81 obs. of 2 variables:
..$ V1: num [1:81] 300 300 300 300 300 ...
..$ V2: num [1:81] 0.216 0.204 0.21 0.195 0.192 ...> source('clipboard')
List of 2
$ :'data.frame': 40 obs. of 2 variables:
..$ V1: num [1:40] 300 300 300 300 300 ...
..$ V2: num [1:40] 0.3 0.289 0.276 0.259 0.237 ...
$ :'data.frame': 81 obs. of 2 variables:
..$ V1: num [1:81] 300 300 300 300 300 ...
..$ V2: num [1:81] 0.216 0.204 0.21 0.195 0.192 ...
On Mon, Jan 21, 2013 at 2:19 AM, Jd Devkota <janesh.devkota at gmail.com>
wrote:> Hello All,
>
> I have a data file in a text format and there are two data sets. The data
> set are continuous.
> For each data set there is a header which has the number of data rows and
> the name of data series.
> For example first data set has "6240 Terry Cove-Model". Then the
data for
> that series follows upto 6240 rows. Then another data would start and it
> will have the header such as "5200 Terry-Observed"
>
> The sample data would look like:
>
> 6240 Terry Cove-Model
> 300 .300110459327698
> 300.041656494141 .289277672767639
> 300.083343505859 .276237487792969
> 300.125 .258902788162231
> 300.166656494141 .236579895019531
> 300.208343505859 .221315026283264
> 300.25 .214318037033081
> 300.291656494141 .190926909446716
> 300.333343505859 .158144593238831
> 300.375 .113302707672119
> 300.416656494141 .103684902191162
> 300.458343505859 9.72903966903687E-02
> 300.5 8.76948833465576E-02
> 300.541656494141 8.42459201812744E-02
> 300.583343505859 .078397274017334
> 300.625 8.44632387161255E-02
> 300.666656494141 9.32939052581787E-02
> 300.708343505859 .113663911819458
> 300.75 .123064398765564
> 300.791656494141 .157548069953918
> 300.833343505859 .148393034934998
> 300.875 .135645747184753
> 300.916656494141 .137590646743774
> 300.958343505859 .133154153823853
> 301 .131152510643005
> 301.041656494141 .114152908325195
> 301.083343505859 8.04083347320557E-02
> 301.125 5.53587675094604E-02
> 301.166656494141 3.17397117614746E-02
> 301.208343505859 4.07266616821289E-03
> 301.25 -2.15455293655396E-02
> 301.291656494141 -4.07489538192749E-02
> 301.333343505859 -5.85414171218872E-02
> 301.375 -7.53517150878906E-02
> 301.416656494141 -8.49723815917969E-02
> 301.458343505859 -7.91778564453125E-02
> 301.5 -7.02846050262451E-02
> 301.541656494141 -7.24701881408691E-02
> 301.583343505859 -7.76907205581665E-02
> 301.625 -6.82642459869385E-02
> 62401 Terry Cove-Data
> 300 .216407993
> 300.0042 .204216005
> 300.0083 .210311999
> 300.0125 .195071996
> 300.0167 .192023999
> 300.0208 .179831992
> 300.025 .188976001
> 300.0292 .185928004
> 300.0333 .195071996
> 300.0375 .219456009
> 300.0417 .210311999
> 300.0458 .204216005
> 300.05 .195071996
> 300.0542 .188976001
> 300.0583 .195071996
> 300.0625 .195071996
> 300.0667 .185928004
> 300.0708 .173735998
> 300.075 .170688001
> 300.0792 .167640004
> 300.0833 .167640004
> 300.0875 .167640004
> 300.0917 .167640004
> 300.0958 .161543991
> 300.1 .1524
> 300.1042 .158495994
> 300.1083 .149352003
> 300.1125 .158495994
> 300.1167 .1524
> 300.1208 .1524
> 300.125 .149352003
> 300.1292 .143256
> 300.1333 .146303997
> 300.1375 .149352003
> 300.1417 .146303997
> 300.1458 .137159996
> 300.15 .131064002
> 300.1542 .124967999
> 300.1583 .128015996
> 300.1625 .124967999
> 300.1667 .131064002
> 300.1708 .124967999
> 300.175 .124967999
> 300.1792 .134111999
> 300.1833 .118871996
> 300.1875 .128015996
> 300.1917 .131064002
> 300.1958 .128015996
> 300.2 .131064002
> 300.2042 .128015996
> 300.2083 .121920002
> 300.2125 .115823999
> 300.2167 .112776001
> 300.2208 .103632001
> 300.225 .097535998
> 300.2292 .103632001
> 300.2333 .094488001
> 300.2375 .082296003
> 300.2417 .0762
> 300.2458 .079247997
> 300.25 .067056
> 300.2542 .064007998
> 300.2583 .045720002
> 300.2625 .033528
> 300.2667 .036575999
> 300.2708 .036575999
> 300.275 .036575999
> 300.2792 .027432001
> 300.2833 .027432001
> 300.2875 .021336
> 300.2917 .012192
> 300.2958 .009144
> 300.3 .009144
> 300.3042 .003048
> 300.3083 0
> 300.3125 -.003048
> 300.3167 -.006096
> 300.3208 0
> 300.325 .006096
> 300.3292 -.003048
> 300.3333 .006096
>
> The full data set can be downloaded from
> https://www.dropbox.com/s/chhw3vz6ru1godk/Practicedata.Dat
>
> I want to make a comparison graph between modeled and observed. Once I am
> able to read two data sets as two sets of data or combined in one I would
> be able to create the time series graph.
>
> Another thing I need to do is create another sub data set where both the
> series have common data. One data might have more intervals than another.
> After I find two data sets of same interval then I want to plot a
> correlation graph.
>
> I hope I made it clear what I want to do.
>
> Thank you so much.
>
> Best Regards,
> Janesh
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.