thr3ads.net - R help - [R] Help with isolating and comparing data from two files. [May 2011]

If this information is useful, please help other people find it:
Share via:

ajn21

2011-May-23 04:00 UTC

[R] Help with isolating and comparing data from two files.

Hello,

I was hoping that someone would be able to help me or at least point me in
the right direction regarding a problem I am having. I am a new R user, and
I've been trying to read tutorials but they haven't been much help to me
so
far.

The problem is relatively simple as I've already created working solutions
in Java and Perl, but I need a solution in R as well. 

I have two text files, say pos.txt and reg.txt. In pos.txt, the data is
listed for example:

c22 1445  - CG 1 4
c22 1542 + CG 2 3
c22 1678 + CG 13 15
...

etc. for thousands of lines. The most important column is column 2, which
lists "position" (e.g. 1445, 1542, 1678). In reg.txt, data is listed
as:

c22 1440 1500 cpg: 44 56 ......
c22 1520 1700 cpg: 56 87 ......
c22 1800 1900 cpg: 58 90 ......
...

where the values in column 2 is the "start" position and values in
column 3
are the "end" position. There are 10 columns total but I just listed
the
first few. Also, the text files are different lengths.


Essentially, my problem is trying to take the position listed in column 2 of
pos.txt and try to find the region (based on start and end positions) listed
in reg.txt. Then I need to print:

c22 "start" "end" "position" + 1 5 

where the last 3 columns are from pos.txt as well (i.e. all of the lines
don't end in  + 1 5, but rather the values for the columns in pos.txt).
Also, the position needs to be within the start and end position.

So far I've been able to use read.table to create a data frame for each text
file, and I've also named each column (e.g. reg.data$end) and I can output
each column individually. However, the problem I keep facing is how to
compare the numbers for "position" in pos.txt to the numbers for
"start" and
"end" in reg.txt. I tried to use: 

if ((pos >= start) | (pos <= end))..

but an error comes up that says the files aren't the same length.

In Java and Perl I used nested loops to cycle through each element in one
file, and compare it to every element in the other file, and then printed to
a new text file. As such, I was trying to learn a bit more about arrays in
R, but if you know of a better way in R to do this then please let me know.

Any help is greatly appreciated.

Thank you,
AJ

--
View this message in context:
http://r.789695.n4.nabble.com/Help-with-isolating-and-comparing-data-from-two-files-tp3543170p3543170.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2011-May-23 12:23 UTC

head link

[R] Help with isolating and comparing data from two files.

Is this what you are after?
> pos   V1   V2 V3 V4 V5 V6
1 c22 1445  - CG  1  4
2 c22 1542  + CG  2  3
3 c22 1678  + CG 13 15> reg   V1   V2   V3   V4 V5 V6     V7
1 c22 1440 1500 cpg: 44 56 ......
2 c22 1520 1700 cpg: 56 87 ......
3 c22 1800 1900 cpg: 58 90 ......> # iterate through the 'reg' printing put match 'pos'
entries
> result <- lapply(seq(nrow(reg)), function(i){+     # get indices of match
+     indx <- (pos$V2 >= reg$V2[i]) & (pos$V2 <= reg$V3[i])
+     if (!any(indx)) return(NULL)  # no match
+     # create new dataframe
+     cbind(reg[rep(i, sum(indx)), 1:3], pos[indx, ])
+ })> do.call(rbind, result)     V1   V2   V3  V1   V2 V3 V4 V5 V6
1   c22 1440 1500 c22 1445  - CG  1  4
2   c22 1520 1700 c22 1542  + CG  2  3
2.1 c22 1520 1700 c22 1678  + CG 13 15>

On Mon, May 23, 2011 at 12:00 AM, ajn21 <ajn21 at case.edu>
wrote:> Hello,
>
> I was hoping that someone would be able to help me or at least point me in
> the right direction regarding a problem I am having. I am a new R user, and
> I've been trying to read tutorials but they haven't been much help
to me so
> far.
>
> The problem is relatively simple as I've already created working
solutions
> in Java and Perl, but I need a solution in R as well.
>
> I have two text files, say pos.txt and reg.txt. In pos.txt, the data is
> listed for example:
>
> c22 1445 ?- CG 1 4
> c22 1542 + CG 2 3
> c22 1678 + CG 13 15
> ...
>
> etc. for thousands of lines. The most important column is column 2, which
> lists "position" (e.g. 1445, 1542, 1678). In reg.txt, data is
listed as:
>
> c22 1440 1500 cpg: 44 56 ......
> c22 1520 1700 cpg: 56 87 ......
> c22 1800 1900 cpg: 58 90 ......
> ...
>
> where the values in column 2 is the "start" position and values
in column 3
> are the "end" position. There are 10 columns total but I just
listed the
> first few. Also, the text files are different lengths.
>
>
> Essentially, my problem is trying to take the position listed in column 2
of
> pos.txt and try to find the region (based on start and end positions)
listed
> in reg.txt. Then I need to print:
>
> c22 "start" "end" "position" + 1 5
>
> where the last 3 columns are from pos.txt as well (i.e. all of the lines
> don't end in ?+ 1 5, but rather the values for the columns in pos.txt).
> Also, the position needs to be within the start and end position.
>
> So far I've been able to use read.table to create a data frame for each
text
> file, and I've also named each column (e.g. reg.data$end) and I can
output
> each column individually. However, the problem I keep facing is how to
> compare the numbers for "position" in pos.txt to the numbers for
"start" and
> "end" in reg.txt. I tried to use:
>
> if ((pos >= start) | (pos <= end))..
>
> but an error comes up that says the files aren't the same length.
>
> In Java and Perl I used nested loops to cycle through each element in one
> file, and compare it to every element in the other file, and then printed
to
> a new text file. As such, I was trying to learn a bit more about arrays in
> R, but if you know of a better way in R to do this then please let me know.
>
> Any help is greatly appreciated.
>
> Thank you,
> AJ
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Help-with-isolating-and-comparing-data-from-two-files-tp3543170p3543170.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

Maybe Matching Threads

Search for more reasonably related threads

R help - May 2011 - Help with isolating and comparing data from two files.

[R] Help with isolating and comparing data from two files.

[R] Help with isolating and comparing data from two files.

Maybe Matching Threads