Here is a start; you can change the column names:
> x
chr start end peak_loc cluster_TC strand peak_TC
1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10
2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8
3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15
4 chr1 565653 565697 chr1:565662..565663,+ 1031 + 28
5 chr1 565861 565922 chr1:565883..565884,+ 316 + 12
6 chr1 566537 566573 chr1:566564..566565,+ 119 +
11> y <- sub("^.*:([[:digit:]]+)..([[:digit:]]+).*", "\\1
\\2", x$peak_loc)
> y
[1] "564644 564645" "565371 565372" "565480
565481" "565662 565663"
"565883 565884" "566564 566565"> y <- strsplit(y, ' ')
> y
[[1]]
[1] "564644" "564645"
[[2]]
[1] "565371" "565372"
[[3]]
[1] "565480" "565481"
[[4]]
[1] "565662" "565663"
[[5]]
[1] "565883" "565884"
[[6]]
[1] "566564" "566565"
> x.new <- cbind(x, do.call(rbind, y))
> x.new
chr start end peak_loc cluster_TC strand peak_TC
1 2
1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10
564644 564645
2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8
565371 565372
3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15
565480 565481
4 chr1 565653 565697 chr1:565662..565663,+ 1031 + 28
565662 565663
5 chr1 565861 565922 chr1:565883..565884,+ 316 + 12
565883 565884
6 chr1 566537 566573 chr1:566564..566565,+ 119 + 11
566564 566565
On Mon, Jun 6, 2011 at 8:22 PM, ads pit <deconstructed.morning at
gmail.com> wrote:> Hi all,
> I am given the a data frame in which one of the columns has more
information
> together- see column 4, peak_loc:
> ?chr ?start ? ?end ? ? ? ? ? ? ?peak_loc cluster_TC strand peak_TC
> 1 chr1 564620 564649 chr1:564644..564645,+ ? ? ? ? 94 ? ? ?+ ? ? ?10
> 2 chr1 565369 565404 chr1:565371..565372,+ ? ? ? ?217 ? ? ?+ ? ? ? 8
> 3 chr1 565463 565541 chr1:565480..565481,+ ? ? ? 1214 ? ? ?+ ? ? ?15
> 4 chr1 565653 565697 chr1:565662..565663,+ ? ? ? 1031 ? ? ?+ ? ? ?28
> 5 chr1 565861 565922 chr1:565883..565884,+ ? ? ? ?316 ? ? ?+ ? ? ?12
> 6 chr1 566537 566573 chr1:566564..566565,+ ? ? ? ?119 ? ? ?+ ? ? ?11
>
>
> ?I am trying to find out if there's a way to extract the coordinates
given
> in the 4th column and replace this column with two others that would have
> the start coord and the end coord. so instead of chr1:564644..564645,+
> I would obtain;
> start_peak ?end_peak
> 564644 ? ? ? 564645
>
> Best,
> nanami
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?