thr3ads.net - R help - [R] How to Store the executed values in a dataframe & rle function [Sep 2011]

If this information is useful, please help other people find it:
Share via:

sujitha

2011-Sep-26 14:30 UTC

[R] How to Store the executed values in a dataframe & rle function

Hi group, 

This is how my test file looks like: 
Chr start end sample1 sample2 
chr2 9896633 9896683 0 0 
chr2 9896639 9896690 0 0 
chr2 14314039 14314098 0 -0.35 
chr2 14404467 14404502 0 -0.35 
chr2 14421718 14421777 -0.43 -0.35 
chr2 16031710 16031769 -0.43 -0.35 
chr2 16036178 16036237 -0.43 -0.35 
chr2 16048665 16048724 -0.43 -0.35 
chr2 37491676 37491735 0 0 
chr2 37702947 37703009 0 0 

This is the output that I am expecting: 
Sample Chr Start End Values Probes 
sample1 chr2 9896633 14404502 0 4 
sample1 chr2 14421718 16048724 -0.43 4 
sample1 chr2 37491676 37703001 0 2 
sample2 chr2 9896633 9896690 0 2 
sample2  chr2 14314039 16048724 -0.35 6 
sample2 chr2 37491676 37703009 0 2 

Here the Chr value is same but can be any other value aswell so unique among
the similar values. The Start for the first line would be the least value
until values are similiar (4) then the end would be highest value. The
values is the unique value among the common values and probes is number of
similar values. 

Code: >m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
#reading the test file >s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
# to get the last 2 columns > names(s)=c("Values","Probes") 
>G=1 
> for(i in 1:length(s$Probes)){ + if(G==1){first<-unique(m$Chr[G:s$Probes[i]]) 
+ second<-min(m$Start[G:s$Probes[i]]) 
+ third<-max(m$End[G:s$Probes[i]]) 
+ c<-cbind(first,second,third,s$Values[i],s$Probes[i]) 
+ print (c) 
+ G=(G+s$Probes[i])} 
+ else if((G-1) < length(m$Sample1)) { 
+ first<-unique(m$Chr[G:(G+s$Probes[i]-1)]) 
+ second<-min(m$Start[G:(G+s$Probes[i]-1)]) 
+ third<-max(m$End[G:(G+s$Probes[i]-1)]) 
+ c<-cbind(first,second,third,s$Values[i],s$Probes[i]) 
+ print (c) 
+ G=(G+s$Probes[i])} 
+ else { 
+ G=1 
+ first<-unique(m$Chr[G:s$Probes[i]]) 
+ second<-min(m$Start[G:s$Probes[i]]) 
+ third<-max(m$End[G:s$Probes[i]]) 
+ c<-cbind(first,second,third,s$Values[i],s$Probes[i]) 
+ print (c) 
+ G=(G+s$Probes[i])} 
+ } 
so the output is: 
     first  second    third             
[1,] "chr2" "9896633" "14404502" "0"
"4"
     first  second     third                 
[1,] "chr2" "14421718" "16048724"
"-0.43" "4"
     first  second     third             
[1,] "chr2" "37491676" "37703009" "0"
"2"
     first  second    third             
[1,] "chr2" "9896633" "9896690" "0"
"2"
     first  second     third                 
[1,] "chr2" "14314039" "16048724"
"-0.35" "6"
     first  second     third             
[1,] "chr2" "37491676" "37703009" "0"
"2"

I get almost the required output but just need 3 modifications to this code: 
1) Since this is just a small part of the file (with 2 samples), but my
actual file has 150 samples, so how do I write rle function for that? 
2) How do I store all the executed c values as a dataframe (here I am just
printing the values)? 
3) How do I include sample name in execution? 
Waiting for your reply , 
Thanks, 
Suji 


--
View this message in context:
http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2011-Sep-28 15:37 UTC

head link

[R] How to Store the executed values in a dataframe & rle function

Here one approach:
> x <- read.table(textConnection("Chr start end sample1 sample2+ chr2 9896633 9896683 0 0
+ chr2 9896639 9896690 0 0
+ chr2 14314039 14314098 0 -0.35
+ chr2 14404467 14404502 0 -0.35
+ chr2 14421718 14421777 -0.43 -0.35
+ chr2 16031710 16031769 -0.43 -0.35
+ chr2 16036178 16036237 -0.43 -0.35
+ chr2 16048665 16048724 -0.43 -0.35
+ chr2 37491676 37491735 0 0
+ chr2 37702947 37703009 0 0"), header = TRUE, as.is =
TRUE)> closeAllConnections()
>
> result <- lapply(c('sample1', 'sample2'),
function(.samp){+     # split by breaks in the values
+     .grps <- split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
+
+     # combine the list of dataframes
+     .range <- do.call(rbind, lapply(.grps, function(.set){
+         # create a dataframe of the results
+         data.frame(Sample = .samp
+                    , Chr = .set$Chr[1L]
+                    , Start = min(.set$start)
+                    , End = max(.set$end)
+                    , Values = .set[[.samp]][1L]
+                    , Probes = nrow(.set)
+                    )
+         }))
+     })> # put the list of dataframes together
> result <- do.call(rbind, result)
> result    Sample  Chr    Start      End Values Probes
0  sample1 chr2  9896633 14404502   0.00      4
1  sample1 chr2 14421718 16048724  -0.43      4
2  sample1 chr2 37491676 37703009   0.00      2
01 sample2 chr2  9896633  9896690   0.00      2
11 sample2 chr2 14314039 16048724  -0.35      6
21 sample2 chr2 37491676 37703009   0.00      2>

On Mon, Sep 26, 2011 at 10:30 AM, sujitha <viritha.k at gmail.com>
wrote:> Hi group,
>
> This is how my test file looks like:
> Chr start end sample1 sample2
> chr2 9896633 9896683 0 0
> chr2 9896639 9896690 0 0
> chr2 14314039 14314098 0 -0.35
> chr2 14404467 14404502 0 -0.35
> chr2 14421718 14421777 -0.43 -0.35
> chr2 16031710 16031769 -0.43 -0.35
> chr2 16036178 16036237 -0.43 -0.35
> chr2 16048665 16048724 -0.43 -0.35
> chr2 37491676 37491735 0 0
> chr2 37702947 37703009 0 0
>
> This is the output that I am expecting:
> Sample Chr Start End Values Probes
> sample1 chr2 9896633 14404502 0 4
> sample1 chr2 14421718 16048724 -0.43 4
> sample1 chr2 37491676 37703001 0 2
> sample2 chr2 9896633 9896690 0 2
> sample2 ?chr2 14314039 16048724 -0.35 6
> sample2 chr2 37491676 37703009 0 2
>
> Here the Chr value is same but can be any other value aswell so unique
among
> the similar values. The Start for the first line would be the least value
> until values are similiar (4) then the end would be highest value. The
> values is the unique value among the common values and probes is number of
> similar values.
>
> Code:
>>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
> #reading the test file
>>s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
> # to get the last 2 columns
>> names(s)=c("Values","Probes")
>>G=1
>> for(i in 1:length(s$Probes)){
> + if(G==1){first<-unique(m$Chr[G:s$Probes[i]])
> + second<-min(m$Start[G:s$Probes[i]])
> + third<-max(m$End[G:s$Probes[i]])
> + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> + print (c)
> + G=(G+s$Probes[i])}
> + else if((G-1) < length(m$Sample1)) {
> + first<-unique(m$Chr[G:(G+s$Probes[i]-1)])
> + second<-min(m$Start[G:(G+s$Probes[i]-1)])
> + third<-max(m$End[G:(G+s$Probes[i]-1)])
> + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> + print (c)
> + G=(G+s$Probes[i])}
> + else {
> + G=1
> + first<-unique(m$Chr[G:s$Probes[i]])
> + second<-min(m$Start[G:s$Probes[i]])
> + third<-max(m$End[G:s$Probes[i]])
> + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> + print (c)
> + G=(G+s$Probes[i])}
> + }
> so the output is:
> ? ? first ?second ? ?third
> [1,] "chr2" "9896633" "14404502"
"0" "4"
> ? ? first ?second ? ? third
> [1,] "chr2" "14421718" "16048724"
"-0.43" "4"
> ? ? first ?second ? ? third
> [1,] "chr2" "37491676" "37703009"
"0" "2"
> ? ? first ?second ? ?third
> [1,] "chr2" "9896633" "9896690" "0"
"2"
> ? ? first ?second ? ? third
> [1,] "chr2" "14314039" "16048724"
"-0.35" "6"
> ? ? first ?second ? ? third
> [1,] "chr2" "37491676" "37703009"
"0" "2"
>
> I get almost the required output but just need 3 modifications to this
code:
> 1) Since this is just a small part of the file (with 2 samples), but my
> actual file has 150 samples, so how do I write rle function for that?
> 2) How do I store all the executed c values as a dataframe (here I am just
> printing the values)?
> 3) How do I include sample name in execution?
> Waiting for your reply ,
> Thanks,
> Suji
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

Reasonably Related Threads

Search for more reasonably related threads

R help - Sep 2011 - How to Store the executed values in a dataframe & rle function

[R] How to Store the executed values in a dataframe & rle function

[R] How to Store the executed values in a dataframe & rle function

Reasonably Related Threads