thr3ads.net - R help - [R] frequency, count rows, data for heat map [Aug 2010]

If this information is useful, please help other people find it:
Share via:

rtsweeney

2010-Aug-25 14:53 UTC

[R] frequency, count rows, data for heat map

Hi all, 
I have read posts of heat map creation but I am one step prior --
Here is what I am trying to do and wonder if you have any tips?
We are trying to map sequence reads from tumors to viral genomes.

Example input file :
111     abc
111     sdf
111     xyz
1079   abc
1079   xyz
1079   xyz
5576   abc
5576   sdf
5576   sdf

How may xyz's are there for 1079 and 111? How many abc's, etc?
How many times did reads from sample (1079) align to virus xyz. 
In some cases there are thousands per virus in a give sample, sometimes one.
The original file (two columns by tens of thousands of rows; 20 MB) is
text file (tab delimited).

Output file:
         abc  sdf  xyz
111     1      1     1
1079   1      0     2
5576   1      2     0

Or, other ways to generate this data so I can then use it for heat map
creation? 

Thanks for any help you may have, 

rtsweeney
palo alto, ca
-- 
View this message in context:
http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
Sent from the R help mailing list archive at Nabble.com.

Jan van der Laan

2010-Aug-25 15:08 UTC

head link

[R] frequency, count rows, data for heat map

Your problem is not completely clear to me, but perhaps something like

data <- data.frame(
   a = rep(c(1,2), each=10),
   b = rep(c('a', 'b', 'c', 'd'), 5))
library(plyr)
daply(data, a ~ b, nrow)

does what you need.

Regards,
Jan

On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney <tripsweeney at gmail.com>
wrote:>
> Hi all,
> I have read posts of heat map creation but I am one step prior --
> Here is what I am trying to do and wonder if you have any tips?
> We are trying to map sequence reads from tumors to viral genomes.
>
> Example input file :
> 111 ? ? abc
> 111 ? ? sdf
> 111 ? ? xyz
> 1079 ? abc
> 1079 ? xyz
> 1079 ? xyz
> 5576 ? abc
> 5576 ? sdf
> 5576 ? sdf
>
> How may xyz's are there for 1079 and 111? How many abc's, etc?
> How many times did reads from sample (1079) align to virus xyz.
> In some cases there are thousands per virus in a give sample, sometimes
one.
> The original file (two columns by tens of thousands of rows; 20 MB) is
> text file (tab delimited).
>
> Output file:
> ? ? ? ? abc ?sdf ?xyz
> 111 ? ? 1 ? ? ?1 ? ? 1
> 1079 ? 1 ? ? ?0 ? ? 2
> 5576 ? 1 ? ? ?2 ? ? 0
>
> Or, other ways to generate this data so I can then use it for heat map
> creation?
>
> Thanks for any help you may have,
>
> rtsweeney
> palo alto, ca
> --
> View this message in context:
http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Dennis Murphy

2010-Aug-25 17:19 UTC

head link

[R] frequency, count rows, data for heat map

Hi:

Here are a couple of ways to render a basic 2D table. Let's call your input
data frame dat:
> names(dat) <- c('samp', 'sequen')
> ssTab <- as.data.frame(with(dat, table(samp, sequen)))
> ssTab   # data frame version  samp sequen Freq
1  111    abc    1
2 1079    abc    1
3 5576    abc    1
4  111    sdf    1
5 1079    sdf    0
6 5576    sdf    2
7  111    xyz    1
8 1079    xyz    2
9 5576    xyz    0> with(dat, table(samp, sequen))   # table version      sequen
samp   abc sdf xyz
  111    1   1   1
  1079   1   0   2
  5576   1   2   0

HTH,
Dennis

On Wed, Aug 25, 2010 at 7:53 AM, rtsweeney <tripsweeney@gmail.com> wrote:
>
> Hi all,
> I have read posts of heat map creation but I am one step prior --
> Here is what I am trying to do and wonder if you have any tips?
> We are trying to map sequence reads from tumors to viral genomes.
>
> Example input file :
> 111     abc
> 111     sdf
> 111     xyz
> 1079   abc
> 1079   xyz
> 1079   xyz
> 5576   abc
> 5576   sdf
> 5576   sdf
>
> How may xyz's are there for 1079 and 111? How many abc's, etc?
> How many times did reads from sample (1079) align to virus xyz.
> In some cases there are thousands per virus in a give sample, sometimes
> one.
> The original file (two columns by tens of thousands of rows; 20 MB) is
> text file (tab delimited).
>
> Output file:
>         abc  sdf  xyz
> 111     1      1     1
> 1079   1      0     2
> 5576   1      2     0
>
> Or, other ways to generate this data so I can then use it for heat map
> creation?
>
> Thanks for any help you may have,
>
> rtsweeney
> palo alto, ca
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jan van der Laan

2010-Aug-26 07:02 UTC

head link

[R] frequency, count rows, data for heat map

Please, reply to the r-help and not only to me personally. That way
others can can also help, or perhaps benefit from the answers.

You can use strplit to remove the last part of the strings. strplit
returns a list of character vectors from which you (if I understand
you correctly) only want to select the first element. I use laply from
the plyr library for this, although there are probably also other ways
of doing this.

library(plyr)
dat$V3 <- laply(strsplit(as.character(dat$V1), '_'), function(l)
l[1])

After that you can use daply as I showed in my previous post
[daply(dat, V3 ~ V2, nrow)] or use the methods suggested by Dennis
Murphy to build your table.

Regards,

Jan



On Thu, Aug 26, 2010 at 1:41 AM, Trip Sweeney <tripsweeney at gmail.com>
wrote:> Jan,
> Thanks for responding to my post to listeserve about arranging data matrix
> for heat map.
> I am still a beginner, so the below is the code I used for the matrix and
> did not yet learn how to
> input 'data.frame' (which I need to know to use your code). The
below code
> works
> and mock.txt file is attached. There is one thing, though. The input in
> column 1 is tricky
> in the mock.txt file. I need it to sum per unique ID based on character
> prior to the "_"
> So, for example the current script call 1079_17891 and 1079_14794 uniques
> when I want
> them to be tallied together since they are both part of same 1079 samples.
> Occasionally
> a sample has three characters before the "_", like 111_463428 etc
in
> mock.txt. The substring
> after the "_" is variable length. In the end, it should be one
row for 1079,
> one for 111, and one for 5576.
> Can you help me with this modification of the code? Any advice much
> appreciated. Sincerely, Trip
>
> dat<-read.table('mock.txt',sep="\t")
>
sumData=matrix(NA,nrow=length(unique(dat[,1])),ncol=length(unique(dat[,2])))
> rownames(sumData)<-unique(dat[,1])
> colnames(sumData)<-unique(dat[,2])
>
> for (i in 1:dim(sumData)[1]){
>   for(j in 1:dim(sumData)[2]){
>      sumData[i,j]<-sum (dat[,1]==unique(dat[,1])[i] &
> dat[,2]==unique(dat[,2])[j])
>   }
> }
>
>
write.table(sumData,"SummarizedData.txt",sep="\t",col.names=NA)
>


On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney <tripsweeney at gmail.com>
wrote:>
> Hi all,
> I have read posts of heat map creation but I am one step prior --
> Here is what I am trying to do and wonder if you have any tips?
> We are trying to map sequence reads from tumors to viral genomes.
>
> Example input file :
> 111 ? ? abc
> 111 ? ? sdf
> 111 ? ? xyz
> 1079 ? abc
> 1079 ? xyz
> 1079 ? xyz
> 5576 ? abc
> 5576 ? sdf
> 5576 ? sdf
>
> How may xyz's are there for 1079 and 111? How many abc's, etc?
> How many times did reads from sample (1079) align to virus xyz.
> In some cases there are thousands per virus in a give sample, sometimes
one.
> The original file (two columns by tens of thousands of rows; 20 MB) is
> text file (tab delimited).
>
> Output file:
> ? ? ? ? abc ?sdf ?xyz
> 111 ? ? 1 ? ? ?1 ? ? 1
> 1079 ? 1 ? ? ?0 ? ? 2
> 5576 ? 1 ? ? ?2 ? ? 0
>
> Or, other ways to generate this data so I can then use it for heat map
> creation?
>
> Thanks for any help you may have,
>
> rtsweeney
> palo alto, ca
> --
> View this message in context:
http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>-------------- next part --------------
1079_346	281416490|ref|NC_013643.1|
1079_346	281416323|ref|NC_013646.1|
1079_378	9629367|ref|NC_001803.1|
1079_588	30984428|ref|NC_004812.1|
1079_1292	9629367|ref|NC_001803.1|
1079_3956	9629357|ref|NC_001802.1|
1079_4736	9629357|ref|NC_001802.1|
1079_7732	21427641|ref|NC_004015.1|
1079_7855	118197620|ref|NC_008584.1|
1079_8618	32453484|ref|NC_004928.1|
1079_11540	10140926|ref|NC_002531.1|
1079_14794	9629367|ref|NC_001803.1|
1079_15738	109255272|ref|NC_008168.1|
1079_17891	299778956|ref|NC_014260.1|
1079_18414	157781212|ref|NC_009823.1|
1079_18414	157781216|ref|NC_009824.1|
1079_20312	9629367|ref|NC_001803.1|
1079_20497	9629357|ref|NC_001802.1|
1079_26750	9629367|ref|NC_001803.1|
1079_27926	9628113|ref|NC_001659.1|
1079_27926	9628113|ref|NC_001659.1|
1079_28033	84662653|ref|NC_007710.1|
1079_30020	47835019|ref|NC_004333.2|
1079_30371	9629367|ref|NC_001803.1|
1079_35750	50313241|ref|NC_001491.2|
1079_35750	50313241|ref|NC_001491.2|
111_463428	56694721|ref|NC_006560.1|
111_464636	114680053|ref|NC_008349.1|
111_464636	9627742|ref|NC_001623.1|
111_465190	9627186|ref|NC_001539.1|
111_467613	51557483|ref|NC_006151.1|
111_467613	51557483|ref|NC_006151.1|
111_467975	9627742|ref|NC_001623.1|
111_467975	114680053|ref|NC_008349.1|
111_467975	23577820|ref|NC_004323.1|
111_469706	21426072|ref|NC_004003.1|
111_469706	21426072|ref|NC_004003.1|
111_469793	146261990|ref|NC_001826.2|
111_470996	203454602|ref|NC_011273.1|
111_473637	281415946|ref|NC_013650.1|
111_473637	203458877|ref|NC_011269.1|
111_473637	109393216|ref|NC_008207.1|
111_473637	203457352|ref|NC_011272.1|
111_473637	203460520|ref|NC_011270.1|
111_473637	29566511|ref|NC_004687.1|
111_473637	204305660|ref|NC_011271.1|
5576_315871	168804017|ref|NC_010356.1|
5576_316443	9629198|ref|NC_001781.1|
5576_324191	148727082|ref|NC_009541.1|
5576_327936	9629267|ref|NC_001798.1|
5576_327936	9629267|ref|NC_001798.1|
5576_327936	9629267|ref|NC_001798.1|
5576_330546	216905965|ref|NC_011645.1|
5576_333512	57659681|ref|NC_006659.1|
5576_333512	57753428|ref|NC_006634.1|
5576_333512	57659681|ref|NC_006659.1|
5576_353878	20522096|ref|NC_003795.1|
5576_354562	9627186|ref|NC_001539.1|
5576_354577	19718363|ref|NC_003461.1|
5576_358444	48696722|ref|NC_005881.1|
5576_358444	48696722|ref|NC_005881.1|
5576_366975	9629178|ref|NC_001753.1|
5576_368020	239505241|ref|NC_012783.1|
5576_371413	48696722|ref|NC_005881.1|
5576_371413	48696722|ref|NC_005881.1|
5576_375881	48696722|ref|NC_005881.1|

Maybe Matching Threads

Search for more apparently analagous threads

R help - Aug 2010 - frequency, count rows, data for heat map

[R] frequency, count rows, data for heat map

[R] frequency, count rows, data for heat map

[R] frequency, count rows, data for heat map

[R] frequency, count rows, data for heat map

Maybe Matching Threads