Allan Kamau
2007-Jul-30 14:22 UTC
[R] Matrix nesting (was Re: Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.)
Success, thanks Patrick. Below is the final matrix construction code. x=list() x[length(myVariableNames)]<-NA names(x)<-names(x.val) for (i in myVariableNames){ residues=names(x.val[[i]]) residuesFrequencies=as.vector(x.val[[i]]) someList=list() names(residuesFrequencies)=residues someList<-list(frequency=residuesFrequencies) x[i]<-someList } #The output> x[16:18]$PR12 I 10 $PR13 K R 8 2 $PR14 I V 2 8>----- Original Message ---- From: Patrick Burns <pburns at pburns.seanet.com> To: Allan Kamau <kamauallan at yahoo.com> Sent: Monday, July 30, 2007 12:01:32 PM Subject: Re: [R] Matrix nesting (was Re: Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.) I think you want your main matrix to be of mode list. S Poetry talks about this some. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Allan Kamau wrote:>Hi > > > > > > > > <!-- > @page { size: 21cm 29.7cm; margin: 2cm } > P { margin-bottom: 0.21cm } > --> > > >I would like to nest matrices, is there >a way of doing so, I am getting ?number of items to replace is not >a multiple of replacement length? errors (probably R is trying to >flatten the matrix into a vector and complains if the vector is >larger than 1 element during the insert) > >I have a matrix (see below) in which I >would like to place one other matrices in to each k[2,i] position >(where i is value between 1 to 4) > >Why ? each value in k[1,i] may >represent several (1or more) key-value results which I would like to >capture in the corresponding k[2,i] element. > > > > > > > >>k >> >> > > [,1] [,2] [,3] >[,4] > >myVariableNames "PR10" "PR11" >"PR12" "PR13" > >x2 "0" "0" > "0" "0" > > > > > > > > > > > >Allan. > > > >----- Original Message ---- >From: Allan Kamau <kamauallan at yahoo.com> >To: jim holtman <jholtman at gmail.com> >Cc: r-help at stat.math.ethz.ch >Sent: Saturday, July 28, 2007 2:48:47 PM >Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset. > >Hi Jim, >The problem description. >I am trying to identify mutations in a given gene from >a particular genome (biological genome sequence). >I have two CSV files consisting of sequences. One file >consists of reference (documented,curated accepted as >standard) sequences. The other consists of sample >sequences I am trying to identify mutations within. In >both files the an individual sequence is contained in >a single record, it?s amino acid residues ( the actual >sequence of alphabets each representing a given amino >acid for example ?A? stands for ?Alanine?, ?C? for >Cysteine and so on) are each allocated a single field >in the CSV file. >The sequences in both files have been well aligned, >each contain 115 residues with the first residue is >contained in the field 5. The fields 1 to 4 are >allocated for metadata (name of sequence and so on). >My task is to compile a residue occurrence count for >each residue present in a given field in the reference >sequence dataset and use this information when reading >each sequence in the sample dataset to identify a >mutation. For example for position 9 of the sample >sequence ?bb? a ?P? is found and according to our >reference sequence dataset of summaries, at position 9 >?P? may not even exist or may have an occurrence of >10% or so will be classified as mutation, (I could >employ a cut of parameter for mutation >classification). > > >Allan. > >--- jim holtman <jholtman at gmail.com> wrote: > > > >>results=()#character() >>myVariableNames=names(x.val) >>results[length(myVariableNames)]<-NA >> >>for (i in myVariableNames){ >> results[i]<-names(x.val[[i]]) # this does not >>work it returns a >>NULL (how can i convert this to x.val$"somevalue" ? >>) >>} >> >> >> >>On 7/27/07, Allan Kamau <kamauallan at yahoo.com> >>wrote: >> >> >>>Hi All, >>>I am having difficulties finding a way to find a >>> >>> >>substitute to the command "names(v.val$PR14)" so >>that I could generate the command on the fly for all >>PR14 to PR200 (please see the previous discussion >>below to understand what the object x.val contains) >>. I have tried the following >> >> >>>>results=()#character() >>>>myVariableNames=names(x.val) >>>>results[length(myVariableNames)]<-NA >>>> >>>> >>>>for >>>> >>>> >>as.vector(unlist(strsplit(str,",")),mode="list") >> >> >>>+ results[i]<-names(x.val$i) # this does not >>> >>> >>work it returns a NULL (how can i convert this to >>x.val$"somevalue" ? ) >> >> >>>>} >>>> >>>> >>>Allan. >>> >>> >>>----- Original Message ---- >>>From: Allan Kamau <kamauallan at yahoo.com> >>>To: r-help at stat.math.ethz.ch >>>Sent: Thursday, July 26, 2007 10:03:17 AM >>>Subject: Re: [R] Obtaining summary of frequencies >>> >>> >>of value occurrences for a variable in a >>multivariate dataset. >> >> >>>Thanks so much Jim, Andaikalavan, Gabor and others >>> >>> >>for the help and suggestions. >> >> >>>The solution will result in a matrix containing >>> >>> >>nested matrices to enable each variable name, each >>variables distinct value and the count of the >>distinct value to be accessible individually. >> >> >>>The main matrix will contain the variable names, >>> >>> >>the first level nested matrices will consist of the >>variables unique values, and each such variable >>entry will contain a one element vector to contain >>the count or occurrence frequency. >> >> >>>This matrix can now be used in comparing other >>> >>> >>similar datasets for variable values and their >>frequencies. >> >> >>>Building on the input received so far, a probable >>> >>> >>solution in building the matrix will include the >>following. >> >> >>>1)I reading the csv file (containing column >>> >>> >>headers) >> >> >>my_data=read.table("<path/to/my/data.csv>",header=TRUE,sep=",",dec=".",fill=TRUE) >> >> >>>2)I group the values in each variable producing an >>> >>> >>occurrence count(frequency) >> >> >>>>x.val<-apply(my_data,2,table) >>>> >>>> >>>3)I obtain a vector of the names of the variables >>> >>> >>in the table >> >> >>>>names(x.val) >>>> >>>> >>>4)Now I make use of the names (obtained in step 3) >>> >>> >>to obtain a vector of distinct values in a given >>variable (in the example below the variable name is >>$PR14) >> >> >>>>names(v.val$PR14) >>>> >>>> >>>5)I obtain a vector (with one element) of the >>> >>> >>frequency of a value obtained from the step above >>(in our example the value is "V") >> >> >>>>as.vector(x.val$PR14["V"]) >>>> >>>> >>>Todo: >>>Now I will need to place the steps above in a >>> >>> >>script (consisting of loops) to build the matrix, >>step 4 and 5 seem tricky to do programatically. >> >> >>>Allan. >>> >>> >>>----- Original Message ---- >>>From: jim holtman <jholtman at gmail.com> >>>To: Allan Kamau <kamauallan at yahoo.com> >>>Cc: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>; >>> >>> >>r-help at stat.math.ethz.ch >> >> >>>Sent: Wednesday, July 25, 2007 1:50:55 PM >>>Subject: Re: [R] Obtaining summary of frequencies >>> >>> >>of value occurrences for a variable in a >>multivariate dataset. >> >> >>>Also if you want to access the individual values, >>> >>> >>you can just leave >> >> >>>it as a list: >>> >>> >>> >>>>x.val <- apply(x, 2, table) >>>># access each value >>>>x.val$PR14["V"] >>>> >>>> >>>V >>>8 >>> >>> >>> >>>On 7/25/07, Allan Kamau <kamauallan at yahoo.com> >>> >>> >>wrote: >> >> >>>>A subset of the data looks as follows >>>> >>>> >>>> >>>>>df[1:10,14:20] >>>>> >>>>> >>>> PR10 PR11 PR12 PR13 PR14 PR15 PR16 >>>>1 V T I K V G D >>>>2 V S I K V G G >>>>3 V T I R V G G >>>>4 V S I K I G G >>>>5 V S I K V G G >>>>6 V S I R V G G >>>>7 V T I K I G G >>>>8 V S I K V E G >>>>9 V S I K V G G >>>>10 V S I K V G G >>>> >>>>The result I would like is as follows >>>> >>>>PR10 PR11 PR12 ... >>>>[V:10] [S:7,T:3] [I:10] >>>> >>>>The result can be in a matrix or a vector and >>>> >>>> >>each variablename, value and frequency should be >>accessible so as to be used for comparisons with >>another dataset later. >> >> >>>>The frequency can be a count or a percentage. >>>> >>>> >>>>Allan. >>>> >>>> >>>>----- Original Message ---- >>>>From: Adaikalavan Ramasamy >>>> >>>> >><ramasamy at cancer.org.uk> >> >> >>>>To: Allan Kamau <kamauallan at yahoo.com> >>>>Cc: r-help at stat.math.ethz.ch >>>>Sent: Tuesday, July 24, 2007 10:21:51 PM >>>>Subject: Re: [R] Obtaining summary of >>>> >>>> >>frequencies of value occurrences for a variable in a >>multivariate dataset. >> >> >>>>The name of the table should give you the >>>> >>>> >>"value". And if you have a >> >> >>>>matrix, you just need to convert it into a >>>> >>>> >>vector first. >> >> >>>> > m <- matrix( LETTERS[ c(1:3, 3:5, 2:4) ], >>>> >>>> >>nc=3 ) >> >> >>>> > m >>>> [,1] [,2] [,3] >>>>[1,] "A" "C" "B" >>>>[2,] "B" "D" "C" >>>>[3,] "C" "E" "D" >>>> > tb <- table( as.vector(m) ) >>>> > tb >>>> >>>>A B C D E >>>>1 2 3 2 1 >>>> > paste( names(tb), ":", tb, sep="" ) >>>>[1] "A:1" "B:2" "C:3" "D:2" "E:1" >>>> >>>>If this is not what you want, then please give a >>>> >>>> >>simple example. >> >> >>>>Regards, Adai >>>> >>>> >>>> >>>>Allan Kamau wrote: >>>> >>>> >>>>>Hi all, >>>>>If the question below as been answered before >>>>> >>>>> >>I >> >> >>>>>apologize for the posting. >>>>>I would like to get the frequencies of >>>>> >>>>> >>occurrence of >> >> >>>>>all values in a given variable in a >>>>> >>>>> >>multivariate >> >> >>>>>dataset. In short for each variable (or field) >>>>> >>>>> >>a >> >> >>>>>summary of values contained with in a >>>>> >>>>> >>value:frequency >> >> >> >=== message truncated ==> > > > >____________________________________________________________________________________ > > > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > > >