thr3ads.net - R help - [R] difficult data manipulation question [Jul 2006]

If this information is useful, please help other people find it:
Share via:

markleeds at verizon.net

2006-Jul-03 20:37 UTC

[R] difficult data manipulation question

hi everyone :

suppose i have a matrix in which some column names are identical so,
for example, TEMP

  "AAA", "BBB", "CCC",
"DDD","AAA", "BBB"
    0      2      1     2      0      0
    2      3      7     6      0      1
    1.5    4      9     9      6      0
    1.0    6      10    11     3      3


I didn't even check  yet whether identical column names are allowed
in a matrix but i hope they are.

assuming that they are, then i would like to be able to take the matrix and 
make a new matrix with the following requirements.

1) whenever there is a unique column name, just take that column for the new
matrix

2) whenever the column name is not unique, take the one
that has the most non zero elements ? ( in the case of
ties, i don't care which one is picked ).

so, in this case, the resulting matrix would just be the first 4 columns.

i realize ( or atleast i think ) that 
sum( TEMP[(TEMP[,columnname] !=0) ,columnname) will give me the
number of non elements in a column with the name columnmame
but how to use this deal with the non uniqueness to solve my particular problem
is beyond me. plus, i think the command will
bomb because columnname will not always be unique ? 
Thanks for any help. I realize this is not a trivial problem so I really
appreciate it.

                                          Mark

Gabor Grothendieck

2006-Jul-03 20:58 UTC

head link

[R] difficult data manipulation question

Try this:

# test data
# read in header separately so R does not make column names unique
Lines <- "AAA BBB CCC DDD AAA BBB
   0      2      1     2      0      0
   2      3      7     6      0      1
   1.5    4      9     9      6      0
   1.0    6      10    11     3      3
"
DF <- read.table(textConnection(Lines), skip = 1)
names(DF) <- scan(textConnection(Lines), what = "", nlines = 1)

f <- function(x) x[which.max(colSums(DF[x]!=0))]
tapply(seq(DF), names(DF), f)

On 7/3/06, markleeds at verizon.net <markleeds at verizon.net>
wrote:>
> hi everyone :
>
> suppose i have a matrix in which some column names are identical so,
> for example, TEMP
>
>  "AAA", "BBB", "CCC",
"DDD","AAA", "BBB"
>    0      2      1     2      0      0
>    2      3      7     6      0      1
>    1.5    4      9     9      6      0
>    1.0    6      10    11     3      3
>
>
> I didn't even check  yet whether identical column names are allowed
> in a matrix but i hope they are.
>
> assuming that they are, then i would like to be able to take the matrix and
make a new matrix with the following requirements.
>
> 1) whenever there is a unique column name, just take that column for the
new matrix
>
> 2) whenever the column name is not unique, take the one
> that has the most non zero elements ? ( in the case of
> ties, i don't care which one is picked ).
>
> so, in this case, the resulting matrix would just be the first 4 columns.
>
> i realize ( or atleast i think ) that
> sum( TEMP[(TEMP[,columnname] !=0) ,columnname) will give me the
> number of non elements in a column with the name columnmame
> but how to use this deal with the non uniqueness to solve my particular
problem is beyond me. plus, i think the command will
> bomb because columnname will not always be unique ?
> Thanks for any help. I realize this is not a trivial problem so I really
appreciate it.
>
>                                          Mark
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Maybe Matching Threads

Search for more maybe matching threads

R help - Jul 2006 - difficult data manipulation question

[R] difficult data manipulation question

[R] difficult data manipulation question

Maybe Matching Threads