Gundala Viswanath
2008-Jun-19 14:59 UTC
[R] Create Matrix from Loop of Vectors, Sort It and Pick Top-K
Hi,
I have the following dataset (simplified for example).
__DATA__
300.35 200.25 104.30
22.00 31.12 89.99
444.50 22.10 43.00
22.10 200.55 66.77
Now from that I wish to do the following:
1. Compute variance of each row
2. Pick top-2 row with highest variance
3. Store those selected rows for further processing
To achieve this, I tried to: a) read the table and compute
variance for each row, b) append variance with its original
row in a vector, c) store a vector into multidimentional array (matrix),
d) sort that array. But I am stuck at the step (b).
Can anybody suggest what's the best way to achieve
my aim above?
This is the sample code I have so far (not working).
__BEGIN__
#data <- read.table("testdata.txt")
# Is this a right way to initialize?
all.arr = NULL
for (gi in 1:nofrow) {
gex <- as.vector(data.matrix(data[gi,],rownames.force=FALSE))
#compute variance
gexvar <- var(gex)
# join variance with its original vector
nvec <- c(gexvar,gex)
# I'm stuck here.....This doesn't seem to work
all.arr <- data.frame(nvec)
}
print(all.arr)
__END__
--
Gundala Viswanath
Jakarta - Indonesia
Jorge Ivan Velez
2008-Jun-19 15:20 UTC
[R] Create Matrix from Loop of Vectors, Sort It and Pick Top-K
Dear Gundala,
Try this:
# Data set
DF=read.table(textConnection("300.35 200.25 104.30
22.00 31.12 89.99
444.50 22.10 43.00
22.10 200.55 66.77"),header=FALSE,sep="")
# Variances
VAR=apply(DF,1,var)
# Order
pos=order(VAR)
# Print VAR and pos
VAR
pos
# ordered VAR
VAR[pos]
# top-2 highest VAR
VAR[pos][3:4]
HTH,
Jorge
On Thu, Jun 19, 2008 at 10:59 AM, Gundala Viswanath <gundalav@gmail.com>
wrote:
> Hi,
>
> I have the following dataset (simplified for example).
>
> __DATA__
> 300.35 200.25 104.30
> 22.00 31.12 89.99
> 444.50 22.10 43.00
> 22.10 200.55 66.77
>
> Now from that I wish to do the following:
>
> 1. Compute variance of each row
> 2. Pick top-2 row with highest variance
> 3. Store those selected rows for further processing
>
> To achieve this, I tried to: a) read the table and compute
> variance for each row, b) append variance with its original
> row in a vector, c) store a vector into multidimentional array (matrix),
> d) sort that array. But I am stuck at the step (b).
>
> Can anybody suggest what's the best way to achieve
> my aim above?
>
> This is the sample code I have so far (not working).
>
> __BEGIN__
>
> #data <- read.table("testdata.txt")
>
>
> # Is this a right way to initialize?
> all.arr = NULL
>
> for (gi in 1:nofrow) {
> gex <- as.vector(data.matrix(data[gi,],rownames.force=FALSE))
>
> #compute variance
> gexvar <- var(gex)
>
> # join variance with its original vector
> nvec <- c(gexvar,gex)
>
> # I'm stuck here.....This doesn't seem to work
> all.arr <- data.frame(nvec)
> }
>
> print(all.arr)
> __END__
> --
> Gundala Viswanath
> Jakarta - Indonesia
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Marc Schwartz
2008-Jun-19 16:46 UTC
[R] Create Matrix from Loop of Vectors, Sort It and Pick Top-K
on 06/19/2008 09:59 AM Gundala Viswanath wrote:> Hi, > > I have the following dataset (simplified for example). > > __DATA__ > 300.35 200.25 104.30 > 22.00 31.12 89.99 > 444.50 22.10 43.00 > 22.10 200.55 66.77 > > Now from that I wish to do the following: > > 1. Compute variance of each row > 2. Pick top-2 row with highest variance > 3. Store those selected rows for further processing > > To achieve this, I tried to: a) read the table and compute > variance for each row, b) append variance with its original > row in a vector, c) store a vector into multidimentional array (matrix), > d) sort that array. But I am stuck at the step (b). > > Can anybody suggest what's the best way to achieve > my aim above? > > This is the sample code I have so far (not working). > > __BEGIN__ > > #data <- read.table("testdata.txt") > > > # Is this a right way to initialize? > all.arr = NULL > > for (gi in 1:nofrow) { > gex <- as.vector(data.matrix(data[gi,],rownames.force=FALSE)) > > #compute variance > gexvar <- var(gex) > > # join variance with its original vector > nvec <- c(gexvar,gex) > > # I'm stuck here.....This doesn't seem to work > all.arr <- data.frame(nvec) > } > > print(all.arr) > __END__ > --If your data is contained in a data frame 'DF': > DF V1 V2 V3 1 300.35 200.25 104.30 2 22.00 31.12 89.99 3 444.50 22.10 43.00 4 22.10 200.55 66.77 # Get row-wise variances and cbind() them to DF > DF.var <- cbind(DF, var = apply(DF, 1, var, na.rm = TRUE)) > DF.var V1 V2 V3 var 1 300.35 200.25 104.30 9610.336 2 22.00 31.12 89.99 1361.915 3 444.50 22.10 43.00 56676.803 4 22.10 200.55 66.77 8622.817 # Sort DF by 'var' using order() > DF.var[order(DF.var$var, decreasing = TRUE), ] V1 V2 V3 var 3 444.50 22.10 43.00 56676.803 1 300.35 200.25 104.30 9610.336 4 22.10 200.55 66.77 8622.817 2 22.00 31.12 89.99 1361.915 To get the top 2, you can take a couple of approaches: > DF.var[order(DF.var$var, decreasing = TRUE)[1:2], ] V1 V2 V3 var 3 444.50 22.10 43.0 56676.803 1 300.35 200.25 104.3 9610.336 or > head(DF.var[order(DF.var$var, decreasing = TRUE), ], 2) V1 V2 V3 var 3 444.50 22.10 43.0 56676.803 1 300.35 200.25 104.3 9610.336 See ?cbind, ?apply, ?order and ?head for more information. HTH, Marc Schwartz