thr3ads.net - R help - [R] newbie: new_data_frame <- selected set of rows [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Darek Kedra

2006-Nov-30 22:23 UTC

[R] newbie: new_data_frame <- selected set of rows

Hello,

this is probably trivial but I failed to find this
particular snippet of code.

What I got:
my_dataframe (contains say a 40k rows and 4 columns)
distances (vector with euclidean distances between a
query vector and each of the rows of my_dataframe)

What I do:
after scaling data my_dataframe I calculate distances.
order them then extract top five hits

my_dataframe  <- read.table("myDB.csv", header=F,
dec=".", sep=";",
row.names=1)
#reads the whole file

scaled_DB <- scale(my_dataframe, center=FALSE)
#scales the values

require(hopach)
#checks necessary R package

distances <- order(distancevector(scaled_DB,
scaled_DB['query',], d="euclid"))
#calculates distances and orders the results from
lowest

for(i in distances[1:5]) print( dbfile[i,])
#prints top five hits just for debugging
 
What I want to do:
1) create a small top_five frame
sadly this does not work:
for(i in distances[1:5]) top_five[i,] <-
my_dataframe[i,]

2) after I got top_five I woul like to get the index
of my query entry, something along Pythons 
top_five.index('query_string')

3) possibly combine values in distances with row names
from my_dataframe:
row_1 distance_from_query1
row_2 distance_from_query2

Thank you very much for your help

Darek Kedra

Darek Kedra

2006-Dec-01 22:52 UTC

head link

[R] newbie: new_data_frame <- selected set of rows

Two missing things:
>distances [1] 13 14 10 11  2  4  6  1  3  9  8 12  7  5

#numbers correspond to rows in my_dataframe
> my_dataframe                      V2         V3         V4        
V5         V6
ENSP00000354687 35660.45 0.04794521 0.05479452
0.06849315 0.07534247
ENSP00000355046 38942.77 0.02967359 0.04451039
0.04451039 0.06824926
ENSP00000354499 57041.21 0.04700855 0.08760684
0.11965812 0.06196581

ENSP00000354687 etc are rownames. 

I am trying to get top five row names with smallest
distances from a given vector as calculated by
distancevector from hopach.



Darek Kedra





 
____________________________________________________________________________________
Cheap talk?

Philipp Pagel

2006-Dec-02 13:16 UTC

head link

[R] newbie: new_data_frame <- selected set of rows

Hi!
> distances <- order(distancevector(scaled_DB,
scaled_DB['query',],
> d="euclid"))
Just compute the distances WITHOUT ordering, here. And then
> 1) create a small top_five frame
top = scaled_DB[rank(distances)<=5, ]

rank() is better for this than order() in case there are ties.
> 2) after I got top_five I woul like to get the index
> of my query entry, something along Pythons 
> top_five.index('query_string')
You mean by row name?

which(row.names(scaled_DB)=='query_string')

But why would you need the index? If you want to get the respective row
use logical indexing:

my_dataframe['query_string', ]
> 3) possibly combine values in distances with row names
> from my_dataframe:
> row_1 distance_from_query1
> row_2 distance_from_query2
The easiest way to store the distances along with the original names and
data would be to simply make distances a column in your data frame,
which is what I would have done to begin with. The entire procedure
would then look like this:

my_dataframe = read.table( ... )
scaled_DB <- scale(my_dataframe, center=FALSE)
scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...)
scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...)
scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...)
...
top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ]
...

cu
	Philipp

-- 
Dr. Philipp Pagel                            Tel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics      Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS          Tel.  +49-89-3187 3675
GSF - National Research Center               Fax.  +49-89-3187 3585
      for Environment and Health
Ingolst?dter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

Neuro LeSuperHéros

2006-Dec-03 16:16 UTC

head link

[R] newbie: new_data_frame <- selected set of rows

#Mock df creation
my_dataframe <-data.frame(matrix(runif(14*5),14,5))
row.names(my_dataframe) <-paste("ENSP",1:14,sep="")
distances <-c(13,14,10 ,11,  2,  4,  6,  1,  3,  9,  8, 12,  7,  5)

head(my_dataframe[order(distances),],5)

>From: Darek Kedra <darked90 at yahoo.com>
>To: r-help at stat.math.ethz.ch
>Subject: Re: [R] newbie: new_data_frame <- selected set of rows
>Date: Fri, 1 Dec 2006 14:52:25 -0800 (PST)
>
>Two missing things:
>
> >distances
>  [1] 13 14 10 11  2  4  6  1  3  9  8 12  7  5
>
>#numbers correspond to rows in my_dataframe
>
> > my_dataframe
>                       V2         V3         V4
>V5         V6
>ENSP00000354687 35660.45 0.04794521 0.05479452
>0.06849315 0.07534247
>ENSP00000355046 38942.77 0.02967359 0.04451039
>0.04451039 0.06824926
>ENSP00000354499 57041.21 0.04700855 0.08760684
>0.11965812 0.06196581
>
>ENSP00000354687 etc are rownames.
>
>I am trying to get top five row names with smallest
>distances from a given vector as calculated by
>distancevector from hopach.
>
>
>
>Darek Kedra
>
>
>
>
>
>
>____________________________________________________________________________________
>Cheap talk?
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide 
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Nov 2006 - newbie: new_data_frame <- selected set of rows

[R] newbie: new_data_frame <- selected set of rows

[R] newbie: new_data_frame <- selected set of rows

[R] newbie: new_data_frame <- selected set of rows

[R] newbie: new_data_frame <- selected set of rows

Possibly Parallel Threads