Hello, this is probably trivial but I failed to find this particular snippet of code. What I got: my_dataframe (contains say a 40k rows and 4 columns) distances (vector with euclidean distances between a query vector and each of the rows of my_dataframe) What I do: after scaling data my_dataframe I calculate distances. order them then extract top five hits my_dataframe <- read.table("myDB.csv", header=F, dec=".", sep=";", row.names=1) #reads the whole file scaled_DB <- scale(my_dataframe, center=FALSE) #scales the values require(hopach) #checks necessary R package distances <- order(distancevector(scaled_DB, scaled_DB['query',], d="euclid")) #calculates distances and orders the results from lowest for(i in distances[1:5]) print( dbfile[i,]) #prints top five hits just for debugging What I want to do: 1) create a small top_five frame sadly this does not work: for(i in distances[1:5]) top_five[i,] <- my_dataframe[i,] 2) after I got top_five I woul like to get the index of my query entry, something along Pythons top_five.index('query_string') 3) possibly combine values in distances with row names from my_dataframe: row_1 distance_from_query1 row_2 distance_from_query2 Thank you very much for your help Darek Kedra
Two missing things:>distances[1] 13 14 10 11 2 4 6 1 3 9 8 12 7 5 #numbers correspond to rows in my_dataframe> my_dataframeV2 V3 V4 V5 V6 ENSP00000354687 35660.45 0.04794521 0.05479452 0.06849315 0.07534247 ENSP00000355046 38942.77 0.02967359 0.04451039 0.04451039 0.06824926 ENSP00000354499 57041.21 0.04700855 0.08760684 0.11965812 0.06196581 ENSP00000354687 etc are rownames. I am trying to get top five row names with smallest distances from a given vector as calculated by distancevector from hopach. Darek Kedra ____________________________________________________________________________________ Cheap talk?
Hi!> distances <- order(distancevector(scaled_DB, scaled_DB['query',], > d="euclid"))Just compute the distances WITHOUT ordering, here. And then> 1) create a small top_five frametop = scaled_DB[rank(distances)<=5, ] rank() is better for this than order() in case there are ties.> 2) after I got top_five I woul like to get the index > of my query entry, something along Pythons > top_five.index('query_string')You mean by row name? which(row.names(scaled_DB)=='query_string') But why would you need the index? If you want to get the respective row use logical indexing: my_dataframe['query_string', ]> 3) possibly combine values in distances with row names > from my_dataframe: > row_1 distance_from_query1 > row_2 distance_from_query2The easiest way to store the distances along with the original names and data would be to simply make distances a column in your data frame, which is what I would have done to begin with. The entire procedure would then look like this: my_dataframe = read.table( ... ) scaled_DB <- scale(my_dataframe, center=FALSE) scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...) scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...) scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...) ... top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ] ... cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich Science Center Weihenstephan 85350 Freising, Germany and Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675 GSF - National Research Center Fax. +49-89-3187 3585 for Environment and Health Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel
Neuro LeSuperHéros
2006-Dec-03 16:16 UTC
[R] newbie: new_data_frame <- selected set of rows
#Mock df creation my_dataframe <-data.frame(matrix(runif(14*5),14,5)) row.names(my_dataframe) <-paste("ENSP",1:14,sep="") distances <-c(13,14,10 ,11, 2, 4, 6, 1, 3, 9, 8, 12, 7, 5) head(my_dataframe[order(distances),],5)>From: Darek Kedra <darked90 at yahoo.com> >To: r-help at stat.math.ethz.ch >Subject: Re: [R] newbie: new_data_frame <- selected set of rows >Date: Fri, 1 Dec 2006 14:52:25 -0800 (PST) > >Two missing things: > > >distances > [1] 13 14 10 11 2 4 6 1 3 9 8 12 7 5 > >#numbers correspond to rows in my_dataframe > > > my_dataframe > V2 V3 V4 >V5 V6 >ENSP00000354687 35660.45 0.04794521 0.05479452 >0.06849315 0.07534247 >ENSP00000355046 38942.77 0.02967359 0.04451039 >0.04451039 0.06824926 >ENSP00000354499 57041.21 0.04700855 0.08760684 >0.11965812 0.06196581 > >ENSP00000354687 etc are rownames. > >I am trying to get top five row names with smallest >distances from a given vector as calculated by >distancevector from hopach. > > > >Darek Kedra > > > > > > >____________________________________________________________________________________ >Cheap talk? > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.