Hello,
this is probably trivial but I failed to find this particular snippet of code.
What I got:
my_dataframe (contains say a 40k rows and 4 columns)
distances (vector with euclidean distances between a query vector and each of
the rows of my_dataframe)
What I do:
after scaling data my_dataframe I calculate distances. order them then extract
top five hits
my_dataframe <- read.table("myDB.csv", header=F,
dec=".", sep=";",
row.names=1)
#reads the whole file
scaled_DB <- scale(my_dataframe, center=FALSE)
#scales the values
require(hopach)
#checks necessary R package
distances <- order(distancevector(scaled_DB, scaled_DB['query',],
d="euclid"))
#calculates distances and orders the results from lowest
for(i in distances[1:5]) print( dbfile[i,])
#prints top five hits just for debugging
What I want to do:
1) create a small top_five frame
sadly this does not work:
for(i in distances[1:5]) top_five[i,] <- my_dataframe[i,]
2) after I got top_five I woul like to get the index of my query entry,
something along Pythons
top_five.index('query_string')
3) possibly combine values in distances with row names from my_dataframe:
row_1 distance_from_query1
row_2 distance_from_query2
Thank you very much for your help
Darek Kedra
---------------------------------
[[alternative HTML version deleted]]