Hi, I have a mysql table with fields userid,track,frequency e.g u1,1,10 u1,2,100 u1,3,110 u1,4,200 u1,5,120 u1,6,130 . u2,1,23 . . where "frequency" is the number of times a music track is played by a "userid" I need to turn my 'frequency' table into a rating table (it's for a recommender system). So, for each user, I need to categorise the frequency of tracks played by quintile so that each particular track can have 5 ratings (1-5), with the ratings allocated as follows: inter-quintile range 100-80% = rating 5, inter-quintile range 80-60% = rating 4, ..., inter-quintile range 20-0% = rating 1) Hence, I need to create a table with fields userid,track,rating: u1,1,1 u1,2, 3 ... Can anybody help me to do this with R? Regards Gawesh [[alternative HTML version deleted]]
try this:> # create some data > x <- data.frame(userid = paste('u', rep(1:20, each = 20), sep = '')+ , track = rep(1:20, 20) + , freq = floor(runif(400, 10, 200)) + , stringsAsFactors = FALSE + )> # get the quantiles for each track > tq <- tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1)) > # create a matrix with the rownames as the tracks to use in the findInterval > tqm <- do.call(rbind, tq) > # now put the ratings > require(data.table) > x.dt <- data.table(x) > x.new <- x.dt[,+ list(userid = userid + , freq = freq + , rating = findInterval(freq + # use track as index into quantile matrix + , tqm[as.character(track[1L]),] + , rightmost.closed = TRUE + ) + 1L + ) + , by = track]> > head(x.new)track userid freq rating [1,] 1 u1 10 1 [2,] 1 u2 15 1 [3,] 1 u3 126 4 [4,] 1 u4 117 3 [5,] 1 u5 76 2 [6,] 1 u6 103 3>On Sun, May 8, 2011 at 2:48 PM, gj <gawesh at gmail.com> wrote:> Hi, > > I have a mysql table with fields userid,track,frequency e.g > u1,1,10 > u1,2,100 > u1,3,110 > u1,4,200 > u1,5,120 > u1,6,130 > . > u2,1,23 > . > . > where "frequency" is the number of times a music track is played by a > "userid" > > I need to turn my 'frequency' table into a rating table (it's for a > recommender system). So, for each user, I need to categorise the frequency > of tracks played by quintile so that each particular track can have 5 > ratings (1-5), with the ratings allocated as follows: inter-quintile range > 100-80% = rating 5, ? inter-quintile range 80-60% = rating 4, > ..., inter-quintile range 20-0% = rating 1) > > Hence, I need to create a table with fields userid,track,rating: > u1,1,1 > u1,2, 3 > ... > > Can anybody help me to do this with R? > > Regards > Gawesh > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
One way to get the ratings would be to use the ave() function: rating = ave(x$freq,x$track, FUN=function(x)cut(x,quantile(x,(0:5)/5),include.lowest=TRUE)) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Sun, 8 May 2011, gj wrote:> Hi, > > I have a mysql table with fields userid,track,frequency e.g > u1,1,10 > u1,2,100 > u1,3,110 > u1,4,200 > u1,5,120 > u1,6,130 > . > u2,1,23 > . > . > where "frequency" is the number of times a music track is played by a > "userid" > > I need to turn my 'frequency' table into a rating table (it's for a > recommender system). So, for each user, I need to categorise the frequency > of tracks played by quintile so that each particular track can have 5 > ratings (1-5), with the ratings allocated as follows: inter-quintile range > 100-80% = rating 5, inter-quintile range 80-60% = rating 4, > ..., inter-quintile range 20-0% = rating 1) > > Hence, I need to create a table with fields userid,track,rating: > u1,1,1 > u1,2, 3 > ... > > Can anybody help me to do this with R? > > Regards > Gawesh > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Seemingly Similar Threads
- Package "demography" - calculating quintiles of survival probabilities
- Creating quintiles on monthly basis
- conditional weighted quintiles
- R: writing data from one matrix into another with keeping NA's
- GAM model with interactions between continuous variables and factors