I'm fairly new to R. The language is amazing, but I'm having trouble navigating packages. I have a solution that handles the problems I'm working on, but I don't know if it could be solved more cleanly with mle, bbmle, maxLik, etc.. Here's an example problem first. I have run many WAV files through voice recognition software; the software returns 50 hypotheses for each, together with scores S_{ni} indicating how 'good' the i^th hypothesis is. I want to map the S_{ni} to a probability distribution. So I'm using MLE to fit a function f that maps scores to logs of relative probabilities. This means maximising \sum_n[ f(S_{nc_n}) - \log \sum_i \exp f(S_{ni}) ] where c_n is the index of the correct hypothesis for the n^th sample. Here's the code: ave_log_likelihood = function(f, scores) { def = scores %>% filter(Sc > 0) log_likelihoods = with(def, f(Sc) - matrixStats::rowLogSumExps(f(S), na.rm = T)) return(mean(log_likelihoods)) } nlopts = list(algorithm = "NLOPT_LN_BOBYQA", maxeval = 500, print_level = 0) best_linear_fit = function(scores) { res <- nloptr(c(0.01), function(a) -ave_log_likelihood(function(x) (a * x), scores), opts = nlopts) return (data.frame(log_likelihood = -res$objective, slope = res$solution, doubling = log(2)/res$solution)) } Now, I need to write a lot of variants of this with different objectives and with different classes of function. But there's a lot of verbiage in best_linear_fit which would currently be copy/pasted. Also, as written it makes it messy to fit on training data and then evaluate on test data. I'd appreciate any advice on packages that might make it easier to write this more cleanly, ideally using the idioms used in `lm`, etc., such as formulae and `predict`. (Any pointers on writing cleaner R code would also be lovely!) Thanks in advance; Mohan [[alternative HTML version deleted]]
Are you familiar with R resources you can search? 1. CRAN task views: https://cran.r-project.org/web/views/ 2. For searching: https://rseek.org/ Searching on "maximum likelihood" there appeared to bring up relevant resources. 3. RStudio resources: https://education.rstudio.com/ Note: RStudio is a private company that is not part of the R Foundation, but may have useful programming resources for you. 4. Tons of online tutorials: Just search! I have not looked at your code in any detail, but I'd be willing to bet you're trying to reinvent a square wheel. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 21, 2019 at 8:21 AM Mohan Ganesalingam <mohan at what3words.com> wrote:> I'm fairly new to R. The language is amazing, but I'm having trouble > navigating packages. I have a solution that handles the problems I'm > working on, but I don't know if it could be solved more cleanly with mle, > bbmle, maxLik, etc.. > > Here's an example problem first. I have run many WAV files through voice > recognition software; the software returns 50 hypotheses for each, together > with scores S_{ni} indicating how 'good' the i^th hypothesis is. I want to > map the S_{ni} to a probability distribution. So I'm using MLE to fit a > function f that maps scores to logs of relative probabilities. This means > maximising > > \sum_n[ f(S_{nc_n}) - \log \sum_i \exp f(S_{ni}) ] > > where c_n is the index of the correct hypothesis for the n^th sample. > > Here's the code: > > ave_log_likelihood = function(f, scores) { > def = scores %>% filter(Sc > 0) > log_likelihoods = with(def, f(Sc) - matrixStats::rowLogSumExps(f(S), > na.rm = T)) > return(mean(log_likelihoods)) > } > > nlopts = list(algorithm = "NLOPT_LN_BOBYQA", maxeval = 500, print_level > 0) > > best_linear_fit = function(scores) { > res <- nloptr(c(0.01), > function(a) -ave_log_likelihood(function(x) (a * x), > scores), > opts = nlopts) > return (data.frame(log_likelihood = -res$objective, slope = res$solution, > doubling = log(2)/res$solution)) > } > > > Now, I need to write a lot of variants of this with different objectives > and with different classes of function. But there's a lot of verbiage in > best_linear_fit which would currently be copy/pasted. Also, as written it > makes it messy to fit on training data and then evaluate on test data. > > I'd appreciate any advice on packages that might make it easier to write > this more cleanly, ideally using the idioms used in `lm`, etc., such as > formulae and `predict`. (Any pointers on writing cleaner R code would also > be lovely!) > > Thanks in advance; > Mohan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Bert, thanks for the quick reply. I spent a while searching before I posted, and also read through the documentation for the mle fn and the maxLik and bbmle packages. As you say, it seems likely I'm reinventing something standard, but nothing I can find quite seems to do what I need. Hence posting on the mailing list... . best wishes, Mohan On Mon, 21 Oct 2019 at 16:40, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Are you familiar with R resources you can search? > > 1. CRAN task views: > https://cran.r-project.org/web/views/ > > 2. For searching: https://rseek.org/ > Searching on "maximum likelihood" there appeared to bring up relevant > resources. > > 3. RStudio resources: https://education.rstudio.com/ > Note: RStudio is a private company that is not part of the R Foundation, > but may have useful programming resources for you. > > 4. Tons of online tutorials: Just search! > > I have not looked at your code in any detail, but I'd be willing to bet > you're trying to reinvent a square wheel. > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Oct 21, 2019 at 8:21 AM Mohan Ganesalingam <mohan at what3words.com> > wrote: > >> I'm fairly new to R. The language is amazing, but I'm having trouble >> navigating packages. I have a solution that handles the problems I'm >> working on, but I don't know if it could be solved more cleanly with mle, >> bbmle, maxLik, etc.. >> >> Here's an example problem first. I have run many WAV files through voice >> recognition software; the software returns 50 hypotheses for each, >> together >> with scores S_{ni} indicating how 'good' the i^th hypothesis is. I want to >> map the S_{ni} to a probability distribution. So I'm using MLE to fit a >> function f that maps scores to logs of relative probabilities. This means >> maximising >> >> \sum_n[ f(S_{nc_n}) - \log \sum_i \exp f(S_{ni}) ] >> >> where c_n is the index of the correct hypothesis for the n^th sample. >> >> Here's the code: >> >> ave_log_likelihood = function(f, scores) { >> def = scores %>% filter(Sc > 0) >> log_likelihoods = with(def, f(Sc) - matrixStats::rowLogSumExps(f(S), >> na.rm = T)) >> return(mean(log_likelihoods)) >> } >> >> nlopts = list(algorithm = "NLOPT_LN_BOBYQA", maxeval = 500, print_level >> 0) >> >> best_linear_fit = function(scores) { >> res <- nloptr(c(0.01), >> function(a) -ave_log_likelihood(function(x) (a * x), >> scores), >> opts = nlopts) >> return (data.frame(log_likelihood = -res$objective, slope >> res$solution, >> doubling = log(2)/res$solution)) >> } >> >> >> Now, I need to write a lot of variants of this with different objectives >> and with different classes of function. But there's a lot of verbiage in >> best_linear_fit which would currently be copy/pasted. Also, as written it >> makes it messy to fit on training data and then evaluate on test data. >> >> I'd appreciate any advice on packages that might make it easier to write >> this more cleanly, ideally using the idioms used in `lm`, etc., such as >> formulae and `predict`. (Any pointers on writing cleaner R code would also >> be lovely!) >> >> Thanks in advance; >> Mohan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]