Leonard Mada
2022-Jan-19 02:39 UTC
[R] How to convert category (or range/group) into continuous?
Dear Marna, If you want to extract the middle of those intervals, please find below an improved variant of Luigi's code. Note: - it is more efficient to process the levels of a factor, instead of all the individual strings; - I envision that there are benefits in a large data frame (> 1 million rows) - although I have not explicitly checked it; - the code also handles better the open/closed intervals; - the returned data structure may require some tweaking (currently returns a data.frame); ### Middle of an Interval mid.factor = function(x, inf.to = NULL, split.str=",") { ?? ?lvl0 = levels(x); lvl = lvl0; ?? ?lvl = sub("^[(\\[]", "", lvl); ?? ?lvl = sub("[])]$", "", lvl); # tricky; ?? ?lvl = strsplit(lvl, split.str); ?? ?lvl = lapply(lvl, function(x) as.numeric(x)); ?? ?if( ! is.null(inf.to)) { ?? ???? FUN = function(x) { ?? ???? ??? if(any(x == Inf)) 1 ?? ???? ??? else if(any(x == - Inf)) -1 ?? ???? ??? else 0; ?? ???? } ?? ???? whatInf = sapply(lvl, FUN); ?? ???? # TODO: more advanced; ?? ???? lvl[whatInf == -1] = inf.to[1]; ?? ???? lvl[whatInf ==? 1] = inf.to[2]; ?? ?} ?? ?mid = sapply(lvl, mean); ?? ?lvl = data.frame(lvl=lvl0, mid=mid); ?? ?merge(data.frame(lvl=x), lvl, by="lvl"); } # uses the daT data frame; # requires a factor: # - this is probably the case with the original data; daT$group = as.factor(daT$group); mid.factor(daT$group); I have uploaded this code also on my GitHub list of useful data tools: https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R Sincerely, Leonard
Leonard Mada
2022-Jan-20 18:44 UTC
[R] How to convert category (or range/group) into continuous?
Dear Marna, I have revisited your initial mail and I am still unsure what your true statistical intention was. Unfortunately, you did not provide any feedback, if any of the solutions helped. Looking at the data more carefully, I see that you try to plot the cumulative frequency. There is an easy way to do this in R. # generate some continuous data x = runif(1000, 0, 2.5); # lets partially discretize it: x = round(x, 2); # Cumulative frequency x.cum = ecdf(x); plot(x.cum, do.points=FALSE, lwd=2, xlim=c(0, 3)); # adding some horizontal lines # using function from previous mail: daT$group = as.factor(daT$group); v = mid.factor(daT$group); # this seems to be on a logarithmic scale: # [Note: a geometric mean may have been more appropriate] abline(v=v$mid, col="red"); I hope this helps, Leonard On 1/19/2022 4:39 AM, Leonard Mada wrote:> Dear Marna, > > > If you want to extract the middle of those intervals, please find > below an improved variant of Rui's code.[edit: corrected name]> > > Note: > - it is more efficient to process the levels of a factor, instead of > all the individual strings; > - I envision that there are benefits in a large data frame (> 1 > million rows) - although I have not explicitly checked it; > - the code also handles better the open/closed intervals; > - the returned data structure may require some tweaking (currently > returns a data.frame); > > > > ### Middle of an Interval > mid.factor = function(x, inf.to = NULL, split.str=",") { > ?? ?lvl0 = levels(x); lvl = lvl0; > ?? ?lvl = sub("^[(\\[]", "", lvl); > ?? ?lvl = sub("[])]$", "", lvl); # tricky; > ?? ?lvl = strsplit(lvl, split.str); > ?? ?lvl = lapply(lvl, function(x) as.numeric(x)); > ?? ?if( ! is.null(inf.to)) { > ?? ???? FUN = function(x) { > ?? ???? ??? if(any(x == Inf)) 1 > ?? ???? ??? else if(any(x == - Inf)) -1 > ?? ???? ??? else 0; > ?? ???? } > ?? ???? whatInf = sapply(lvl, FUN); > ?? ???? # TODO: more advanced; > ?? ???? lvl[whatInf == -1] = inf.to[1]; > ?? ???? lvl[whatInf ==? 1] = inf.to[2]; > ?? ?} > ?? ?mid = sapply(lvl, mean); > ?? ?lvl = data.frame(lvl=lvl0, mid=mid); > ?? ?merge(data.frame(lvl=x), lvl, by="lvl"); > } > > > # uses the daT data frame; > # requires a factor: > # - this is probably the case with the original data; > daT$group = as.factor(daT$group); > mid.factor(daT$group); > > > I have uploaded this code also on my GitHub list of useful data tools: > > https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R > > > Sincerely, > > > Leonard > >