Leonard Mada
2022-Jan-19 02:39 UTC
[R] How to convert category (or range/group) into continuous?
Dear Marna,
If you want to extract the middle of those intervals, please find below
an improved variant of Luigi's code.
Note:
- it is more efficient to process the levels of a factor, instead of all
the individual strings;
- I envision that there are benefits in a large data frame (> 1 million
rows) - although I have not explicitly checked it;
- the code also handles better the open/closed intervals;
- the returned data structure may require some tweaking (currently
returns a data.frame);
### Middle of an Interval
mid.factor = function(x, inf.to = NULL, split.str=",") {
?? ?lvl0 = levels(x); lvl = lvl0;
?? ?lvl = sub("^[(\\[]", "", lvl);
?? ?lvl = sub("[])]$", "", lvl); # tricky;
?? ?lvl = strsplit(lvl, split.str);
?? ?lvl = lapply(lvl, function(x) as.numeric(x));
?? ?if( ! is.null(inf.to)) {
?? ???? FUN = function(x) {
?? ???? ??? if(any(x == Inf)) 1
?? ???? ??? else if(any(x == - Inf)) -1
?? ???? ??? else 0;
?? ???? }
?? ???? whatInf = sapply(lvl, FUN);
?? ???? # TODO: more advanced;
?? ???? lvl[whatInf == -1] = inf.to[1];
?? ???? lvl[whatInf ==? 1] = inf.to[2];
?? ?}
?? ?mid = sapply(lvl, mean);
?? ?lvl = data.frame(lvl=lvl0, mid=mid);
?? ?merge(data.frame(lvl=x), lvl, by="lvl");
}
# uses the daT data frame;
# requires a factor:
# - this is probably the case with the original data;
daT$group = as.factor(daT$group);
mid.factor(daT$group);
I have uploaded this code also on my GitHub list of useful data tools:
https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R
Sincerely,
Leonard
Leonard Mada
2022-Jan-20 18:44 UTC
[R] How to convert category (or range/group) into continuous?
Dear Marna, I have revisited your initial mail and I am still unsure what your true statistical intention was. Unfortunately, you did not provide any feedback, if any of the solutions helped. Looking at the data more carefully, I see that you try to plot the cumulative frequency. There is an easy way to do this in R. # generate some continuous data x = runif(1000, 0, 2.5); # lets partially discretize it: x = round(x, 2); # Cumulative frequency x.cum = ecdf(x); plot(x.cum, do.points=FALSE, lwd=2, xlim=c(0, 3)); # adding some horizontal lines # using function from previous mail: daT$group = as.factor(daT$group); v = mid.factor(daT$group); # this seems to be on a logarithmic scale: # [Note: a geometric mean may have been more appropriate] abline(v=v$mid, col="red"); I hope this helps, Leonard On 1/19/2022 4:39 AM, Leonard Mada wrote:> Dear Marna, > > > If you want to extract the middle of those intervals, please find > below an improved variant of Rui's code.[edit: corrected name]> > > Note: > - it is more efficient to process the levels of a factor, instead of > all the individual strings; > - I envision that there are benefits in a large data frame (> 1 > million rows) - although I have not explicitly checked it; > - the code also handles better the open/closed intervals; > - the returned data structure may require some tweaking (currently > returns a data.frame); > > > > ### Middle of an Interval > mid.factor = function(x, inf.to = NULL, split.str=",") { > ?? ?lvl0 = levels(x); lvl = lvl0; > ?? ?lvl = sub("^[(\\[]", "", lvl); > ?? ?lvl = sub("[])]$", "", lvl); # tricky; > ?? ?lvl = strsplit(lvl, split.str); > ?? ?lvl = lapply(lvl, function(x) as.numeric(x)); > ?? ?if( ! is.null(inf.to)) { > ?? ???? FUN = function(x) { > ?? ???? ??? if(any(x == Inf)) 1 > ?? ???? ??? else if(any(x == - Inf)) -1 > ?? ???? ??? else 0; > ?? ???? } > ?? ???? whatInf = sapply(lvl, FUN); > ?? ???? # TODO: more advanced; > ?? ???? lvl[whatInf == -1] = inf.to[1]; > ?? ???? lvl[whatInf ==? 1] = inf.to[2]; > ?? ?} > ?? ?mid = sapply(lvl, mean); > ?? ?lvl = data.frame(lvl=lvl0, mid=mid); > ?? ?merge(data.frame(lvl=x), lvl, by="lvl"); > } > > > # uses the daT data frame; > # requires a factor: > # - this is probably the case with the original data; > daT$group = as.factor(daT$group); > mid.factor(daT$group); > > > I have uploaded this code also on my GitHub list of useful data tools: > > https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R > > > Sincerely, > > > Leonard > >