Thanks for your comments. Actually only the last group has a single element. The first group is always "full" of members and as that it works fine. Some constant spacing between the groups would be good as well and thus I will check quantiles. Thanks for the great support and time invested on thisRegardsAlex On Wednesday, November 4, 2015 3:34 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: Whatever approach is "best" to define subsets depends completely on the semantics of the data. Your approach (a fixed number of equally spaced breaks) is the right one if the absolute ranges of the data is important. It should be obvious that either the top or the bottom group could contain only a single element, and also that any or all of the intermediate groups could be empty. If you want to control the number of elements in your groups, use quantiles instead. Your application may require to define the breaks in other ways. The code I have given you doesn't generalize well, as it depends on the equal spacing of breaks. As I mentioned earlier, I would not store the groups at all - but would define a function that returns a vector of elements in the group, and in the function body I would clearly and explicitly define the conditions for group membership (and comment it). That is how you make code for a task like this explicit and _maintainable_. Cheers, Boris On Nov 4, 2015, at 9:19 AM, Alaios <alaios at yahoo.com> wrote:> Thanks everything is solved and I was even able to plot boxplots as needed. > The only minor is that the max element falls in the last category and is only the single one element. Perhaps this can be from the way my data look like. > Retgards > Alex > > > > On Wednesday, November 4, 2015 3:06 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > The breaks are just the min() and max() in your groups. Something like > >? sprintf("[%5.2f,%5.2f]", min(dBin[groups==2]), max(dBin[groups==2])) > > ... should achieve what you need. > > > B. > > > > On Nov 4, 2015, at 8:45 AM, Alaios <alaios at yahoo.com> wrote: > > > you are right. > > by labels I mean the "categories", "breaks" that my data fall in. > > To be part of group 2 for example you have to be in the range of [110,223) I need to keep those for my plots. > > > > Did I describe it more precisely now? > > Alex > > > > > > > > On Wednesday, November 4, 2015 2:09 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > > > > I don't understand: > > - where does the "label" come from? (It's not an element of your data that I see.) > > - what do you want to do with this "label" i.e. how does it need to be associated with the data? > > > > > > B. > > > > > > > > On Nov 4, 2015, at 7:57 AM, Alaios <alaios at yahoo.com> wrote: > > > > > Thanks it works great and gives me group numbers as integers and thus I can with which group the elements as needed (which (groups== 2)) > > > > > > Question though is how to keep also the labels for each group. For example that my first group is the [13,206) > > > > > > Regards > > > Alex > > > > > > > > > > > > On Wednesday, November 4, 2015 1:00 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > > > > > > > I would transform the original numbers into integers which you can use as group labels. The row numbers of the group labels are the indexes of your values. > > > > > > Example: assume your input vector is dBin > > > > > > nGroups <- 5? # number of groups > > > groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range [0,1] > > > groups <- floor(groups * nGroups) + 1? # discretize to nGroups integers > > > > > > Now you can eg. get the indices for group 2 > > > > > > groups[groups == 2] > > > > > > Depending on the nature of your input data, it may be better to keep these groups in a column adjacent to your values, rather than in a separate vector, or even better to just calculate the groups on the fly in your downstream analysis with the approach given above in a function, rather than storing them at all. These are simple operations that should not add perceptibly to execution time. > > > > > > Cheers, > > > Boris > > > > > > > > > > > > > > > > > > > > > On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org> wrote: > > > > > > > Thanks for the answer. Split does not give me the indexes though but only in which group they fall in. I also need the index of the group. Is the first, the second .. group?Alex > > > > > > > > > > > > > > > >? ? On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote: > > > > > > > > > > > > Probably > > > > > > > > split(binDistance, test). > > > > > > > > Best, > > > > Ista > > > > > > > > On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote: > > > >> Dear all,I am not exactly sure on what is the proper name of what I am trying to do. > > > >> I have a vector that looks like > > > >>? binDistance > > > >>? ? ? ? ? ? [,1] > > > >>? [1,] 238.95162 > > > >>? [2,] 143.08590 > > > >>? [3,]? 88.50923 > > > >>? [4,] 177.67884 > > > >>? [5,] 277.54116 > > > >>? [6,] 342.94689 > > > >>? [7,] 241.60905 > > > >>? [8,] 177.81969 > > > >>? [9,] 211.25559 > > > >> [10,] 279.72702 > > > >> [11,] 381.95738 > > > >> [12,] 483.76363 > > > >> [13,] 480.98841 > > > >> [14,] 369.75241 > > > >> [15,] 267.73650 > > > >> [16,] 138.55959 > > > >> [17,] 137.93181 > > > >> [18,] 184.75200 > > > >> [19,] 254.64359 > > > >> [20,] 328.87785 > > > >> [21,] 273.15577 > > > >> [22,] 252.52830 > > > >> [23,] 252.52830 > > > >> [24,] 252.52830 > > > >> [25,] 262.20084 > > > >> [26,] 314.93064 > > > >> [27,] 366.02996 > > > >> [28,] 442.77467 > > > >> [29,] 521.20323 > > > >> [30,] 465.33071 > > > >> [31,] 366.60582 > > > >> [32,]? 13.69540 > > > >> so numbers that start from 13 and go up to maximum 522 (I have also many other similar sets).I want to put these numbers into 5 categories and thus I have tried cut > > > >> > > > >> > > > >> Browse[2]> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1)) > > > >> Browse[2]> test > > > >>? [1] (217,318]? (115,217]? (13.7,115] (115,217]? (217,318]? (318,420] > > > >>? [7] (217,318]? (115,217]? (115,217]? (217,318]? (318,420]? (420,521] > > > >> [13] (420,521]? (318,420]? (217,318]? (115,217]? (115,217]? (115,217] > > > >> [19] (217,318]? (318,420]? (217,318]? (217,318]? (217,318]? (217,318] > > > >> [25] (217,318]? (217,318]? (318,420]? (420,521]? (420,521]? (420,521] > > > >> [31] (318,420]? (13.7,115] > > > >> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521] > > > >> > > > >> > > > >> I want then for the numbers of my initial vector that fall within the same "category" lets say the (318,420] to be collected on a vector.I rephrase it the indexes of my initial vector that have a value between 318 to 420 to be put in a same vector that I can process then as I want. > > > >> How I can do that effectively in R? > > > >> I would like to thank you for your replyRegardsAlex > > > >> > > > >>? ? ? ? [[alternative HTML version deleted]] > > > >> > > > >> ______________________________________________ > > > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > >? ? [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > >[[alternative HTML version deleted]]
I have been vaguely following this thread and have become very confused given the complications that seem to have appeared. The original question was:>>>>> On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote: >>>>>> Dear all,I am not exactly sure on what is the proper name of what I am trying to do. >>>>>> I have a vector that looks likeActually you appear to have a 32 x 1 *matrix* (NOT the same thing!) that looks like:>>>>>> binDistance >>>>>> [,1] >>>>>> [1,] 238.95162 >>>>>> [2,] 143.08590 >>>>>> [3,] 88.50923 >>>>>> [4,] 177.67884 >>>>>> [5,] 277.54116 >>>>>> [6,] 342.94689 >>>>>> [7,] 241.60905 >>>>>> [8,] 177.81969 >>>>>> [9,] 211.25559 >>>>>> [10,] 279.72702 >>>>>> [11,] 381.95738 >>>>>> [12,] 483.76363 >>>>>> [13,] 480.98841 >>>>>> [14,] 369.75241 >>>>>> [15,] 267.73650 >>>>>> [16,] 138.55959 >>>>>> [17,] 137.93181 >>>>>> [18,] 184.75200 >>>>>> [19,] 254.64359 >>>>>> [20,] 328.87785 >>>>>> [21,] 273.15577 >>>>>> [22,] 252.52830 >>>>>> [23,] 252.52830 >>>>>> [24,] 252.52830 >>>>>> [25,] 262.20084 >>>>>> [26,] 314.93064 >>>>>> [27,] 366.02996 >>>>>> [28,] 442.77467 >>>>>> [29,] 521.20323 >>>>>> [30,] 465.33071 >>>>>> [31,] 366.60582 >>>>>> [32,] 13.69540A later addendum to the question indicated that the OP wanted labels for the result consisting of the endpoints of the intervals into which the data were subdivided. Unless I am misunderstanding, this is trivial to accomplish using cut() and split(): x <- c(238.95162, 143.0859, 88.50923, 177.67884, 277.54116, 342.94689, 241.60905, 177.81969, 211.25559, 279.72702, 381.95738, 483.76363, 480.98841, 369.75241, 267.7365, 138.55959, 137.93181, 184.752, 254.64359, 328.87785, 273.15577, 252.5283, 252.5283, 252.5283, 262.20084, 314.93064, 366.02996, 442.77467, 521.20323, 465.33071, 366.60582, 13.6954) f <- cut(x,5) y <- split(x,f) y $`(13.2,115]` [1] 88.50923 13.69540 $`(115,217]` [1] 143.0859 177.6788 177.8197 211.2556 138.5596 137.9318 184.7520 $`(217,318]` [1] 238.9516 277.5412 241.6090 279.7270 267.7365 254.6436 273.1558 252.5283 [9] 252.5283 252.5283 262.2008 314.9306 $`(318,420]` [1] 342.9469 381.9574 369.7524 328.8779 366.0300 366.6058 $`(420,522]` [1] 483.7636 480.9884 442.7747 521.2032 465.3307 Is this not the result that you want? If not, what *is* the result that you want? cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Thanks.That is what I want. It is more that I do not know how to read factors that these two functions return Browse[1]> y $`13.6954016405008` [1] (13.2,115] Levels: (13.2,115] (115,217] (217,318] (318,420] (420,522] $`88.5092280867206` [1] (13.2,115] Levels: (13.2,115] (115,217] (217,318] (318,420] (420,522] $`137.931810364616` [1] (115,217] Levels: (13.2,115] (115,217] (217,318] (318,420] (420,522] ?str(y) List of 30 ?$ 13.6954016405008: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 1 ?$ 88.5092280867206: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 1 ?$ 137.931810364616: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 138.559590072838: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 143.085897171535: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 177.678839068735: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 177.819693807561: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 184.752000138622: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 211.255591076421: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 2 ?$ 238.951618624679: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 3 ?$ 241.609050762905: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 3 ?$ 252.528297510773: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 3 3 3 ?$ 254.643586371518: Factor w/ 5 levels "(13.2,115]","(115,217]",..: 3 I need to be able to keep the items within their groups and at the same time to keep the label of the group so to be able to use it for plotting purposes. How I can do that?RegardsAlex On Wednesday, November 4, 2015 11:20 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote: I have been vaguely following this thread and have become very confused given the complications that seem to have appeared. The original question was:>>>>> On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote: >>>>>> Dear all,I am not exactly sure on what is the proper name of what I am trying to do. >>>>>> I have a vector that looks likeActually you appear to have a 32 x 1 *matrix* (NOT the same thing!) that looks like:>>>>>>? ? binDistance >>>>>>? ? ? ? ? ? ? [,1] >>>>>>? ? [1,] 238.95162 >>>>>>? ? [2,] 143.08590 >>>>>>? ? [3,]? 88.50923 >>>>>>? ? [4,] 177.67884 >>>>>>? ? [5,] 277.54116 >>>>>>? ? [6,] 342.94689 >>>>>>? ? [7,] 241.60905 >>>>>>? ? [8,] 177.81969 >>>>>>? ? [9,] 211.25559 >>>>>> [10,] 279.72702 >>>>>> [11,] 381.95738 >>>>>> [12,] 483.76363 >>>>>> [13,] 480.98841 >>>>>> [14,] 369.75241 >>>>>> [15,] 267.73650 >>>>>> [16,] 138.55959 >>>>>> [17,] 137.93181 >>>>>> [18,] 184.75200 >>>>>> [19,] 254.64359 >>>>>> [20,] 328.87785 >>>>>> [21,] 273.15577 >>>>>> [22,] 252.52830 >>>>>> [23,] 252.52830 >>>>>> [24,] 252.52830 >>>>>> [25,] 262.20084 >>>>>> [26,] 314.93064 >>>>>> [27,] 366.02996 >>>>>> [28,] 442.77467 >>>>>> [29,] 521.20323 >>>>>> [30,] 465.33071 >>>>>> [31,] 366.60582 >>>>>> [32,]? 13.69540A later addendum to the question indicated that the OP wanted labels for the result consisting of the endpoints of the intervals into which the data were subdivided.? Unless I am misunderstanding, this is trivial to accomplish using cut() and split(): x <- c(238.95162, 143.0859, 88.50923, 177.67884, 277.54116, 342.94689, 241.60905, 177.81969, 211.25559, 279.72702, 381.95738, 483.76363, 480.98841, 369.75241, 267.7365, 138.55959, 137.93181, 184.752, 254.64359, 328.87785, 273.15577, 252.5283, 252.5283, 252.5283, 262.20084, 314.93064, 366.02996, 442.77467, 521.20323, 465.33071, 366.60582, 13.6954) f <- cut(x,5) y <- split(x,f) y $`(13.2,115]` [1] 88.50923 13.69540 $`(115,217]` [1] 143.0859 177.6788 177.8197 211.2556 138.5596 137.9318 184.7520 $`(217,318]` ? [1] 238.9516 277.5412 241.6090 279.7270 267.7365 254.6436 273.1558 252.5283 ? [9] 252.5283 252.5283 262.2008 314.9306 $`(318,420]` [1] 342.9469 381.9574 369.7524 328.8779 366.0300 366.6058 $`(420,522]` [1] 483.7636 480.9884 442.7747 521.2032 465.3307 Is this not the result that you want?? If not, what *is* the result that you want? cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 [[alternative HTML version deleted]]