Thanks it works great and gives me group numbers as integers and thus I can with which group the elements as needed (which (groups== 2)) Question though is how to keep also the labels for each group. For example that my first group is the [13,206) RegardsAlex On Wednesday, November 4, 2015 1:00 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: I would transform the original numbers into integers which you can use as group labels. The row numbers of the group labels are the indexes of your values. Example: assume your input vector is dBin nGroups <- 5? # number of groups groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range [0,1] groups <- floor(groups * nGroups) + 1? # discretize to nGroups integers Now you can eg. get the indices for group 2 groups[groups == 2] Depending on the nature of your input data, it may be better to keep these groups in a column adjacent to your values, rather than in a separate vector, or even better to just calculate the groups on the fly in your downstream analysis with the approach given above in a function, rather than storing them at all. These are simple operations that should not add perceptibly to execution time. Cheers, Boris On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org> wrote:> Thanks for the answer. Split does not give me the indexes though but only in which group they fall in. I also need the index of the group. Is the first, the second .. group?Alex > > > >? ? On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote: > > > Probably > > split(binDistance, test). > > Best, > Ista > > On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote: >> Dear all,I am not exactly sure on what is the proper name of what I am trying to do. >> I have a vector that looks like >>? binDistance >>? ? ? ? ? ? [,1] >>? [1,] 238.95162 >>? [2,] 143.08590 >>? [3,]? 88.50923 >>? [4,] 177.67884 >>? [5,] 277.54116 >>? [6,] 342.94689 >>? [7,] 241.60905 >>? [8,] 177.81969 >>? [9,] 211.25559 >> [10,] 279.72702 >> [11,] 381.95738 >> [12,] 483.76363 >> [13,] 480.98841 >> [14,] 369.75241 >> [15,] 267.73650 >> [16,] 138.55959 >> [17,] 137.93181 >> [18,] 184.75200 >> [19,] 254.64359 >> [20,] 328.87785 >> [21,] 273.15577 >> [22,] 252.52830 >> [23,] 252.52830 >> [24,] 252.52830 >> [25,] 262.20084 >> [26,] 314.93064 >> [27,] 366.02996 >> [28,] 442.77467 >> [29,] 521.20323 >> [30,] 465.33071 >> [31,] 366.60582 >> [32,]? 13.69540 >> so numbers that start from 13 and go up to maximum 522 (I have also many other similar sets).I want to put these numbers into 5 categories and thus I have tried cut >> >> >> Browse[2]> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1)) >> Browse[2]> test >>? [1] (217,318]? (115,217]? (13.7,115] (115,217]? (217,318]? (318,420] >>? [7] (217,318]? (115,217]? (115,217]? (217,318]? (318,420]? (420,521] >> [13] (420,521]? (318,420]? (217,318]? (115,217]? (115,217]? (115,217] >> [19] (217,318]? (318,420]? (217,318]? (217,318]? (217,318]? (217,318] >> [25] (217,318]? (217,318]? (318,420]? (420,521]? (420,521]? (420,521] >> [31] (318,420]? (13.7,115] >> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521] >> >> >> I want then for the numbers of my initial vector that fall within the same "category" lets say the (318,420] to be collected on a vector.I rephrase it the indexes of my initial vector that have a value between 318 to 420 to be put in a same vector that I can process then as I want. >> How I can do that effectively in R? >> I would like to thank you for your replyRegardsAlex >> >>? ? ? ? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
I don't understand: - where does the "label" come from? (It's not an element of your data that I see.) - what do you want to do with this "label" i.e. how does it need to be associated with the data? B. On Nov 4, 2015, at 7:57 AM, Alaios <alaios at yahoo.com> wrote:> Thanks it works great and gives me group numbers as integers and thus I can with which group the elements as needed (which (groups== 2)) > > Question though is how to keep also the labels for each group. For example that my first group is the [13,206) > > Regards > Alex > > > > On Wednesday, November 4, 2015 1:00 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > I would transform the original numbers into integers which you can use as group labels. The row numbers of the group labels are the indexes of your values. > > Example: assume your input vector is dBin > > nGroups <- 5 # number of groups > groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range [0,1] > groups <- floor(groups * nGroups) + 1 # discretize to nGroups integers > > Now you can eg. get the indices for group 2 > > groups[groups == 2] > > Depending on the nature of your input data, it may be better to keep these groups in a column adjacent to your values, rather than in a separate vector, or even better to just calculate the groups on the fly in your downstream analysis with the approach given above in a function, rather than storing them at all. These are simple operations that should not add perceptibly to execution time. > > Cheers, > Boris > > > > > > > On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org> wrote: > > > Thanks for the answer. Split does not give me the indexes though but only in which group they fall in. I also need the index of the group. Is the first, the second .. group?Alex > > > > > > > > On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote: > > > > > > Probably > > > > split(binDistance, test). > > > > Best, > > Ista > > > > On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote: > >> Dear all,I am not exactly sure on what is the proper name of what I am trying to do. > >> I have a vector that looks like > >> binDistance > >> [,1] > >> [1,] 238.95162 > >> [2,] 143.08590 > >> [3,] 88.50923 > >> [4,] 177.67884 > >> [5,] 277.54116 > >> [6,] 342.94689 > >> [7,] 241.60905 > >> [8,] 177.81969 > >> [9,] 211.25559 > >> [10,] 279.72702 > >> [11,] 381.95738 > >> [12,] 483.76363 > >> [13,] 480.98841 > >> [14,] 369.75241 > >> [15,] 267.73650 > >> [16,] 138.55959 > >> [17,] 137.93181 > >> [18,] 184.75200 > >> [19,] 254.64359 > >> [20,] 328.87785 > >> [21,] 273.15577 > >> [22,] 252.52830 > >> [23,] 252.52830 > >> [24,] 252.52830 > >> [25,] 262.20084 > >> [26,] 314.93064 > >> [27,] 366.02996 > >> [28,] 442.77467 > >> [29,] 521.20323 > >> [30,] 465.33071 > >> [31,] 366.60582 > >> [32,] 13.69540 > >> so numbers that start from 13 and go up to maximum 522 (I have also many other similar sets).I want to put these numbers into 5 categories and thus I have tried cut > >> > >> > >> Browse[2]> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1)) > >> Browse[2]> test > >> [1] (217,318] (115,217] (13.7,115] (115,217] (217,318] (318,420] > >> [7] (217,318] (115,217] (115,217] (217,318] (318,420] (420,521] > >> [13] (420,521] (318,420] (217,318] (115,217] (115,217] (115,217] > >> [19] (217,318] (318,420] (217,318] (217,318] (217,318] (217,318] > >> [25] (217,318] (217,318] (318,420] (420,521] (420,521] (420,521] > >> [31] (318,420] (13.7,115] > >> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521] > >> > >> > >> I want then for the numbers of my initial vector that fall within the same "category" lets say the (318,420] to be collected on a vector.I rephrase it the indexes of my initial vector that have a value between 318 to 420 to be put in a same vector that I can process then as I want. > >> How I can do that effectively in R? > >> I would like to thank you for your replyRegardsAlex > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >
you are right.by labels I mean the "categories", "breaks" that my data fall in.To be part of group 2 for example you have to be in the range of [110,223) I need to keep those for my plots. Did I describe it more precisely now?Alex On Wednesday, November 4, 2015 2:09 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: I don't understand: - where does the "label" come from? (It's not an element of your data that I see.) - what do you want to do with this "label" i.e. how does it need to be associated with the data? B. On Nov 4, 2015, at 7:57 AM, Alaios <alaios at yahoo.com> wrote:> Thanks it works great and gives me group numbers as integers and thus I can with which group the elements as needed (which (groups== 2)) > > Question though is how to keep also the labels for each group. For example that my first group is the [13,206) > > Regards > Alex > > > > On Wednesday, November 4, 2015 1:00 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > I would transform the original numbers into integers which you can use as group labels. The row numbers of the group labels are the indexes of your values. > > Example: assume your input vector is dBin > > nGroups <- 5? # number of groups > groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range [0,1] > groups <- floor(groups * nGroups) + 1? # discretize to nGroups integers > > Now you can eg. get the indices for group 2 > > groups[groups == 2] > > Depending on the nature of your input data, it may be better to keep these groups in a column adjacent to your values, rather than in a separate vector, or even better to just calculate the groups on the fly in your downstream analysis with the approach given above in a function, rather than storing them at all. These are simple operations that should not add perceptibly to execution time. > > Cheers, > Boris > > > > > > > On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org> wrote: > > > Thanks for the answer. Split does not give me the indexes though but only in which group they fall in. I also need the index of the group. Is the first, the second .. group?Alex > > > > > > > >? ? On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote: > > > > > > Probably > > > > split(binDistance, test). > > > > Best, > > Ista > > > > On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote: > >> Dear all,I am not exactly sure on what is the proper name of what I am trying to do. > >> I have a vector that looks like > >>? binDistance > >>? ? ? ? ? ? [,1] > >>? [1,] 238.95162 > >>? [2,] 143.08590 > >>? [3,]? 88.50923 > >>? [4,] 177.67884 > >>? [5,] 277.54116 > >>? [6,] 342.94689 > >>? [7,] 241.60905 > >>? [8,] 177.81969 > >>? [9,] 211.25559 > >> [10,] 279.72702 > >> [11,] 381.95738 > >> [12,] 483.76363 > >> [13,] 480.98841 > >> [14,] 369.75241 > >> [15,] 267.73650 > >> [16,] 138.55959 > >> [17,] 137.93181 > >> [18,] 184.75200 > >> [19,] 254.64359 > >> [20,] 328.87785 > >> [21,] 273.15577 > >> [22,] 252.52830 > >> [23,] 252.52830 > >> [24,] 252.52830 > >> [25,] 262.20084 > >> [26,] 314.93064 > >> [27,] 366.02996 > >> [28,] 442.77467 > >> [29,] 521.20323 > >> [30,] 465.33071 > >> [31,] 366.60582 > >> [32,]? 13.69540 > >> so numbers that start from 13 and go up to maximum 522 (I have also many other similar sets).I want to put these numbers into 5 categories and thus I have tried cut > >> > >> > >> Browse[2]> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1)) > >> Browse[2]> test > >>? [1] (217,318]? (115,217]? (13.7,115] (115,217]? (217,318]? (318,420] > >>? [7] (217,318]? (115,217]? (115,217]? (217,318]? (318,420]? (420,521] > >> [13] (420,521]? (318,420]? (217,318]? (115,217]? (115,217]? (115,217] > >> [19] (217,318]? (318,420]? (217,318]? (217,318]? (217,318]? (217,318] > >> [25] (217,318]? (217,318]? (318,420]? (420,521]? (420,521]? (420,521] > >> [31] (318,420]? (13.7,115] > >> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521] > >> > >> > >> I want then for the numbers of my initial vector that fall within the same "category" lets say the (318,420] to be collected on a vector.I rephrase it the indexes of my initial vector that have a value between 318 to 420 to be put in a same vector that I can process then as I want. > >> How I can do that effectively in R? > >> I would like to thank you for your replyRegardsAlex > >> > >>? ? ? ? [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > > >? ? [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]