thr3ads.net - R help - [R] merging-binning data [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Alaios

2015-Nov-04 11:40 UTC

[R] merging-binning data

Thanks for the answer. Split does not give me the indexes though but only in
which group they fall in. I also need the index of the group. Is the first, the
second .. group?Alex
 


     On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at
gmail.com> wrote:
   

 Probably

split(binDistance, test).

Best,
Ista

On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at
r-project.org> wrote:> Dear all,I am not exactly sure on what is the proper name of what I am
trying to do.
> I have a vector that looks like
>? binDistance
>? ? ? ? ? ? [,1]
>? [1,] 238.95162
>? [2,] 143.08590
>? [3,]? 88.50923
>? [4,] 177.67884
>? [5,] 277.54116
>? [6,] 342.94689
>? [7,] 241.60905
>? [8,] 177.81969
>? [9,] 211.25559
> [10,] 279.72702
> [11,] 381.95738
> [12,] 483.76363
> [13,] 480.98841
> [14,] 369.75241
> [15,] 267.73650
> [16,] 138.55959
> [17,] 137.93181
> [18,] 184.75200
> [19,] 254.64359
> [20,] 328.87785
> [21,] 273.15577
> [22,] 252.52830
> [23,] 252.52830
> [24,] 252.52830
> [25,] 262.20084
> [26,] 314.93064
> [27,] 366.02996
> [28,] 442.77467
> [29,] 521.20323
> [30,] 465.33071
> [31,] 366.60582
> [32,]? 13.69540
> so numbers that start from 13 and go up to maximum 522 (I have also many
other similar sets).I want to put these numbers into 5 categories and thus I
have tried cut
>
>
> Browse[2]>
test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1))
> Browse[2]> test
>? [1] (217,318]? (115,217]? (13.7,115] (115,217]? (217,318]? (318,420]
>? [7] (217,318]? (115,217]? (115,217]? (217,318]? (318,420]? (420,521]
> [13] (420,521]? (318,420]? (217,318]? (115,217]? (115,217]? (115,217]
> [19] (217,318]? (318,420]? (217,318]? (217,318]? (217,318]? (217,318]
> [25] (217,318]? (217,318]? (318,420]? (420,521]? (420,521]? (420,521]
> [31] (318,420]? (13.7,115]
> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521]
>
>
> I want then for the numbers of my initial vector that fall within the same
"category" lets say the (318,420] to be collected on a vector.I
rephrase it the indexes of my initial vector that have a value between 318 to
420 to be put in a same vector that I can process then as I want.
> How I can do that effectively in R?
> I would like to thank you for your replyRegardsAlex
>
>? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
	[[alternative HTML version deleted]]

Boris Steipe

2015-Nov-04 12:00 UTC

head link

[R] merging-binning data

I would transform the original numbers into integers which you can use as group
labels. The row numbers of the group labels are the indexes of your values.

Example: assume your input vector is dBin

nGroups <- 5  # number of groups
groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range
[0,1]
groups <- floor(groups * nGroups) + 1   # discretize to nGroups integers

Now you can eg. get the indices for group 2

groups[groups == 2]

Depending on the nature of your input data, it may be better to keep these
groups in a column adjacent to your values, rather than in a separate vector, or
even better to just calculate the groups on the fly in your downstream analysis
with the approach given above in a function, rather than storing them at all.
These are simple operations that should not add perceptibly to execution time.

Cheers,
Boris






On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org>
wrote:
> Thanks for the answer. Split does not give me the indexes though but only
in which group they fall in. I also need the index of the group. Is the first,
the second .. group?Alex
> 
> 
> 
>     On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at
gmail.com> wrote:
> 
> 
> Probably
> 
> split(binDistance, test).
> 
> Best,
> Ista
> 
> On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at
r-project.org> wrote:
>> Dear all,I am not exactly sure on what is the proper name of what I am
trying to do.
>> I have a vector that looks like
>>   binDistance
>>             [,1]
>>   [1,] 238.95162
>>   [2,] 143.08590
>>   [3,]  88.50923
>>   [4,] 177.67884
>>   [5,] 277.54116
>>   [6,] 342.94689
>>   [7,] 241.60905
>>   [8,] 177.81969
>>   [9,] 211.25559
>> [10,] 279.72702
>> [11,] 381.95738
>> [12,] 483.76363
>> [13,] 480.98841
>> [14,] 369.75241
>> [15,] 267.73650
>> [16,] 138.55959
>> [17,] 137.93181
>> [18,] 184.75200
>> [19,] 254.64359
>> [20,] 328.87785
>> [21,] 273.15577
>> [22,] 252.52830
>> [23,] 252.52830
>> [24,] 252.52830
>> [25,] 262.20084
>> [26,] 314.93064
>> [27,] 366.02996
>> [28,] 442.77467
>> [29,] 521.20323
>> [30,] 465.33071
>> [31,] 366.60582
>> [32,]  13.69540
>> so numbers that start from 13 and go up to maximum 522 (I have also
many other similar sets).I want to put these numbers into 5 categories and thus
I have tried cut
>> 
>> 
>> Browse[2]>
test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1))
>> Browse[2]> test
>>   [1] (217,318]  (115,217]  (13.7,115] (115,217]  (217,318]  (318,420]
>>   [7] (217,318]  (115,217]  (115,217]  (217,318]  (318,420]  (420,521]
>> [13] (420,521]  (318,420]  (217,318]  (115,217]  (115,217]  (115,217]
>> [19] (217,318]  (318,420]  (217,318]  (217,318]  (217,318]  (217,318]
>> [25] (217,318]  (217,318]  (318,420]  (420,521]  (420,521]  (420,521]
>> [31] (318,420]  (13.7,115]
>> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521]
>> 
>> 
>> I want then for the numbers of my initial vector that fall within the
same "category" lets say the (318,420] to be collected on a vector.I
rephrase it the indexes of my initial vector that have a value between 318 to
420 to be put in a same vector that I can process then as I want.
>> How I can do that effectively in R?
>> I would like to thank you for your replyRegardsAlex
>> 
>>         [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Alaios

2015-Nov-04 12:57 UTC

head link

[R] merging-binning data

Thanks it works great and gives me group numbers as integers and thus I can with
which group the elements as needed (which (groups== 2))
Question though is how to keep also the labels for each group. For example that
my first group is the [13,206)
RegardsAlex
 


     On Wednesday, November 4, 2015 1:00 PM, Boris Steipe <boris.steipe at
utoronto.ca> wrote:
   

 I would transform the original numbers into integers which you can use as group
labels. The row numbers of the group labels are the indexes of your values.

Example: assume your input vector is dBin

nGroups <- 5? # number of groups
groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range
[0,1]
groups <- floor(groups * nGroups) + 1? # discretize to nGroups integers

Now you can eg. get the indices for group 2

groups[groups == 2]

Depending on the nature of your input data, it may be better to keep these
groups in a column adjacent to your values, rather than in a separate vector, or
even better to just calculate the groups on the fly in your downstream analysis
with the approach given above in a function, rather than storing them at all.
These are simple operations that should not add perceptibly to execution time.

Cheers,
Boris






On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org>
wrote:
> Thanks for the answer. Split does not give me the indexes though but only
in which group they fall in. I also need the index of the group. Is the first,
the second .. group?Alex
> 
> 
> 
>? ? On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at
gmail.com> wrote:
> 
> 
> Probably
> 
> split(binDistance, test).
> 
> Best,
> Ista
> 
> On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at
r-project.org> wrote:
>> Dear all,I am not exactly sure on what is the proper name of what I am
trying to do.
>> I have a vector that looks like
>>? binDistance
>>? ? ? ? ? ? [,1]
>>? [1,] 238.95162
>>? [2,] 143.08590
>>? [3,]? 88.50923
>>? [4,] 177.67884
>>? [5,] 277.54116
>>? [6,] 342.94689
>>? [7,] 241.60905
>>? [8,] 177.81969
>>? [9,] 211.25559
>> [10,] 279.72702
>> [11,] 381.95738
>> [12,] 483.76363
>> [13,] 480.98841
>> [14,] 369.75241
>> [15,] 267.73650
>> [16,] 138.55959
>> [17,] 137.93181
>> [18,] 184.75200
>> [19,] 254.64359
>> [20,] 328.87785
>> [21,] 273.15577
>> [22,] 252.52830
>> [23,] 252.52830
>> [24,] 252.52830
>> [25,] 262.20084
>> [26,] 314.93064
>> [27,] 366.02996
>> [28,] 442.77467
>> [29,] 521.20323
>> [30,] 465.33071
>> [31,] 366.60582
>> [32,]? 13.69540
>> so numbers that start from 13 and go up to maximum 522 (I have also
many other similar sets).I want to put these numbers into 5 categories and thus
I have tried cut
>> 
>> 
>> Browse[2]>
test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1))
>> Browse[2]> test
>>? [1] (217,318]? (115,217]? (13.7,115] (115,217]? (217,318]? (318,420]
>>? [7] (217,318]? (115,217]? (115,217]? (217,318]? (318,420]? (420,521]
>> [13] (420,521]? (318,420]? (217,318]? (115,217]? (115,217]? (115,217]
>> [19] (217,318]? (318,420]? (217,318]? (217,318]? (217,318]? (217,318]
>> [25] (217,318]? (217,318]? (318,420]? (420,521]? (420,521]? (420,521]
>> [31] (318,420]? (13.7,115]
>> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521]
>> 
>> 
>> I want then for the numbers of my initial vector that fall within the
same "category" lets say the (318,420] to be collected on a vector.I
rephrase it the indexes of my initial vector that have a value between 318 to
420 to be put in a same vector that I can process then as I want.
>> How I can do that effectively in R?
>> I would like to thank you for your replyRegardsAlex
>> 
>>? ? ? ? [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> ??? [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

  
	[[alternative HTML version deleted]]

R help - Nov 2015 - merging-binning data

[R] merging-binning data

[R] merging-binning data

[R] merging-binning data