Displaying 20 results from an estimated 4000 matches similar to: "grouping"
2009 Jun 04
4
Binning or grouping data
Newbie here. Many apologies in advance for using the incorrect lingo. I'm
new to statistics and VERY new to R.
I'm attempting to "group" or "bin" data together in order to analyze them as
a combined group rather than as discrete set. I'll provide a simple example
of the data for illustrative purposes.
Patient ID | Charges | Age | Race
1 |
2011 Dec 06
1
help wrapping findInterval into a function
Dear R Community,
I hope you might be able to assist with a small problem creating a function.
I am working with water-quality data sets that contain the concentration of
many different elements in water samples. I need to assign quality-control
flags to values that fall into various concentration ranges. Rather than a
web of nested if statements, I am employing the findInterval function to
2016 Aug 04
1
findInterval(all.inside=TRUE) for degenerate 'vec' arguments
What should findInterval(x,vec,all.inside=TRUE) return when length(vec)<=1,
so there are no inside intervals?
R-3.3.0 gives a decreasing map of x->output when length(vec)==1 and -1's
when length(vec)==0. Would '0' in all those cases be better?
> findInterval(x=c(10, 11, 12), vec=11, all.inside=TRUE,
rightmost.closed=FALSE, left.open=FALSE)
[1] 1 0 0
>
2024 Sep 16
1
findInterval
Suppose we have `dat` shown below and we want to find the the `y` value
corresponding to the last value in `x` equal to the corresponding component
of `seek` and we wish to return an output the same length as `seek` using
`findInterval` to perform the search. This returns the correct result:
dat <- data.frame(x = c(2, 2, 3, 4, 4, 4),
y = c(37, 12, 19, 30, 6, 15),
seek = 1:6)
2024 Sep 17
1
findInterval
>>>>> Gabor Grothendieck
>>>>> on Mon, 16 Sep 2024 11:21:55 -0400 writes:
> Suppose we have `dat` shown below and we want to find the the `y` value
> corresponding to the last value in `x` equal to the corresponding component
> of `seek` and we wish to return an output the same length as `seek` using
> `findInterval` to perform the
2011 Apr 04
2
General binary search?
Is there a generic binary search routine in a standard library which
a) works for character vectors
b) runs in O(log(N)) time?
I'm aware of findInterval(x,vec), but it is restricted to numeric vectors.
I'm also aware of various hashing solutions (e.g. new.env(hash=TRUE) and
fastmatch), but I need the greatest-lower-bound match in my application.
findInterval is also slow for
2010 Jul 12
2
findInterval and data resolution
Hello Wise Ones...
I need a clever way around a problem with findInterval. Consider:
vec1 <- 1:10
vec2 <- seq(1, 10, by = 0.1)
x1 <- c(2:3)
a1 <- findInterval(x1, vec1); a1 # example 1
a2 <- findInterval(x1, vec2); a2 # example 2
In the problem I'm working on, vec* may be either integer or numeric, like
vec1 and vec2. I need to remove one or more sections of this vector;
2013 Sep 13
2
how to get values within a threshold
input:
> values
[1] 0.854400 1.648465 1.829830 1.874704 7.670915 7.673585 7.722619
> thresholds
[1] 1 3 5 7 9
expected output:
[1] 1 4 4 4 7
That is, need a vector of indexes of the maximum value below the threshold.
e.g.
First element is "1", because value[1] is the largest below threshold "1".
Second element is "4", because value[4] is the
2011 Feb 17
2
does range of values in array include a third value?
I'm using the range command to get the minimum and maximum values of an array as in
x <- range(array_y)
which gives me two values such as
[1] -2 9
I need to be able to test if this range of values includes a third value. For example I'd like to query
1) does the range of -2 to 9 include 3, answer TRUE
2) does the range of -2 to 9 include -6, answer FALSE?
All values could be
2018 Apr 19
0
create multiple categorical variables in a data frame using a loop
> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding at coh.org> wrote:
>
> Hi All,
>
> I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code.
>
> pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA
> cat.pfoa[pfas.pheno$log2pfoa
2018 Apr 20
1
create multiple categorical variables in a data frame using a loop
> On Apr 19, 2018, at 1:22 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>
>> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding at coh.org> wrote:
>>
>> Hi All,
>>
>> I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code.
2008 Sep 22
1
findInterval(), binary search, log(N) complexity
Dear R users,
the help for findInterval(x,vec) suggests a logarithmic dependence on N
(=length(vec)), which would imply a binary search type algorithm.
However, when I "test" this hypothesis, in the following manner:
set.seed(-3645);
l <- vector();
N.seq <- c(5000, 500000, 1000000, 10000000, 50000000);k <- 1
for (N in N.seq){
tmp <- sort(round(stats::rt(N, df=2), 2));
2013 Jun 18
2
find closest value in a vector based on another vector values
Dear All,
would you please provide your thoughts on the following:
let us say I have:
a <-c(1,5,8,15,32,69)
b <-c(8.5,33)
and I would like to extract from "a" the two values that are closest to the values in "b", where the length of this vectors may change but b will allways be shorter than "a". So at the end based on this example I should have the result
2009 May 22
5
Need a faster function to replace missing data
Dear List,
I need some help in coming up with a function that will take two data sets, determine if a value is missing in one, find a value in the second that was taken at about the same time, and substitute the second value in for where the first should have been. My problem is from a fish tracking study. We put acoustic tags in fish and track them for several days. Location data is supposed
2018 Apr 19
4
create multiple categorical variables in a data frame using a loop
Hi All,
I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code.
pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA
cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0
cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2
2010 May 28
4
vlookup in R?
Hi R-users,
I would like to search for the values of seq that match my rand values. In excel I will use =VLOOKUP(G2,$E$2:$F$32,2). For example, for rand=.262 it will give me approximately seq=120 and rand=0.964293344, seq=460 and etc.
E F G
cdf seq rand
0.00E+00 0 0.262123478
1.56E-03 20 0.964293344
1.55E-02 40 0.494827113
5.30E-02 60
2012 Feb 28
4
vlookup type function
Hi
I''m looking for an Excel Vlookup type function in R.
Example:
list <- c(1,2,3,4,5,6,7)
base <- c(2.2,3,5.2)
What I want is, for each number in base, the highest value in list,
which is equal to or less than the number in base
So the results would be:
base ? ? ? ? list
2.2 ?------> 2
3 ? ?------> 3
5.2 ?------> ?5
Thanks for your help!
2013 Feb 01
2
Nested loop and output help
Hello Everyone,
My name is Thomas and I have been using R for one week. I recently found
your site and have been able to search the archives of posts. This has
given me some great information that has allowed me to craft an initial
design to an inquiry I would like to make into the breakdown of McNemar's
test. I have read an intro to R manual and the posting guides and hope I am
not violating
2020 Mar 05
3
findInterval Documentation Suggestion
I've found over time that R documentation that comes off as terse at
first blush is usually revealed to be precise, concise, and complete
on close reading.? I'm sure this is also true of `?findInterval`, but
for whatever reason my brain simply refuses to extract meaning from it.
Part of the problem may be that we interact with the function via a
compressed form of the bounds of the
2012 Mar 07
2
Plot por factores
hola a todos y todas,
Estoy haciendo un estudio sobre las mareas y tengo dos variables que
quiero relacionar, la altura del mar y el tipo de marea viva, muerta o
intermedia.
Unos datos simulados podrían ser:
> datos <- data.frame(v1=sin(1:50), v2= rep(c("a","b","c"), each = 5,
> len = 50))
Ahora mi pregunta, si dibujo la altura de la marea sería: