Displaying 20 results from an estimated 30000 matches similar to: "working with summarized data"
2003 Sep 10
2
Plot survey data
I am trying to make plots that take into account survey weights. This a
survey of the US population. To start with I want to explore the data using pairs,
plot, coplots and lattice. Are there specialized methods that handle survey
weights for plotting? Any pointers?
Anupam.
[[alternative HTML version deleted]]
2002 Feb 06
4
Weighted median
Is there a weighted median function out there similar to weighted.mean()
but for medians? If not, I'll try implement or port it myself.
The need for a weighted median came from the following optimization
problem:
x* = arg_x min (a|x| + sum_{k=1}^n |x - b_k|)
where
a : is a *positive* real scalar
x : is a real scalar
n : is an integer
b_k: are negative and positive scalars
2008 Oct 07
2
weighted quantiles
I have a set of values and their corresponding weights. I can use the
function weighted.mean to calculate the weighted mean, I would like to be
able to similarly calculate the weighted median and quantiles? Is there a
function in R that can do this?
thanks,
Spencer
[[alternative HTML version deleted]]
2004 Apr 12
2
Complex sample variances
Hello,
Is there a way to get complex sample variances in the survey package on summary statistics other than means? If not, can they be added to a future version? It would be be great to have them on totals, quantiles, ratios, and tables (eg row percent, columns percent, etc).
Thanks.
Fred
---------------------------------
[[alternative HTML version deleted]]
2006 Sep 11
2
faster way?
Hi,
Is there a faster way to do this? It takes forever, even on a
moderately sized dataset.
n <- dim(dsn)[1]
dsn2 <- dsn[order(-dsn$xhat),]
dsn2[1, "cumx"] <- dsn2[1, "xhat"]
for (i in 2:n) {
dsn2[i, "cumx"] <- dsn2[i - 1, "cumx"] + dsn2[i, "xhat"]
}
[[alternative HTML version deleted]]
2017 Oct 08
2
Discourage the weights= option of lm with summarized data
Indeed: Using 'weights' is not meant to indicate that the same
observation is repeated 'n' times. As I showed, this gives erroneous
results. Hence I suggested that it is discouraged rather than
encouraged in the Details section of lm in the Reference manual.
Arie
---Original Message-----
On Sat, 7 Oct 2017, wolfgang.viechtbauer at maastrichtuniversity.nl wrote:
Using
2017 Oct 07
1
Discourage the weights= option of lm with summarized data
In the Details section of lm (linear models) in the Reference manual,
it is suggested to use the weights= option for summarized data. This
must be discouraged rather than encouraged. The motivation for this is
as follows.
With summarized data the standard errors get smaller with increasing
numbers of observations. However, the standard errors in lm do not get
smaller when for instance all weights
2017 Oct 09
2
Discourage the weights= option of lm with summarized data
Yes. Thank you; I should have quoted it.
I suggest to remove this text or to add the word "not" at the beginning.
Arie
On Sun, Oct 8, 2017 at 4:38 PM, Viechtbauer Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> Ah, I think you are referring to this part from ?lm:
>
> "(including the case that there are w_i observations equal to y_i and
2017 Dec 03
1
Discourage the weights= option of lm with summarized data
Peter,
This is a highly structured text. Just for the discussion, I separate
the building blocks, where (D) and (E) and (F) are new:
BEGIN OF TEXT --------------------
(A)
Non-?NULL? ?weights? can be used to indicate that different
observations have different variances (with the values in ?weights?
being inversely proportional to the variances);
(B)
or equivalently, when the elements of
2017 Oct 12
4
Discourage the weights= option of lm with summarized data
OK. We have now three suggestions to repair the text:
- remove the text
- add "not" at the beginning of the text
- add at the end of the text a warning; something like:
"Note that in this case the standard estimates of the parameters are
in general not correct, and hence also the t values and the p value.
Also the number of degrees of freedom is not correct. (The parameter
2009 Jun 30
2
odd behaviour in quantreg::rq
Hi,
I am trying to use quantile regression to perform weighted-comparisons of the
median across groups. This works most of the time, however I am seeing some
odd output in summary(rq()):
Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
area_fraction)
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 45.44262 3.64706 12.46007
2007 Feb 07
3
boxplot statistics in ggplot
I need to make weighted boxplots. I found that ggplot makes them. I
would however like to label them with the boxplot statistics (the
median, q1 and q3). In the boxplot function in r-base, I could output
boxplot statistics and then write a text on the plot to place the
labels. How would one do it with ggplot?
Vikas
2011 Jan 27
2
Extrapolating values from a glm fit
Dear R-help,
I have fitted a glm logistic function to dichotomous forced choices
responses varying according to time interval between two stimulus. x values
are time separation in miliseconds, and the y values are proportion
responses for one of the stimulus. Now I am trying to extrapolate x values
for the y value (proportion) at .25, .5, and .75. I have tried several
predict parameters, and they
2009 Nov 18
2
Median on Aggregated data
Folks,
I have the following code, that works fine on smaller data sets. For
larger datasets, it runs out of memory and runs way too slow because we
are essentially creating large vectors with rep() and then calling
median() on it. (I learned this approach from a post on the web).
Below that, I have written the corresponding SAS code. The SAS code
works fast because I can just tell the proc
2003 Oct 02
3
Query: weighting cells in histogram
I have the 'breaks' for the histogram ('hist') but I want weight the cells instead of using actual observations. I thought that using freq=FALSE implied that the numbers in 'x' were weights but this turned out to be wrong.
Any help and/or comment is very much appreciated.
Regards,
M?rten
M?rten Bjellerup
Doctoral Student in Economics
School of Management and Economics
2009 Nov 21
4
other decriptive stats packages
i just found the following list, i wondered if anybody could add to this as i
have to characterize a large data set and am new to R...the list below was
so helpful....can you add to this???
Just to forestall confusion amongst those who would like to use one of
the functions called "describe"...
Hmisc package - describe
numeric
name
count of observations
count of missing
2012 Oct 30
6
standard error for quantile
Dear all
I have a question about quantiles standard error, partly practical
partly theoretical. I know that
x<-rlnorm(100000, log(200), log(2))
quantile(x, c(.10,.5,.99))
computes quantiles but I would like to know if there is any function to
find standard error (or any dispersion measure) of these estimated
values.
And here is a theoretical one. I feel that when I compute median from
given
2006 Mar 11
1
Quicker quantiles?
Motivated by Deepayan's recent inquiries about the efficiency of the
R 'quantile'
function:
http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
http://tolstoy.newcastle.edu.au/R/devel/06/03/4358.html
I decided to try to revive an old project to implement a version of
the Floyd
and Rivest (1975) algorithm for finding quantiles with O(n)
comparisons. I
used
2010 Mar 30
2
weighted.median function from package R.basic
Dear all,
I want to apply a weighted median on a huge dataset, and I remember a
function from the package R.basic that could do this using an internal
sorting algorithm qsort. This speeded things up quite a bit. Alas, I can't
find that package anywhere anymore. There is a weighted.median function in
the package limma too, but I didn't use that before.
Anybody who knows what happened to
2010 Nov 12
3
predict.coxph
Since I read the list in digest form (and was out ill yesterday) I'm
late to the discussion.
There are 3 steps for predicting survival, using a Cox model:
1. Fit the data
fit <- coxph(Surv(time, status) ~ age + ph.ecog, data=lung)
The biggest question to answer here is what covariates you wish to base
the prediction on. There is the usual tradeoff between too few (leave
out something