Displaying 20 results from an estimated 10000 matches similar to: "how to create a variable to rank within subgroups"
2005 Jan 06
6
"labels" attached to variable names
Hi,
Can we attach a more descriptive "label" (I may use the wrong
terminology, which would explain why I found nothing on the FAQ) to
variable names, and later have an easy way to switch to these labels in
plots? I fear this is not possible and one must enter this by hand as
ylab and xlab when making plots.
Thanks in advance,
Denis Chabot
2005 Sep 26
4
p-level in packages mgcv and gam
Hi,
I am fairly new to GAM and started using package mgcv. I like the
fact that optimal smoothing is automatically used (i.e. df are not
determined a priori but calculated by the gam procedure).
But the mgcv manual warns that p-level for the smooth can be
underestimated when df are estimated by the model. Most of the time
my p-levels are so small that even doubling them would not result
2009 Mar 10
4
puzzled by math on date-time objects
Hi,
I don't understand the following. When I create a small artificial set
of date information in class POSIXct, I can calculate the mean and the
median:
a = as.POSIXct(Sys.time())
a = a + 60*0:10; a
[1] "2009-03-10 11:30:16 EDT" "2009-03-10 11:31:16 EDT" "2009-03-10
11:32:16 EDT"
[4] "2009-03-10 11:33:16 EDT" "2009-03-10 11:34:16
2007 Mar 18
2
italics letter in roman string
Hi,
As part of the legend to a plot, I need to have the "n" in italics
because it is a requirement of the journal I aim to publish in:
"This study, n = 3293"
Presently I have:
legend(20, 105, "This study, n = 3293", pch=1, col=rgb(0,0,0,0.5),
pt.cex=0.3, cex=0.8, bty="n")
I suppose I could leave a blank in place of the "n",
2006 Sep 20
1
functionality of "update" in SAS
Dear list,
I've tried to search the archives but found nothing, although I may
use the wrong wording in my searches. I've also double-checked the
upData function in Hmisc, but it does something else.
I'm wondering if one can update a dataframe by "forcing into" it a
shorter dataframe containing the corrections, like the "update"
provided in SAS data steps.
2005 Feb 04
5
2 small problems: integer division and the nature of NA
Hi,
I'm wondering why
48 %/% 2 gives 24
but
4.8 %/% 0.2 gives 23...
I'm not trying to round up here, but to find out how many times
something fits into something else, and the answer should have been the
same for both examples, no?
On a different topic, I like the behavior of NAs better in R than in
SAS (at least they are not considered the smallest value for a
variable), but at the
2006 Feb 08
1
plotting lines that break if data break
Hi,
Sometimes data series (not necessarily time series) suffer breaks
where data were expected, but not collected. Often the regular
"lines" command to add such data to a plot is what I want, but other
times I'd like the line to break where the data series is
interrupted, instead of the line jumping to the next point in the
series uninterrupted. Usually my data file
2006 Oct 26
2
distance between legend title and legend box
Hi,
I've looked at the parameters available for the legend function and
cannot find a way to change the distance between the top of the box
surrounding a legend and the legend's title. I have a math expression
that raises the height of my title.
If you don't mind the non-sensical title I give to the legend for
this plot (Figure 3.20 in R Graphics):
with(iris,
2005 Jan 19
2
recoding large number of categories (select in SAS)
Hi,
I have data on stomach contents. Possible prey species are in the
hundreds, so a list of prey codes has been in used in many labs doing
this kind of work.
When comes time to do analyses on these data one often wants to regroup
prey in broader categories, especially for rare prey.
In SAS you can nest a large number of "if-else", or do this more
cleanly with "select"
2006 Nov 08
2
combining dataframes with different numbers of columns
Dear list members,
I have to combine dataframes together. However they contain different
numbers of variables. It is possible that all the variables in the
dataframe with fewer variables are contained in the dataframe with
more variables, though it is not always the case.
There are key variables identifying observations. These could be used
in a merge statement, although this won't
2006 Feb 05
1
how to extract predicted values from a quantreg fit?
Hi,
I have used package quantreg to estimate a non-linear fit to the
lowest part of my data points. It works great, by the way.
But I'd like to extract the predicted values. The help for
predict.qss1 indicates this:
predict.qss1(object, newdata, ...)
and states that newdata is a data frame describing the observations
at which prediction is to be made.
I used the same technique I used
2007 May 22
5
Reducing the size of pdf graphics files produced with R
Hi,
Without trying to print 1000000 points (see <http://
finzi.psych.upenn.edu/R/Rhelp02a/archive/42105.html>), I often print
maps for which I do not want to loose too much of coastline detail,
and/or plots with 1000-5000 points (yes, some are on top of each
other, but using transparency (i.e. rgb colors with alpha
information) this actually comes through as useful information.
2006 Sep 13
1
reshaping a dataset
Hi,
I'm trying to move to R the last few data handling routines I was
performing in SAS.
I'm working on stomach content data. In the simplified example I
provide below, there are variables describing the origin of each prey
item (nbpc is a ship number, each ship may have been used on
different trips, each trip has stations, and individual fish (tagno)
can be caught at each
2010 Jan 08
2
A better way to Rank Data that considers "ties"
This will start off sounding very easy, but I think it will be very
complicated.
Let's say that I have a matrix, which shows the number of apples that each
person in a group has.
OriginalMatrix<-matrix(c(2,3,5,4,6),nrow=5,ncol=1,byrow=T,dimnames=list(c("Bob","Frank","Joe","Jim","David"),c("Apples")))
Apples
Bob 2
2005 Oct 05
3
testing non-linear component in mgcv:gam
Hi,
I need further help with my GAMs. Most models I test are very
obviously non-linear. Yet, to be on the safe side, I report the
significance of the smooth (default output of mgcv's summary.gam) and
confirm it deviates significantly from linearity.
I do the latter by fitting a second model where the same predictor is
entered without the s(), and then use anova.gam to compare the
2003 Jul 22
1
rank with ties
Hi,
Is there a function like rank but that solves the ties by randomly assigning
a value (doesn't average ranks of ties).
This is what I actually need:
I want to make NA all elements of each column in an array that are ranked in
a position larger that rankmax for each column.
# Say I've got an array b:
b<-cbind(c(1:5,5:1),c(1,12,14,2,5,4:8))
#> b
# [,1] [,2]
#[1,] 1 1
2006 Oct 10
3
Rank Function
Does anyone know why the two rank functions gives
different results? I need to use the rank function in
a "for" loop, so the sequence to be ranked is given
values in the form of part (1). How can I use
assignment like in part (1) to get correct ranks as in
part (2)?
Thank You
Part (1)
i<-1.94
b<-0.95-i
c<-1.73-i
d<-2.62-i
y<-c(0.68,0.95,b,c,d)
y
0.68 0.95 -0.99
2009 Jul 23
2
alternative to rbind within a loop
Hi,
I often have to do this:
select a folder (directory) containing a few hundred data files in csv
format (up to 1000 files, in fact)
open each file, transform some character variables in date-tiime format
make into a dataframe (involves getting rid of a few variables I don't
need
concatenate to the master dataframe that will eventually contain the
data from all the files in the
2010 Oct 17
1
lattice xyplot - formatting of multiple Y variables when using subgroups
Hi all,
Using xyplot I want to print to Y variables (y1, y2) versus X, conditional
on the group.
How can I obtain a line (type="l") for one relationship (ie. y1 ~ x) and
points (type="p") for the other (y2 ~ x) ?
library(lattice)
# create some sample data
df<-data.frame(group=as.factor(c(rep("a",4), rep("b",4))), # grouping
variable for conditional
2005 Nov 01
1
percent rank by an index key?
What is the easiest way to calculate a percent rank “by” an index key?
Foe example, I have a dataset with 3 fields:
Year, State, Income ,
I wish to calculate the rank, by year, by state.
I also wish to calculate the “percent rank”, where I define percent rank as rank/n.
(n is the number of numeric data points within each date-state grouping.)
This is what I am