thr3ads.net - similar to: "how to create a variable to rank within subgroups"

Displaying 20 results from an estimated 10000 matches similar to: "how to create a variable to rank within subgroups"

2005 Jan 06

"labels" attached to variable names

Hi, Can we attach a more descriptive "label" (I may use the wrong terminology, which would explain why I found nothing on the FAQ) to variable names, and later have an easy way to switch to these labels in plots? I fear this is not possible and one must enter this by hand as ylab and xlab when making plots. Thanks in advance, Denis Chabot

p-level in packages mgcv and gam

2005 Sep 26

p-level in packages mgcv and gam

Hi, I am fairly new to GAM and started using package mgcv. I like the fact that optimal smoothing is automatically used (i.e. df are not determined a priori but calculated by the gam procedure). But the mgcv manual warns that p-level for the smooth can be underestimated when df are estimated by the model. Most of the time my p-levels are so small that even doubling them would not result

puzzled by math on date-time objects

2009 Mar 10

puzzled by math on date-time objects

Hi, I don't understand the following. When I create a small artificial set of date information in class POSIXct, I can calculate the mean and the median: a = as.POSIXct(Sys.time()) a = a + 60*0:10; a [1] "2009-03-10 11:30:16 EDT" "2009-03-10 11:31:16 EDT" "2009-03-10 11:32:16 EDT" [4] "2009-03-10 11:33:16 EDT" "2009-03-10 11:34:16

italics letter in roman string

2007 Mar 18

italics letter in roman string

Hi, As part of the legend to a plot, I need to have the "n" in italics because it is a requirement of the journal I aim to publish in: "This study, n = 3293" Presently I have: legend(20, 105, "This study, n = 3293", pch=1, col=rgb(0,0,0,0.5), pt.cex=0.3, cex=0.8, bty="n") I suppose I could leave a blank in place of the "n",

functionality of "update" in SAS

2006 Sep 20

functionality of "update" in SAS

Dear list, I've tried to search the archives but found nothing, although I may use the wrong wording in my searches. I've also double-checked the upData function in Hmisc, but it does something else. I'm wondering if one can update a dataframe by "forcing into" it a shorter dataframe containing the corrections, like the "update" provided in SAS data steps.

2 small problems: integer division and the nature of NA

2005 Feb 04

2 small problems: integer division and the nature of NA

Hi, I'm wondering why 48 %/% 2 gives 24 but 4.8 %/% 0.2 gives 23... I'm not trying to round up here, but to find out how many times something fits into something else, and the answer should have been the same for both examples, no? On a different topic, I like the behavior of NAs better in R than in SAS (at least they are not considered the smallest value for a variable), but at the

plotting lines that break if data break

2006 Feb 08

plotting lines that break if data break

Hi, Sometimes data series (not necessarily time series) suffer breaks where data were expected, but not collected. Often the regular "lines" command to add such data to a plot is what I want, but other times I'd like the line to break where the data series is interrupted, instead of the line jumping to the next point in the series uninterrupted. Usually my data file

distance between legend title and legend box

2006 Oct 26

distance between legend title and legend box

Hi, I've looked at the parameters available for the legend function and cannot find a way to change the distance between the top of the box surrounding a legend and the legend's title. I have a math expression that raises the height of my title. If you don't mind the non-sensical title I give to the legend for this plot (Figure 3.20 in R Graphics): with(iris,

recoding large number of categories (select in SAS)

2005 Jan 19

recoding large number of categories (select in SAS)

Hi, I have data on stomach contents. Possible prey species are in the hundreds, so a list of prey codes has been in used in many labs doing this kind of work. When comes time to do analyses on these data one often wants to regroup prey in broader categories, especially for rare prey. In SAS you can nest a large number of "if-else", or do this more cleanly with "select"

combining dataframes with different numbers of columns

2006 Nov 08

combining dataframes with different numbers of columns

Dear list members, I have to combine dataframes together. However they contain different numbers of variables. It is possible that all the variables in the dataframe with fewer variables are contained in the dataframe with more variables, though it is not always the case. There are key variables identifying observations. These could be used in a merge statement, although this won't

how to extract predicted values from a quantreg fit?

2006 Feb 05

how to extract predicted values from a quantreg fit?

Hi, I have used package quantreg to estimate a non-linear fit to the lowest part of my data points. It works great, by the way. But I'd like to extract the predicted values. The help for predict.qss1 indicates this: predict.qss1(object, newdata, ...) and states that newdata is a data frame describing the observations at which prediction is to be made. I used the same technique I used

Reducing the size of pdf graphics files produced with R

2007 May 22

Reducing the size of pdf graphics files produced with R

Hi, Without trying to print 1000000 points (see <http:// finzi.psych.upenn.edu/R/Rhelp02a/archive/42105.html>), I often print maps for which I do not want to loose too much of coastline detail, and/or plots with 1000-5000 points (yes, some are on top of each other, but using transparency (i.e. rgb colors with alpha information) this actually comes through as useful information.

reshaping a dataset

2006 Sep 13

reshaping a dataset

Hi, I'm trying to move to R the last few data handling routines I was performing in SAS. I'm working on stomach content data. In the simplified example I provide below, there are variables describing the origin of each prey item (nbpc is a ship number, each ship may have been used on different trips, each trip has stations, and individual fish (tagno) can be caught at each

A better way to Rank Data that considers "ties"

2010 Jan 08

A better way to Rank Data that considers "ties"

This will start off sounding very easy, but I think it will be very complicated. Let's say that I have a matrix, which shows the number of apples that each person in a group has. OriginalMatrix<-matrix(c(2,3,5,4,6),nrow=5,ncol=1,byrow=T,dimnames=list(c("Bob","Frank","Joe","Jim","David"),c("Apples"))) Apples Bob 2

testing non-linear component in mgcv:gam

2005 Oct 05

testing non-linear component in mgcv:gam

Hi, I need further help with my GAMs. Most models I test are very obviously non-linear. Yet, to be on the safe side, I report the significance of the smooth (default output of mgcv's summary.gam) and confirm it deviates significantly from linearity. I do the latter by fitting a second model where the same predictor is entered without the s(), and then use anova.gam to compare the

rank with ties

2003 Jul 22

rank with ties

Hi, Is there a function like rank but that solves the ties by randomly assigning a value (doesn't average ranks of ties). This is what I actually need: I want to make NA all elements of each column in an array that are ranked in a position larger that rankmax for each column. # Say I've got an array b: b<-cbind(c(1:5,5:1),c(1,12,14,2,5,4:8)) #> b # [,1] [,2] #[1,] 1 1

Rank Function

2006 Oct 10

Rank Function

Does anyone know why the two rank functions gives different results? I need to use the rank function in a "for" loop, so the sequence to be ranked is given values in the form of part (1). How can I use assignment like in part (1) to get correct ranks as in part (2)? Thank You Part (1) i<-1.94 b<-0.95-i c<-1.73-i d<-2.62-i y<-c(0.68,0.95,b,c,d) y 0.68 0.95 -0.99

alternative to rbind within a loop

2009 Jul 23

alternative to rbind within a loop

Hi, I often have to do this: select a folder (directory) containing a few hundred data files in csv format (up to 1000 files, in fact) open each file, transform some character variables in date-tiime format make into a dataframe (involves getting rid of a few variables I don't need concatenate to the master dataframe that will eventually contain the data from all the files in the

lattice xyplot - formatting of multiple Y variables when using subgroups

2010 Oct 17

lattice xyplot - formatting of multiple Y variables when using subgroups

Hi all, Using xyplot I want to print to Y variables (y1, y2) versus X, conditional on the group. How can I obtain a line (type="l") for one relationship (ie. y1 ~ x) and points (type="p") for the other (y2 ~ x) ? library(lattice) # create some sample data df<-data.frame(group=as.factor(c(rep("a",4), rep("b",4))), # grouping variable for conditional

percent rank by an index key?

2005 Nov 01

percent rank by an index key?

What is the easiest way to calculate a percent rank “by” an index key? Foe example, I have a dataset with 3 fields: Year, State, Income , I wish to calculate the rank, by year, by state. I also wish to calculate the “percent rank”, where I define percent rank as rank/n. (n is the number of numeric data points within each date-state grouping.) This is what I am

similar to: how to create a variable to rank within subgroups