thr3ads.net - similar to: "Removing outliers"

Displaying 20 results from an estimated 4000 matches similar to: "Removing outliers"

cph/nomogram Design/RMS package hazard ratio: interquartile vs per unit

2011 Oct 21

cph/nomogram Design/RMS package hazard ratio: interquartile vs per unit

Hello, I am constructing a nomogram using cph and nomogram commands in Dr. Harrell's Design/RMS package. The HR that I obtain for dichotomous and categorical variables are identical to those that I obtain using STATA stcox. However, the inter-quartile HR I obtain for continuous variables is obviously different, since STATA gives me HR for each unit (year, centimeter, etc) like coxph would

Boxplot not doing what I think it should

2011 Feb 24

Boxplot not doing what I think it should

My box plot below is drawing its upper whisker all the way to the last point, instead of showing the point as an outlier. Am I misunderstanding, or is it a bug? Help(boxplot) states for the parameter ?range? that ?this determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the

box and whisker (PR#13821)

2009 Jul 12

box and whisker (PR#13821)

In a Box and Whisker plot, I thought that when there are outliers both abov= e and below the whiskers, then the whiskers should both be the same length = (plus or minus 1.5 times the inter-quartile range). If you look at the plot for SilwoodWeather on p.155 of The R Book you will = see that for November (month =3D 11) the upper whisker is shorter than the = lower, while for other months with

loop of quartile groups

2012 Oct 17

loop of quartile groups

Greetings R users, My goal is to generate quartile groups of each variable in my data set. I would like each experiment to have its designated group added as a subsequent column. I can accomplish this individually with the following code: brks <- with(data_variables, cut2(var2, g=4)) #I don't want the actual numbers, I need a numbered group data$test1=factor(brks,

outlier threshold

2005 Feb 25

outlier threshold

For the analysis of financial data wih a large variance, what is the best way to select an outlier threshold? Listed below, is there a best method to select an outlier threshold and how does R calculate it? In R, how do you find the outlier threshold through an interquartile range? In R, how do you find the outlier threshold using the hist command? In R, how do you find the outlier threshold

Forcing results from lm into datframe

2010 Oct 26

Forcing results from lm into datframe

Hi I need some help getting results from multiple linear models into a dataframe. Let me explain the problem. I have a dataframe with ejection fraction results measured over a number of quartiles and grouped by base_study. My dataframe (800 different base_studies) looks like > afvtprelvefs basestudy quartile ef ef_std entropy CBP0908020 1 21.6 0.53 3.27

Quartile regression question

2008 Jun 13

Quartile regression question

I have data that looks like lake,loglength,logweight 1,2.369215857,1.929418926 1,2.426511261,2.230448921 1,2.434568904,2.298853076 1,2.437750563,2.298853076 1,2.442479769,2.230448921 1,2.445604203,2.356025857 ... 102,2.722633923,3.310268367 102,2.781755375,3.502153893 102,2.836324116,3.683407299 102,2.802773725,3.583312152 102,2.790285164,3.546419267 102,2.806179974,3.599118565

use of class variable in r as in Proc means of sas

2009 Sep 22

use of class variable in r as in Proc means of sas

Hi,everyone i need to calculate quartile values of a variable grouped by the other variable . same as in aggregate function(only median,mean or functions is possible-i think so) Could you please help me to achieve the same for other quartile values(5,10,25,75,90) as for median using aggregate. Thanks in advance. data : zip price 60000 567000 60001 478654 60004 485647 60001

Quartiles and Inter-Quartile Range

2010 Jan 22

Quartiles and Inter-Quartile Range

Why am I getting a wrong result for quartiles? here is my code: > cbiomass = c(910, 1058, 929, 1103, 1056, 1022, 1255, 1121, 1111, 1192, > 1074, 1415) > summary(cbiomass) > IQR(cbiomass) The result R gives me is: For the summary > Min. 1st Qu. Median Mean 3rd Qu. Max. 910 1048 1088 1104 1139 1415 For IQR > 91.25 ********* The true Q1 is 1039

random number generation

2003 Oct 28

random number generation

Hi every one, I am trying to generate a normally distributed random variable with the following descriptive statistics, min=1, max=99, variance=125, mean=38.32, 1st quartile=38, median=40, 3rd quartile=40, skewness=-0.274. I know the "rnorm" will allow me to simulate random numbers with mean 38.32 and Sd=11.18(sqrt(125)). But I need to have the above mentioned descriptive

Splitting Data Into Different Series

2012 Aug 06

Splitting Data Into Different Series

Dear R Community, I'm trying to write a loop to split my data into different series. I need to make a new matrix (or series) according to the series code. For instance, every time the "code" column assumes the value "433" I need to save "date", "value", and "code" into the "dados433" matrix. Please take a look at the following

Getting wrong NA values using "for" cmd

2011 Jul 08

Getting wrong NA values using "for" cmd

Hi There, I'm facing one problem to construct a vector using the "for" command: I have one matrix named 'dados' (same as /data/ from portuguese), for example: > dados[140:150,] [,1] [,2] [,3] [1,] 212.7298 0.14 0.11 [2,] 213.3778 0.14 0.11 [3,] 214.0257 0.15 0.11 [4,] 214.6737 0.15 0.12 [5,] 215.3217 0.15 0.12 [6,] 215.9696 0.15 0.12 [7,] 216.6176 0.16

Summary vs fivenum results for Q3

2007 Oct 09

Summary vs fivenum results for Q3

I've just started using R and am still a neophyte, but I found the following curious result. I'm using the current version of R (2.5.1 (2007-06-27) ). Why are the results for the third quartile different in the output from the summary and fivenum commands? For the following data set 457 514 530 530 538 560 687 745 745 778 786 790 792

Bug: floating point bug in nclass.FD can cause hist() to crash

2017 May 18

Bug: floating point bug in nclass.FD can cause hist() to crash

Hello everybody, This is a bug involving functions in core R package: graphics::hist.default, grDevices::nclass.FD, and base::pretty.default. It is not yet on Bugzilla. I cannot submit it myself, as I do not have an account. Could somebody else add it for me, perhaps? That would be much appreciated. Kind regards, Sietse Sietse Brouwer Summary ------- Floating point errors can cause a data

How to define proper breaks in RFM analysis

2017 Oct 13

How to define proper breaks in RFM analysis

> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at precheza.cz> wrote: > > Hi > > You expect us to solve your problem but you ignore advice already recieved. > > Your data are unreadable, use dput(yourdata) instead. see ?dput > >> test<-read.table("clipboard", heade=T) > Error in scan(file = file, what = what, sep = sep, quote = quote,

Problems with variable types.

2003 Mar 06

Problems with variable types.

Hi all, I have problems in a dataframe variables types. Look: from a loop function: for(...){ ... dados.fin <- rbind(dados.fin, c(L=j, A=j^2, Nsp=nsps, N=length(amosfin$SP), AmT="am",NAm=nam, AMST=amst)) dados.fin <- rbind(dados.fin, c(L=j, A=j^2,

Plot.svm error

2008 Jan 02

Plot.svm error

Hi all, Sorry to be bothering again with probably an easy error to fix, but I've been trying to solve the problem and haven't been able yet to do it. So I'm doing this: > dados<-read.table("b.txt",sep="",nrows=30000) >

Tables and merge

2011 Jul 06

Tables and merge

----- Original Message ----- From: "Silvano" <silvano at uel.br> To: <r-help at r-project.org> Sent: Thursday, June 30, 2011 9:07 AM Subject: Tables and merge > Hi, > > I have 21 files which is common variable CODE. > Each file refers to a question. > > I would like to join the 21 files into one, to construct > tables for each question by CODE. >

How to define proper breaks in RFM analysis

2017 Oct 13

How to define proper breaks in RFM analysis

Hemant's problem is that the indicators are not distributed uniformly. With a uniform distribution, categorization gives a reasonably optimal separation of cases. One approach would be to drop categorization and calculate the overall score as the mean of the standardized indicator scores. Whether this is an option I do not know. I did offer an "eyeball" set of breaks in a previous

Defining reference category for a cph model summary inside of a "for" loop

2008 Mar 28

Defining reference category for a cph model summary inside of a "for" loop

I have the following code. > f <- cph(formula = Surv(TimeToDeath, Dead == "Yes") ~1,data=single.dat, x=T, y=T, surv=T) > for(i in c('A', 'B', 'C', 'D', 'E', 'F')){ > f <-update(f,as.formula(paste('Surv(TimeToDeath, Dead == "Yes")~',i,sep=''))) > print(summary(f, paste(i,"=1st

similar to: Removing outliers