Displaying 20 results from an estimated 3000 matches similar to: "odd behavior of "summary" function"
2007 Oct 09
3
Summary vs fivenum results for Q3
I've just started using R and am still a neophyte, but I found the following curious result. I'm using the current version of R (2.5.1 (2007-06-27) ).
Why are the results for the third quartile different in the output from the summary and fivenum commands? For the following data set
457 514 530 530 538 560 687 745 745 778 786 790 792
2010 Jan 22
2
Quartiles and Inter-Quartile Range
Why am I getting a wrong result for quartiles?
here is my code:
> cbiomass = c(910, 1058, 929, 1103, 1056, 1022, 1255, 1121, 1111, 1192,
> 1074, 1415)
> summary(cbiomass)
> IQR(cbiomass)
The result R gives me is:
For the summary
> Min. 1st Qu. Median Mean 3rd Qu. Max.
910 1048 1088 1104 1139 1415
For IQR
> 91.25
*********
The true Q1 is 1039
2007 Nov 18
2
Obtaining x-values from ECDF
Dear Group,
I am using the ecdf function as follows:
cawa.cdp <- ecdf(cawaocc$LEFF80)
summary(cawa.cdp)
Empirical CDF: 223 unique values with summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.07918 1.35700 1.68600 1.61000 1.91200 2.70000
I can see by the summary that the y-value for the 3rd quartile is 1.912.
How can I obtain the x-value for a specified y-value (e.g., 0.8)?
2005 Apr 28
3
have to point it out again: a distribution question
Stock returns and other financial data have often found to be heavy-tailed.
Even Cauchy distributions (without even a first absolute moment) have been
entertained as models.
Your qq function subtracts numbers on the scale of a normal (0,1)
distribution from the input data. When the input data are scaled so that
they are insignificant compared to 1, say, then you get essentially the
2010 Jul 29
2
ggplot2 histograms... a subtle error found
Hello all,
I have a peculiar and particular bug that I stumbled across with
ggplot2. I cannot seem to replicate it with anything other than my specific
data set.
Here is the problem:
- when I try to plot a histogram, allowing for ggplot2 to decide the
binwidths itself, I get the following error:
- stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start
2011 Feb 08
4
manipulating the Date & Time classes
Hello,
This is mostly to developers, but in case I missed something in my
literature search, I am sending this to the broader audience.
- Are there any plans in the works to make "time" classes a bit more
friendly to the rest of the "R" world? I am not suggesting to allow for
fancy functions to manipulate times, per se, or to figure out how to
properly
2011 Feb 08
4
manipulating the Date & Time classes
Hello,
This is mostly to developers, but in case I missed something in my
literature search, I am sending this to the broader audience.
- Are there any plans in the works to make "time" classes a bit more
friendly to the rest of the "R" world? I am not suggesting to allow for
fancy functions to manipulate times, per se, or to figure out how to
properly
2011 Feb 24
1
Boxplot not doing what I think it should
My box plot below is drawing its upper whisker all the way to the last point, instead of showing the point as an outlier. Am I misunderstanding, or is it a bug?
Help(boxplot) states for the parameter ?range? that ?this determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the
2010 Jul 13
1
question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Hi everyone,
I have another "Random Forest" package question:
- my (presumably incorrect) understanding of the varImpPlot is that it
should plot the "% increase in MSE" and "IncNodePurity" exactly as can be
found from the "importance" section of the model results.
- However, the plot does not, in fact, match the "importance"
2010 Dec 20
1
ideas, modeling highly discrete time-series data
Hello all,
First of all, thanks so those of you who helped me a week or so ago
managing a time series with varying gaps between the data series in 'R'.
(My final preferred solution was to use "its" function & then
forecast(Arima( ) ). )
My next question is a general statistical question where I'd like some
advice, for those willing / able to proffer any wisdom:
2010 Dec 03
2
How to get 'R' to talk BACK to other languages / scripts??
Hey everyone,
I know that I can call 'R' from other scripts, and that I can make
command calls from 'R' (e.g., using system() ). But how can I get 'R' to
RETURN values to the script that called it. E.g., I would like to be able
to do something like the following (as a simpler example) from a bash
script:
#!/bin/bash
myTest=echo /usr/local/bin/R --no-restore
2005 Oct 04
6
boxplot statistics
I have read and reread the boxplot and the boxplot stats page, and I
still cannot understand how and what boxplot shows. I realize that
this might be due to me not knowing enough statistics, but anyway...
First, how does boxplot determine the size of the box? And is the line
inside the box the mean or the median (or something completely
different?) And how does it determine how long out the
2010 Dec 17
2
how to convert "sloppy data" into a time series?
Hi All,
First let me state that I did search for a while on r-help, google, and
using the "sos" package inside of 'R', without much luck. I want to know
how to create a univariate time series from a set of data that will have
huge time gaps in it. For instance, here is a snapshot of a piece of data
that I would like to analyze:
*Row queued_time
2007 Feb 22
1
Diagnostic Tests: Jarque-Bera Test / RAMSEY
Hello R-Users,
The following questions are not R-technical, but more of general statistical
nature.
1. NORMALITY
I built a normal linear regression model and now I want to check for the
residual normality assumption. If I check the distribution graphically and
look at the descriptive characteristics (skewness and kurtosis are below 1),
I would confirm that the residuals are normally
2011 Jan 12
2
syntax for extending a line in a script??
Hello,
A hopefully simple question. I use 'R' through emacs, but I suspect the
following would occur with any manner of text editor:
- my editor has a normally quite handy feature where it will
automatically indent to the appropriate level when I start a new line.
However, this occasionally creates cases where there is no friendly way to
break a long line of code into
2012 Oct 17
2
loop of quartile groups
Greetings R users,
My goal is to generate quartile groups of each variable in my data set. I
would like each experiment to have its designated group added as a
subsequent column. I can accomplish this individually with the following
code:
brks <- with(data_variables,
cut2(var2, g=4))
#I don't want the actual numbers, I need a numbered group
data$test1=factor(brks,
2010 Oct 26
2
Forcing results from lm into datframe
Hi
I need some help getting results from multiple linear models into a dataframe.
Let me explain the problem.
I have a dataframe with ejection fraction results measured over a number of quartiles and grouped by base_study.
My dataframe (800 different base_studies) looks like
> afvtprelvefs
basestudy quartile ef ef_std entropy
CBP0908020 1 21.6 0.53 3.27
2009 Sep 22
5
use of class variable in r as in Proc means of sas
Hi,everyone i need to calculate quartile values of a variable grouped by the
other variable .
same as in aggregate function(only median,mean or functions is possible-i
think so)
Could you please help me to achieve the same for other quartile
values(5,10,25,75,90) as for median using aggregate.
Thanks in advance.
data :
zip price
60000 567000
60001 478654
60004 485647
60001
2003 Oct 28
4
random number generation
Hi every one,
I am trying to generate a normally distributed random variable with the
following descriptive statistics,
min=1, max=99, variance=125, mean=38.32, 1st quartile=38, median=40, 3rd
quartile=40, skewness=-0.274.
I know the "rnorm" will allow me to simulate random numbers with mean 38.32
and Sd=11.18(sqrt(125)). But I need to have the above mentioned descriptive