Displaying 20 results from an estimated 8000 matches similar to: "Randomly selecting one row for each factor level"
2006 Apr 20
1
Randomly selecting one row for each factor level [Broadca st]
The following should work:
> dfr.samp <- dfr[tapply(1:nrow(dfr), dfr$x, sample, 1),]
> dfr.samp
x y z
10 a 10 J
2 b 2 B
9 c 9 I
Andy
From: Kelly Hildner
>
> I don't use R much, and I have been unable to figure out how
> to get the
> subset of my data frame that I would like.
>
> For example, if this were my data frame:
>
> > dfr <-
2003 Jun 10
1
color coding a legend
I'm using R 1.6.2 on a Windows 2000 machine.
I've plotted the results of an MDS run labeled by a numerical ID, and
color coded by a group code:
plot(cv.mds.spr$points, type="n", main="Non-Metric Multidimensional
Scaling of SprRun CV Watersheds")
text(cv.mds.spr$points, labels = as.character(cv.wshed.id.spr), col =
codes(cv.wshed.grp), cex=.75)
Question is, how do I
2005 May 02
2
Nonparametric Tukey-type multiple comparisons "Nemenyi" test
I am trying to do a Nonparametric Tukey-type multiple comparison
post-hoc test to determine which groups are significantly different. I
have read the dialogue on this topic from the R-help, and am still not
clear why no statistical packages include this test as an option? Is it
not an appropriate test to conduct on non-normally distributed data? Is
the only option to calculate it by hand
2003 Feb 20
2
subset with NA
Easy question that I can't find an answer for. I'm trying to subset a
data frame and want to exclude the positive values, i.e. I want the NA
values.
My data:
> summary(temp$tuna)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 2 3 3 4 5 1211
Querying for
subset(temp, tuna %in% "NA", select....
subset(temp, tuna == NA,
2016 Mar 22
3
Memory usage in prcomp
Hi All:
I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD
2016 Mar 22
3
Memory usage in prcomp
Hi All:
I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD
2002 Oct 25
1
reshape: duplicate rows to multiple cols
I have a dataframe that I'm trying to reshape, and need advice. My data:
> klam.merge[200:225,]
stream lulc x sumlength pct.lgth
200 1223030419685 92 0.25000000 9.89 2.52780586
201 1223030419686 23 0.00274154 4.73 0.05796068
202 1223030419686 41 0.75009917 4.73 15.85833341
203 1223030419686 42 2.65000000 4.73 56.02536998
204
2002 Oct 17
4
Newbie Time Series Questions
I have a data set of monthly river flows from 1960-2000, which are
similar in structure to the nottem data:
> klam.flow
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
1961 1461 1716 2524 1773 1906 2005 1756 1575 1387 983 1094 1382
1962 1907 2253 1985 1907 1769 1676 2634 1386 929 766 968 1309
...
I tried plotting with
> ts.plot(klam.flow)
Which quickly led me to
2009 Jul 12
1
Booting problem with memdisk + Thinkpad + USB
Hi,
I encountered a booting problem with memdisk 2.83, USB and IBM Thinkpad
T61, apparently the same issue as described here:
http://syslinux.zytor.com/archives/2008-April/009850.html
The boot process always stops after "Loading boot sector... booting...".
With debug tracers enabled, the last few output lines are:
Loading boot sector... FR<p>Dbooting...
2008 Mar 06
2
How to hold a value(Mean sq) with a string
Hi all:
Can someone advice me on how to hold the residuals
Mean sq value on a string
so it can be used in other calculations.
I was trying something like this:
Msquare<-dfr$Mean sq but fails..Thanks
dfr <- read.table(textConnection("percentQ
Efficiency
1.565 0.0125
1.94 0.0213
0.876 0.003736
1.027 0.006
1.536 0.0148
1.536 0.0162
2.607 0.02
1.456 0.0157
2.16 0.0103
2002 Oct 24
3
model.matrix (via predict) (PR#2206)
Full_Name: Glenn Stone
Version: 1.5.1 and 1.6.0
OS: win2000
Submission from: (NULL) (168.140.227.9)
The following code produces incorrect fitted values in version 1.5.1 and an
error in 1.6.0
Error in "contrasts<-"(*tmp*, value = "contr.treatment") :
contrasts apply only to factors
In addition: Warning message:
variable ihalf is not a factor in:
1997 Apr 08
1
R-alpha: User friendly functions
A loose idea for *post*-0.50 development
I've been giving a some (but not all that many) thoughts to whether
some of the conceptual difficulties facing newcomers could be avoided
by having simplified functions for common operations. We already have
parts of this, e.g. in Kurts ctest routines. Specifically, I was
thinking about data frames: How about
2010 Oct 04
2
i have aproblem --thank you
dear professor:
thank you for your help,witn your help i develop the nomogram successfully.
after that i want to do the internal validation to the model.i ues the bootpred to do it,and then i encounter problem again,just like that.(´íÎóÓÚerror to :complete.cases(x, y, wt) : ²»ÊÇËùÓеIJÎÊý¶¼Ò»Ñù³¤(the length of the augment was different))
i hope you tell me where is the mistake,and maybe i have
2007 May 18
4
Simple programming question
Hi R-users,
I have a simple question for R heavy users. If I have a data frame like this
dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr <- dfr[order(dfr$categ),]
and I want to score values or points in variable named "var3" following this
kind of logic:
1. the highest value of var3 within category (variable named
2011 Jun 29
2
Indexing to Insert values from a dataframe into a matrix
Hello,
I think this is a simple problem but I am not coming up with a simple
solution. I think it just an indexing problem.
I can easily replace values in a matrix from a dataframe when the
dataframe has row and column numbers. In the example below I use row
and column names and I can not get it to work
#make a matrix where rows and columns are the lat and long for a
bounding box of Australia
2011 Mar 28
2
GSoC 2011 Weighting Schemes
Hi, guys
I am Wenjin from Graduate School of Chinese Academy of Science, pursing a
master degree and my current research interests including using Data mining
and Information retrieve technology to analysis software engineering (SE)
data and support SE.
I have great interested in "Weight Schemes" project. and in the last few
days I have learnt some detail about DFR model family by
2010 Feb 27
1
Newbie help with ANOVA and lm.
Would someone be so kind as to explain in English what the ANOVA code (anova.lm) is doing? I am having a hard time reconciling what the text books have as a brute force regression and the formula algorithm in 'R'. Specifically I see:
p <- object$rank
if (p > 0L) {
p1 <- 1L:p
comp <- object$effects[p1]
asgn <-
2007 May 20
2
Number of NA's in every second column
Hi R-users,
How do I calculate a number of NA's in a row of every second column in my
data frame?
As a starting point:
dfr <- data.frame(sapply(x, function(x) sample(0:x, 6, replace = TRUE)))
dfr[dfr==0] <- NA
So, I would like to count the number of NA in row one, two, three etc. of
columns X1, X3, X5 etc.
Thanks in advance
Lauri
[[alternative HTML version deleted]]
2008 May 30
1
Question about adding text to xYplot(Hmisc)
Hello,
I have been trying to make a graph that have error bars and text at
specific position.
I used the following code from the help file of xYplot(Hmisc) as an
example except I add a myPanel function, which is just supposed to add
letters from the alphabet at the position aligned at y = 3.
It constantly gives me error:
"Error using packet 1 argument "subscripts" is
2017 May 31
2
stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
OTOH,
> sapply(1:9, function(i){
+ sum(dfr$time <= quantile(dfr$time, 1./3., type = i))
+ })
[1] 8 8 6 6 6 6 8 6 6
Only the default (type = 7) and the first two types give the result lines()
gives now. I think there is plenty of reasons to give why any of the other
6 types might be better suited in Tukey's method.
So to my mind, chaning the definition of line() to give sensible