similar to: Randomly selecting one row for each factor level

Displaying 20 results from an estimated 8000 matches similar to: "Randomly selecting one row for each factor level"

2006 Apr 20
1
Randomly selecting one row for each factor level [Broadca st]
The following should work: > dfr.samp <- dfr[tapply(1:nrow(dfr), dfr$x, sample, 1),] > dfr.samp x y z 10 a 10 J 2 b 2 B 9 c 9 I Andy From: Kelly Hildner > > I don't use R much, and I have been unable to figure out how > to get the > subset of my data frame that I would like. > > For example, if this were my data frame: > > > dfr <-
2003 Jun 10
1
color coding a legend
I'm using R 1.6.2 on a Windows 2000 machine. I've plotted the results of an MDS run labeled by a numerical ID, and color coded by a group code: plot(cv.mds.spr$points, type="n", main="Non-Metric Multidimensional Scaling of SprRun CV Watersheds") text(cv.mds.spr$points, labels = as.character(cv.wshed.id.spr), col = codes(cv.wshed.grp), cex=.75) Question is, how do I
2005 May 02
2
Nonparametric Tukey-type multiple comparisons "Nemenyi" test
I am trying to do a Nonparametric Tukey-type multiple comparison post-hoc test to determine which groups are significantly different. I have read the dialogue on this topic from the R-help, and am still not clear why no statistical packages include this test as an option? Is it not an appropriate test to conduct on non-normally distributed data? Is the only option to calculate it by hand
2003 Feb 20
2
subset with NA
Easy question that I can't find an answer for. I'm trying to subset a data frame and want to exclude the positive values, i.e. I want the NA values. My data: > summary(temp$tuna) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1 2 3 3 4 5 1211 Querying for subset(temp, tuna %in% "NA", select.... subset(temp, tuna == NA,
2016 Mar 22
3
Memory usage in prcomp
Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD
2016 Mar 22
3
Memory usage in prcomp
Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD
2002 Oct 25
1
reshape: duplicate rows to multiple cols
I have a dataframe that I'm trying to reshape, and need advice. My data: > klam.merge[200:225,] stream lulc x sumlength pct.lgth 200 1223030419685 92 0.25000000 9.89 2.52780586 201 1223030419686 23 0.00274154 4.73 0.05796068 202 1223030419686 41 0.75009917 4.73 15.85833341 203 1223030419686 42 2.65000000 4.73 56.02536998 204
2002 Oct 17
4
Newbie Time Series Questions
I have a data set of monthly river flows from 1960-2000, which are similar in structure to the nottem data: > klam.flow Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep 1961 1461 1716 2524 1773 1906 2005 1756 1575 1387 983 1094 1382 1962 1907 2253 1985 1907 1769 1676 2634 1386 929 766 968 1309 ... I tried plotting with > ts.plot(klam.flow) Which quickly led me to
2009 Jul 12
1
Booting problem with memdisk + Thinkpad + USB
Hi, I encountered a booting problem with memdisk 2.83, USB and IBM Thinkpad T61, apparently the same issue as described here: http://syslinux.zytor.com/archives/2008-April/009850.html The boot process always stops after "Loading boot sector... booting...". With debug tracers enabled, the last few output lines are: Loading boot sector... FR<p>Dbooting...
2008 Mar 06
2
How to hold a value(Mean sq) with a string
Hi all: Can someone advice me on how to hold the residuals Mean sq value on a string so it can be used in other calculations. I was trying something like this: Msquare<-dfr$Mean sq but fails..Thanks dfr <- read.table(textConnection("percentQ Efficiency 1.565 0.0125 1.94 0.0213 0.876 0.003736 1.027 0.006 1.536 0.0148 1.536 0.0162 2.607 0.02 1.456 0.0157 2.16 0.0103
2002 Oct 24
3
model.matrix (via predict) (PR#2206)
Full_Name: Glenn Stone Version: 1.5.1 and 1.6.0 OS: win2000 Submission from: (NULL) (168.140.227.9) The following code produces incorrect fitted values in version 1.5.1 and an error in 1.6.0 Error in "contrasts<-"(*tmp*, value = "contr.treatment") : contrasts apply only to factors In addition: Warning message: variable ihalf is not a factor in:
1997 Apr 08
1
R-alpha: User friendly functions
A loose idea for *post*-0.50 development I've been giving a some (but not all that many) thoughts to whether some of the conceptual difficulties facing newcomers could be avoided by having simplified functions for common operations. We already have parts of this, e.g. in Kurts ctest routines. Specifically, I was thinking about data frames: How about
2010 Oct 04
2
i have aproblem --thank you
dear professor: thank you for your help,witn your help i develop the nomogram successfully. after that i want to do the internal validation to the model.i ues the bootpred to do it,and then i encounter problem again,just like that.(´íÎóÓÚerror to :complete.cases(x, y, wt) : ²»ÊÇËùÓеIJÎÊý¶¼Ò»Ñù³¤(the length of the augment was different)) i hope you tell me where is the mistake,and maybe i have
2007 May 18
4
Simple programming question
Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr <- dfr[order(dfr$categ),] and I want to score values or points in variable named "var3" following this kind of logic: 1. the highest value of var3 within category (variable named
2011 Jun 29
2
Indexing to Insert values from a dataframe into a matrix
Hello, I think this is a simple problem but I am not coming up with a simple solution. I think it just an indexing problem. I can easily replace values in a matrix from a dataframe when the dataframe has row and column numbers. In the example below I use row and column names and I can not get it to work #make a matrix where rows and columns are the lat and long for a bounding box of Australia
2011 Mar 28
2
GSoC 2011 Weighting Schemes
Hi, guys I am Wenjin from Graduate School of Chinese Academy of Science, pursing a master degree and my current research interests including using Data mining and Information retrieve technology to analysis software engineering (SE) data and support SE. I have great interested in "Weight Schemes" project. and in the last few days I have learnt some detail about DFR model family by
2010 Feb 27
1
Newbie help with ANOVA and lm.
Would someone be so kind as to explain in English what the ANOVA code (anova.lm) is doing? I am having a hard time reconciling what the text books have as a brute force regression and the formula algorithm in 'R'. Specifically I see: p <- object$rank if (p > 0L) { p1 <- 1L:p comp <- object$effects[p1] asgn <-
2007 May 20
2
Number of NA's in every second column
Hi R-users, How do I calculate a number of NA's in a row of every second column in my data frame? As a starting point: dfr <- data.frame(sapply(x, function(x) sample(0:x, 6, replace = TRUE))) dfr[dfr==0] <- NA So, I would like to count the number of NA in row one, two, three etc. of columns X1, X3, X5 etc. Thanks in advance Lauri [[alternative HTML version deleted]]
2008 May 30
1
Question about adding text to xYplot(Hmisc)
Hello, I have been trying to make a graph that have error bars and text at specific position. I used the following code from the help file of xYplot(Hmisc) as an example except I add a myPanel function, which is just supposed to add letters from the alphabet at the position aligned at y = 3. It constantly gives me error: "Error using packet 1 argument "subscripts" is
2017 May 31
2
stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
OTOH, > sapply(1:9, function(i){ + sum(dfr$time <= quantile(dfr$time, 1./3., type = i)) + }) [1] 8 8 6 6 6 6 8 6 6 Only the default (type = 7) and the first two types give the result lines() gives now. I think there is plenty of reasons to give why any of the other 6 types might be better suited in Tukey's method. So to my mind, chaning the definition of line() to give sensible