thr3ads.net - similar to: "Randomly selecting one row for each factor level"

Displaying 20 results from an estimated 8000 matches similar to: "Randomly selecting one row for each factor level"

Randomly selecting one row for each factor level [Broadca st]

2006 Apr 20

Randomly selecting one row for each factor level [Broadca st]

The following should work: > dfr.samp <- dfr[tapply(1:nrow(dfr), dfr$x, sample, 1),] > dfr.samp x y z 10 a 10 J 2 b 2 B 9 c 9 I Andy From: Kelly Hildner > > I don't use R much, and I have been unable to figure out how > to get the > subset of my data frame that I would like. > > For example, if this were my data frame: > > > dfr <-

color coding a legend

2003 Jun 10

color coding a legend

I'm using R 1.6.2 on a Windows 2000 machine. I've plotted the results of an MDS run labeled by a numerical ID, and color coded by a group code: plot(cv.mds.spr$points, type="n", main="Non-Metric Multidimensional Scaling of SprRun CV Watersheds") text(cv.mds.spr$points, labels = as.character(cv.wshed.id.spr), col = codes(cv.wshed.grp), cex=.75) Question is, how do I

Nonparametric Tukey-type multiple comparisons "Nemenyi" test

2005 May 02

Nonparametric Tukey-type multiple comparisons "Nemenyi" test

I am trying to do a Nonparametric Tukey-type multiple comparison post-hoc test to determine which groups are significantly different. I have read the dialogue on this topic from the R-help, and am still not clear why no statistical packages include this test as an option? Is it not an appropriate test to conduct on non-normally distributed data? Is the only option to calculate it by hand

subset with NA

2003 Feb 20

subset with NA

Easy question that I can't find an answer for. I'm trying to subset a data frame and want to exclude the positive values, i.e. I want the NA values. My data: > summary(temp$tuna) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1 2 3 3 4 5 1211 Querying for subset(temp, tuna %in% "NA", select.... subset(temp, tuna == NA,

Memory usage in prcomp

2016 Mar 22

Memory usage in prcomp

Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD

Memory usage in prcomp

2016 Mar 22

Memory usage in prcomp

reshape: duplicate rows to multiple cols

2002 Oct 25

reshape: duplicate rows to multiple cols

I have a dataframe that I'm trying to reshape, and need advice. My data: > klam.merge[200:225,] stream lulc x sumlength pct.lgth 200 1223030419685 92 0.25000000 9.89 2.52780586 201 1223030419686 23 0.00274154 4.73 0.05796068 202 1223030419686 41 0.75009917 4.73 15.85833341 203 1223030419686 42 2.65000000 4.73 56.02536998 204

Newbie Time Series Questions

2002 Oct 17

Newbie Time Series Questions

I have a data set of monthly river flows from 1960-2000, which are similar in structure to the nottem data: > klam.flow Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep 1961 1461 1716 2524 1773 1906 2005 1756 1575 1387 983 1094 1382 1962 1907 2253 1985 1907 1769 1676 2634 1386 929 766 968 1309 ... I tried plotting with > ts.plot(klam.flow) Which quickly led me to

Booting problem with memdisk + Thinkpad + USB

2009 Jul 12

Booting problem with memdisk + Thinkpad + USB

Hi, I encountered a booting problem with memdisk 2.83, USB and IBM Thinkpad T61, apparently the same issue as described here: http://syslinux.zytor.com/archives/2008-April/009850.html The boot process always stops after "Loading boot sector... booting...". With debug tracers enabled, the last few output lines are: Loading boot sector... FR<p>Dbooting...

How to hold a value(Mean sq) with a string

2008 Mar 06

How to hold a value(Mean sq) with a string

Hi all: Can someone advice me on how to hold the residuals Mean sq value on a string so it can be used in other calculations. I was trying something like this: Msquare<-dfr$Mean sq but fails..Thanks dfr <- read.table(textConnection("percentQ Efficiency 1.565 0.0125 1.94 0.0213 0.876 0.003736 1.027 0.006 1.536 0.0148 1.536 0.0162 2.607 0.02 1.456 0.0157 2.16 0.0103

model.matrix (via predict) (PR#2206)

2002 Oct 24

model.matrix (via predict) (PR#2206)

Full_Name: Glenn Stone Version: 1.5.1 and 1.6.0 OS: win2000 Submission from: (NULL) (168.140.227.9) The following code produces incorrect fitted values in version 1.5.1 and an error in 1.6.0 Error in "contrasts<-"(*tmp*, value = "contr.treatment") : contrasts apply only to factors In addition: Warning message: variable ihalf is not a factor in:

R-alpha: User friendly functions

1997 Apr 08

R-alpha: User friendly functions

A loose idea for *post*-0.50 development I've been giving a some (but not all that many) thoughts to whether some of the conceptual difficulties facing newcomers could be avoided by having simplified functions for common operations. We already have parts of this, e.g. in Kurts ctest routines. Specifically, I was thinking about data frames: How about

i have aproblem --thank you

2010 Oct 04

i have aproblem --thank you

dear professor: thank you for your help,witn your help i develop the nomogram successfully. after that i want to do the internal validation to the model.i ues the bootpred to do it,and then i encounter problem again,just like that.(´íÎóÓÚerror to :complete.cases(x, y, wt) : ²»ÊÇËùÓÐµÄ²ÎÊý¶¼Ò»Ñù³¤(the length of the augment was different)) i hope you tell me where is the mistake,and maybe i have

Simple programming question

2007 May 18

Simple programming question

Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr <- dfr[order(dfr$categ),] and I want to score values or points in variable named "var3" following this kind of logic: 1. the highest value of var3 within category (variable named

Indexing to Insert values from a dataframe into a matrix

2011 Jun 29

Indexing to Insert values from a dataframe into a matrix

Hello, I think this is a simple problem but I am not coming up with a simple solution. I think it just an indexing problem. I can easily replace values in a matrix from a dataframe when the dataframe has row and column numbers. In the example below I use row and column names and I can not get it to work #make a matrix where rows and columns are the lat and long for a bounding box of Australia

GSoC 2011 Weighting Schemes

2011 Mar 28

GSoC 2011 Weighting Schemes

Hi, guys I am Wenjin from Graduate School of Chinese Academy of Science, pursing a master degree and my current research interests including using Data mining and Information retrieve technology to analysis software engineering (SE) data and support SE. I have great interested in "Weight Schemes" project. and in the last few days I have learnt some detail about DFR model family by

Newbie help with ANOVA and lm.

2010 Feb 27

Newbie help with ANOVA and lm.

Would someone be so kind as to explain in English what the ANOVA code (anova.lm) is doing? I am having a hard time reconciling what the text books have as a brute force regression and the formula algorithm in 'R'. Specifically I see: p <- object$rank if (p > 0L) { p1 <- 1L:p comp <- object$effects[p1] asgn <-

Number of NA's in every second column

2007 May 20

Number of NA's in every second column

Hi R-users, How do I calculate a number of NA's in a row of every second column in my data frame? As a starting point: dfr <- data.frame(sapply(x, function(x) sample(0:x, 6, replace = TRUE))) dfr[dfr==0] <- NA So, I would like to count the number of NA in row one, two, three etc. of columns X1, X3, X5 etc. Thanks in advance Lauri [[alternative HTML version deleted]]

Question about adding text to xYplot(Hmisc)

2008 May 30

Question about adding text to xYplot(Hmisc)

Hello, I have been trying to make a graph that have error bars and text at specific position. I used the following code from the help file of xYplot(Hmisc) as an example except I add a myPanel function, which is just supposed to add letters from the alphabet at the position aligned at y = 3. It constantly gives me error: "Error using packet 1 argument "subscripts" is

stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

2017 May 31

stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

OTOH, > sapply(1:9, function(i){ + sum(dfr$time <= quantile(dfr$time, 1./3., type = i)) + }) [1] 8 8 6 6 6 6 8 6 6 Only the default (type = 7) and the first two types give the result lines() gives now. I think there is plenty of reasons to give why any of the other 6 types might be better suited in Tukey's method. So to my mind, chaning the definition of line() to give sensible

similar to: Randomly selecting one row for each factor level