thr3ads.net - similar to: "R seems to "stall" after several hours on a long series o f analyses... where to start?"

Displaying 20 results from an estimated 4000 matches similar to: "R seems to "stall" after several hours on a long series o f analyses... where to start?"

R seems to "stall" after several hours on a long series of analyses... where to start?

2005 Nov 07

R seems to "stall" after several hours on a long series of analyses... where to start?

Not sure where to even start on this.... I'm hoping there's some debugging I can do... I have a loop that cycles through several different data sets (same structure, different info), performing randomForest growth and predictions... saving out the predictions for later study... I get about 5 hours in (9%... of the planned iterations.. yikes!) and R just freezes. This happens in

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

2005 Oct 27

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use in conjunction. If "strata" is not specified, the class labels will be used.

Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?

2005 Oct 27

Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?

Sorry for the repost, but I've really been looking, and can't find any syntax direction on this issue... Just browsing the documentation, and searching the list came up short... I have some unbalanced data and was wondering if, in a "0" v "1" classification forest, some combo of these options might yield better predictions when the proportion of one class is low (less

Passing a list object to lapply

2009 Aug 11

Passing a list object to lapply

Hello, I'm having difficulty passing an object name to a lapply function. Can somebody tell me the trick to make this work? #Works T13702 <- TRACKDATA[["13702.xls"]][["data"]] min(unlist(lapply(list(T13702), function(x) mdy.date(x[1, 2], x[1, 1], x[1, 3])))) 16553 #Works d<-2 assign(paste("T",substr(names(TRACKDATA)[d],1,(nchar(names(TRACKDATA)[d]

Creating new columns inside a loop

2005 Aug 10

Creating new columns inside a loop

Ok, I know R isn't an optimal environment for looping (or so I've heard) but I have a need to loop through columns of data and create new columns of data based on calculations within rows... I'm sure there's a help file, but I'm not sure what search terms to use to find it! The problem is that these new columns need to have names that I can later access... Like NewVar1,

memory, speed, and assigning results into new v. existing variable

2007 Mar 23

memory, speed, and assigning results into new v. existing variable

I have a very large data frame, and I'm doing a conversion of all columns into factors. Takes a while (thanks to folks here though, for making faster!), but am wondering about optimization from a memory perspective... Internally, am I better off assigning into a new data frame, or doing one of these: dataframe<-someoperation(dataframe) It would seem that re-assigning into the same data

replacing all NA's in a dataframe with zeros...

2007 Mar 15

replacing all NA's in a dataframe with zeros...

I've seen how to replace the NA's in a single column with a data frame *> mydata$ncigs[is.na(mydata$ncigs)]<-0 *But this is just one column... I have thousands of columns (!) that I need to do this, and I can't figure out a way, outside of the dreaded loop, do replace all NA's in an entire data frame (all vars) without naming each var separately. Yikes. I'm racking my

glmmPQL & Wald-type F-tests

2008 Oct 03

glmmPQL & Wald-type F-tests

Hello, Might anyone know how to conduct Wald-type F-tests of the fixed effects estimated by glmmPQL? I see this implemented in SAS (GLIMMIX), and have seen it recommended in user group discussions, but haven't come across any code to accomplish it. I understand the anova function treats a glmmPQL fit as an lme fit, with the test assumptions based on maximum likelihood, which is inappropriate

abline(h=whatever) not working in candleChart() (in quantmod)?

2010 Nov 21

abline(h=whatever) not working in candleChart() (in quantmod)?

Hello, all-- I am having some fun playing with the graphing in quantmod-- very nice! I am writing a function to calculate (and hopefully plot) support and resistance lines, but the usual plot call of "abline(h=value)" does not seem to work. Here's my code: require(quantmod) AAPL<-getYahooData("AAPL") candleChart(AAPL,subset="last 3

TryCatch() with read.csv("http://...")

2005 Nov 23

TryCatch() with read.csv("http://...")

Hi, folks! I'm trying to pull in data using read.csv("my URL goes here"), and it really works fantastically. Amazing to pull in live data right off the internet, into RAM, and get busy... however... occasionally there is a server problem, or the data are not up yet, and instead of pushing through a nice CSV file, the server sends a 404 "Not Found" page... Since the

Anyone have any code for importing data from NAMCS?

2005 Sep 13

Anyone have any code for importing data from NAMCS?

The National Ambulatory and Medical Care Survey is a free data set from the CDC that I'd like to analyze using the "Survey" package in R. Before I dive in, though, it occurred to me that someone may already have gone to the trouble of writing code that will bring in the data and assign the variable names and value labels. This is a big file, so doing it from scratch will take

Not sure if this is "aggregate" or some other task.

2005 May 15

Not sure if this is "aggregate" or some other task.

I have data where where I've taken some measurements three times... twice in rapid succession so I could check test-retest reliability of a piece of equipment, and then a third measurement some time later. Not I'd like to do an analysis where I have two scores... the first being the mean of the first two taken the same day, and the second being the one taken later. I have a lot of

Attached file following download failure

2009 Aug 12

Attached file following download failure

Hello, I'm working with a package that uses download.file in functions to extract information from remote databases. My current environment is Windows XP Pro SP3, R 2.7. A full extraction can be a great deal of data, so the download is accomplished in generally manageable packets, such that a single download will result in many files, which are written to a directory. It is not uncommon for a

Survey proportions... Can I use population as denominator?

2006 May 23

Survey proportions... Can I use population as denominator?

Just giving the survey package a spin... I'm accustomed to stata, and it seems very similar in many respects. One thing is throwing me, however. I've gotten my data in, and specified the design. Looks like the weighting is right (based on published population estimates from these data), but now I'd like to check my "marginal means" for proportions against those that have

Insert value from same column of another row (lag across observations)

2005 Oct 09

Insert value from same column of another row (lag across observations)

I know I've done this before, but it's been a while and I can't find quite what I need in the help files or archives. I have a text field in a very large data frame. I'd like to add a column that represents the value from an existing field, from the next record (the data are sorted). I'm trying to represent "what happens tomorrow", so the "today" row would

"Survey" package and NAMCS data... unsure of specification

2005 Oct 04

"Survey" package and NAMCS data... unsure of specification

Hello, all. I wanted to use the "survey" package to analyze data from the National Ambulatory Medical Care Survey, and am having some difficulty translating the analysis keywords from one package (Stata) to the other (R). The data were collected using a multistage probability sampling, and there are variables included to identify the sampling units and weights. Documentation from the

calcMin

2013 Feb 19

calcMin

I tried to use calcMin with a function that uses a number of ... arguments (all args from resid on) besides the vector of parameters being fit. Same idea as optim, nlm, nlminb for which this form of ... syntax works. But with calcMin I get an error regarding unused arguments. No partial matches to previous arguments that I can see. Anybody know the reason or fix for this?

randomForest gives different results for formula call v. x, y methods. Why?

2007 Apr 29

randomForest gives different results for formula call v. x, y methods. Why?

Just out of curiosity, I took the default "iris" example in the RF helpfile... but seeing the admonition against using the formula interface for large data sets, I wanted to play around a bit to see how the various options affected the output. Found something interesting I couldn't find documentation for... Just like the example... > set.seed(12) # to be sure I have

Examples of "classwt", "strata", and "sampsize" in randomForest?

2005 Oct 25

Examples of "classwt", "strata", and "sampsize" in randomForest?

Just browsing the documentation, and searching the list came up short... I have some unbalance data and was wondering if, in a "0" v "1" classification forest, if these options might yield better predictions when the proportion of one class is low (less than 10% in a sample of 2,000 observations). Not sure how to specify these terms... from the docs, we have: classwt: Priors

How might I -remove- a tree from a random forest?

2007 May 08

How might I -remove- a tree from a random forest?

I see the function "getTree", which is very interesting. As I'm trying to teach myself more and more about R, and dealing with lists, it occurred to me that it might be fun to remove (as in delete) a single tree from a forest...say to go from 500 to 499. I know, I know... "why?" Why, to play, of course! I've been doing a lot of reading on various tuning parameters,

similar to: R seems to "stall" after several hours on a long series o f analyses... where to start?