thr3ads.net - similar to: "Discretizing data rows into regular intervals"

Displaying 20 results from an estimated 2000 matches similar to: "Discretizing data rows into regular intervals"

2009 May 05

Oracle-JRuby error

I am trying to migrate from RoR/MYSql to JRoR/Oracle. I am using Active Record JDBC to talk to the database. The Migration process to create and populate the database tables has been painful. My latest issue is the method new_date is undefined in the JDBC adapter. I have the following gems installed: *** LOCAL GEMS *** actionmailer (2.2.2) actionpack (2.2.2) activerecord (2.2.2)

finding the values to minimize sum of functions

2012 Jul 19

finding the values to minimize sum of functions

Hi fellow R users, I am desperately hoping there is an easy way to do this in R. Say I have three functions: f(x) = x^2 f(y) = 2y^2 f(z) = 3z^2 constrained such that x+y+z=c (let c=1 for simplicity). I want to find the values of x,y,z that will minimize f(x) + f(y) + f(z). I know I can use the optim function when there is only one function, but don't know how to set it up when there are

subsetting data by specified observation number

2011 Mar 05

subsetting data by specified observation number

Hi members, I'd like to thank you guys ahead of time for the help. I'm kind of stuck. I have a data frame with ID and position numbers: 1> head(failed.3) id position 1 10000997 2 4 1000RW_M 2 15 1006RW_G 2 24 1012RW_M 3 28 10160917 2 30 1016RW_M 13 I'd like to use this to subset out a large dataset and keep only the observation

Regular Expression

2012 Jul 24

Regular Expression

Hi-- I have three columns in an input file: MONTH QUARTER YEAR 2012-07 2012-3 2012 2001-07 2001-3 2001 2002-01 2002-1 2002 I want to make output like so: MONTH QUARTER YEAR 07 3 2012 07 3 2001 01 1 2002 I was having some trouble getting the regular expression to work. I think it should

Discretization of numeric attributes

2002 May 07

Discretization of numeric attributes

Dear R-helpers: I am interested in discretization methods for numerical attributes, as they are reported in the 'machine learning' community. For example, the work of Fayyad & Irani (IJCAI-93), Kononenko, entropy-based approaches, MDL principle, the C4.5 approach, etc. I am especially interested in those methods that take a factor as goal target into account for discretizing

How to do more advanced cross tabulation in R?

2008 Jan 23

How to do more advanced cross tabulation in R?

Hi, I am trying to reproduce some functionalities of Excel pivot table in R, sadly, I couldn't figure out how to do it. I am wondering if this is even possible in R. Does anyone know? Here is an example: year=rep(2003,16) quarter=rep(1:4,each=4) sales=1:16 company=rep(c("a","b","c","d"),4) df=data.frame(year,quarter,sales,company) #this is the

(Newbie) Functions on vectors

2006 Feb 17

(Newbie) Functions on vectors

Folks, I want to make the following function more efficient, by vectorizing it: getCriterionDecisionDate <- function (quarter , year) { if (length(quarter) != length(year)) stop ("Quarter and year vectors of unequal length!"); ret <- character(0); for (i in 1:length(quarter)) { currQuarter <- quarter[i]; currYear <- year[i]; if ((currQuarter < 1) |

Comparing dates in dataframes

2010 Jan 16

Comparing dates in dataframes

I have two data frames. One (arr) has all arrivals to an airport for a year, and the other (gw) has the dates and quarter hour of the day when the weather is good. arr has a Date and quarter hour column. >names(arr) [1] "Date" "weekday" "hour" "month" "minute" [6] "quarter" "ICAO"

Newbie woes with *apply

2010 Feb 14

Newbie woes with *apply

Dataframe cust has Date-type column open.date. I wish to set up another column, with (first day of) the quarter of open.date. To be comprehensive (of course, improvement suggestions are welcome), month = function(date) { return(as.numeric(format(date,"%m"))) } first.day.of.month = function(date) { return(date + 1 - as.numeric(format(date,"%d"))) } first.day.of.quarter =

Using the output of strsplit

2010 Jan 18

Using the output of strsplit

I successfully combined my data frames, and am now on my next hurdle. I had combined the data and quarter, and used tapply to count the entries for each unique date/quarter pair. ar= tapply(ewrgnd$gw, list(ewrgnd$dq), sum) #for each date/quarter combination sums the gw (which are all 1) dq=row.names(ar) spl=strsplit(dq) But I need to split them back into the separate date and quarter. So I used

Finding minimum of time subset

2009 Aug 13

Finding minimum of time subset

Dear List, I have a data frame of data taken every few seconds. I would like to subset the data to retain only the data taken on the quarter hour, and as close to the quarter hour as possible. So far I have figured out how to subset the data to the quarter hour, but not how to keep only the minimum time for each quarter hour. For example:

table with 3 variables

2009 Feb 19

table with 3 variables

I have the initial matrice: > *data.frame(Subject=rep(100:101, each=4), Quarter=rep(paste("Q",1:4, sep=""),2), Boolean = rep(c("Y","N"),4))* Subject Quarter Boolean 1 100 Q1 Y 2 100 Q2 N 3 100 Q3 Y 4 100 Q4 N 5 101 Q1 Y 6 101 Q2 N 7 101 Q3 Y 8 101

group definition for a bootstrap

2004 Jul 26

group definition for a bootstrap

Hi, This is probably really simple, but I am clearly not R-minded, I have read the help files, and reread them, and I still can't work out what to do... I have a data frame (d) with 3 columns (age (0-5), quarter (1-4) and x). I want to estimate the precision of my mean x by age and quarter, so I want to carry out a bootstrap for each group. I am trying to do this within a loop, so I don't

Convert continuous variable into discrete variable

2011 Jul 15

Convert continuous variable into discrete variable

Dear all, I have a continuous variable that can take on values between 0 and 100, for example: x<-runif(100,0,100) I also have a second variable that defines a series of thresholds, for example: y<-c(3, 4.5, 6, 8) I would like to convert my continuous variable into a discrete one using the threshold variables: If x is between 0 and 3 the discrete variable should be 1 If x is between 3

mapply & assign to generate functions

2012 Mar 12

mapply & assign to generate functions

Hi, I have a problem that I'm finding a bit tricky. I'm trying to use mapply and assign to generate curried functions. For example, if I have the function divide divide <- function(x, y) { x / y } And I want the end result to be functionally equivalent to: half <- function(x) divide(x, 2) third <- function(x) divide(x, 3) quarter <- function(x) divide(x, 4) But I want

trouble using boot package

2004 Jun 25

trouble using boot package

Hello, I am trying to carry out a bootstrap analysis (using the boot package) on a table and cannot work out how to get the results I need! I have a table ("d2") with 4 columns: "ID_code", "Age", "Quarter" and "StomWt". Age (0-5) and Quarter (1-4) are my strata Therefore I wish to estimate the confidence intervals for the mean StomWt for each Age

table with 3 varialbes

2009 Feb 19

table with 3 varialbes

Writing out csv files

2010 Feb 02

Writing out csv files

In my code, I calculate the maximum values with 2 factors using maxr=with(arrdf, tapply(rate,list(weekday,quarter), max, na.rm=T)) and I want to write out the file so that Excel can read it. I used write.table(maxr, fname, sep=",", col.names=TRUE, row.names=TRUE, quote=TRUE, na="0") which works, and yields something like

Overplotting: plot() invocation looks ugly ... suggestions?

2006 Jul 25

Overplotting: plot() invocation looks ugly ... suggestions?

Hi WizaRds, I'd like to overplot UK fuel consumption per quarter over the course of five years. Sounds simple enough? Unless I'm missing something, the following seems very involved for what I'm trying to do. Any suggestions on simplifications? The way I did it is awkward mainly because of the first call to plot ... but isn't this necessary, especially to set limits for the

Improving data processing efficiency

2008 Jun 05

Improving data processing efficiency

Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a "matched" firm in the same

similar to: Discretizing data rows into regular intervals