thr3ads.net - similar to: "data manipulation and summaries with few million rows"

Displaying 20 results from an estimated 9000 matches similar to: "data manipulation and summaries with few million rows"

formatting a 6 million row data set; creating a censoring variable

2011 Aug 31

formatting a 6 million row data set; creating a censoring variable

List, Consider the following data. gender mygroup id 1 F A 1 2 F B 2 3 F B 2 4 F B 2 5 F C 2 6 F C 2 7 F C 2 8 F D 2 9 F D 2 10 F D 2 11 F D 2 12 F D 2 13 F D 2 14 M A 3 15 M A 3 16 M A 3 17

aggregate syntax for grouped column means

2011 Nov 29

aggregate syntax for grouped column means

I am calculating the mean of each column grouped by the variable 'id'. I do this using aggregate, data.table, and plyr. My aggregate results do not match the other two, and I am trying to figure out what is incorrect with my syntax. Any suggestions? Thanks. Here is the data. myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61, 30.59, 30.84, 30.98, 30.79, 30.79,

data manipulation/subsetting and relation matrix

2009 Dec 08

data manipulation/subsetting and relation matrix

Hi List, Here is some example data. myDat <- read.table(textConnection("group id 1 101 1 201 1 301 2 401 2 501 2 601 3 701 3 801 3 901"),header=TRUE) closeAllConnections() corr_mat <-read.table(textConnection("1 1 .5 0 0 0 0 0 0 0 2 .5 1 0 0 0 0 0 0 0 3 0 0 1.0 0 0 0 0 0 0 4 0 0 0 1 .5 .5 0 0 0 5 0 0 0 .5 1

partial match of one column in data frame to another character vector

2010 Aug 10

partial match of one column in data frame to another character vector

Here is some data (dput output below) > myData id group 1 D599 A 2 002-0004 B 3 F01932 A 18 F16 B 19

store list objects in data.table

2024 Sep 21

store list objects in data.table

I am trying to store regression objects in a data.table df <- data.frame(x = rnorm(20)) df[, "y"] <- with(df, x + 0.1 * x^2 + 0.2 * rnorm(20)) mydt <- data.table(mypower = c(1, 2), myreg = list(lm(y ~ x, data = df), lm(y ~ x + I(x^2), data = df))) mydt #?? mypower??? myreg #???? <num>?? <list> #1:?????? 1 <lm[12]> #2:?????? 2 <lm[12]> But mydt[1, 2]

Extracting year from a date object

2009 Oct 06

Extracting year from a date object

Hi all, this one left me a bit puzzled, as I don't seem to find a function to perform this easily. I must have overlooked the obvious, so sorry in advance. I have a list of dates in numerical format (i.e. 34576), defined as the number of days that passed since january 1st 1900. So I apply the function : > MyDate <-as.Date(34576,origin="1900-01-01") > MyDate [1]

cbind in aggregate formula - based on an existing object (vector)

2011 Jul 14

cbind in aggregate formula - based on an existing object (vector)

Hello! I am aggregating using a formula in aggregate - of the type: aggregate(cbind(var1,var2,var3)~factor1+factor2,sum,data=mydata) However, I actually have an object (vector of my variables to be aggregated): myvars<-c("var1","var2","var3") I'd like my aggregate formula (its "cbind" part) to be able to use my "myvars" object. Is it

POSIX, time zone and Windows

2006 May 21

POSIX, time zone and Windows

Dear Listers, Apologize to pile up on the 'tz' issue in POSIX objects. I have a 'simple' thing on which I must make up my mind but cannot do it from the existing R-help threads. I am currently working on dog telemetry in China, and download time information from GPS collars. I would like to set up the corresponding POSIXxx variables in R to a given time zone. Eg Pekin

Data arrangement for PLSDA using the ropls package

2017 Sep 18

Data arrangement for PLSDA using the ropls package

Hello, I would like to do a partial least square discriminant analysis (PLSDA) in R using the package "ropls" Which is in R available via the R command : source("https://bioconductor.org/biocLite.R") I try to do a PLSDA to illustrate the impact of two genders (AP,C) on 5 compounds measured in persons (samples) should be illustrated. When I try to do a PLSDA I get the warning

using "aggregate" when variable names contain spaces

2011 Apr 18

using "aggregate" when variable names contain spaces

Hello! my data set has many variables. Unfortuantely, many of those variables contain spaces in their names. I need advice on: how to refer to variable names in the formula for "aggregate". See example below: ### Generating example data set: mydate = rep(seq(as.Date("2008-12-01"), length = 3, by = "month"),4) value1=c(1,10,100,2,20,200,3,30,300,4,40,400)

Does POSIXlt extract date components properly?

2011 Mar 01

Does POSIXlt extract date components properly?

I would like to use POSIX classes to store dates and extract components of dates. Following the example in Spector ("Data Manipulation in R"), I create a date > mydate = as. POSIXlt('2005-4-19 7:01:00') I then successfully extract the day with the command > mydate$day [1] 19 But when I try to extract the month > mydate$mon [1] 3 it returns the wrong month. And

merging 2 frames while keeping all the entries from the "reference" frame

2011 Apr 04

merging 2 frames while keeping all the entries from the "reference" frame

Hello! I have my data frame "mydata" (below) and data frame "reference" - that contains all the dates I would like to be present in the final data frame. I am trying to merge them so that the the result data frame contains all 8 dates in both subgroups (i.e., Group1 should have 8 rows and Group2 too). But when I merge it it's not coming out this way. Any hint would be

Create a vector without using an external 'if statement'

2013 Apr 02

Create a vector without using an external 'if statement'

Dear R-users, suppose I have three dataframes like these df1: mydate min_temp 31032013 12 01042013 8 02042013 -999 df2: mydate min_temp 31032013 10 01042013 11 02042013 14 df3: mydate min_temp 31032013 4 01042013 3 02042013 5 where -999 means that the temperature data is not available (at the moment I cannot change it to NA because I am not the db administrator); suppose also that oggi is

plotting large time series

2011 Oct 27

plotting large time series

hello, I got a problem with plotting large time series, since I want to store the results in a .PDF file (I want to store several pages of plots). The PDF files get too large to be handled (> 10MB, one was even 200MB big). So I wonder, if there would be a possibilty to either - reduce the file size of the PDF - change the way the plot is generated to reduce the plot size? I use:

new data.frame summed by date

2009 Aug 28

new data.frame summed by date

Hi, I wonder if someone can suggest how to create a new data.frame Y from X where X$PL_Pos is summed by each unique X$MyDate. Y should end up with two (or more) columns Y$MyDate and Y$PL_Sum with its value being the cumsum of all the values in X for that date. - a 'daily cumsum'. Thanks, Mark TStoDate = function (TSDate) { X = strptime(TSDate + 19e6L, "%Y%m%d")

improve formatting of HTML table

2011 Aug 11

improve formatting of HTML table

I am trying to improve the look of an HTML table for a report (that needs to be pasted into Word). Here is an example. table2 <- structure(c(26L, 0L, 40L, 0L, 10L, 0L, 0L, 188L, 0L, 281L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L), .Dim = c(6L, 3L), .Dimnames = structure(list( myvar = c("Don't know", "Somewhat likely", "Somewhat unlikely", "Very

Creating a "shifted" month (one that starts not on the first of each month but on another date)

2011 May 19

Creating a "shifted" month (one that starts not on the first of each month but on another date)

Hello! I have a data frame with dates. I need to create a new "month" that starts on the 20th of each month - because I'll need to aggregate my data later by that "shifted" month. I wrote the code below and it works. However, I was wondering if there is some ready-made function in some package - that makes it easier/more elegant? Thanks a lot! # Example data:

Fwd: Windows upssched does not work

2020 Mar 10

Fwd: Windows upssched does not work

Hi, I have problem with upssched on windows. Upssched is not executed. I have 2 scripts, 1 for notification in upsmon and second for scheduling in upssched. Monitoring is working fine, script write to log. I'm using binary windows installer 2.6.5-6 from NUT. Here are my configs: --- nut.conf MODE=netclient --- upsmon.conf MONITOR ups_1000 at 192.168.3.95 1 <user> <password> slave

Is this a documentation bug? Spss dates import

2009 Mar 11

Is this a documentation bug? Spss dates import

Hello R-user bug seekers are needed! In order to perform these simple tasks you have to use a copy of SPSS and obviously R. The problem is that date conversion of data coming from SPSS gives wrong results, if we follow ?as.POSIXct ## SPSS dates (R-help 2006-02-17) z <- c(10485849600, 10477641600, 10561104000, 10562745600) as.Date(as.POSIXct(z, origin="1582-10-14",

finding row and column indices of date in multiple columns of a data frame

2009 Jan 21

finding row and column indices of date in multiple columns of a data frame

Hi, I have a data.frame SAMPLES with columns: Site Site# Season Day1 Day2 Day3 Day1, Day2, Day3 are class "Date", the other columns are numeric or factor. I have a date "mydate" that may or may not be listed in my data.frame and I need to find that out. If "mydate" is there, I want to get the number of the data.frame row where it occurs.

similar to: data manipulation and summaries with few million rows