similar to: data manipulation and summaries with few million rows

Displaying 20 results from an estimated 8000 matches similar to: "data manipulation and summaries with few million rows"

2011 Aug 31
1
formatting a 6 million row data set; creating a censoring variable
List, Consider the following data. gender mygroup id 1 F A 1 2 F B 2 3 F B 2 4 F B 2 5 F C 2 6 F C 2 7 F C 2 8 F D 2 9 F D 2 10 F D 2 11 F D 2 12 F D 2 13 F D 2 14 M A 3 15 M A 3 16 M A 3 17
2011 Nov 29
2
aggregate syntax for grouped column means
I am calculating the mean of each column grouped by the variable 'id'. I do this using aggregate, data.table, and plyr. My aggregate results do not match the other two, and I am trying to figure out what is incorrect with my syntax. Any suggestions? Thanks. Here is the data. myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61, 30.59, 30.84, 30.98, 30.79, 30.79,
2009 Dec 08
1
data manipulation/subsetting and relation matrix
Hi List, Here is some example data. myDat <- read.table(textConnection("group id 1 101 1 201 1 301 2 401 2 501 2 601 3 701 3 801 3 901"),header=TRUE) closeAllConnections() corr_mat <-read.table(textConnection("1 1 .5 0 0 0 0 0 0 0 2 .5 1 0 0 0 0 0 0 0 3 0 0 1.0 0 0 0 0 0 0 4 0 0 0 1 .5 .5 0 0 0 5 0 0 0 .5 1
2010 Aug 10
1
partial match of one column in data frame to another character vector
Here is some data (dput output below) > myData id group 1 D599 A 2 002-0004 B 3 F01932 A 18 F16 B 19
2024 Sep 21
3
store list objects in data.table
I am trying to store regression objects in a data.table df <- data.frame(x = rnorm(20)) df[, "y"] <- with(df, x + 0.1 * x^2 + 0.2 * rnorm(20)) mydt <- data.table(mypower = c(1, 2), myreg = list(lm(y ~ x, data = df), lm(y ~ x + I(x^2), data = df))) mydt #?? mypower??? myreg #???? <num>?? <list> #1:?????? 1 <lm[12]> #2:?????? 2 <lm[12]> But mydt[1, 2]
2009 Oct 06
2
Extracting year from a date object
Hi all, this one left me a bit puzzled, as I don't seem to find a function to perform this easily. I must have overlooked the obvious, so sorry in advance. I have a list of dates in numerical format (i.e. 34576), defined as the number of days that passed since january 1st 1900. So I apply the function : > MyDate <-as.Date(34576,origin="1900-01-01") > MyDate [1]
2011 Jul 14
2
cbind in aggregate formula - based on an existing object (vector)
Hello! I am aggregating using a formula in aggregate - of the type: aggregate(cbind(var1,var2,var3)~factor1+factor2,sum,data=mydata) However, I actually have an object (vector of my variables to be aggregated): myvars<-c("var1","var2","var3") I'd like my aggregate formula (its "cbind" part) to be able to use my "myvars" object. Is it
2006 May 21
1
POSIX, time zone and Windows
Dear Listers, Apologize to pile up on the 'tz' issue in POSIX objects. I have a 'simple' thing on which I must make up my mind but cannot do it from the existing R-help threads. I am currently working on dog telemetry in China, and download time information from GPS collars. I would like to set up the corresponding POSIXxx variables in R to a given time zone. Eg Pekin
2017 Sep 18
1
Data arrangement for PLSDA using the ropls package
Hello, I would like to do a partial least square discriminant analysis (PLSDA) in R using the package "ropls" Which is in R available via the R command : source("https://bioconductor.org/biocLite.R") I try to do a PLSDA to illustrate the impact of two genders (AP,C) on 5 compounds measured in persons (samples) should be illustrated. When I try to do a PLSDA I get the warning
2011 Apr 18
1
using "aggregate" when variable names contain spaces
Hello! my data set has many variables. Unfortuantely, many of those variables contain spaces in their names. I need advice on: how to refer to variable names in the formula for "aggregate". See example below: ### Generating example data set: mydate = rep(seq(as.Date("2008-12-01"), length = 3, by = "month"),4) value1=c(1,10,100,2,20,200,3,30,300,4,40,400)
2011 Mar 01
2
Does POSIXlt extract date components properly?
I would like to use POSIX classes to store dates and extract components of dates. Following the example in Spector ("Data Manipulation in R"), I create a date > mydate = as. POSIXlt('2005-4-19 7:01:00') I then successfully extract the day with the command > mydate$day [1] 19 But when I try to extract the month > mydate$mon [1] 3 it returns the wrong month. And
2011 Apr 04
2
merging 2 frames while keeping all the entries from the "reference" frame
Hello! I have my data frame "mydata" (below) and data frame "reference" - that contains all the dates I would like to be present in the final data frame. I am trying to merge them so that the the result data frame contains all 8 dates in both subgroups (i.e., Group1 should have 8 rows and Group2 too). But when I merge it it's not coming out this way. Any hint would be
2013 Apr 02
2
Create a vector without using an external 'if statement'
Dear R-users, suppose I have three dataframes like these df1: mydate min_temp 31032013 12 01042013 8 02042013 -999 df2: mydate min_temp 31032013 10 01042013 11 02042013 14 df3: mydate min_temp 31032013 4 01042013 3 02042013 5 where -999 means that the temperature data is not available (at the moment I cannot change it to NA because I am not the db administrator); suppose also that oggi is
2011 Oct 27
1
plotting large time series
hello, I got a problem with plotting large time series, since I want to store the results in a .PDF file (I want to store several pages of plots). The PDF files get too large to be handled (> 10MB, one was even 200MB big). So I wonder, if there would be a possibilty to either - reduce the file size of the PDF - change the way the plot is generated to reduce the plot size? I use:
2009 Aug 28
2
new data.frame summed by date
Hi, I wonder if someone can suggest how to create a new data.frame Y from X where X$PL_Pos is summed by each unique X$MyDate. Y should end up with two (or more) columns Y$MyDate and Y$PL_Sum with its value being the cumsum of all the values in X for that date. - a 'daily cumsum'. Thanks, Mark TStoDate = function (TSDate) { X = strptime(TSDate + 19e6L, "%Y%m%d")
2011 Aug 11
3
improve formatting of HTML table
I am trying to improve the look of an HTML table for a report (that needs to be pasted into Word). Here is an example. table2 <- structure(c(26L, 0L, 40L, 0L, 10L, 0L, 0L, 188L, 0L, 281L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L), .Dim = c(6L, 3L), .Dimnames = structure(list( myvar = c("Don't know", "Somewhat likely", "Somewhat unlikely", "Very
2011 May 19
1
Creating a "shifted" month (one that starts not on the first of each month but on another date)
Hello! I have a data frame with dates. I need to create a new "month" that starts on the 20th of each month - because I'll need to aggregate my data later by that "shifted" month. I wrote the code below and it works. However, I was wondering if there is some ready-made function in some package - that makes it easier/more elegant? Thanks a lot! # Example data:
2020 Mar 10
2
Fwd: Windows upssched does not work
Hi, I have problem with upssched on windows. Upssched is not executed. I have 2 scripts, 1 for notification in upsmon and second for scheduling in upssched. Monitoring is working fine, script write to log. I'm using binary windows installer 2.6.5-6 from NUT. Here are my configs: --- nut.conf MODE=netclient --- upsmon.conf MONITOR ups_1000 at 192.168.3.95 1 <user> <password> slave
2009 Mar 11
1
Is this a documentation bug? Spss dates import
Hello R-user bug seekers are needed! In order to perform these simple tasks you have to use a copy of SPSS and obviously R. The problem is that date conversion of data coming from SPSS gives wrong results, if we follow ?as.POSIXct ## SPSS dates (R-help 2006-02-17) z <- c(10485849600, 10477641600, 10561104000, 10562745600) as.Date(as.POSIXct(z, origin="1582-10-14",
2009 Jan 21
1
finding row and column indices of date in multiple columns of a data frame
Hi, I have a data.frame SAMPLES with columns: Site Site# Season Day1 Day2 Day3 Day1, Day2, Day3 are class "Date", the other columns are numeric or factor. I have a date "mydate" that may or may not be listed in my data.frame and I need to find that out. If "mydate" is there, I want to get the number of the data.frame row where it occurs.