Displaying 20 results from an estimated 8000 matches similar to: "data manipulation and summaries with few million rows"
2011 Aug 31
1
formatting a 6 million row data set; creating a censoring variable
List,
Consider the following data.
gender mygroup id
1 F A 1
2 F B 2
3 F B 2
4 F B 2
5 F C 2
6 F C 2
7 F C 2
8 F D 2
9 F D 2
10 F D 2
11 F D 2
12 F D 2
13 F D 2
14 M A 3
15 M A 3
16 M A 3
17
2011 Nov 29
2
aggregate syntax for grouped column means
I am calculating the mean of each column grouped by the variable 'id'.
I do this using aggregate, data.table, and plyr. My aggregate results
do not match the other two, and I am trying to figure out what is
incorrect with my syntax. Any suggestions? Thanks.
Here is the data.
myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61,
30.59, 30.84, 30.98, 30.79, 30.79,
2009 Dec 08
1
data manipulation/subsetting and relation matrix
Hi List,
Here is some example data.
myDat <- read.table(textConnection("group id
1 101
1 201
1 301
2 401
2 501
2 601
3 701
3 801
3 901"),header=TRUE)
closeAllConnections()
corr_mat <-read.table(textConnection("1 1 .5 0 0 0 0 0 0 0
2 .5 1 0 0 0 0 0 0 0
3 0 0 1.0 0 0 0 0 0 0
4 0 0 0 1 .5 .5 0 0 0
5 0 0 0 .5 1
2010 Aug 10
1
partial match of one column in data frame to another character vector
Here is some data (dput output below)
> myData
id group
1 D599 A
2 002-0004 B
3 F01932 A
18 F16 B
19
2024 Sep 21
3
store list objects in data.table
I am trying to store regression objects in a data.table
df <- data.frame(x = rnorm(20))
df[, "y"] <- with(df, x + 0.1 * x^2 + 0.2 * rnorm(20))
mydt <- data.table(mypower = c(1, 2), myreg = list(lm(y ~ x, data = df),
lm(y ~ x + I(x^2), data = df)))
mydt
#?? mypower??? myreg
#???? <num>?? <list>
#1:?????? 1 <lm[12]>
#2:?????? 2 <lm[12]>
But mydt[1, 2]
2009 Oct 06
2
Extracting year from a date object
Hi all,
this one left me a bit puzzled, as I don't seem to find a function to
perform this easily. I must have overlooked the obvious, so sorry in
advance.
I have a list of dates in numerical format (i.e. 34576), defined as
the number of days that passed since january 1st 1900. So I apply the
function :
> MyDate <-as.Date(34576,origin="1900-01-01")
> MyDate
[1]
2011 Jul 14
2
cbind in aggregate formula - based on an existing object (vector)
Hello!
I am aggregating using a formula in aggregate - of the type:
aggregate(cbind(var1,var2,var3)~factor1+factor2,sum,data=mydata)
However, I actually have an object (vector of my variables to be aggregated):
myvars<-c("var1","var2","var3")
I'd like my aggregate formula (its "cbind" part) to be able to use my
"myvars" object. Is it
2006 May 21
1
POSIX, time zone and Windows
Dear Listers,
Apologize to pile up on the 'tz' issue in POSIX objects. I have a
'simple' thing on which I must make up my mind but cannot do it from the
existing R-help threads. I am currently working on dog telemetry in
China, and download time information from GPS collars. I would like to
set up the corresponding POSIXxx variables in R to a given time zone. Eg
Pekin
2017 Sep 18
1
Data arrangement for PLSDA using the ropls package
Hello,
I would like to do a partial least square discriminant analysis (PLSDA) in R using the package "ropls"
Which is in R available via the R command :
source("https://bioconductor.org/biocLite.R")
I try to do a PLSDA to illustrate the impact of two genders (AP,C) on 5 compounds measured in persons (samples) should be illustrated. When I try to do a PLSDA I get the warning
2011 Apr 18
1
using "aggregate" when variable names contain spaces
Hello!
my data set has many variables. Unfortuantely, many of those variables
contain spaces in their names.
I need advice on: how to refer to variable names in the formula for
"aggregate". See example below:
### Generating example data set:
mydate = rep(seq(as.Date("2008-12-01"), length = 3, by = "month"),4)
value1=c(1,10,100,2,20,200,3,30,300,4,40,400)
2011 Mar 01
2
Does POSIXlt extract date components properly?
I would like to use POSIX classes to store dates and extract components of
dates. Following the example in Spector ("Data Manipulation in R"), I
create a date
> mydate = as. POSIXlt('2005-4-19 7:01:00')
I then successfully extract the day with the command
> mydate$day
[1] 19
But when I try to extract the month
> mydate$mon
[1] 3
it returns the wrong month. And
2011 Apr 04
2
merging 2 frames while keeping all the entries from the "reference" frame
Hello!
I have my data frame "mydata" (below) and data frame "reference" -
that contains all the dates I would like to be present in the final
data frame.
I am trying to merge them so that the the result data frame contains
all 8 dates in both subgroups (i.e., Group1 should have 8 rows and
Group2 too). But when I merge it it's not coming out this way. Any
hint would be
2013 Apr 02
2
Create a vector without using an external 'if statement'
Dear R-users,
suppose I have three dataframes like these
df1:
mydate min_temp
31032013 12
01042013 8
02042013 -999
df2:
mydate min_temp
31032013 10
01042013 11
02042013 14
df3:
mydate min_temp
31032013 4
01042013 3
02042013 5
where -999 means that the temperature data is not available (at the moment I cannot change it to NA because I am not the db administrator);
suppose also that oggi is
2011 Oct 27
1
plotting large time series
hello,
I got a problem with plotting large time series, since I want to store
the results in a .PDF file (I want to store several pages of plots). The
PDF files get too large to be handled (> 10MB, one was even 200MB big).
So I wonder, if there would be a possibilty to either
- reduce the file size of the PDF
- change the way the plot is generated to reduce the plot size?
I use:
2009 Aug 28
2
new data.frame summed by date
Hi,
I wonder if someone can suggest how to create a new data.frame Y
from X where X$PL_Pos is summed by each unique X$MyDate. Y should end
up with two (or more) columns Y$MyDate and Y$PL_Sum with its value
being the cumsum of all the values in X for that date. - a 'daily
cumsum'.
Thanks,
Mark
TStoDate = function (TSDate) {
X = strptime(TSDate + 19e6L, "%Y%m%d")
2011 Aug 11
3
improve formatting of HTML table
I am trying to improve the look of an HTML table for a report (that
needs to be pasted into Word).
Here is an example.
table2 <- structure(c(26L, 0L, 40L, 0L, 10L, 0L, 0L, 188L, 0L, 281L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 4L), .Dim = c(6L, 3L), .Dimnames = structure(list(
myvar = c("Don't know", "Somewhat likely", "Somewhat unlikely",
"Very
2011 May 19
1
Creating a "shifted" month (one that starts not on the first of each month but on another date)
Hello!
I have a data frame with dates. I need to create a new "month" that
starts on the 20th of each month - because I'll need to aggregate my
data later by that "shifted" month.
I wrote the code below and it works. However, I was wondering if there
is some ready-made function in some package - that makes it
easier/more elegant?
Thanks a lot!
# Example data:
2020 Mar 10
2
Fwd: Windows upssched does not work
Hi,
I have problem with upssched on windows. Upssched is not executed. I
have 2 scripts, 1 for notification in upsmon and second for scheduling
in upssched. Monitoring is working fine, script write to log. I'm
using binary windows installer 2.6.5-6 from NUT.
Here are my configs:
--- nut.conf
MODE=netclient
--- upsmon.conf
MONITOR ups_1000 at 192.168.3.95 1 <user> <password> slave
2009 Mar 11
1
Is this a documentation bug? Spss dates import
Hello R-user
bug seekers are needed!
In order to perform these simple tasks you have to use a copy of SPSS
and obviously R.
The problem is that date conversion of data coming from SPSS
gives wrong results, if we follow ?as.POSIXct
## SPSS dates (R-help 2006-02-17)
z <- c(10485849600, 10477641600, 10561104000, 10562745600)
as.Date(as.POSIXct(z, origin="1582-10-14",
2009 Jan 21
1
finding row and column indices of date in multiple columns of a data frame
Hi,
I have a data.frame SAMPLES with columns:
Site Site# Season Day1 Day2 Day3
Day1, Day2, Day3 are class "Date", the other columns are numeric or
factor.
I have a date "mydate" that may or may not be listed in my data.frame
and I need to find that out.
If "mydate" is there, I want to get the number of the data.frame row
where it occurs.