Displaying 20 results from an estimated 60000 matches similar to: "Why does aggregate fail?"
2010 Feb 02
Writing out csv files
In my code, I calculate the maximum values with 2 factors using
maxr=with(arrdf, tapply(rate,list(weekday,quarter), max, na.rm=T))
and I want to write out the file so that Excel can read it.
I used
write.table(maxr, fname, sep=",", col.names=TRUE, row.names=TRUE,
quote=TRUE, na="0")
which works, and yields something like
2010 Apr 01
pdf files in loops
I need to make a bunch of PDF files of histograms. I tried
gatelist = unique(mdf$ArrivalGate)
for( gate in gatelist) {
outfile = paste("../", airport, "/", airport, "taxiHistogram", gate,
".pdf", sep="")
pdf(file = outfile, width = 10, height=8, par(lwd=1))
title=paste("Taxi time for Arrival Gate", gate, "by
2010 Jan 18
Using the output of strsplit
I successfully combined my data frames, and am now on my next hurdle.
I had combined the data and quarter, and used tapply to count the
entries for each unique date/quarter pair.
ar= tapply(ewrgnd$gw, list(ewrgnd$dq), sum) #for each date/quarter
combination sums the gw (which are all 1)
But I need to split them back into the separate date and quarter. So I
2010 Jan 16
Comparing dates in dataframes
I have two data frames. One (arr) has all arrivals to an airport for a
year, and the other (gw) has the dates and quarter hour of the day when
the weather is good. arr has a Date and quarter hour column.
[1] "Date" "weekday" "hour" "month" "minute"
[6] "quarter" "ICAO"
2010 Apr 16
bwplot puts the bars in the wrong place
Dear R-Help,
With the attached data set, I am still getting incorrect bwplots
> xyplot(gdf$tt~gdf$OnHour |gdf$Runway, data=gdf) # Is correct
> bwplot(gdf$tt~gdf$OnHour |gdf$Runway, data=gdf, horizontal=FALSE) #
Puts the boxes on the wrong x-axis values
# look especially at 0 and 3. How do I fix this?
What is happening?
Jim Rome
2006 Apr 24
Can you improve on this code?
# File app/models/timesheet.rb, line 27
27: def totals
28: totals = Hash.new
29: totals["Monday"] = totals["Tuesday"] = totals["Wednesday"] =
totals["Thursday"] = totals["Friday"] = totals["Saturday"] =
totals["Sunday"] = totals["Totals"]=0 #initialise all to zero
32: for item in
2010 Mar 20
How to select a row from one dataframe that is "close" to a row in another dataframe
I have two data frames of flight data, but they have very different
numbers of rows. They come from different sources, so the data are not
> names(oooi)
[1] "FltOrigDt" "MkdCrrCd"
[3] "MkdFltNbr" "DprtTrpnStnCd"
[5] "ArrTrpnStnCd" "ActualOutLocalTimestamp"
2011 May 25
Importing fixed-width data
I have a data set where the lines look like:
2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
Some lines are missing the field before and after the NON:
2011-05-13 00:00:05 EONBHS229 mia13001621NON
I read them into R using
df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),
2011 Aug 14
Not sure how to use aggregate, colSums, by
I have a data frame called test shown below that i would like to summarize in
a particular way :
I want to show the column sums (columns y ,f) grouped by country (column
e1). However, I'm looking for the data to be split according to column e2.
In other words, two tables of sum by country. One table for "con" and one
table for "std" shown in column e2. Finally at the
2009 Dec 30
What am I doing wrong in my loops?
Dear kind list people:
I have the following code:
[1] "0" "1" "2" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15"
[16] "16" "17" "18" "19" "20" "21" "22"
2003 Oct 22
High frequency time-series
Having to collect hourly electricity loads and quarter-of-an-hour electricity production data for some years I think that the tidiest way of doing it is to resort to ts but I don't know how to define such a frequency starting from a set date.
Leafing through r-help mail archives I've found this *ALMOST* satisfactory message:
2009 Dec 24
How to separate a data set by its factors
I have a large data set of airport data and wish to analyze it by hour
and day of the week. hour and day of the week are factors.
I can do something such as:
histogram(~(Arrival.Val) | DAY*Hour, type="count", breaks=60)
which displays the data the way I want it in principle, but the plots
are too small to read. I added layout=c(7,6,4) to the argument list, but
then I only get the first
2011 Jun 10
Double x grid in ggplot2
I am trying to overlay raw data with a boxplot as follows:
pp = qplot(factor(time, levels=0:60, ordered=TRUE),
error, data=dfsub, size=I(1), main =" title", ylab="Error
xlab="Time before ON (min)", alpha=I(1/10),
ylim=c(-30,40), geom="jitter") +
facet_wrap(~ runway, ncol=2) +
2011 Jul 17
How to speed up interpolation
df is a very large data frame with arrival estimates for many flights
(DF$flightfact) at random times (df$PredTime). The error of the estimate
is df$dt.
My problem is that I want to know the prediction error at each minute
before landing. This code works, but is very slow, and dominates
everything. I tried using split(), but that rapidly ate up my 12 GB of
memory. So, is there a better R way of
2011 Jun 08
How to suppress factor labels
I am using ggplot2 to make a boxplot that overlays a scatterplot:
pp = qplot(time, error, data=times, size=I(1), geom="jitter", main=title,
ylab="Error (min)", xlab="Time before ON (min)", alpha=I(1/10),
pp2 = pp + with(times, facet_wrap(~ runway, ncol=2))
print(pp2 + geom_boxplot(alpha=.5,
2006 Sep 25
Beginner question: select cases
Hello all,
I hope i chose the right list as my question is a beginner-question.
I have a data set with 3 colums "London", "Rome" and "Vienna" - the
location is presented through a 1 like this:
London Rome Vienna q1
0 0 1 4
0 1 0 2
1 0 0 3
I just want to calculate the means of a variable q1.
I tried following script:
# calculate the mean
2011 Feb 24
Rome TW on Ubuntu 10.10 (maverick)
I was asked to provide info about my attempt to run Rome TW on my Linux system. Not sure what info exactly is sought, so please ask for additional information.
The system I tried to use is a Dell Latitude D830 with 3 GB ram and the latest Bios version A15
I've installed Ubuntu 10.10 32bit. I did install Compiz Fusion from standard Ubuntu repo's.
Additionaly, I installed Wine 1.2
2008 Sep 22
zoo: hourly values (local time) not unique
I've got a time series as a zoo object which contains hourly values. My problem is that these values occur in every "real" hour with regard to daylight savings time. I.e. the last sunday in march, i'll have 23values whereas the last sunday in october contains 25 values instead of 24.
Thus if I try to aggregate the data using for example tapply (e.g. to get a monthly mean),
2012 Oct 25
trying ti use a function in aggregate
Hi -I am using R v 2.13.0. I am trying to use the aggregate function to
calculate the percent at length for each Trip_id and CommonName. Here is a
small subset of the data.
Trip_id Vessel CommonName Length Count
1 230 Sunlight Shad,American 19 1
2 230 Sunlight Shad,American 20 1
3 230 Sunlight Shad,American 21
2009 Dec 26
Why do histogram bars vary their width?
histogram(~(Arrival4) | as.factor(Hour), type="count",
breaks=16,ylab="Arrival Count",
xlab="Arrival Rate/4",main="Friday EWR A22R D22L Configiration",
layout=c(6,4), par.strip.text=list(cex=0.7))
Why do I get plots with different bar widths? See attached.
Jim Rome
-------------- next part --------------
A non-text attachment was scrubbed...