I get a data frame on my end:
lines <- "2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
2011-05-13 00:00:05 EONBHS229 mia13001621NON"
df = read.fwf(textConnection(lines), widths=c(19,-4,7,3,8,2,1,3,1),
col.names=c("DateTime","Flight","Dest","ArrTime","MsgType","Conf","Runway","Source"),
colClasses=c("POSIXct",NA,"factor","factor","character","factor","factor","factor"))> df
DateTime Flight Dest ArrTime MsgType Conf Runway Source
1 2011-05-13 00:00:00 AAL330 dfa 13002516 PS C NON A
2 2011-05-13 00:00:01 AAL223 laa 13044510 AS . NON M
3 2011-05-13 00:00:05 BHS229 mia 13001621 NO N <NA>
<NA>> str(df)
'data.frame': 3 obs. of 8 variables:
$ DateTime: POSIXct, format: "2011-05-13 00:00:00" "2011-05-13
00:00:01" ...
$ Flight : Factor w/ 3 levels "AAL223 ","AAL330 ",..: 2 1
3
$ Dest : Factor w/ 3 levels "dfa","laa","mia":
1 2 3
$ ArrTime : Factor w/ 3 levels "13001621","13002516",..: 2
3 1
$ MsgType : chr "PS" "AS" "NO"
$ Conf : Factor w/ 3 levels ".","C","N": 2 1 3
$ Runway : Factor w/ 1 level "NON": 1 1 NA
$ Source : Factor w/ 2 levels "A","M": 1 2 NA
> sessionInfo()
R version 2.13.0 Patched (2011-04-19 r55523)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets grid methods
[8] base
other attached packages:
[1] gplots_2.8.0 caTools_1.12 bitops_1.0-4.1 gdata_2.8.2
[5] gtools_2.6.2 sos_1.3-0 brew_1.0-6 lattice_0.19-26
[9] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2
loaded via a namespace (and not attached):
[1] tools_2.13.0
Dennis
On Wed, May 25, 2011 at 8:42 AM, James Rome <jamesrome at gmail.com>
wrote:> I have a data set where the lines look like:
> 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
> 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
> Some lines are missing the field before and after the NON:
> 2011-05-13 00:00:05 EONBHS229 mia13001621NON
>
> I read them into R using
> ? ?df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),
>
>
col.names=c("DateTime","Flight","Dest","ArrTime","MsgType","Conf","Runway","Source"),
>
>
colClasses=c("POSIXct",NA,"factor","factor","character","factor","factor","factor"))
>
> The documentation for read.fwf says that the data are read into a
> dataframe. Yet, I get a list, and the conversions I specified do not
> seem to have been obeyed:
>> df[1:20,]
> ? ? ? ? ? ? ? ? ? ? ? ? DateTime ?Flight Dest ?ArrTime MsgType Conf
> Runway Source
> 1 ?2011-05-13 00:00:00 AAL330 ? dfa 13002516 ? ? ?PS ? ?C ? ?NON ? ? ?A
> 2 ?2011-05-13 00:00:01 AAL223 ? laa 13044510 ? ? ?AS ? ?. ? ?NON ? ? ?M
> . . .
>> sapply(df, mode)
> ? DateTime ? ? ?Flight ? ? ? ?Dest ? ? ArrTime ? ? MsgType ? ? ? ?Conf
> ?"numeric" ? "numeric" ? "numeric" ?
"numeric" "character" ? "numeric"
> ? ? Runway ? ? ?Source
> ?"numeric" ? "numeric"
>> dfn = df[!is.na(df$Source),]
>> mode(df)
> [1] "list"
>
> What am I doing wrong?
>
> Thanks,
> Jim Rome
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>