This discussion is a bit weird so can we step back.
Someone wants help on how to read in a file that apparently was not written
following one of several consistent sets of rules.
If it was fixed width, R has functions that can read that.
If it was separated by commas, tabs, single spaces, arbitrary whitespace,
with or without a header line, we have functions that can read that if
properly called.
ALL the above normally assume that all the resulting columns are the same
length. If any are meant to be shorter, you still leave the separators in
place and put some NA or similar into the result. And, the functions we
normally talk about do NOT read in and produce multiple vectors but
something like a data.frame.
So the choice is either to make sure the darn data is in a consistent
format, or try a different plan. Fair enough?
Some are suggesting parsing it yourself line by line. Certainly that can be
done. But unless you know some schema to help you disambiguate, what do you
do it you reach a row that is too short and has enough data for two columns.
Which of the columns do you assign it to? If you had a clear rule, ...
And what if you have different data types? R does not handle that within a
single vector or row of a data.frame, albeit it can if you make it a list
column.
If this data is a one-time thing, perhaps it should be copied into something
like EXCEL by a human and edited so every column is filled as you wish and
THEN saved as something like a CSV file and then it can happily be imported
the usual way, including NA values as needed.
If the person really wants 4 independent vectors of different lengths to
read in, there are plenty of ways to do that and no need to lump them in
this odd format.
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of jim holtman
Sent: Monday, February 22, 2021 9:01 PM
To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
Cc: r-help at R-project.org (r-help at r-project.org) <r-help at
r-project.org>
Subject: Re: [R] Read
It looks like we can look at the last digit of the data and that would be
the column number; is that correct?
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Mon, Feb 22, 2021 at 5:34 PM Jeff Newmiller <jdnewmil at
dcn.davis.ca.us>
wrote:>
> This gets it into a data frame. If you know which columns should be
numeric you can convert them.>
> s <-
> "x1 x2 x3 x4
> 1 B22
> 2 C33
> 322 B22 D34
> 4 D44
> 51 D53
> 60 D62
> "
>
> tc <- textConnection( s )
> lns <- readLines(tc)
> close(tc)
> if ( "" == lns[ length( lns ) ] )
> lns <- lns[ -length( lns ) ]
>
> L <- strsplit( lns, " +" )
> m <- do.call( rbind, lapply( L[-1], function(v) if
> (length(v)<length(L[[1]])) c( v, rep(NA, length(L[[1]]) - length(v) )
> ) else v ) ) colnames( m ) <- L[[1]] result <- as.data.frame( m,
> stringsAsFactors = FALSE ) result
>
> On February 22, 2021 4:42:57 PM PST, Val <valkremk at gmail.com>
wrote:
> >That is my problem. The spacing between columns is not consistent.
> >It
> > may be single space or multiple spaces (two or three).
> >
> >On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap
> ><williamwdunlap at gmail.com>
> >wrote:
> >>
> >> You said the column values were separated by space characters.
> >> Copying the text from gmail shows that some column names and
column
> >> values are separated by single spaces (e.g., between x1 and x2)
and
> >> some by multiple spaces (e.g., between x3 and x4. Did the mail
> >> mess up the spacing or is there some other way to tell where the
> >> omitted values are?
> >>
> >> -Bill
> >>
> >> On Mon, Feb 22, 2021 at 2:54 PM Val <valkremk at gmail.com>
wrote:
> >> >
> >> > I Tried that one and it did not work. Please see the error
message
> >> > Error in read.table(text = "x1 x2 x3 x4\n1 B12 \n2
C23
> >> > \n322 B32 D34 \n4 D44 \n51 D53\n60 D62
> >",
> >> > :
> >> > more columns than column names
> >> >
> >> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
> ><williamwdunlap at gmail.com> wrote:
> >> > >
> >> > > Since the columns in the file are separated by a space
> >> > > character,
> >" ",
> >> > > add the read.table argument sep=" ".
> >> > >
> >> > > -Bill
> >> > >
> >> > > On Mon, Feb 22, 2021 at 2:21 PM Val <valkremk at
gmail.com> wrote:
> >> > > >
> >> > > > Hi all, I am trying to read a messy data but
facing
> >difficulty. The
> >> > > > data has several columns separated by blank
space(s). Each
> >column
> >> > > > value may have different lengths across the rows.
The first
> >> > > > row(header) has four columns. However, each row may
not have
> >the four
> >> > > > column values. For instance, the first data row
has only the
> >first
> >> > > > two column values. The fourth data row has the
first and last
> >column
> >> > > > values, the second and the third column values are
missing
> >> > > > for
> >this
> >> > > > row.. How do I read this data set correctly? Here
is my
> >> > > > sample
> >data
> >> > > > set, output and desired output. To make it clear
to each data
> >point
> >> > > > I have added the row and column numbers. I cannot
use fixed
> >width
> >> > > > format reading because each row may have different
length
> >> > > > for
> >a
> >> > > > given column.
> >> > > >
> >> > > > dat<-read.table(text="x1 x2 x3 x4
> >> > > > 1 B22
> >> > > > 2 C33
> >> > > > 322 B22 D34
> >> > > > 4 D44
> >> > > > 51 D53
> >> > > > 60 D62 ",header=T,
fill=T,na.strings=c("","NA"))
> >> > > >
> >> > > > Output
> >> > > > x1 x2 x3 x4
> >> > > > 1 1 B12 <NA> NA
> >> > > > 2 2 C23 <NA> NA
> >> > > > 3 322 B32 D34 NA
> >> > > > 4 4 D44 <NA> NA
> >> > > > 5 51 D53 <NA> NA
> >> > > > 6 60 D62 <NA> NA
> >> > > >
> >> > > >
> >> > > > Desired output
> >> > > > x1 x2 x3 x4
> >> > > > 1 1 B22 <NA> NA
> >> > > > 2 2 <NA> C33 NA
> >> > > > 3 322 B32 NA D34
> >> > > > 4 4 <NA> NA D44
> >> > > > 5 51 <NA> D53 NA
> >> > > > 6 60 D62 <NA> NA
> >> > > >
> >> > > > Thank you,
> >> > > >
> >> > > > ______________________________________________
> >> > > > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
> >see
> >> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > > > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> > > > and provide commented, minimal, self-contained,
reproducible
> >code.
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.