Messed up did not see your 'desired' output which will be hard since
there
is not a consistent number of spaces that would represent the desired
column number. Do you have any hit as to how to interpret the spacing
especially you have several hundred more lines? Is the output supposed to
the 'fixed' field?
Jim Holtman
*Data Munger Guru*
*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*
On Mon, Feb 22, 2021 at 5:00 PM jim holtman <jholtman at gmail.com> wrote:
> Try this:
>
> > library(tidyverse)
>
> > text <- "x1 x2 x3 x4\n1 B12 \n2 C23 \n322 B32
D34 \n4
> D44 \n51 D53\n60 D62 "
>
> > # read in the data as characters and replace multiple blanks with
single
> blank
> > input <- read_lines(text)
>
> > input <- str_replace_all(input, ' +', ' ')
>
> > mydata <- read_delim(input, ' ', col_names = TRUE)
> Warning: 5 parsing failures.
> row col expected actual file
> 1 -- 4 columns 3 columns literal data
> 2 -- 4 columns 3 columns literal data
> 4 -- 4 columns 3 columns literal data
> 5 -- 4 columns 2 columns literal data
> 6 -- 4 columns 3 columns literal data
>
> > mydata
> # A tibble: 6 x 4
> x1 x2 x3 x4
> <dbl> <chr> <chr> <lgl>
> 1 1 B12 NA NA
> 2 2 C23 NA NA
> 3 322 B32 D34 NA
> 4 4 D44 NA NA
> 5 51 D53 NA NA
> 6 60 D62 NA NA
> >
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Mon, Feb 22, 2021 at 4:49 PM Val <valkremk at gmail.com> wrote:
>
>> That is my problem. The spacing between columns is not consistent. It
>> may be single space or multiple spaces (two or three).
>>
>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <williamwdunlap at
gmail.com>
>> wrote:
>> >
>> > You said the column values were separated by space characters.
>> > Copying the text from gmail shows that some column names and
column
>> > values are separated by single spaces (e.g., between x1 and x2)
and
>> > some by multiple spaces (e.g., between x3 and x4. Did the mail
mess
>> > up the spacing or is there some other way to tell where the
omitted
>> > values are?
>> >
>> > -Bill
>> >
>> > On Mon, Feb 22, 2021 at 2:54 PM Val <valkremk at gmail.com>
wrote:
>> > >
>> > > I Tried that one and it did not work. Please see the error
message
>> > > Error in read.table(text = "x1 x2 x3 x4\n1 B12 \n2
C23
>> > > \n322 B32 D34 \n4 D44 \n51 D53\n60 D62
",
>> > > :
>> > > more columns than column names
>> > >
>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
<williamwdunlap at gmail.com>
>> wrote:
>> > > >
>> > > > Since the columns in the file are separated by a space
character, "
>> ",
>> > > > add the read.table argument sep=" ".
>> > > >
>> > > > -Bill
>> > > >
>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val <valkremk at
gmail.com> wrote:
>> > > > >
>> > > > > Hi all, I am trying to read a messy data but
facing
>> difficulty. The
>> > > > > data has several columns separated by blank
space(s). Each column
>> > > > > value may have different lengths across the rows.
The first
>> > > > > row(header) has four columns. However, each row may
not have the
>> four
>> > > > > column values. For instance, the first data row
has only the
>> first
>> > > > > two column values. The fourth data row has the
first and last
>> column
>> > > > > values, the second and the third column values are
missing for
>> this
>> > > > > row.. How do I read this data set correctly? Here
is my sample
>> data
>> > > > > set, output and desired output. To make it clear
to each data
>> point
>> > > > > I have added the row and column numbers. I cannot
use fixed width
>> > > > > format reading because each row may have different
length for a
>> > > > > given column.
>> > > > >
>> > > > > dat<-read.table(text="x1 x2 x3 x4
>> > > > > 1 B22
>> > > > > 2 C33
>> > > > > 322 B22 D34
>> > > > > 4 D44
>> > > > > 51 D53
>> > > > > 60 D62 ",header=T,
fill=T,na.strings=c("","NA"))
>> > > > >
>> > > > > Output
>> > > > > x1 x2 x3 x4
>> > > > > 1 1 B12 <NA> NA
>> > > > > 2 2 C23 <NA> NA
>> > > > > 3 322 B32 D34 NA
>> > > > > 4 4 D44 <NA> NA
>> > > > > 5 51 D53 <NA> NA
>> > > > > 6 60 D62 <NA> NA
>> > > > >
>> > > > >
>> > > > > Desired output
>> > > > > x1 x2 x3 x4
>> > > > > 1 1 B22 <NA> NA
>> > > > > 2 2 <NA> C33 NA
>> > > > > 3 322 B32 NA D34
>> > > > > 4 4 <NA> NA D44
>> > > > > 5 51 <NA> D53 NA
>> > > > > 6 60 D62 <NA> NA
>> > > > >
>> > > > > Thank you,
>> > > > >
>> > > > > ______________________________________________
>> > > > > R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > > > > and provide commented, minimal, self-contained,
reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
[[alternative HTML version deleted]]