thr3ads.net - R help - [R] Unable to read csv files with comma in values [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Amit Govil

2019-Apr-06 14:03 UTC

[R] Unable to read csv files with comma in values

Hi,

I have a bunch of csv files to read in R. I'm unable to read them correctly
because in some of the files, there is a column ("Role") which has
comma in
the values.

Sample data:

User, Role, Rule, GAPId
Sam, [HadoopAnalyst, DBA, Developer], R46443

I'm trying to play with the below code but it doesnt work:

files <- list.files(pattern='.*REDUNDANT(.*).csv$')

tbl <- sapply(files, function(f) {
  gsub('\\[|\\]', '"', readLines(f)) %>%
    read.csv(text = ., check.names = FALSE)
}) %>%
  bind_rows(.id = "id") %>%
  select(id, User, Rule) %>%
  distinct()

Please assist.

Thanks

	[[alternative HTML version deleted]]

Duncan Murdoch

2019-Apr-07 15:56 UTC

head link

[R] Unable to read csv files with comma in values

On 06/04/2019 10:03 a.m., Amit Govil wrote:> Hi,
> 
> I have a bunch of csv files to read in R. I'm unable to read them
correctly
> because in some of the files, there is a column ("Role") which
has comma in
> the values.
> 
> Sample data:
> 
> User, Role, Rule, GAPId
> Sam, [HadoopAnalyst, DBA, Developer], R46443
> 
> I'm trying to play with the below code but it doesnt work:
Since you didn't give a reproducible example, you should at least say 
what "doesn't work" means.

But here's some general advice:  if you want to debug code, don't write 
huge expressions like the chain of functions below, put things in 
temporary variables and make sure you get what you were expecting at 
each stage.

Instead of> 
> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
> 
> tbl <- sapply(files, function(f) {
>    gsub('\\[|\\]', '"', readLines(f)) %>%
>      read.csv(text = ., check.names = FALSE)
> }) %>%
>    bind_rows(.id = "id") %>%
>    select(id, User, Rule) %>%
>    distinct()
try


files <- list.files(pattern='.*REDUNDANT(.*).csv$')

tmp1 <- sapply(files, function(f) {
   gsub('\\[|\\]', '"', readLines(f)) %>%
     read.csv(text = ., check.names = FALSE)
})

tmp2 <- tmp1 %>% bind_rows(.id = "id")

tmp3 <- tmp2 %>% select(id, User, Rule)

tbl <- tmp3 %>% distinct()

(You don't need pipes here, but it will make it easier to put the giant 
expression back together at the end.)

Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went 
wrong.

Duncan Murdoch

Bert Gunter

2019-Apr-07 16:55 UTC

head link

[R] Unable to read csv files with comma in values

(Note: This follows an earlier mistaken reply just to Duncan)

Multiple "amens!" to Duncan's comments...

However:

Here is a start at my interpretation of how to do what you want. Note first
that your "example" listed 4 fields in the line, but you showed only
3. I
modified your example for 3 text fields, only one of which has brackets
([...]) in it I assume. Here is a little example of how to use regex's to
replace the commas within the brackets by "-", which would presumably
then
allow you to easily convert the text into a data frame e.g. using
textConnection() and read.csv. Obviously, if this is not what you meant,
read no further.

##Example
txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan,
DBA, R101",
        "Mary, [Stats, Designer, R], t14")

wh <- grep("\\[.+\\]",txt)  ## which records need to be modified?
fixup <- gsub(" *,
*","-",sub(".+(\\[.+\\]).+","\\1",txt[wh]))
## bracketed
expressions, changing "," to "-"

## Unfortunately, the "replacement" argument in sub() is not
vectorized, se
we need a loop:

for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ##
replace
original bracketed text with fixed up bracketed text
> txt[1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 "
[2] "Jan, DBA, R101"
[3] "Mary, [HadoopAnalyst-DBA-Developer], t14"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <murdoch.duncan at
gmail.com>
wrote:
> On 06/04/2019 10:03 a.m., Amit Govil wrote:
> > Hi,
> >
> > I have a bunch of csv files to read in R. I'm unable to read them
> correctly
> > because in some of the files, there is a column ("Role")
which has comma
> in
> > the values.
> >
> > Sample data:
> >
> > User, Role, Rule, GAPId
> > Sam, [HadoopAnalyst, DBA, Developer], R46443
> >
> > I'm trying to play with the below code but it doesnt work:
>
> Since you didn't give a reproducible example, you should at least say
> what "doesn't work" means.
>
> But here's some general advice:  if you want to debug code, don't
write
> huge expressions like the chain of functions below, put things in
> temporary variables and make sure you get what you were expecting at
> each stage.
>
> Instead of
> >
> > files <- list.files(pattern='.*REDUNDANT(.*).csv$')
> >
> > tbl <- sapply(files, function(f) {
> >    gsub('\\[|\\]', '"', readLines(f)) %>%
> >      read.csv(text = ., check.names = FALSE)
> > }) %>%
> >    bind_rows(.id = "id") %>%
> >    select(id, User, Rule) %>%
> >    distinct()
>
> try
>
>
> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>
> tmp1 <- sapply(files, function(f) {
>    gsub('\\[|\\]', '"', readLines(f)) %>%
>      read.csv(text = ., check.names = FALSE)
> })
>
> tmp2 <- tmp1 %>% bind_rows(.id = "id")
>
> tmp3 <- tmp2 %>% select(id, User, Rule)
>
> tbl <- tmp3 %>% distinct()
>
> (You don't need pipes here, but it will make it easier to put the giant
> expression back together at the end.)
>
> Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
> wrong.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Apr 2019 - Unable to read csv files with comma in values

[R] Unable to read csv files with comma in values

[R] Unable to read csv files with comma in values

[R] Unable to read csv files with comma in values