Dear friend Duncan,
Thank you so much for your kind reply. Yes, that is exactly what is
happening, there are a lot of NA values at the start, so R assumes that the
field is of type boolean. The challenge that I am facing is that I want to
read into R an Excel file that has many sheets (46 in this case) but I
wanted to combine all 46 sheets into a single dataframe (since the columns
are exactly the same for all 46 sheets). The rio package does this nicely,
the problem is that, once I have the full dataframe (which amounts to
roughly 2.98 million rows total), I cannot change the data type from
boolean to numeric. I tried doing dataset$my_field as.numeric(dataset$my_field),
I also tried to do dataset <-
dataset[complete.cases(dataset), ], didn't work either.
The only thing that worked for me was to take a single sheed and through
the read_excel function use the guess_max parameter and set it to a
sufficiently large number (a number >= to the total amount of the full
merged dataset). I want to automate the merging of the N number of Excel
sheets so that I don't have to be manually doing it. Unless there is a way
to accomplish something similar to what rio's package function import_list
does, that is able to keep the field's numeric data type nature.
Cheers,
Paul
El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (<murdoch.duncan at
gmail.com>)
escribi?:
> On 30/01/2024 11:10 a.m., Paul Bernal wrote:
> > Dear friends,
> >
> > Hope you are doing well. I am currently using R version 4.3.2, and I
> have a
> > .xlsx file that has 46 sheets on it. I basically combined all 46
sheets
> > and read them as a single dataframe in R using package rio.
> >
> > I read a solution using package readlx, as suggested in a
StackOverflow
> > discussion as follows:
> > df <- read_excel(path = filepath, sheet = sheet_name, guess_max
> 100000).
> > Now, when you have so many sheets (46 in my case) in an Excel file,
the
> rio
> > methodology is more practical.
> >
> > This is what I did:
> > path > >
>
"C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
> > (4).xlsx"
> > figidat = import_list(path, rbind = TRUE) #here figidat refers to my
> dataset
> >
> > Now, it successfully imports and merges all records, however, some
fields
> > (despite being numeric), R interprets as a boolean field.
> >
> > Here is the structure of the field that is causing me problems (I
> apologize
> > for the length):
> > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> ...
> > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
> > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl",
"data.frame"
> > ), row.names = c(NA, -7033L))
> >
> > As you can see, when I do the dput, it gives me a bunch of TRUE and
FALSE
> > values, when in reality I have records with value $0, records with
> amounts
> >> $0 and also a bunch of blank records.
> >
> > Any help will be greatly appreciated.
>
> I don't know how read_excel() determines column types, but some
> functions look only at the first n rows to guess the type. It appears
> you have a lot of NA values at the start. That is a logical value, so
> that might be what is going wrong.
>
> In read.table() and related functions, you can specify the types of
> column explicitly. It sounds as though that's what you should do if
> read_excel() offers that as a possibility.
>
> Duncan Murdoch
>
[[alternative HTML version deleted]]