Hi Bert,
Below the information you asked me for:
nrow(mydataset)
[1] 2986276
########
sapply(mydataset, "class")
$`Transit Date`
[1] "POSIXct" "POSIXt"
$`Market Segment`
[1] "character"
$`N?mero de Tr?nsitos`
[1] "numeric"
$`Tar No`
[1] "character"
$`Beam Range (Operations)`
[1] "character"
$`Operational Vessel Ranges Group`
[1] "character"
$`Rcnst PCUMS`
[1] "numeric"
$`Toll Amount`
[1] "numeric"
$Beam
[1] "numeric"
$Length
[1] "numeric"
$`Trn Draft (FT)`
[1] "numeric"
$`Other Income Amt`
[1] "numeric"
$`Total Other Income Amount`
[1] "logical"
$`Booking Charges`
[1] "numeric"
$`Booking Cancellation`
[1] "logical"
$`Booking Auction`
[1] "logical"
$`_file`
[1] "integer"
Hope this helps you understand what I am dealling with.
Cheers,
Paul
El mar, 30 ene 2024 a las 14:19, Bert Gunter (<bgunter.4567 at gmail.com>)
escribi?:
> Incidentally, "didn't work" is not very useful information.
Please tell us
> exactly what error message or apparently aberrant result you received.
> Also, what do you get from:
>
> sapply(your_dataframe, "class")
> nrow(your_dataframe)
>
> (as I suspect what you think it is, isn't).
>
> Cheers,
> Bert
>
> On Tue, Jan 30, 2024 at 11:01?AM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
>
>> "I cannot change the data type from
>> boolean to numeric. I tried doing dataset$my_field >>
as.numeric(dataset$my_field), I also tried to do dataset <-
>> dataset[complete.cases(dataset), ], didn't work either. "
>>
>> Sorry, but all I can say is: huh?
>>
>> > dt <- data.frame(a = c(NA,NA, FALSE, TRUE), b = 1:4)
>> > dt
>> a b
>> 1 NA 1
>> 2 NA 2
>> 3 FALSE 3
>> 4 TRUE 4
>> > sapply(dt, class)
>> a b
>> "logical" "integer"
>> > dt$a <- as.numeric(dt$a)
>> > dt
>> a b
>> 1 NA 1
>> 2 NA 2
>> 3 0 3
>> 4 1 4
>> > sapply(dt, class)
>> a b
>> "numeric" "integer"
>>
>> So either I'm missing something or you are. Happy to be corrected
and
>> chastised if the former.
>>
>> Cheers,
>> Bert
>>
>>
>> On Tue, Jan 30, 2024 at 10:41?AM Paul Bernal <paulbernal07 at
gmail.com>
>> wrote:
>>
>>> Dear friend Duncan,
>>>
>>> Thank you so much for your kind reply. Yes, that is exactly what is
>>> happening, there are a lot of NA values at the start, so R assumes
that
>>> the
>>> field is of type boolean. The challenge that I am facing is that I
want
>>> to
>>> read into R an Excel file that has many sheets (46 in this case)
but I
>>> wanted to combine all 46 sheets into a single dataframe (since the
>>> columns
>>> are exactly the same for all 46 sheets). The rio package does this
>>> nicely,
>>> the problem is that, once I have the full dataframe (which amounts
to
>>> roughly 2.98 million rows total), I cannot change the data type
from
>>> boolean to numeric. I tried doing dataset$my_field >>>
as.numeric(dataset$my_field), I also tried to do dataset <-
>>> dataset[complete.cases(dataset), ], didn't work either.
>>>
>>> The only thing that worked for me was to take a single sheed and
through
>>> the read_excel function use the guess_max parameter and set it to a
>>> sufficiently large number (a number >= to the total amount of
the full
>>> merged dataset). I want to automate the merging of the N number of
Excel
>>> sheets so that I don't have to be manually doing it. Unless
there is a
>>> way
>>> to accomplish something similar to what rio's package function
>>> import_list
>>> does, that is able to keep the field's numeric data type
nature.
>>>
>>> Cheers,
>>> Paul
>>>
>>> El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (<
>>> murdoch.duncan at gmail.com>)
>>> escribi?:
>>>
>>> > On 30/01/2024 11:10 a.m., Paul Bernal wrote:
>>> > > Dear friends,
>>> > >
>>> > > Hope you are doing well. I am currently using R version
4.3.2, and I
>>> > have a
>>> > > .xlsx file that has 46 sheets on it. I basically combined
all 46
>>> sheets
>>> > > and read them as a single dataframe in R using package
rio.
>>> > >
>>> > > I read a solution using package readlx, as suggested in a
>>> StackOverflow
>>> > > discussion as follows:
>>> > > df <- read_excel(path = filepath, sheet = sheet_name,
guess_max >>> > 100000).
>>> > > Now, when you have so many sheets (46 in my case) in an
Excel file,
>>> the
>>> > rio
>>> > > methodology is more practical.
>>> > >
>>> > > This is what I did:
>>> > > path >>> > >
>>> >
>>>
"C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
>>> > > (4).xlsx"
>>> > > figidat = import_list(path, rbind = TRUE) #here figidat
refers to my
>>> > dataset
>>> > >
>>> > > Now, it successfully imports and merges all records,
however, some
>>> fields
>>> > > (despite being numeric), R interprets as a boolean field.
>>> > >
>>> > > Here is the structure of the field that is causing me
problems (I
>>> > apologize
>>> > > for the length):
>>> > > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA,
>>> > ...
>>> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
NA, NA,
>>> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE,
>>> > > FALSE, FALSE, FALSE)), class = c("tbl_df",
"tbl", "data.frame"
>>> > > ), row.names = c(NA, -7033L))
>>> > >
>>> > > As you can see, when I do the dput, it gives me a bunch
of TRUE and
>>> FALSE
>>> > > values, when in reality I have records with value $0,
records with
>>> > amounts
>>> > >> $0 and also a bunch of blank records.
>>> > >
>>> > > Any help will be greatly appreciated.
>>> >
>>> > I don't know how read_excel() determines column types, but
some
>>> > functions look only at the first n rows to guess the type. It
appears
>>> > you have a lot of NA values at the start. That is a logical
value, so
>>> > that might be what is going wrong.
>>> >
>>> > In read.table() and related functions, you can specify the
types of
>>> > column explicitly. It sounds as though that's what you
should do if
>>> > read_excel() offers that as a possibility.
>>> >
>>> > Duncan Murdoch
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
[[alternative HTML version deleted]]