thr3ads.net - R help - [R] how to manage missing values correctly when importing a data frame [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Stefano Sofia

2016-Sep-07 14:26 UTC

[R] how to manage missing values correctly when importing a data frame

Thank you for your explanations, and your patience.
With all the humbleness that I can have, I am not a beginner in R. Said that I
am really sorry if my question shows a big lack in understanding some basic
object types and their distinctions.

I still find difficult to understand your comments (which are obviously
correct), and I beg your pardon if I keep asking you the same question.
In my query to the data frame, Station_RT is exactly 112, and there is only one
row where Station_RT is equal to 112. I would expect a unique value for
Test_20151231.
Why R should expect to handle the possibility of having Station_RT = NA?

# > Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112]
# What do you expect to have happen when Station_RT is NA? R has no idea
# whether it is 112 or not, so R returns an "I don't know" value
that
# lets the user decide how to handle the missing data, rather than making
# assumptions.

Again, sorry for my question
Stefano

________________________________________
Da: Sarah Goslee [sarah.goslee at gmail.com]
Inviato: mercoled? 7 settembre 2016 15.11
A: Stefano Sofia
Cc: r-help at r-project.org
Oggetto: Re: [R] how to manage missing values correctly when importing a data
frame

R is refusing to make unwarranted assumptions about your data.

See inline.

# it's nicer to use dput() instead of pasting raw data

Storia_RM_RT <- structure(list(Station_RM = c(1400L, 1460L, 1500L,
1520L), Sensor_RM = 2701:2704,
    Place_RM = c("Novafeltria", "Carpegna",
"Pesaro", "Fano"),
    Y_init_RM = c(1959L, 1963L, 1957L, 1957L), M_init_RM = c(1L,
    1L, 1L, 1L), D_init_RM = c(1L, 1L, 1L, 1L), Long_cent_RM = c(12.289552,
    12.332614, 12.909822, 13.017591), Lat_cent_RM = c(43.890057,
    43.778107, 43.910889, 43.840054), Height_RM = c(293L, 748L,
    11L, 4L), Continues = c("NO", "SI", "SI",
"SI"), Station_RT = c(NA,
    702L, 112L, 152L), Sensor_RT = c(NA, 2954L, 1229L, 2671L),
    Place_RT = c(NA, "Carpegna", "Pesaro",
"Fano"), Name1_RT = c(NA,
    "Carpegna", "Villa_Fastiggi", "Foce_Metauro"),
Name2_RT = c(NA,
    "Carpegna", "Villa_Fastiggi", "Metaurilia"),
Long_cent_RT = c(NA,
    12.340618, 12.86939, 13.053796), Lat_cent_RT = c(NA, 43.780575,
    43.89061, 43.826328), Height_RT = c(NA, 715, 22, 7.12), Actual_net
= c("CAE",
    "RT", "RT", "RT"), Notes = c(NA, NA, NA, NA),
Test_20141231 = c("NO",
    "NO", "YES", "YES"), Test_20151231 =
c("NO", "NO", "YES",
    "YES")), .Names = c("Station_RM", "Sensor_RM",
"Place_RM",
"Y_init_RM", "M_init_RM", "D_init_RM",
"Long_cent_RM", "Lat_cent_RM",
"Height_RM", "Continues", "Station_RT",
"Sensor_RT", "Place_RT",
"Name1_RT", "Name2_RT", "Long_cent_RT",
"Lat_cent_RT", "Height_RT",
"Actual_net", "Notes", "Test_20141231",
"Test_20151231"), class "data.frame", row.names = c(NA,
-4L))

> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RM == 1500][1] "YES"

# Storia_RM_RT$Omogenea_20151231[Storia_RM_RT$Station_RT == 112]
# there's no such column; you probably mean Test_20151231
> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112][1] NA    "YES"

# What do you expect to have happen when Station_RT is NA? R has no idea
# whether it is 112 or not, so R returns an "I don't know" value
that
# lets the user decide how to handle the missing data, rather than making
# assumptions.

# But you probably want one of these constructions:

Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112 &
!is.na(Storia_RM_RT$Station_RT)]

# subset automatically handles NAs, making the assumption I'm assuming you
want.
subset(Storia_RM_RT, Station_RT == 112 )$Test_20151231

# This is the first form, somewhat more elegantly
with(Storia_RM_RT, Test_20151231[Station_RT == 112 & !is.na(Station_RT)])

On Wed, Sep 7, 2016 at 7:09 AM, Stefano Sofia
<stefano.sofia at regione.marche.it> wrote:> Dear R users,
> I have a data frame with 22 columns, called Storia_RM_RT. Here the first 4
rows:
>
> Station_RM Sensor_RM Place_RM Y_init_RM M_init_RM D_init_RM Long_cent_RM
Lat_cent_RM Height_RM Continues Station_RT Sensor_RT Place_RT Name1_RT Name2_RT
Long_cent_RT Lat_cent_RT Height_RT Actual_net Notes Test_20141231 Test_20151231
> 1400 2701 Novafeltria 1959 1 1 12.289552 43.890057 293 NO NA NA NA NA NA NA
NA NA CAE NA NO NO
> 1460 2702 Carpegna 1963 1 1 12.332614 43.778107 748 SI 702 2954 Carpegna
Carpegna Carpegna 12.340618 43.780575 715 RT NA NO NO
> 1500 2703 Pesaro 1957 1 1 12.909822 43.910889 11 SI 112 1229 Pesaro
Villa_Fastiggi Villa_Fastiggi 12.86939 43.890610 22 RT NA YES YES
> 1520 2704 Fano 1957 1 1 13.017591 43.840054 4 SI 152 2671 Fano Foce_Metauro
Metaurilia 13.053796 43.826328 7.12 RT NA YES YES
>
> I load it with
> Storia_RM_RT <- read.table(file="Storia_RM_RT.txt", header =
TRUE, sep=" ", dec = ".", stringsAsFactors = FALSE)
>
> print(Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RM == 1500]) gives
> [1] "YES"
>
> while
> print(Storia_RM_RT$Omogenea_20151231[Storia_RM_RT$Station_RT == 112]) gives
> [1] NA   "YES"
>
>
> print(lapply(Storia_RM_RT, class)) gives
>
> $Station_RM
> [1] "integer"
>
> $Sensor_RM
> [1] "integer"
>
> $Place_RM
> [1] "character"
>
> $Y_init_RM
> [1] "integer"
>
> $M_init_RM
> [1] "integer"
>
> $D_init_RM
> [1] "integer"
>
> $Long_cent_RM
> [1] "numeric"
>
> $Lat_cent_RM
> [1] "numeric"
>
> $Height_RM
> [1] "integer"
>
> $Continues
> [1] "character"
>
> $Station_RT
> [1] "integer"
>
> $Sensor_RT
> [1] "integer"
>
> $Place_RT
> [1] "character"
>
> $Name1_RT
> [1] "character"
>
> $Name2_RT
> [1] "character"
>
> $Long_cent_RT
> [1] "numeric"
>
> $Lat_cent_RT
> [1] "numeric"
> $Quota_RT
> [1] "numeric"
>
> $Actual_net
> [1] "character"
>
> $Notes
> [1] "logical"
>
> $Test_20141231
> [1] "character"
>
> $Test_20151231
> [1] "character"
>
> I am struggling to understand why the query through the field Station_RT
does not work.
> Could please somebody help me to manage correctly the missing values? Is
the mistake somewhere else?
>
> Thank you
> Stefano Sofia
>
>
--
Sarah Goslee
http://www.functionaldiversity.org

________________________________

AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere
informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla
ricezione. I messaggi di posta elettronica per i client di Regione Marche
possono contenere informazioni confidenziali e con privilegi legali. Se non si ?
il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo
messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente
ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art.
6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la
risposta al presente messaggio di posta elettronica pu? essere visionata da
persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons
entitled to receive the confidential information it may contain. E-mail messages
to clients of Regione Marche may contain information that is confidential and
legally privileged. Please do not read, copy, forward, or store this message
unless you are an intended recipient of it. If you have received this message in
error, please forward it to the sender and delete it completely from your
computer system.

Sarah Goslee

2016-Sep-07 14:39 UTC

head link

[R] how to manage missing values correctly when importing a data frame

On Wed, Sep 7, 2016 at 10:26 AM, Stefano Sofia
<stefano.sofia at regione.marche.it> wrote:> Thank you for your explanations, and your patience.
> With all the humbleness that I can have, I am not a beginner in R. Said
that I am really sorry if my question shows a big lack in understanding some
basic object types and their distinctions.
>
> I still find difficult to understand your comments (which are obviously
correct), and I beg your pardon if I keep asking you the same question.
> In my query to the data frame, Station_RT is exactly 112, and there is only
one row where Station_RT is equal to 112. I would expect a unique value for
Test_20151231.
> Why R should expect to handle the possibility of having Station_RT = NA?
If a value for Station_RT is missing, how does R know whether it is
112 or not? It could be. Instead of assuming that it is not, R tells
the user that there is a potential problem, and it's on the user to
decide explicitly whether NA values should be included or not.

If you read further down, I showed you two ways to handle that, one
that makes the same assumption you do, that NA values cannot ever be
112, and one that requires you to explicitly state that you want NA
values to be ignored.


>
> # > Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112]
> # What do you expect to have happen when Station_RT is NA? R has no idea
> # whether it is 112 or not, so R returns an "I don't know"
value that
> # lets the user decide how to handle the missing data, rather than making
> # assumptions.
>
> Again, sorry for my question
> Stefano
>
> ________________________________________
> Da: Sarah Goslee [sarah.goslee at gmail.com]
> Inviato: mercoled? 7 settembre 2016 15.11
> A: Stefano Sofia
> Cc: r-help at r-project.org
> Oggetto: Re: [R] how to manage missing values correctly when importing a
data frame
>
> R is refusing to make unwarranted assumptions about your data.
>
> See inline.
>
>
> # it's nicer to use dput() instead of pasting raw data
>
> Storia_RM_RT <- structure(list(Station_RM = c(1400L, 1460L, 1500L,
> 1520L), Sensor_RM = 2701:2704,
>     Place_RM = c("Novafeltria", "Carpegna",
"Pesaro", "Fano"),
>     Y_init_RM = c(1959L, 1963L, 1957L, 1957L), M_init_RM = c(1L,
>     1L, 1L, 1L), D_init_RM = c(1L, 1L, 1L, 1L), Long_cent_RM = c(12.289552,
>     12.332614, 12.909822, 13.017591), Lat_cent_RM = c(43.890057,
>     43.778107, 43.910889, 43.840054), Height_RM = c(293L, 748L,
>     11L, 4L), Continues = c("NO", "SI", "SI",
"SI"), Station_RT = c(NA,
>     702L, 112L, 152L), Sensor_RT = c(NA, 2954L, 1229L, 2671L),
>     Place_RT = c(NA, "Carpegna", "Pesaro",
"Fano"), Name1_RT = c(NA,
>     "Carpegna", "Villa_Fastiggi",
"Foce_Metauro"), Name2_RT = c(NA,
>     "Carpegna", "Villa_Fastiggi",
"Metaurilia"), Long_cent_RT = c(NA,
>     12.340618, 12.86939, 13.053796), Lat_cent_RT = c(NA, 43.780575,
>     43.89061, 43.826328), Height_RT = c(NA, 715, 22, 7.12), Actual_net
> = c("CAE",
>     "RT", "RT", "RT"), Notes = c(NA, NA, NA,
NA), Test_20141231 = c("NO",
>     "NO", "YES", "YES"), Test_20151231 =
c("NO", "NO", "YES",
>     "YES")), .Names = c("Station_RM",
"Sensor_RM", "Place_RM",
> "Y_init_RM", "M_init_RM", "D_init_RM",
"Long_cent_RM", "Lat_cent_RM",
> "Height_RM", "Continues", "Station_RT",
"Sensor_RT", "Place_RT",
> "Name1_RT", "Name2_RT", "Long_cent_RT",
"Lat_cent_RT", "Height_RT",
> "Actual_net", "Notes", "Test_20141231",
"Test_20151231"), class > "data.frame", row.names = c(NA,
> -4L))
>
>
>> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RM == 1500]
> [1] "YES"
>
> # Storia_RM_RT$Omogenea_20151231[Storia_RM_RT$Station_RT == 112]
> # there's no such column; you probably mean Test_20151231
>
>> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112]
> [1] NA    "YES"
>
> # What do you expect to have happen when Station_RT is NA? R has no idea
> # whether it is 112 or not, so R returns an "I don't know"
value that
> # lets the user decide how to handle the missing data, rather than making
> # assumptions.
>
> # But you probably want one of these constructions:
>
> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112 &
> !is.na(Storia_RM_RT$Station_RT)]
>
> # subset automatically handles NAs, making the assumption I'm assuming
you want.
> subset(Storia_RM_RT, Station_RT == 112 )$Test_20151231
>
> # This is the first form, somewhat more elegantly
> with(Storia_RM_RT, Test_20151231[Station_RT == 112 &
!is.na(Station_RT)])
>
> On Wed, Sep 7, 2016 at 7:09 AM, Stefano Sofia
> <stefano.sofia at regione.marche.it> wrote:
>> Dear R users,
>> I have a data frame with 22 columns, called Storia_RM_RT. Here the
first 4 rows:
>>
>> Station_RM Sensor_RM Place_RM Y_init_RM M_init_RM D_init_RM
Long_cent_RM Lat_cent_RM Height_RM Continues Station_RT Sensor_RT Place_RT
Name1_RT Name2_RT Long_cent_RT Lat_cent_RT Height_RT Actual_net Notes
Test_20141231 Test_20151231
>> 1400 2701 Novafeltria 1959 1 1 12.289552 43.890057 293 NO NA NA NA NA
NA NA NA NA CAE NA NO NO
>> 1460 2702 Carpegna 1963 1 1 12.332614 43.778107 748 SI 702 2954
Carpegna Carpegna Carpegna 12.340618 43.780575 715 RT NA NO NO
>> 1500 2703 Pesaro 1957 1 1 12.909822 43.910889 11 SI 112 1229 Pesaro
Villa_Fastiggi Villa_Fastiggi 12.86939 43.890610 22 RT NA YES YES
>> 1520 2704 Fano 1957 1 1 13.017591 43.840054 4 SI 152 2671 Fano
Foce_Metauro Metaurilia 13.053796 43.826328 7.12 RT NA YES YES
>>
>> I load it with
>> Storia_RM_RT <- read.table(file="Storia_RM_RT.txt", header
= TRUE, sep=" ", dec = ".", stringsAsFactors = FALSE)
>>
>> print(Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RM == 1500])
gives
>> [1] "YES"
>>
>> while
>> print(Storia_RM_RT$Omogenea_20151231[Storia_RM_RT$Station_RT == 112])
gives
>> [1] NA   "YES"
>>
>>
>> print(lapply(Storia_RM_RT, class)) gives
>>
>> $Station_RM
>> [1] "integer"
>>
>> $Sensor_RM
>> [1] "integer"
>>
>> $Place_RM
>> [1] "character"
>>
>> $Y_init_RM
>> [1] "integer"
>>
>> $M_init_RM
>> [1] "integer"
>>
>> $D_init_RM
>> [1] "integer"
>>
>> $Long_cent_RM
>> [1] "numeric"
>>
>> $Lat_cent_RM
>> [1] "numeric"
>>
>> $Height_RM
>> [1] "integer"
>>
>> $Continues
>> [1] "character"
>>
>> $Station_RT
>> [1] "integer"
>>
>> $Sensor_RT
>> [1] "integer"
>>
>> $Place_RT
>> [1] "character"
>>
>> $Name1_RT
>> [1] "character"
>>
>> $Name2_RT
>> [1] "character"
>>
>> $Long_cent_RT
>> [1] "numeric"
>>
>> $Lat_cent_RT
>> [1] "numeric"
>> $Quota_RT
>> [1] "numeric"
>>
>> $Actual_net
>> [1] "character"
>>
>> $Notes
>> [1] "logical"
>>
>> $Test_20141231
>> [1] "character"
>>
>> $Test_20151231
>> [1] "character"
>>
>> I am struggling to understand why the query through the field
Station_RT does not work.
>> Could please somebody help me to manage correctly the missing values?
Is the mistake somewhere else?
>>
>> Thank you
>> Stefano Sofia
>>
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

Ivan Calandra

2016-Sep-07 14:56 UTC

head link

[R] how to manage missing values correctly when importing a data frame

Hi Stefano,

I agree that this behavior of R can be somewhat counter-intuitive, but 
this can be seen as a safety procedure, so that no assumptions are made 
and problems can be easily identified.

I would think that in this case, the input data is in the wrong format. 
Half the columns are for RM and the other for RT, but the headers are 
exactly the same. The problem then happens because you actually have 
only 3 lines of data for station RT but 4 for station RM. So it is 
filled with NA.

IMHO, it would be better to add a column "station" with values being 
either RM or RT. In that case, you would not have whole NA lines. And 
you would have less columns to work with. See what I mean?

By the way, I like the matrix method for subsetting a data.frame, I find 
it easier and more flexible (maybe someone will tell if there are any 
drawbacks):
Storia_RM_RT[Storia_RM_RT$Station_RT==112, "Test_20151231"]

HTH,
Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calandra at univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 07/09/2016 ? 16:39, Sarah Goslee a ?crit :> On Wed, Sep 7, 2016 at 10:26 AM, Stefano Sofia
> <stefano.sofia at regione.marche.it> wrote:
>> Thank you for your explanations, and your patience.
>> With all the humbleness that I can have, I am not a beginner in R. Said
that I am really sorry if my question shows a big lack in understanding some
basic object types and their distinctions.
>>
>> I still find difficult to understand your comments (which are obviously
correct), and I beg your pardon if I keep asking you the same question.
>> In my query to the data frame, Station_RT is exactly 112, and there is
only one row where Station_RT is equal to 112. I would expect a unique value for
Test_20151231.
>> Why R should expect to handle the possibility of having Station_RT =
NA?
> If a value for Station_RT is missing, how does R know whether it is
> 112 or not? It could be. Instead of assuming that it is not, R tells
> the user that there is a potential problem, and it's on the user to
> decide explicitly whether NA values should be included or not.
>
> If you read further down, I showed you two ways to handle that, one
> that makes the same assumption you do, that NA values cannot ever be
> 112, and one that requires you to explicitly state that you want NA
> values to be ignored.
>
>
>
>> # > Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112]
>> # What do you expect to have happen when Station_RT is NA? R has no
idea
>> # whether it is 112 or not, so R returns an "I don't
know" value that
>> # lets the user decide how to handle the missing data, rather than
making
>> # assumptions.
>>
>> Again, sorry for my question
>> Stefano
>>
>> ________________________________________
>> Da: Sarah Goslee [sarah.goslee at gmail.com]
>> Inviato: mercoled? 7 settembre 2016 15.11
>> A: Stefano Sofia
>> Cc: r-help at r-project.org
>> Oggetto: Re: [R] how to manage missing values correctly when importing
a data frame
>>
>> R is refusing to make unwarranted assumptions about your data.
>>
>> See inline.
>>
>>
>> # it's nicer to use dput() instead of pasting raw data
>>
>> Storia_RM_RT <- structure(list(Station_RM = c(1400L, 1460L, 1500L,
>> 1520L), Sensor_RM = 2701:2704,
>>      Place_RM = c("Novafeltria", "Carpegna",
"Pesaro", "Fano"),
>>      Y_init_RM = c(1959L, 1963L, 1957L, 1957L), M_init_RM = c(1L,
>>      1L, 1L, 1L), D_init_RM = c(1L, 1L, 1L, 1L), Long_cent_RM =
c(12.289552,
>>      12.332614, 12.909822, 13.017591), Lat_cent_RM = c(43.890057,
>>      43.778107, 43.910889, 43.840054), Height_RM = c(293L, 748L,
>>      11L, 4L), Continues = c("NO", "SI",
"SI", "SI"), Station_RT = c(NA,
>>      702L, 112L, 152L), Sensor_RT = c(NA, 2954L, 1229L, 2671L),
>>      Place_RT = c(NA, "Carpegna", "Pesaro",
"Fano"), Name1_RT = c(NA,
>>      "Carpegna", "Villa_Fastiggi",
"Foce_Metauro"), Name2_RT = c(NA,
>>      "Carpegna", "Villa_Fastiggi",
"Metaurilia"), Long_cent_RT = c(NA,
>>      12.340618, 12.86939, 13.053796), Lat_cent_RT = c(NA, 43.780575,
>>      43.89061, 43.826328), Height_RT = c(NA, 715, 22, 7.12), Actual_net
>> = c("CAE",
>>      "RT", "RT", "RT"), Notes = c(NA, NA,
NA, NA), Test_20141231 = c("NO",
>>      "NO", "YES", "YES"), Test_20151231 =
c("NO", "NO", "YES",
>>      "YES")), .Names = c("Station_RM",
"Sensor_RM", "Place_RM",
>> "Y_init_RM", "M_init_RM", "D_init_RM",
"Long_cent_RM", "Lat_cent_RM",
>> "Height_RM", "Continues", "Station_RT",
"Sensor_RT", "Place_RT",
>> "Name1_RT", "Name2_RT", "Long_cent_RT",
"Lat_cent_RT", "Height_RT",
>> "Actual_net", "Notes", "Test_20141231",
"Test_20151231"), class >> "data.frame", row.names =
c(NA,
>> -4L))
>>
>>
>>> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RM == 1500]
>> [1] "YES"
>>
>> # Storia_RM_RT$Omogenea_20151231[Storia_RM_RT$Station_RT == 112]
>> # there's no such column; you probably mean Test_20151231
>>
>>> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112]
>> [1] NA    "YES"
>>
>> # What do you expect to have happen when Station_RT is NA? R has no
idea
>> # whether it is 112 or not, so R returns an "I don't
know" value that
>> # lets the user decide how to handle the missing data, rather than
making
>> # assumptions.
>>
>> # But you probably want one of these constructions:
>>
>> Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RT == 112 &
>> !is.na(Storia_RM_RT$Station_RT)]
>>
>> # subset automatically handles NAs, making the assumption I'm
assuming you want.
>> subset(Storia_RM_RT, Station_RT == 112 )$Test_20151231
>>
>> # This is the first form, somewhat more elegantly
>> with(Storia_RM_RT, Test_20151231[Station_RT == 112 &
!is.na(Station_RT)])
>>
>> On Wed, Sep 7, 2016 at 7:09 AM, Stefano Sofia
>> <stefano.sofia at regione.marche.it> wrote:
>>> Dear R users,
>>> I have a data frame with 22 columns, called Storia_RM_RT. Here the
first 4 rows:
>>>
>>> Station_RM Sensor_RM Place_RM Y_init_RM M_init_RM D_init_RM
Long_cent_RM Lat_cent_RM Height_RM Continues Station_RT Sensor_RT Place_RT
Name1_RT Name2_RT Long_cent_RT Lat_cent_RT Height_RT Actual_net Notes
Test_20141231 Test_20151231
>>> 1400 2701 Novafeltria 1959 1 1 12.289552 43.890057 293 NO NA NA NA
NA NA NA NA NA CAE NA NO NO
>>> 1460 2702 Carpegna 1963 1 1 12.332614 43.778107 748 SI 702 2954
Carpegna Carpegna Carpegna 12.340618 43.780575 715 RT NA NO NO
>>> 1500 2703 Pesaro 1957 1 1 12.909822 43.910889 11 SI 112 1229 Pesaro
Villa_Fastiggi Villa_Fastiggi 12.86939 43.890610 22 RT NA YES YES
>>> 1520 2704 Fano 1957 1 1 13.017591 43.840054 4 SI 152 2671 Fano
Foce_Metauro Metaurilia 13.053796 43.826328 7.12 RT NA YES YES
>>>
>>> I load it with
>>> Storia_RM_RT <- read.table(file="Storia_RM_RT.txt",
header = TRUE, sep=" ", dec = ".", stringsAsFactors = FALSE)
>>>
>>> print(Storia_RM_RT$Test_20151231[Storia_RM_RT$Station_RM == 1500])
gives
>>> [1] "YES"
>>>
>>> while
>>> print(Storia_RM_RT$Omogenea_20151231[Storia_RM_RT$Station_RT ==
112]) gives
>>> [1] NA   "YES"
>>>
>>>
>>> print(lapply(Storia_RM_RT, class)) gives
>>>
>>> $Station_RM
>>> [1] "integer"
>>>
>>> $Sensor_RM
>>> [1] "integer"
>>>
>>> $Place_RM
>>> [1] "character"
>>>
>>> $Y_init_RM
>>> [1] "integer"
>>>
>>> $M_init_RM
>>> [1] "integer"
>>>
>>> $D_init_RM
>>> [1] "integer"
>>>
>>> $Long_cent_RM
>>> [1] "numeric"
>>>
>>> $Lat_cent_RM
>>> [1] "numeric"
>>>
>>> $Height_RM
>>> [1] "integer"
>>>
>>> $Continues
>>> [1] "character"
>>>
>>> $Station_RT
>>> [1] "integer"
>>>
>>> $Sensor_RT
>>> [1] "integer"
>>>
>>> $Place_RT
>>> [1] "character"
>>>
>>> $Name1_RT
>>> [1] "character"
>>>
>>> $Name2_RT
>>> [1] "character"
>>>
>>> $Long_cent_RT
>>> [1] "numeric"
>>>
>>> $Lat_cent_RT
>>> [1] "numeric"
>>> $Quota_RT
>>> [1] "numeric"
>>>
>>> $Actual_net
>>> [1] "character"
>>>
>>> $Notes
>>> [1] "logical"
>>>
>>> $Test_20141231
>>> [1] "character"
>>>
>>> $Test_20151231
>>> [1] "character"
>>>
>>> I am struggling to understand why the query through the field
Station_RT does not work.
>>> Could please somebody help me to manage correctly the missing
values? Is the mistake somewhere else?
>>>
>>> Thank you
>>> Stefano Sofia
>>>
>>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2016 - how to manage missing values correctly when importing a data frame

[R] how to manage missing values correctly when importing a data frame

[R] how to manage missing values correctly when importing a data frame

[R] how to manage missing values correctly when importing a data frame