thr3ads.net - R help - [R] combining data.frames with is.na & match (), two questions [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Drake Gossi

2019-Apr-17 23:24 UTC

[R] combining data.frames with is.na & match (), two questions

Hello everyone,

I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
and I'm just having trouble understanding this maneuver.

In sum, I'm trying to combine data in two different data.frames.

This data.frame is called fruitNutr

Fruit  Calories
1 banana 100
2 pear 100
3 mango 200

And this data.frame is called fruitData

Fruit Color Shape Juice
1 apple red round 1
2 banana yellow oblong 0
3 pear green pear 0.5
4 orange orange round 1
5 kiwi green round 0

So, as you can see, these two data.frames overlap insofar as they both have
banana and pear. So, what happens next is the book suggests this:

fruitData$calories <- NA


As a result, I've created a new column for the fruitData data.frame:

Fruit Color Shape Juice Calories
1 apple red round 1            N/A
2 banana yellow oblong 0            N/A
3 pear green pear 0.5            N/A
4 orange orange round 1            N/A
5 kiwi green round 0            N/A

Then:
> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
> index
  [1]    NA       1       2      NA      NA> is.na(index)
  [1]    TRUE   FALSE    FALSE   TRUE    TRUE> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
(index)]]> fruitData
Fruit Color Shape Juice Calories
1 apple red round 1            N/A
2 banana yellow oblong 0 100
3 pear green pear 0.5 100
4 orange orange round 1            N/A
5 kiwi green round 0            N/A

I get what the first part means, that first part being this:
fruitData$Calories [!is.na(index)]
go into the fruitData data.frame, specifically into the calories column,
and only for what's true according to is.na(index). But I just literally
can't understand this last part.  fruitNutr$Calories[index[!is.na(index)]]

Two questions.


   1. I just literally don't understand how this code works. It does work,
   of course, but I don't know what it's doing, specifically this
[index[!
   is.na(index)]] part. Could someone explain it to me like I'm five?
I'm
   new at this...
   2. And then: is there any other way to combine these two data.frames so
   that we get this same result? maybe an easier to understand method?

That same result, again, is

Fruit Color Shape Juice Calories
1 apple red round 1            N/A
2 banana yellow oblong 0 100
3 pear green pear 0.5 100
4 orange orange round 1            N/A
5 kiwi green round 0            N/A


Drake

	[[alternative HTML version deleted]]

Michael Dewey

2019-Apr-18 08:04 UTC

head link

[R] combining data.frames with is.na & match (), two questions

Dear Drake

See in-line comments

On 18/04/2019 00:24, Drake Gossi wrote:> Hello everyone,
> 
> I'm working through this book, *Humanities Data in R* (Arnold &
Tilton),
> and I'm just having trouble understanding this maneuver.
> 
> In sum, I'm trying to combine data in two different data.frames.
> 
> This data.frame is called fruitNutr
> 
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
> 
> And this data.frame is called fruitData
> 
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
> 
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
> 
> fruitData$calories <- NA
> 
> 
> As a result, I've created a new column for the fruitData data.frame:
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> Then:
> 
>> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
>> index
>    [1]    NA       1       2      NA      NA
>> is.na(index)
>    [1]    TRUE   FALSE    FALSE   TRUE    TRUE
>> fruitData$Calories [!is.na(index)] <-
fruitNutr$Calories[index[!is.na
> (index)]]
>> fruitData
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
> and only for what's true according to is.na(index). But I just
literally
> can't understand this last part. 
fruitNutr$Calories[index[!is.na(index)]]
> 
> Two questions.
> 
> 
>     1. I just literally don't understand how this code works. It does
work,
>     of course, but I don't know what it's doing, specifically this
[index[!
>     is.na(index)]] part. Could someone explain it to me like I'm five?
I'm
>     new at this...
Decompose it from the inside out. So

!is.na(index)

gives you a vector the same length as index which is true if index has a 
value and false if it is NA

index[ something ]

gives you a vector of all the values of index corresponding to something 
being true (in this case). Note this vector may be shorter than 
something if that contains FALSE.

That should help you get started. My personal opinion is that it is much 
clearer with these things to do it in separate stages.

keep <= !is.na(index)
index[keep]

and check the value of keep if it seems to have gone
wrong>     2. And then: is there any other way to combine these two data.frames so
>     that we get this same result? maybe an easier to understand method?
> 
> That same result, again, is
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> 
> Drake
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
> 
> 
-- 
Michael
http://www.dewey.myzen.co.uk/home.html

peter dalgaard

2019-Apr-18 08:29 UTC

head link

[R] combining data.frames with is.na & match (), two questions

The whole thing is a merge operation, i.e.
> FruitNutr <- read.table(text="+ Fruit  Calories
+ 1 banana 100
+ 2 pear 100
+ 3 mango 200
+ ")> FruitData <- read.table(text="+ Fruit Color Shape Juice
+ 1 apple red round 1
+ 2 banana yellow oblong 0
+ 3 pear green pear 0.5
+ 4 orange orange round 1
+ 5 kiwi green round 0
+ ")> merge(FruitData, FruitNutr)   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0      100
2   pear  green   pear   0.5      100> merge(FruitData, FruitNutr, all.x=TRUE)   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100

Mind you, merge() comes with its own set of confusing options in the more
complex cases, which may be why the authors have chosen a more elementary
approach.

-pd
> On 18 Apr 2019, at 01:24 , Drake Gossi <drake.gossi at gmail.com>
wrote:
> 
> Hello everyone,
> 
> I'm working through this book, *Humanities Data in R* (Arnold &
Tilton),
> and I'm just having trouble understanding this maneuver.
> 
> In sum, I'm trying to combine data in two different data.frames.
> 
> This data.frame is called fruitNutr
> 
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
> 
> And this data.frame is called fruitData
> 
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
> 
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
> 
> fruitData$calories <- NA
> 
> 
> As a result, I've created a new column for the fruitData data.frame:
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> Then:
> 
>> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
>> index
>  [1]    NA       1       2      NA      NA
>> is.na(index)
>  [1]    TRUE   FALSE    FALSE   TRUE    TRUE
>> fruitData$Calories [!is.na(index)] <-
fruitNutr$Calories[index[!is.na
> (index)]]
>> fruitData
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
> and only for what's true according to is.na(index). But I just
literally
> can't understand this last part. 
fruitNutr$Calories[index[!is.na(index)]]
> 
> Two questions.
> 
> 
>   1. I just literally don't understand how this code works. It does
work,
>   of course, but I don't know what it's doing, specifically this
[index[!
>   is.na(index)]] part. Could someone explain it to me like I'm five?
I'm
>   new at this...
>   2. And then: is there any other way to combine these two data.frames so
>   that we get this same result? maybe an easier to understand method?
> 
> That same result, again, is
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> 
> Drake
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

PIKAL Petr

2019-Apr-18 08:31 UTC

head link

[R] combining data.frames with is.na & match (), two questions

Hi

I wonder why such combination is so complicated in your text book.

Having data frames fr1 and fr2
> dput(fr1)structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
"mango", "pear"), class = "factor"), Calories =
c(100L, 100L,
200L)), class = "data.frame", row.names = c("1",
"2", "3"))> dput(fr2)structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label =
c("apple",
"banana", "kiwi", "orange", "pear"),
class = "factor"), Color = structure(c(3L,
4L, 1L, 2L, 1L), .Label = c("green", "orange",
"red", "yellow"
), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label
= c("oblong",
"pear", "round"), class = "factor"), Juice = c(1,
0, 0.5, 1,
0)), class = "data.frame", row.names = c("1", "2",
"3", "4",
"5"))>
> fr1   Fruit Calories
1 banana      100
2   pear      100
3  mango      200>
you can use merge to combine those 2 data frames to get either all values from
both
> merge(fr2, fr1, all=T)   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100
6  mango   <NA>   <NA>    NA      200

just values from data frame with calories
> merge(fr2, fr1, all.y=T)   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0      100
2   pear  green   pear   0.5      100
3  mango   <NA>   <NA>    NA      200

or just values from data frame with colours
> merge(fr2, fr1, all.x=T)   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Drake
Gossi
> Sent: Thursday, April 18, 2019 1:24 AM
> To: r-help at r-project.org
> Subject: [R] combining data.frames with is.na & match (), two questions
>
> Hello everyone,
>
> I'm working through this book, *Humanities Data in R* (Arnold &
Tilton), and
> I'm just having trouble understanding this maneuver.
>
> In sum, I'm trying to combine data in two different data.frames.
>
> This data.frame is called fruitNutr
>
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
>
> And this data.frame is called fruitData
>
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
>
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
>
> fruitData$calories <- NA
>
>
> As a result, I've created a new column for the fruitData data.frame:
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
> Then:
>
> > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
>   [1]    NA       1       2      NA      NA
> > is.na(index)
>   [1]    TRUE   FALSE    FALSE   TRUE    TRUE
> > fruitData$Calories [!is.na(index)] <-
fruitNutr$Calories[index[!is.na
> (index)]]
> > fruitData
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
and only
> for what's true according to is.na(index). But I just literally
can't understand
> this last part.  fruitNutr$Calories[index[!is.na(index)]]
>
> Two questions.
>
>
>    1. I just literally don't understand how this code works. It does
work,
>    of course, but I don't know what it's doing, specifically this
[index[!
>    is.na(index)]] part. Could someone explain it to me like I'm five?
I'm
>    new at this...
>    2. And then: is there any other way to combine these two data.frames so
>    that we get this same result? maybe an easier to understand method?
>
> That same result, again, is
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
>
> Drake
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch
partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner?s personal data are available on
website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti:
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to
it may be confidential and are subject to the legally binding disclaimer:
https://www.precheza.cz/en/01-disclaimer/

Eric Berger

2019-Apr-18 08:53 UTC

head link

[R] combining data.frames with is.na & match (), two questions

Hi Drake,
Petr's suggestion to use the merge() function is good.
Another (possibly overkill) approach is to use functions from the dplyr
package, which is a fantastic package to get familiar with.
For example, the last alternative that Petr suggests is an example of what
is called a "left join" (meaning, when joining structures x and y, 
keep
all the x rows, even if there is no corresponding row for y).
You can do this via dplyr as follows:

dplyr::left_join( fr2, fr1, by="Fruit")

HTH,
Eric


On Thu, Apr 18, 2019 at 11:40 AM PIKAL Petr <petr.pikal at precheza.cz>
wrote:
> Hi
>
> I wonder why such combination is so complicated in your text book.
>
> Having data frames fr1 and fr2
>
> > dput(fr1)
> structure(list(Fruit = structure(c(1L, 3L, 2L), .Label =
c("banana",
> "mango", "pear"), class = "factor"), Calories
= c(100L, 100L,
> 200L)), class = "data.frame", row.names = c("1",
"2", "3"))
> > dput(fr2)
> structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label =
c("apple",
> "banana", "kiwi", "orange",
"pear"), class = "factor"), Color > structure(c(3L,
> 4L, 1L, 2L, 1L), .Label = c("green", "orange",
"red", "yellow"
> ), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L),
.Label > c("oblong",
> "pear", "round"), class = "factor"), Juice =
c(1, 0, 0.5, 1,
> 0)), class = "data.frame", row.names = c("1",
"2", "3", "4",
> "5"))
> >
>
> > fr1
>    Fruit Calories
> 1 banana      100
> 2   pear      100
> 3  mango      200
> >
>
> you can use merge to combine those 2 data frames to get either all values
> from both
>
> > merge(fr2, fr1, all=T)
>    Fruit  Color  Shape Juice Calories
> 1  apple    red  round   1.0       NA
> 2 banana yellow oblong   0.0      100
> 3   kiwi  green  round   0.0       NA
> 4 orange orange  round   1.0       NA
> 5   pear  green   pear   0.5      100
> 6  mango   <NA>   <NA>    NA      200
>
> just values from data frame with calories
>
> > merge(fr2, fr1, all.y=T)
>    Fruit  Color  Shape Juice Calories
> 1 banana yellow oblong   0.0      100
> 2   pear  green   pear   0.5      100
> 3  mango   <NA>   <NA>    NA      200
>
> or just values from data frame with colours
>
> > merge(fr2, fr1, all.x=T)
>    Fruit  Color  Shape Juice Calories
> 1  apple    red  round   1.0       NA
> 2 banana yellow oblong   0.0      100
> 3   kiwi  green  round   0.0       NA
> 4 orange orange  round   1.0       NA
> 5   pear  green   pear   0.5      100
>
> Cheers
> Petr
>
>
> > -----Original Message-----
> > From: R-help <r-help-bounces at r-project.org> On Behalf Of
Drake Gossi
> > Sent: Thursday, April 18, 2019 1:24 AM
> > To: r-help at r-project.org
> > Subject: [R] combining data.frames with is.na & match (), two
questions
> >
> > Hello everyone,
> >
> > I'm working through this book, *Humanities Data in R* (Arnold
& Tilton),
> and
> > I'm just having trouble understanding this maneuver.
> >
> > In sum, I'm trying to combine data in two different data.frames.
> >
> > This data.frame is called fruitNutr
> >
> > Fruit  Calories
> > 1 banana 100
> > 2 pear 100
> > 3 mango 200
> >
> > And this data.frame is called fruitData
> >
> > Fruit Color Shape Juice
> > 1 apple red round 1
> > 2 banana yellow oblong 0
> > 3 pear green pear 0.5
> > 4 orange orange round 1
> > 5 kiwi green round 0
> >
> > So, as you can see, these two data.frames overlap insofar as they both
> have
> > banana and pear. So, what happens next is the book suggests this:
> >
> > fruitData$calories <- NA
> >
> >
> > As a result, I've created a new column for the fruitData
data.frame:
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1            N/A
> > 2 banana yellow oblong 0            N/A
> > 3 pear green pear 0.5            N/A
> > 4 orange orange round 1            N/A
> > 5 kiwi green round 0            N/A
> >
> > Then:
> >
> > > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
index
> >   [1]    NA       1       2      NA      NA
> > > is.na(index)
> >   [1]    TRUE   FALSE    FALSE   TRUE    TRUE
> > > fruitData$Calories [!is.na(index)] <-
fruitNutr$Calories[index[!is.na
> > (index)]]
> > > fruitData
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1            N/A
> > 2 banana yellow oblong 0 100
> > 3 pear green pear 0.5 100
> > 4 orange orange round 1            N/A
> > 5 kiwi green round 0            N/A
> >
> > I get what the first part means, that first part being this:
> > fruitData$Calories [!is.na(index)]
> > go into the fruitData data.frame, specifically into the calories
column,
> and only
> > for what's true according to is.na(index). But I just literally
can't
> understand
> > this last part.  fruitNutr$Calories[index[!is.na(index)]]
> >
> > Two questions.
> >
> >
> >    1. I just literally don't understand how this code works. It
does
> work,
> >    of course, but I don't know what it's doing, specifically
this
> [index[!
> >    is.na(index)]] part. Could someone explain it to me like I'm
five?
> I'm
> >    new at this...
> >    2. And then: is there any other way to combine these two
data.frames
> so
> >    that we get this same result? maybe an easier to understand method?
> >
> > That same result, again, is
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1            N/A
> > 2 banana yellow oblong 0 100
> > 3 pear green pear 0.5 100
> > 4 orange orange round 1            N/A
> > 5 kiwi green round 0            N/A
> >
> >
> > Drake
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch
> partner? PRECHEZA a.s. jsou zve?ejn?ny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner?s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou
> d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en?
> odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

PIKAL Petr

2019-Apr-23 06:59 UTC

head link

[R] combining data.frames with is.na & match (), two questions

Hi

Keep posts also to r-help, others could give you different/better solutions.

Regarding ordering, see ?order or ?sort. However this is mainly necessary only
for plotting or exporting data.

Cheers
Petr

From: Drake Gossi <drake.gossi at gmail.com>
Sent: Thursday, April 18, 2019 9:27 PM
To: PIKAL Petr <petr.pikal at precheza.cz>
Subject: Re: [R] combining data.frames with is.na & match (), two questions

Thanks Pikal,

Your answer was super helpful. I just learned a lot from you. The only thing I
have to figure out now is how to rearrange the numbers, say, so that 200 is on
top, and NA is on bottom, or so that the two 100 calories are together.
Something like that. Perhaps I'll try an ascending/descending function.

Thank you again.

D

On Thu, Apr 18, 2019 at 1:31 AM PIKAL Petr <petr.pikal at
precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
Hi

I wonder why such combination is so complicated in your text book.

Having data frames fr1 and fr2
> dput(fr1)structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
"mango", "pear"), class = "factor"), Calories =
c(100L, 100L,
200L)), class = "data.frame", row.names = c("1",
"2", "3"))> dput(fr2)structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label =
c("apple",
"banana", "kiwi", "orange", "pear"),
class = "factor"), Color = structure(c(3L,
4L, 1L, 2L, 1L), .Label = c("green", "orange",
"red", "yellow"
), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label
= c("oblong",
"pear", "round"), class = "factor"), Juice = c(1,
0, 0.5, 1,
0)), class = "data.frame", row.names = c("1", "2",
"3", "4",
"5"))>
> fr1   Fruit Calories
1 banana      100
2   pear      100
3  mango      200>
you can use merge to combine those 2 data frames to get either all values from
both
> merge(fr2, fr1, all=T)   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100
6  mango   <NA>   <NA>    NA      200

just values from data frame with calories
> merge(fr2, fr1, all.y=T)   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0      100
2   pear  green   pear   0.5      100
3  mango   <NA>   <NA>    NA      200

or just values from data frame with colours
> merge(fr2, fr1, all.x=T)   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org<mailto:r-help-bounces
at r-project.org>> On Behalf Of Drake Gossi
> Sent: Thursday, April 18, 2019 1:24 AM
> To: r-help at r-project.org<mailto:r-help at r-project.org>
> Subject: [R] combining data.frames with is.na<http://is.na> &
match (), two questions
>
> Hello everyone,
>
> I'm working through this book, *Humanities Data in R* (Arnold &
Tilton), and
> I'm just having trouble understanding this maneuver.
>
> In sum, I'm trying to combine data in two different data.frames.
>
> This data.frame is called fruitNutr
>
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
>
> And this data.frame is called fruitData
>
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
>
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
>
> fruitData$calories <- NA
>
>
> As a result, I've created a new column for the fruitData data.frame:
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
> Then:
>
> > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
>   [1]    NA       1       2      NA      NA
> > is.na<http://is.na>(index)
>   [1]    TRUE   FALSE    FALSE   TRUE    TRUE
> > fruitData$Calories [!is.na<http://is.na>(index)] <-
fruitNutr$Calories[index[!is.na<http://is.na>
> (index)]]
> > fruitData
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na<http://is.na>(index)]
> go into the fruitData data.frame, specifically into the calories column,
and only
> for what's true according to is.na<http://is.na>(index). But I
just literally can't understand
> this last part. 
fruitNutr$Calories[index[!is.na<http://is.na>(index)]]
>
> Two questions.
>
>
>    1. I just literally don't understand how this code works. It does
work,
>    of course, but I don't know what it's doing, specifically this
[index[!
>    is.na<http://is.na>(index)]] part. Could someone explain it to me
like I'm five? I'm
>    new at this...
>    2. And then: is there any other way to combine these two data.frames so
>    that we get this same result? maybe an easier to understand method?
>
> That same result, again, is
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
>
> Drake
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch
partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner?s personal data are available on
website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti:
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to
it may be confidential and are subject to the legally binding disclaimer:
https://www.precheza.cz/en/01-disclaimer/

	[[alternative HTML version deleted]]

R help - Apr 2019 - combining data.frames with is.na & match (), two questions

[R] combining data.frames with is.na & match (), two questions

[R] combining data.frames with is.na & match (), two questions

[R] combining data.frames with is.na & match (), two questions

[R] combining data.frames with is.na & match (), two questions

[R] combining data.frames with is.na & match (), two questions

[R] combining data.frames with is.na & match (), two questions