thr3ads.net - R help - [R] Taking the Average of a subset of data [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Isaac Barnhart

2019-Feb-15 15:06 UTC

[R] Taking the Average of a subset of data

Hello all, I have another question. I'm working with the following dataset:






plot    plant   leaf_number     sen_score       plot_lai        plant_lai      
lai_score       leaf_num
104     5       1       90      104     1       82      1
104     5       2       90      104     1       167     2
104     5       3       95      104     1       248     3
104     5       4       100     104     1       343     4
104     6       1       95      104     1       377     5
104     6       2       85      104     1       372     6
104     6       3       90      104     1       335     7
104     6       4       90      104     1       221     8
105     5       1       90      104     1       162     9
105     5       2       95      104     2       145     1
105     5       3       100     104     2       235     2
105     5       4       100     104     2       310     3
105     6       1       70      104     2       393     4
105     6       2       80      104     2       455     5
105     6       3       90      104     2       472     6
105     6       4       80      104     2       445     7
106     5       1       100     104     2       330     8
106     5       2       90      104     2       292     9
106     5       3       100     105     1       64      1
106     5       4       100     105     1       139     2
106     5       10      0       105     1       211     3
106     6       1       100     105     1       296     4
106     6       2       30      105     1       348     5
106     6       3       100     105     1       392     6
106     6       4       40      105     1       405     7
108     5       1       100     105     1       379     8
108     5       2       100     105     1       278     9
108     5       3       100     105     2       64      1
108     5       4       100     105     2       209     2

(Note: 'plant' and 'leaf' column should be separated.
'51' means plant 5, leaf 1).


This dataset shows two datasets: The left 4 columns are of one  measurement
(leaf senescence), and the right 4 columns are of another (leaf area index). I
have a large amount of plots, and several plants, more than what is listed.


I need to sort both datasets (senescence and leaf area index) so that each plot
has the same number of leaves.


This is hard because sometimes plots in the 'senescence' dataset have
more leaves, and sometimes plots in the 'leaf area index'. Is there a
way to sort both datasets so that this requirement is met? Like I said, there is
no way to tell which dataset has the plot with the minimum amount of leaves; it
can be either one in any case.


Any help would be appreciated!


Isaac


	[[alternative HTML version deleted]]

Bert Gunter

2019-Feb-15 20:43 UTC

head link

[R] Taking the Average of a subset of data

Read the posting guide, please, paying particular attention to how to
provide reproducible data, e.g. via ?dput. You are much more likely to get
useful help if you do what it recommends and provide data for people to
work with.

You also should provide code showing us what you tried. You appear not to
have done much homework of your own -- have you gone through some R
tutorials, for example? Which ones?

Cheers,
Bert





Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Feb 15, 2019 at 12:26 PM Isaac Barnhart <ihb at ksu.edu> wrote:
> Hello all, I have another question. I'm working with the following
dataset:
>
>
>
>
>
>
> plot    plant   leaf_number     sen_score       plot_lai        plant_lai
>      lai_score       leaf_num
> 104     5       1       90      104     1       82      1
> 104     5       2       90      104     1       167     2
> 104     5       3       95      104     1       248     3
> 104     5       4       100     104     1       343     4
> 104     6       1       95      104     1       377     5
> 104     6       2       85      104     1       372     6
> 104     6       3       90      104     1       335     7
> 104     6       4       90      104     1       221     8
> 105     5       1       90      104     1       162     9
> 105     5       2       95      104     2       145     1
> 105     5       3       100     104     2       235     2
> 105     5       4       100     104     2       310     3
> 105     6       1       70      104     2       393     4
> 105     6       2       80      104     2       455     5
> 105     6       3       90      104     2       472     6
> 105     6       4       80      104     2       445     7
> 106     5       1       100     104     2       330     8
> 106     5       2       90      104     2       292     9
> 106     5       3       100     105     1       64      1
> 106     5       4       100     105     1       139     2
> 106     5       10      0       105     1       211     3
> 106     6       1       100     105     1       296     4
> 106     6       2       30      105     1       348     5
> 106     6       3       100     105     1       392     6
> 106     6       4       40      105     1       405     7
> 108     5       1       100     105     1       379     8
> 108     5       2       100     105     1       278     9
> 108     5       3       100     105     2       64      1
> 108     5       4       100     105     2       209     2
>
> (Note: 'plant' and 'leaf' column should be separated.
'51' means plant 5,
> leaf 1).
>
>
> This dataset shows two datasets: The left 4 columns are of one
> measurement (leaf senescence), and the right 4 columns are of another (leaf
> area index). I have a large amount of plots, and several plants, more than
> what is listed.
>
>
> I need to sort both datasets (senescence and leaf area index) so that each
> plot has the same number of leaves.
>
>
> This is hard because sometimes plots in the 'senescence' dataset
have more
> leaves, and sometimes plots in the 'leaf area index'. Is there a
way to
> sort both datasets so that this requirement is met? Like I said, there is
> no way to tell which dataset has the plot with the minimum amount of
> leaves; it can be either one in any case.
>
>
> Any help would be appreciated!
>
>
> Isaac
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

PIKAL Petr

2019-Feb-18 10:47 UTC

head link

[R] Taking the Average of a subset of data

Hi

Could you show what is your intention with your data? What do you mean by sort
data to have the same number of leaves? Do you want to trim excessive rows in
both data.frames to meet such condition?

I would suggest using merge.

merge(test1, test2, by.x=c("plot", "plant"),
by.y=c("plot_lai", "plant_lai"), all=TRUE)

which gives you one data.frame with rows corresponding to each plot and line

After that you could remove all rows having NA in respective columns, which
ensures that there is same number of leaves in each column.

Cheers
Petr
> dput(test1)structure(list(plot = c(104L, 104L, 104L, 104L, 104L, 104L, 104L,
104L, 105L, 105L, 105L, 105L, 105L, 105L, 105L, 105L, 106L, 106L,
106L, 106L, 106L, 106L, 106L, 106L, 106L, 108L, 108L, 108L, 108L
), plant = c(5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 5L, 5L, 5L,
5L), leaf_number = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 10L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L), sen_score = c(90L, 90L, 95L, 100L, 95L, 85L,
90L, 90L, 90L, 95L, 100L, 100L, 70L, 80L, 90L, 80L, 100L, 90L,
100L, 100L, 0L, 100L, 30L, 100L, 40L, 100L, 100L, 100L, 100L)), class =
"data.frame", row.names = c(NA,
-29L))> dput(test2)structure(list(plot_lai = c(104L, 104L, 104L, 104L, 104L, 104L,
104L, 104L, 104L, 104L, 104L, 104L, 104L, 104L, 104L, 104L, 104L,
104L, 105L, 105L, 105L, 105L, 105L, 105L, 105L, 105L, 105L, 105L,
105L), plant_lai = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L), lai_score = c(82L, 167L, 248L, 343L, 377L, 372L,
335L, 221L, 162L, 145L, 235L, 310L, 393L, 455L, 472L, 445L, 330L,
292L, 64L, 139L, 211L, 296L, 348L, 392L, 405L, 379L, 278L, 64L,
209L), leaf_num = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
1L, 2L)), class = "data.frame", row.names = c(NA,
-29L))>

> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Isaac
Barnhart
> Sent: Friday, February 15, 2019 4:07 PM
> To: r-help at r-project.org
> Subject: [R] Taking the Average of a subset of data
>
> Hello all, I have another question. I'm working with the following
dataset:
>
> plot    plant   leaf_number     sen_score       plot_lai        plant_lai  
lai_score
> leaf_num
> 104     5       1       90      104     1       82      1
> 104     5       2       90      104     1       167     2
> 104     5       3       95      104     1       248     3
> 104     5       4       100     104     1       343     4
> 104     6       1       95      104     1       377     5
> 104     6       2       85      104     1       372     6
> 104     6       3       90      104     1       335     7
> 104     6       4       90      104     1       221     8
> 105     5       1       90      104     1       162     9
> 105     5       2       95      104     2       145     1
> 105     5       3       100     104     2       235     2
> 105     5       4       100     104     2       310     3
> 105     6       1       70      104     2       393     4
> 105     6       2       80      104     2       455     5
> 105     6       3       90      104     2       472     6
> 105     6       4       80      104     2       445     7
> 106     5       1       100     104     2       330     8
> 106     5       2       90      104     2       292     9
> 106     5       3       100     105     1       64      1
> 106     5       4       100     105     1       139     2
> 106     5       10      0       105     1       211     3
> 106     6       1       100     105     1       296     4
> 106     6       2       30      105     1       348     5
> 106     6       3       100     105     1       392     6
> 106     6       4       40      105     1       405     7
> 108     5       1       100     105     1       379     8
> 108     5       2       100     105     1       278     9
> 108     5       3       100     105     2       64      1
> 108     5       4       100     105     2       209     2
>
> (Note: 'plant' and 'leaf' column should be separated.
'51' means plant 5, leaf
> 1).
>
>
> This dataset shows two datasets: The left 4 columns are of one  measurement
> (leaf senescence), and the right 4 columns are of another (leaf area
index). I
> have a large amount of plots, and several plants, more than what is listed.
>
>
> I need to sort both datasets (senescence and leaf area index) so that each
plot
> has the same number of leaves.
>
>
> This is hard because sometimes plots in the 'senescence' dataset
have more
> leaves, and sometimes plots in the 'leaf area index'. Is there a
way to sort both
> datasets so that this requirement is met? Like I said, there is no way to
tell
> which dataset has the plot with the minimum amount of leaves; it can be
> either one in any case.
>
>
> Any help would be appreciated!
>
>
> Isaac
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch
partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner?s personal data are available on
website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti:
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to
it may be confidential and are subject to the legally binding disclaimer:
https://www.precheza.cz/en/01-disclaimer/

R help - Feb 2019 - Taking the Average of a subset of data

[R] Taking the Average of a subset of data

[R] Taking the Average of a subset of data

[R] Taking the Average of a subset of data