Phillip Heinrich
2020-Apr-21 20:47 UTC
[R] Subtracting Data Frame With a Different Number of Rows
I have two small data frames of baseball data. The first one is the mean number of runs that will score in each half inning for the 2018 Arizona Diamondbacks. The second data frame is the same information but for only one player. As you will see the individual player did not come up to bat any time during the season: with the bases loaded and no outs runners on first and third with one out Overall RunnerCode Outs MeanRuns 1 Bases Empty 0 0.5137615 2 Runner:1st 0 0.8967391 3 Runner:2nd 0 1.3018868 4 Runners:1st & 2nd 0 1.6551724 5 Runner:3rd 0 1.9545455 6 Runners:1st & 3rd 0 2.0571429 7 Runners:2nd & 3rd 0 2.1578947 8 Bases Loaded 0 3.2173913 9 Bases Empty 1 0.3963801 10 Runner:1st 1 0.6952596 11 Runner:2nd 1 0.9580838 12 Runners:1st & 2nd 1 1.4397163 13 Runner:3rd 1 1.5352113 14 Runners:1st & 3rd 1 1.5882353 15 Runners:2nd & 3rd 1 1.9215686 16 Bases Loaded 1 1.9193548 17 Bases Empty 2 0.4191011 18 Runner:1st 2 0.5531915 19 Runner:2nd 2 0.8777293 20 Runners:1st & 2nd 2 0.9553073 21 Runner:3rd 2 1.2783505 22 Runners:1st & 3rd 2 1.5851064 23 Runners:2nd & 3rd 2 1.2794118 24 Bases Loaded 2 1.388235 Individual Player RunnerCode Outs MeanRuns 1 Bases Empty 0 0.4262295 2 Runner:1st 0 1.3200000 3 Runner:2nd 0 1.2857143 4 Runners:1st & 2nd 0 0.5714286 5 Runner:3rd 0 2.0000000 6 Runners:1st & 3rd 0 3.5000000 7 Runners:2nd & 3rd 0 1.0000000 8 Bases Empty 1 0.5238095 9 Runner:1st 1 0.6578947 10 Runner:2nd 1 0.3750000 11 Runners:1st & 2nd 1 1.4285714 12 Runner:3rd 1 1.4285714 13 Runners:2nd & 3rd 1 0.6666667 14 Bases Loaded 1 3.0000000 15 Bases Empty 2 0.3469388 16 Runner:1st 2 0.1363636 17 Runner:2nd 2 0.7142857 18 Runners:1st & 2nd 2 1.6666667 19 Runner:3rd 2 1.2500000 20 Runners:1st & 3rd 2 2.1428571 21 Runners:2nd & 3rd 2 1.5000000 22 Bases Loaded 2 2.2000000 RunnersCode is a factor Outs are integers MeanRuns is numerical data I would like to subtract the second from the first as a way to evaluate the players ability to produce runs. As part of this analysis I I would like to input the mean number of runs from the overall data frame into the two missing cells for the individual player:Bases Loaded no outs and 1st and 3rd one out. Can anyone give me some advise?
William Michels
2020-Apr-21 22:29 UTC
[R] Subtracting Data Frame With a Different Number of Rows
Hi Phillip, You have two choices here: 1. Manually enter the missing rows into your individual.df using rbind(), and cbind() the overall.df and individual.df dataframes together (assuming the rows line up properly), or 2. Use merge() to perform an SQL-like "Left Join", and copy values from the "overall" columns to fill in missing values in the "indiv" columns (imputation). Below is code starting from a .tsv files showing the second (merge) method. Note: I've only included the first 4 rows of data after the merge command (there are 24 rows total):> overall <- read.delim("overall.R", sep="\t") > indiv <- read.delim("individual.R", sep="\t") > merge(overall, indiv, all.x=TRUE, by.x=c("RunnerCode", "Outs"), by.y=c("RunnerCode", "Outs"))RunnerCode Outs X.x MeanRuns.x X.y MeanRuns.y 1 BasesEmpty 0 1 0.5137615 1 0.4262295 2 BasesEmpty 1 9 0.3963801 8 0.5238095 3 BasesEmpty 2 17 0.4191011 15 0.3469388 4 BasesLoaded 0 8 3.2173913 NA NA HTH, Bill. W. Michels, Ph.D. On Tue, Apr 21, 2020 at 1:47 PM Phillip Heinrich <herd_dog at cox.net> wrote:> > I have two small data frames of baseball data. The first one is the mean > number of runs that will score in each half inning for the 2018 Arizona > Diamondbacks. The second data frame is the same information but for only > one player. As you will see the individual player did not come up to bat > any time during the season: > with the bases loaded and no outs > runners on first and third with one out > > Overall > > RunnerCode Outs MeanRuns > 1 Bases Empty 0 0.5137615 > 2 Runner:1st 0 0.8967391 > 3 Runner:2nd 0 1.3018868 > 4 Runners:1st & 2nd 0 1.6551724 > 5 Runner:3rd 0 1.9545455 > 6 Runners:1st & 3rd 0 2.0571429 > 7 Runners:2nd & 3rd 0 2.1578947 > 8 Bases Loaded 0 3.2173913 > 9 Bases Empty 1 0.3963801 > 10 Runner:1st 1 0.6952596 > 11 Runner:2nd 1 0.9580838 > 12 Runners:1st & 2nd 1 1.4397163 > 13 Runner:3rd 1 1.5352113 > 14 Runners:1st & 3rd 1 1.5882353 > 15 Runners:2nd & 3rd 1 1.9215686 > 16 Bases Loaded 1 1.9193548 > 17 Bases Empty 2 0.4191011 > 18 Runner:1st 2 0.5531915 > 19 Runner:2nd 2 0.8777293 > 20 Runners:1st & 2nd 2 0.9553073 > 21 Runner:3rd 2 1.2783505 > 22 Runners:1st & 3rd 2 1.5851064 > 23 Runners:2nd & 3rd 2 1.2794118 > 24 Bases Loaded 2 1.388235 > > Individual Player > > RunnerCode Outs MeanRuns > 1 Bases Empty 0 0.4262295 > 2 Runner:1st 0 1.3200000 > 3 Runner:2nd 0 1.2857143 > 4 Runners:1st & 2nd 0 0.5714286 > 5 Runner:3rd 0 2.0000000 > 6 Runners:1st & 3rd 0 3.5000000 > 7 Runners:2nd & 3rd 0 1.0000000 > 8 Bases Empty 1 0.5238095 > 9 Runner:1st 1 0.6578947 > 10 Runner:2nd 1 0.3750000 > 11 Runners:1st & 2nd 1 1.4285714 > 12 Runner:3rd 1 1.4285714 > 13 Runners:2nd & 3rd 1 0.6666667 > 14 Bases Loaded 1 3.0000000 > 15 Bases Empty 2 0.3469388 > 16 Runner:1st 2 0.1363636 > 17 Runner:2nd 2 0.7142857 > 18 Runners:1st & 2nd 2 1.6666667 > 19 Runner:3rd 2 1.2500000 > 20 Runners:1st & 3rd 2 2.1428571 > 21 Runners:2nd & 3rd 2 1.5000000 > 22 Bases Loaded 2 2.2000000 > > RunnersCode is a factor > Outs are integers > MeanRuns is numerical data > > I would like to subtract the second from the first as a way to evaluate the > players ability to produce runs. As part of this analysis I I would like to > input the mean number of runs from the overall data frame into the two > missing cells for the individual player:Bases Loaded no outs and 1st and 3rd > one out. > > Can anyone give me some advise? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.