The 2014 MLB season marked the first time a team could utilize instant replay to challenge an umpire’s ruling. Although technically introduced in 2008, previously only home run boundary calls could be reviewed at an umpire’s discretion. Now with less time spent arguing safe/out calls and more time consulting replay officials in New York, there seems to be less opportunity for players and managers to argue and get ejected. But let’s consult the data first. I looked at ten years of ejection data (2008-2017) coming from UEFL Portal to see if I could reveal any interesting patterns. The steps of my analysis are outlined below.

Data Pre-Processing

Load Packages


Import Data

#load ejections data
load(file = "data/ejections.rda") 

Clean Data

The data is stored in a list containing ten data frames, each containing one year’s worth of ejections. As the years progressed, some column names evolved into simpler variable titles such as “Ejected Reason” becoming just “Reason”. I’ll adjust the differing variable names to match the others before combining all ten data frames into one.

map(ejections, names) #review column names for each yearly dataset

#create function to replace '08-'12 column names
fix_columns <- function(df){
  new_names <- c("W/L Pre-" = "W/L at time?" , 
                 "W/L Final" ="W/L at final?",
                 "Ej Pos" = "Pos__1")
  df %>%

ejections[6:10] <- map(ejections[6:10], fix_columns)

#create anonymous functions to fix remaining differences
ejections[5:10] <- map(ejections[5:10], function(.x) .x %>% rename("Name" = "Ejected Name") %>% rename("Reason" = "Eject Reason"))
ejections[9:10] <- map(ejections[9:10], function(.x) .x %>% rename("Team" = "Ej Team"))

#create function to select columns for each dataset in list
select_columns<- function(df){
  df %>%
  filter(`W/L Final` != "NA") %>%
    select("Date", "Team", "Ej Pos", "Name", "H", "AB", "W/L Pre-", "W/L Final", "RS", "RA", "AB", "Inn", "Reason", "Play Result")

ejections <- map(ejections, select_columns)

#combine all years
ejections <- bind_rows(ejections)

From 2008-2017 there were a total of 1930 ejections. A quick snapshot of the data is shown below.

## Observations: 1,930
## Variables: 13
## $ Date          <dttm> 2017-04-05, 2017-04-06, 2017-04-09, 2017-04-10, 2…
## $ Team          <chr> "MIA", "LAD", "TB", "PHI", "COL", "SEA", "TEX", "T…
## $ `Ej Pos`      <chr> "Manager", "CF", "CF", "Manager", "Manager", "Mana…
## $ Name          <chr> "Don Mattingly", "Joc Pederson", "Kevin Kiermaier"…
## $ H             <dbl> NA, 0, 3, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, 0…
## $ AB            <dbl> NA, 3, 4, NA, NA, NA, NA, NA, NA, NA, NA, 2, NA, 3…
## $ `W/L Pre-`    <chr> "L", "W", "W", "Tie", "L", "L", "L", "L", "L", "L"…
## $ `W/L Final`   <chr> "L", "W", "W", "L", "L", "W", "L", "W", "L", "L", …
## $ RS            <dbl> 2, 0, 2, 1, 2, 3, 1, 2, 0, 0, 2, 2, 0, 2, 2, 6, 0,…
## $ RA            <dbl> 0, 0, 0, 2, 4, 1, 5, 0, 2, 0, 2, 2, 2, 2, 2, 2, 1,…
## $ Inn           <dbl> 7, 7, 7, 8, 5, 6, 3, 9, 8, 8, 5, 5, 8, 7, 7, 3, 7,…
## $ Reason        <chr> "Unsportsmanlike-NEC", "Balls/Strikes", "Balls/Str…
## $ `Play Result` <chr> "HBP", "K", "K", "Ball", "No Balk", "Fair", "Foul"…

After a few more housekeeping data manipulations, I think our data is ready to be explored!

#convert from DateTime to Date
ejections$Date <- ymd(ejections$Date)

#Create Year variable
ejections$Year <- year(ejections$Date)

#adjust differing team name abbreviations
ejections$Team <-
  fct_recode(ejections$Team, MIA = "FLA", ARI = "AZ", MIA = "FL", WAS = "WSH")

#Check swing ejections were not differentiated from ball/strikes until 2014
ejections$Reason <- fct_recode(ejections$Reason, `Balls/Strikes` = "Check Swing")

#recode levels
ejections$`W/L Pre-` <- recode(ejections$`W/L Pre-`, L = -1, Tie = 0, TIe = 0, W = 1) 
ejections$`W/L Final` <- recode(ejections$`W/L Final`, L = -1, W = 1)

Ejections By Year & Cause

First let’s take a look at how ejections have changed since replay was introduced in 2014.

From 2008 to 2017, an MLB season averaged 193 ejections per season. In 2014, the first year of expanded replay, the number of ejections increased by 21, eclipsing 200. The following year saw yet another jump, before dropping below the 10-year average in ’16 and ’17. While the replay system doesn’t appear to be deterring ejections, I wouldn’t say it’s encouraging them either. What could be changing; however, is the reason for players/coaches getting ejected. Let’s look at how the cause of arguments could be evolving.

ejections$Reason <- fct_other(ejections$Reason, keep = c("Balls/Strikes", "Fighting", "Replay Review", "Safe/Out", "Throwing At"))

data <- ejections %>%
  group_by(Year, Reason) %>%
  select(Year, Reason) %>%
  add_tally() %>%
  distinct() %>%
  group_by(Year) %>%
  mutate(percent = n/sum(n)) %>%
  group_by(Year, Reason) %>%
  summarise(percent = sum(percent)) %>%
  ungroup() %>%
  add_row(Year = 2008:2013, Reason = "Replay Review", percent = 0) %>%
  add_row(Year = 2016:2017, Reason = "Safe/Out", percent = 0)

Arguing safe/out sharply declined after having the ability to challenge calls before going extinct in 2016. A new type of ejection emerged as a result though. Issues with the replay ruling or why a challenge was or was not granted seemed to fill the gap. The “other” category refers to play regarding interference, balks, or any other less common ruling. Perhaps what is most noteworthy, is that from 2013 to 2017 the percentage of yearly ejections via arguing balls/strikes (which also includes check swings) is up 17%. The trust in an umpire’s ability to manage the strike zone seems to be at least shrinking some. If that number doesn’t decrease anytime soon, I wouldn’t be surprised to see the automated strike zone make its debut sooner rather than later.

Ejections By Division

The following chart outlines the total amount of ejections by division. Try clicking on each square to explore which teams contribute the most to their respective divisions total and hover over each to see their count.

divisions <- ejections %>%
  mutate(division = case_when(Team %in% c("STL", "CHC", "MIL", "CIN", "PIT") ~ "NL Central",
                              Team %in% c("LAD", "SF", "SD", "COL", "ARI") ~ "NL West",
                              Team %in% c("MIA", "PHI", "WAS", "NYM", "ATL") ~ "NL East",
                              Team %in% c("NYY", "BOS", "TB", "TOR", "BAL") ~ "AL East",
                              Team %in% c("LAA", "HOU", "OAK", "TEX", "SEA") ~ "AL West",
                              Team %in% c("CWS", "DET", "KC", "CLE", "MIN") ~ "AL Central")) %>%
  group_by(Team, division) %>%

The most penalized division is the AL East with 388 ejections over 10 years. On average a team will receive 6.43 ejections per season. The top five teams with the most ejections since ’08 are listed below.

Team division n
BOS AL East 95
TOR AL East 94
DET AL Central 90
LAD NL West 89
ATL NL East 82

Managerial Ejections

It’s often said during a broadcast that after a manager is ejected, they’ll pay attention to how the players respond. The team will either take the lead (positive outcome), lose the lead (negative outcome), or remain ahead/behind (neutral outcome). The graph below plots all outcomes against each other among managers with 6 or more ejections over the past 10 years.

ejections$Name <- recode(ejections$Name, "A.J. Hinch" = "AJ Hinch", "Fredi Gonzales" = "Fredi Gonzalez")

manager_ej <- ejections %>%
  filter(`Ej Pos` == "Manager") %>%
  mutate(result = case_when(.$`W/L Final` > .$`W/L Pre-` ~ "Positive",
                            .$`W/L Final` < .$`W/L Pre-` ~ "Negative",
                            .$`W/L Final` == .$`W/L Pre-` ~ "Neutral")) %>%
  group_by(Name, result) %>%
  count() %>%
  spread(result, n, fill =0) %>%
  mutate(total = sum(Positive, Negative, Neutral)) %>%
  filter(total > 6) %>%
  mutate(diff = abs(Positive - Negative)/total)

After a team’s manager has been ejected, the current result of the game rarely changes, meaning the team in the lead tends to hang on for the victory. Some managers like Joe Girardi, Bobby Cox, or Bruce Bochy have seen their teams rally to a win in their absence more often than seeing a lead slip away or losing a tied game while others like Jim Riggleman haven’t been so lucky.