In 2019, I really got into picking the winning team each week through survivor pools and general sports books. I felt like when I looked at the stats from each week, I could see a story unfold. I would stare at the screen and feel like Allen from The Hangover when he played poker.

 

I was able to go down the rabbit hole for several games each week where I could find a solid conclusion. But I didn’t have the time to achieve that depth for every game. I needed more manpower – or rather, machine power. (Un)Luckily, 2020 offered some extra time around the house, so I learned some coding and researched a lot of football stats to address this.

 

How it started

An afternoon of scrolling Reddit led to this post about NFL stats. I found it pretty cool and compelling enough to take a further look. So, I took all the official team stats on NFL.com, threw them into Python, and tried to see what correlated to wins. I got some cool numbers but had to look at each to determine correlation or causation.

 

For example, rushing stats as a whole were kind of weak. Teams run the ball to burn the clock when they have a lead. So, a strong rushing attack doesn’t usually cause the win but is rather the result of having the lead. And teams with long runs (>40 yards) had a negative correlation! Big runs resulted in losing some games. Unfortunately, this supports the current trend of devaluing Running Backs.

 

On the other hand, the stats shined in the passing game. QBR, Yards per Attempt, and Passing 1st Downs were very highly correlated to victories. Passing yards are not. This is because yardage can accumulate during a loss – like during garbage time. But QBR, YpA, and 1st Downs are related to good decision-making and control of the game. We are truly living in an era where passing is king. In fact, the team with the better QBR won 71% of the time.

 

Other factors, including sacks given up, turnovers, Tight End contribution, and a few others, all work together to form a model offense. Defense has respective counterpoints. And of course, Special Teams does its thing off to the side – just like at practice.


Combining all of this info with a little computer code ended up with a basic program that models a football game. 

 

How it’s Going

In 2021, the machine was 61% accurate. In 2022, Version 2.0 was 68% accurate for regular season games and fine-tuned down to pick 12 of 13 playoff games. The more data, the more accurate this machine became.

 

For reference, most professional analysts average around 65% accuracy. We’re looking to compete against the best of the best. This year, Version 3.0 is aiming to break 70% with some shiny new tools. Here is a summary of the flow and the upgrades.

 

Scrape the Stats


First, I get the stats. Each week, data is scraped and added to the database. In general, bigger stats mean bigger success. If an offense can generate more yards and points than a defense can limit, then they usually win. But there’s only so much truth to that statement. The machine takes these team stats and breaks them into tiers.  It then places the offensive tiers versus the defensive tiers in each game to establish a matchup.

The tier system gives some margin for error and respects the upper and lower limits of the league. Even 0-16 teams have been able to move the ball against playoff-bound teams.

 

Track Trajectory

Second, I’ve added a 3-week rolling average. This is a tracker of how well a team is doing relative to their expectations. Most teams gradually improve throughout a season, but injuries and trades rock the boat. This tracks upswings and drop-offs and has extra value in divisional games when things are tight. How a team performed 4-weeks ago doesn’t always indicate how they will perform this week.

 

This average 

 

Strength of Schedule

Third, I’ve added a model to track strength of schedule using a concept similar to Crabtree Points. If you haven’t heard of this, you’re not alone. This is the system that the state of Maine used to select high school football teams for state playoffs. A team’s winning percentage plus their opponents winning percentage gives a decent evaluation of how strong a team is. This helps a lot with cross-conference games.

Hypothetically, let’s say the Raiders are 1-4 and have primarily played the juggernaut AFC West teams. They are set to play the Panthers who are 4-1 but have mostly played struggling NFC South teams. Can you really say the Panthers are the better team based on record? These Crabtree Points help the machine see these scenarios.

 

Weigh the Passing Game

Lastly, I have the machine calculate the Killer Stat. This becomes more important later in the season, but it’s a ratio of a team’s passing offense vs. its own passing defense. This gives a measure of efficiency in ball movement.

 

The End Game

All four modules get put together to pick the winners. So, stay tuned each week as I post what my machine picks with some insight as to why.

For More of My Content

Darnell Washington and 2023 Rookie Expectations