re-en‧gi‧NēRD

Tuesday, September 27, 2016

EPL Advanced Table

Bookmark this page, as I'll be updating through the season. If you have questions or ideas for improving the advanced table, I welcome the feedback.

Let's Connect!
@reenginerd on Twitter & Instagram
Tableau Public Repository
GitHub
StackOverflow

Thursday, September 8, 2016

Messing Around With Fantasy Football #2

In MAWFF#1, we looked at the scoring trends by position over the last 10 years in standard scoring fantasy leagues. Some might say that a point is a point, but when it comes to touchdowns, it's 6 (4 for standard league QBs) or 0. This leads to wild score swings for players who rely heavily on reaching the end zone. So I wanted to take a look at the players who depend most on this aspect of scoring, in an effort to consider who I should be wary of repeating their 2015 performance.

Here's a look at the fantasy players who depend most on the TD:

So you need to decide for your team: Do you want a player that produces on the field each week and is reliable for 5-10 pts by yards gained, or do you want a guy that is going to go big some weeks and score 3 pts when they dont reach the end zone?

Look at Demaryius in 2013, Dez in 2014, and... 'Doug' in 2015. All 3 relied heavily on their TD production to achieve their great seasons. I like to make sure the offense will be the same before betting on this kind of performance again. Is the QB the same (cough... Romo)? Has the offense lost a guy whose nickname is Beastmode? How will that affect that guy's ability to get into the end zone.

Look at Tyrod and Cam last year. About the same production from yards gained, but Cam scored a touchdown per game more. Kirk Cousins found the end zone plenty last season, but are you gonna bet on Kirk Cousins?

TE, look at Julius Thomas 2014 vs 2015. 3.8 points from yards each year, but a drop from 5.5 points from touchdowns to just 2.5 a game. Same on field production... [At this point I realize I ended up with him at the end of my auction draft... and... replaced. Phew. Pretty sure I just reverse jinxed him, so you should start him week 1].

Looking back through RB, it's ~~fun~~ enraging to see the vultures make the list. Look at 2010, Javarris James 1.6 points from yards per game, but he scored 6 touchdowns in 10 games played. Lol, who? #specialist

What do you guys think? Does anyone else like to break it down this way?

Let's Connect!
Tableau Public Repository
GitHub
StackOverflow
@reenginerd on Twitter & Instagram

Wednesday, September 7, 2016

Messing Around With Fantasy Football #1

In this first MAWFF post, I take a quick look at how fantasy scoring has evolved over the last 10 years. In the graph below, I show the following:

Top 20 Players by Average Points Per Game by Position by Year
Must have played at least 10 games
Graph compares Points Per Game by Yards Gained vs Points Per Game by TD

A couple notable trends stick out:

QB: Notable rise in points per game in both yard and TD points.
RB: Reduction in both yard and TD points. This could be due to a reduction in overall RB productivity as the passing game has increased and/or a spreading of the RB load with more RBs by committee in use.
TE: Small but meaningful increase in TE offensive productivity, especially at the top end with the likes of Rob Gronkowski and Jimmy Graham.
WR: Not a huge change over 10 years. With the increase in QB productivity, the receiving yards and TDs look to be going to non-top 20 WRs. TEs and pass-catching RBs are the likely beneficiaries.

What do you guys think? How has this affected your draft over the last couple years?

Let's Connect!
Tableau Public Repository
GitHub
StackOverflow
@reenginerd on Twitter & Instagram

Saturday, July 23, 2016

Road To The CrossFit Games: Part 4: Did The Best Athletes Make It To The Games?

Wow, what an awesome way (or annoying way if you are a Dave Castro hater) to start the games this year. ICYMI they made the athletes meet at 3:30am on Wednesday morning and flew them up to the CrossFit Ranch in Aromas where the inaugural Games were held. Whatever / Wherever / Whenever, amiright? Now while I’m sitting here writing this, the women are completing Murph 2.0. They nerfed Murph (giggle) this year by partitioning the reps, but it has still pushed these athletes to the limit – I don’t care how you split up the 600 Murph reps, you are going to face muscle failure... and then have to run another mile. And that is amazing, because these are the top 40 fittest men and women in the world… or are they?

With only 5 spots available from each Regional, it’s very likely that some great athletes won’t be competing at the Games due to the Regional that they competed at. There are clearly stronger Regionals that might deserve more than 5 bids, and conversely there are weaker Regionals. In part 4 of this series I’ve been doing on the CrossFit Games, I set out to score all 8 Regionals against each other to see how well the current system did at selecting the top 40 athletes. CFHQ has shown that they are always evolving this process, which has improved, and I believe will continue to improve. To finish this piece I’ll take a look at some ideas to help achieve that end.

GETTING STARTED

Before we look at the scores, I tried to eliminate (or at least be conscious of) any bias that might skew the results. I first thought there might be an advantage to the athletes that competed in the 3rd week due to them having more time to practice and strategize events. Here are the average times / scores for each event by week:

There does appear to be some advantage, but with a small sample size, it could be due to just having a stacked Regional. I wanted to improve this, so I eliminated the outliers by normalizing the Open scores of the athletes and considering only those within 1 standard deviation of the mean Open score. Let’s take another look.

That is a much different picture. There is almost no improvement for the men and none for the women.

There are some other forms of bias that I cannot do anything about, such as the fact that most of these competitors didn’t actually compete against one another. That is that an athlete might work just hard enough to win his Regional when he could’ve done better.

SCORING

So with that out of the way, let’s talk about scoring. For this exercise I chose to score the Regionals like the Open. Scores for each event are rank ordered with your place in an event awarded as your score. An athlete's overall score is equal to the sum of his/her places in each event. The athlete with the lowest overall score is in 1st Overall Regional Place.

There are obviously other ways to attempt an overall scoring of the Regionals. The order may change, but the point is the same. Not all Regionals are created equal – it’s easier to make it to the games in some Regionals than others. In a system set up the way these Regionals are, you can’t hope to get the top 40 athletes every time. This is a problem that many sports face, and people usually write off strong divisions to seasonality – that in the long run things even out. Where The CrossFit Games differs is that it is an open tournament with geographically based bids. Yes, some athletes do move around, particularly in North America, but it's not the same as a free agent / draft system that seeks to prop up weaker teams. The CrossFit Games is still young, relatively speaking, and they make changes every year. As recently as 2014, each Region (17) held its own Regional. In that system, the top 3 athletes qualified for the Games in many Regions, with fewer in some. These would be the Games qualifiers in 2016 under that system:

Combining 17 Regions into 8 Regionals is a big improvement, as it lowers the error and allows strong Regions (e.g. Europe Women) to win 5 bids. When the Europe Women previously had only 3 bids (2014), then reigning champ Samantha Briggs finished in 4th place and failed to qualify. Under the current system she would have qualified at the expense of a likely lesser athlete from the Africa Region. Despite the improvement, there are still some seemingly worthy athletes that didn't make the cut. Here is a look at this years qualifiers and their Overall Regional Place:

There is definitely improvement, but still 6 women and 9 men who theoretically might be top 40 didn't qualify. When the 13th best Regionals competitor fails to get 1 of 40 spots, there is an argument to be made for changes.

IDEAS FOR THE FUTURE

I’m certain that CFHQ is taking feedback and planning improvements without my help, but here are some ideas I have to improve the process. As with many of the major sports (in America at least), they could consider creating wildcard bids for the best athletes who did not auto-qualify at their Regional. To begin this could mean altering the current system by reducing auto-bids to 4 (32 total auto-bids) and adding 8 wildcard spots to be awarded to the top 8 finishers across all Regionals. Of course to do that you also need a scoring system that can be used to compare the athletes from different regions. You could also allocate wildcard bids to 'Super-Regionals' such as 'The Americas' and 'The World'.

To quantify the impact of potential changes, I’ve taken the average overall place of all 2016 games qualifiers under the different selection systems:

1) Old System (Bids for each Region)
2) Current System (5 bids for each Regional)
3) Wildcard System (4 bids for each Regional / 1 Wildcard per Regional)

To judge the effectiveness of a system, I've defined the following term:

Selection Error = Average Overall Place – 20.5

20.5 is the theoretical best (lowest) Average Overall Place. If a system selects the top 40 placing overall athletes, the average place will equal 20.5. The smaller the Selection Error, the better the method.

The wildcard system would make another huge jump forward in the selection process. Here are the athletes who would've qualified in 2016. It is clear that strong Regionals are rewarded, while still pulling athletes from around the globe:

CONCLUSIONS

No, the Regionals process wasn’t perfect, but it is getting better. I don’t want this to come across as harsh criticism, because that is not my intent, nor is it deserved. More than one major sport has dealt with a similar issue recently. Until recent years, the NBA was dominated by the Western Conference, leading people to suggest the league takes the top 16 teams overall to the playoffs instead of 8 from each conference. The MLB added another wildcard team to the playoffs a couple years ago and has the wildcard teams play a ridiculous 1 game series to get into the real playoffs. Conversations around these issues usually lead to the same conclusions: 1) do we really care that much about the worst teams to get into the postseason, 2) just win your division… Ultimately that is how I feel here > #WINYOURDIVISION. There are other advantages to a wildcard system though. Mainly with the possibility of an athlete on the bubble getting in, it keeps bubble athletes working as hard as possible thereby improving the competition. Also, it ties a nice little bow around all of the Regionals, which might cause more fans of week 1 Regionals to watch the week 3 Regionals to see who will get the wildcard bids.

So keep doing what you’re doing CrossFit, and I look forward to what you have in store for the future.

-ReEngiNerd

Let's Connect!
Tableau Public Repository
GitHub
@reenginerd on Twitter & Instagram

Thursday, July 14, 2016

NBA: Tim Duncan, A Look Back.

With news of Tim Duncan's retirement this week, it is time to take a look at his place amongst the great power forwards in NBA history.

I found this list from ESPN that seemed like a good place to start. With 26,496 career points, he sits 14th all-time. I've always loved Timmy's bank shot, and hated meeting him in the playoffs. Did you know he is also one of the best defenders at his position? As I dug into the stats in this group, this one really stood out:

Just incredible, zero career personal fouls committed. Not a single one. Perfect. Contrast this with the Mailman's 4,578, and you can start to understand exactly what we are losing from the game.

You might be saying, that's not possible. But the career block numbers back this up:

Timmy, you'll be missed. The bank is closed.

Thursday, July 7, 2016

2016/04 TTM - Texas Alcohol Revenue (VERSION 2.0)

Minor Update

Added a few Venue Names, added a new feature to the dashboard (seasonality), and started getting into scrubbing the dataset.

Data Issues

Many of the venues have reported more than 12 months of data, which so far is usually due to having multiple reports in certain months. In some cases the extra report is for $0 (no harm there), but in others it's not so simple. Are these extra reports amendments? If so, there is some double counting going on. You can see in the tooltip how many periods are counted (should be 12). For now I've just started making note of these discrepancies.

What other features would you like to see?

-ReEngiNerd

Let's Connect!
Tableau Public Repository
GitHub
@reenginerd on Twitter & Instagram

Tuesday, June 28, 2016

Colorado Retail Marijuana Sales (June 2016)

Borrowing the concept from my last post on Texas Alcohol Revenue, a friend pointed me towards the Colorado tax receipts on Retail Marijuana Sales:

It's hard to pull much out of this dataset besides "wow that's a lot". Sadly the state does not report sales by individual business as in the Texas Alcohol example. I hoped to see clearer seasonal trends, but it is difficult to do so in the early rapid growth stage. There does seem to be a slight dip in Q4, possibly due to family holidays? I guess this isn't a family activity quite yet...

-ReEngiNerd

Let's Connect!
Tableau Public Repository
GitHub
@reenginerd on Twitter & Instagram