4 years of stats from Hoopdata.com

Errntknght

Registered User
Joined
Sep 24, 2002
Posts
6,342
Reaction score
319
Location
Phoenix
Recently someone mentioned the hoopdata.com site and checking it out I found a wealth of data, just ripe for analysis. How could a mathematician resist. The data consists of yearly summary statistics for each team plus some calculated formulas like Hollinger's PER (for teams.) Using a correlation of each stat versus the teams winning pct, it gives some information about how important each of them is to winning games.

Some stats, like 'point differential', are known to correlate well with winning pctage and at the entry PTdiff, you'll see it shows up with a 4 yr avg correlation coeff. of .974 (out a maximum possible of 1.0). Hollinger's PER, at the bottom of the list, has avg cc of .845, so its not all that bad. Of course, its a terribly complicated formula and you can see that simple field goal percentage difference (own minus opponents, named FG%diff) correlates just as well. FG%diff, PTdiff, 3P%diff are not given explicitly at Hoopdata.com, so I calculated them from PTS - OPTS(opponents pts), FG% - OFG%, and 3P% - O3P%. Except for those three 'stats' the rest of the labels match those at hoopdata.com in the two statistical categories "Basic" & "Advanced". (You can get the definition of them on hoopdata.com, should you be interested in the details.) I will mention that CHG means charges taken and DEF means CHG+BLK+STL.

Some things I found of interest.
How can FT% be of so little significance - note that it had a negative correlation coefficient in 08-09. The number of data points for a year are only 30, one for each team, which means that estimating a correlation has a std deviation of .18 and the average of 4 years has std dev of half that or .09. If you look across four years you can see that they do jump around a good bit, except that the high numbers don't vary as much (thats expected, just on statistical grounds). My explanation is that FT% doesn't correlate highly with winning because the variation among teams' yearly averages isn't that great, to begin with, and perhaps teams with a number of bulky strong guys, who contribute other ways, don't shoot FT's so well. You may have a better explanation.

STLs and CHGs also had negative years and little significance on average, too. One thought I had about CHGs was that teams that get lots of charges, try to get them lots of times and fairly often they fail to get the call, which is a significant negative. If you look at the progression over 4 years, the steadily declining value might well mean that the refs are siding more and more with the offensive player - or it could mean the players are pushing the envelope too far in trying to get a charge call. Of course, its well within the expected random variation, and could mean zip. One thing that is clear is that taking charges is not something that is greatly beneficial in itself. Teams probably should discourage players from doing it that don't get the calls most of the time.
STLs having essentially zero value is right in line with what many coaches say - going for steals costs you significantly when you fail. Clearly league wide, players are pushing the envelope.

Pace has a consistently negative value. I don't take that to mean that it is beneficial to slow the game down. For one thing two of teams that push the pace the most the last two years are simply two of the worst teams at any pace - GSW and NYK. Players with poor shot selection increase the pace of the game a small amount by not working for a better shot. Put a bunch of poor shot selection guys together and you have a losing team and a somewhat higher pace.

ORR, offensive rebound rate, has an almost zero cc, which seems to fly in the face of reason, but it is true that you have miss a shot to get an offensive rebound so that might be the reason for that. I'm a little surprised when it shows up in ORR, though - I've known for a while that it shows up in raw count of offensive rebounds. ORR is not o-rbs per game, its as a percentage of missed shots and you'd think that had to be beneficial. It doesn't appear in these two sets of stats but opponent's offensive rebounds has a significant negative correlation with winning pctage which would make you think own OR are worth a fair bit when expressed as a pctage of missed shots.

For the proponents of Offensive and Defensive efficiency, you'll see that they show up with significantly higher cc's than raw PTS & OPTS, so you might expect the difference to be a better metric than PTdiff but it is a dead heat between the two. In fact, not only do they average the same they're almost identical every year.



stat.......... 06-07 .... 07-08 .... 08-09 .... 09-10
PTS ..... : . +0.320 .. +0.509 .. +0.297 .. +0.448 ... Avg: +0.394
OPTS .... : . -0.525 .. -0.542 .. -0.654 .. -0.606 ... Avg: -0.582
FG% ..... : . +0.455 .. +0.643 .. +0.649 .. +0.601 ... Avg: +0.587
OFG% .... : . -0.704 .. -0.764 .. -0.873 .. -0.749 ... Avg: -0.772
3P% ..... : . +0.421 .. +0.536 .. +0.483 .. +0.532 ... Avg: +0.493
O3P% .... : . -0.561 .. -0.565 .. -0.564 .. -0.649 ... Avg: -0.585
FT% ..... : . +0.107 .. +0.284 .. -0.160 .. +0.012 ... Avg: +0.061
AST ..... : . +0.405 .. +0.487 .. +0.335 .. +0.396 ... Avg: +0.406
TO ...... : . -0.548 .. -0.473 .. -0.374 .. -0.429 ... Avg: -0.456
STL ..... : . -0.051 .. +0.304 .. +0.208 .. -0.034 ... Avg: +0.107
BLK ..... : . +0.267 .. +0.313 .. +0.163 .. +0.403 ... Avg: +0.287
CHG ..... : . +0.388 .. +0.044 .. +0.066 .. -0.106 ... Avg: +0.098
DEF ..... : . +0.311 .. +0.384 .. +0.234 .. +0.198 ... Avg: +0.282
PF ...... : . -0.373 .. -0.104 .. -0.229 .. -0.172 ... Avg: -0.220
PTdiff .. : . +0.953 .. +0.978 .. +0.991 .. +0.974 ... Avg: +0.974
FG%diff . : . +0.800 .. +0.876 .. +0.904 .. +0.836 ... Avg: +0.854
3P%diff . : . +0.635 .. +0.687 .. +0.673 .. +0.774 ... Avg: +0.692
Below from statistical category Advanced
Pace .... : . -0.108 .. -0.117 .. -0.309 .. -0.282 ... Avg: -0.204
OffEff .. : . +0.703 .. +0.853 .. +0.823 .. +0.774 ... Avg: +0.788
DefEff .. : . -0.711 .. -0.826 .. -0.870 .. -0.740 ... Avg: -0.787
Diff .... : . +0.953 .. +0.977 .. +0.990 .. +0.972 ... Avg: +0.973
TS% ..... : . +0.493 .. +0.712 .. +0.724 .. +0.682 ... Avg: +0.653
AR ...... : . +0.496 .. +0.553 .. +0.522 .. +0.485 ... Avg: +0.514
TOR ..... : . -0.535 .. -0.454 .. -0.300 .. -0.345 ... Avg: -0.409
ORR ..... : . -0.080 .. +0.104 .. +0.172 .. +0.056 ... Avg: +0.063
DRR ..... : . +0.479 .. +0.323 .. +0.470 .. +0.501 ... Avg: +0.443
TRR ..... : . +0.397 .. +0.537 .. +0.691 .. +0.553 ... Avg: +0.544
EFF ..... : . +0.658 .. +0.735 .. +0.642 .. +0.736 ... Avg: +0.693
WS ...... : . +0.764 .. +0.842 .. +0.833 .. +0.822 ... Avg: +0.815
AWS ..... : . +0.718 .. +0.805 .. +0.767 .. +0.810 ... Avg: +0.775
PER ..... : . +0.794 .. +0.883 .. +0.860 .. +0.844 ... Avg: +0.845

The stats AR, TOR, ORR, DRR, TRR are not per game rates, but per possession rates for AR (Asts) and TOR (TOs) and per opportunity rates for the rebounds. TS%, EFF, WS, AWS, and PER are formulas of varying complexity - see Hoopdata.com.
 
Last edited:

Sunburn

ASFN Lifer
Joined
Oct 8, 2008
Posts
4,408
Reaction score
1,637
Location
Scottsdale
Excellent, excellent stuff. I'm bookmarking that site. I wonder if you could get a job with a team if you became adept enough with these statistics.
 
Last edited:
OP
OP
E

Errntknght

Registered User
Joined
Sep 24, 2002
Posts
6,342
Reaction score
319
Location
Phoenix
Teams are using stats more than ever even though the game is not ideally suited to it, like baseball is. I think you have to understand the game quite thoroughly as well as being adept at statistics - the game is probably more important, in fact.

Joe Mama, a long time poster here, works for a company that supplies video clips from NBA games, organized a variety of ways, and perhaps some stats as well. If you're seriously interested in a career you might talk to him about how one gets started. Joe has the most astute bbal mind of anyone that ever comes in here, in case you hadn't figured that out for yourself.
 

Sunburn

ASFN Lifer
Joined
Oct 8, 2008
Posts
4,408
Reaction score
1,637
Location
Scottsdale
Teams are using stats more than ever even though the game is not ideally suited to it, like baseball is. I think you have to understand the game quite thoroughly as well as being adept at statistics - the game is probably more important, in fact.

Joe Mama, a long time poster here, works for a company that supplies video clips from NBA games, organized a variety of ways, and perhaps some stats as well. If you're seriously interested in a career you might talk to him about how one gets started. Joe has the most astute bbal mind of anyone that ever comes in here, in case you hadn't figured that out for yourself.

Hey, thanks for the tip. I'll have to ask him about it.
 

Irish

Registered
Joined
Apr 11, 2008
Posts
2,668
Reaction score
0
Location
Arizona
Great stuff. I'm not entirely sure "get" it all. But it does reinforce a few truths.''

1. BEWARE OF STAT TAKEN OUT OF CONTEXT. My favorite is counting offensive rebounds without looking at team shooting percentage. :bang:

2. INVIDIUAL VERSUS TEAM STATS: A player who socores a lot, but doesn't pass the ball is obvious. Blocking out does not really generate stats, but is hugely valuable.

3. TURNOVERS TO GENERATE OFFENSE. Some coaches really push for turnovers because their team is so bad on offense.
 

elindholm

edited for content
Joined
Sep 14, 2002
Posts
27,541
Reaction score
9,821
Location
L.A. area
Thanks for posting that information, Errntknght. I suspect that the reason ORR is such a poor predictor of success is that teams that gamble on the offensive boards leave themselves vulnerable to fast breaks.

The near-identicality between PTdiff and O/Deffdiff is really striking.

I'm surprised that AR is so high, since (a) assists aren't awarded for FT attempts thwarted by fouls -- thus a team that gets to the line a lot is, comparatively, going to suffer in amassing assist totals -- and (b) the league's rules and patterns of star treatment seem to encourage one-on-one play.
 

cly2tw

Registered User
Joined
Oct 26, 2002
Posts
5,832
Reaction score
0
Isn't CC based on only 30 samples each year a little too raw?
 
OP
OP
E

Errntknght

Registered User
Joined
Sep 24, 2002
Posts
6,342
Reaction score
319
Location
Phoenix
Isn't CC based on only 30 samples each year a little too raw?

Thirty teams is the entire population so one can't increase the number of samples. Four years worth of data does show how the various CC's jump around and they also show that patterns do remain fairly consistent. Averaging is also a legitimate statistical technique so we have 120 data points for each stat to work with.

I'd be interested to know if there are some stats that would lead to enlightenment if only they were more precise?
 

Joe Mama

Moderator
Supporting Member
Joined
May 14, 2002
Posts
9,501
Reaction score
964
Location
Gilbert, AZ
Teams are using stats more than ever even though the game is not ideally suited to it, like baseball is. I think you have to understand the game quite thoroughly as well as being adept at statistics - the game is probably more important, in fact.

Joe Mama, a long time poster here, works for a company that supplies video clips from NBA games, organized a variety of ways, and perhaps some stats as well. If you're seriously interested in a career you might talk to him about how one gets started. Joe has the most astute bbal mind of anyone that ever comes in here, in case you hadn't figured that out for yourself.

I paid Errntknght to say that. :)

while I appreciate the compliment the truth is that there are many, many people who frequent this message board who understand the game better and follow it more closely than me. My job gives me access to tools that most people not associated with a team don't have. I can pull up video of just about anybody I would want to see, but actually my work has very little to do with actually watching and breaking down basketball games myself. I tell people all the time that I watch far, far less basketball now than I did prior to this job. My opinions on draft picks for example are almost entirely based on the opinions of some of our long time loggers whose minds I pick in May/June every year.

By the way, if anybody is interested in logging basketball games this upcoming season go ahead and send me a private message. We are already preparing for 2010-11.

Joe
 

Irish

Registered
Joined
Apr 11, 2008
Posts
2,668
Reaction score
0
Location
Arizona
I haven't made a lot of sense of man of the stats, but my concern is that it is very hard to separate what a player does versus what the rest of the team and the coach does.

Let me give an example. Steve Nash wasn't a whole lot better in 2002-03 than later, yet the team defense for the Mavs was vastly better. If you looked at his opponent's individual scoring, I'm guessing he fell off the table.

I'm not sure how it wors, but I'm guessing that a straight on shot blocker can make up for a guy who is a weak man defender on the outside. The combination of having a weak man defender and the absense of a straight on shot blocker really hurts, but the measure is hard to find since length intimidates.

Man defense is at least partially a function of who a guy plays. A guy can look very bad when always batched against top plyers and much better if someonele does that job. Many "very good" defenders benefit from being on a strong defensive team.

Good defense is hard to measure directly. For example, keeping a guy to a low shtting perc entage may be due to a weak total offense. Limiting a guy's attempts is fine unless it just shows that other guys were getting the shots.

'the difficulty of measuring defense means we are left with subjective views.
 
OP
OP
E

Errntknght

Registered User
Joined
Sep 24, 2002
Posts
6,342
Reaction score
319
Location
Phoenix
I paid Errntknght to say that. :)

while I appreciate the compliment the truth is that there are many, many people who frequent this message board who understand the game better and follow it more closely than me. My job gives me access to tools that most people not associated with a team don't have. I can pull up video of just about anybody I would want to see, but actually my work has very little to do with actually watching and breaking down basketball games myself. I tell people all the time that I watch far, far less basketball now than I did prior to this job. My opinions on draft picks for example are almost entirely based on the opinions of some of our long time loggers whose minds I pick in May/June every year.

By the way, if anybody is interested in logging basketball games this upcoming season go ahead and send me a private message. We are already preparing for 2010-11.

Joe

You fooled me, you tricky son-of-a-gun... I'll bet you didn't post something unless you knew what you were talking about.

Brings to mind a famous quote of Mark Twain's:"Better to remain silent and be thought a fool than to speak out and remove all doubt."
 
OP
OP
E

Errntknght

Registered User
Joined
Sep 24, 2002
Posts
6,342
Reaction score
319
Location
Phoenix
Here is another group of stats, correlated with winning pctage as before. This is the stat group labeled 'Four Factors' on hoopdata.com. (Some of them are repeated from Basic & Advanced.)

I was so surprised that opponents turnovers had a CC so much less than own turnovers that I ran a variety of tests on my computer program to make sure it didn't mess up the computations. I can't vouch for the data from hoopdata.com but it seems unlikely they have such erroneous data for opps turnovers to cause the strange results - similar to own ORR.

I've come to grips with the low CC for opps TO's - its got to be the same effect as STLs, in trying to force turnovers you give up something else just like when you go for STLs. Of course, STLs are part of 'forced' turnovers so it could be just the STLs dragging it down. Heck, we call all opps turnovers forced turnovers even though a good number of them are simple mistakes by the opps - and pretty much opps turnovers = opps unforced turnovers + ownSTLs.

One thing you will note is that opps ORR has exactly the same CC as own DRR with the opposite sign - that is exactly what should happen because oppORR = 100 - ownDRR. Its just a cross check that the data is consistent and the calculations are not totally out of whack.

A further consequence of the CC of ownORR being small is that oppDRR has to be small as well because oppDRR = 100 - ownORR. (They don't list oppDRR anyplace.)

This data shows that oppFTR, like ownFTR, has a small CC - I wouldn't know what to think if they were wildly different. I'm still surprised that FTR is so insignificant. FT% having a small CC, I can understand because that just means that poor teams shoot FTs about as well as good teams. One possible explanation for the small FTR CC would be the refs awarding more FTs to the team that was losing. Maybe they have bias against large score differentials...


year..... : .. 06-07 .... 07-08 .... 08-09 .... 09-10 .... 4 year average

OffEff .. : . +0.703 .. +0.853 .. +0.823 .. +0.774 ... Avg: +0.788
DefEff .. : . -0.711 .. -0.826 .. -0.870 .. -0.740 ... Avg: -0.787
Diff .... : . +0.952 .. +0.978 .. +0.990 .. +0.972 ... Avg: +0.973
eFG% (effective FG%)
Own ..... : . +0.541 .. +0.705 .. +0.757 .. +0.650 ... Avg: +0.663
Opp ..... : . -0.721 .. -0.756 .. -0.852 .. -0.751 ... Avg: -0.770
Diff .... : . +0.844 .. +0.877 .. +0.918 .. +0.853 ... Avg: +0.873
FTR (FT rate: FT's attempted / FGs attempted)
Own ..... : . -0.266 .. +0.144 .. +0.204 .. +0.422 ... Avg: +0.126
Opp ..... : . -0.185 .. -0.120 .. -0.312 .. -0.182 ... Avg: -0.200
Diff .... : . -0.061 .. +0.218 .. +0.417 .. +0.462 ... Avg: +0.259
TOR (TO rate: turnovers / possesions)
Own ..... : . -0.535 .. -0.454 .. -0.300 .. -0.345 ... Avg: -0.409
Opp ..... : . +0.019 .. +0.253 .. +0.087 .. -0.077 ... Avg: +0.070
Diff .... : . -0.425 .. -0.498 .. -0.318 .. -0.181 ... Avg: -0.355
ORR (100*OR / (other teams defensive rebounds + OR) )
Own ..... : . -0.080 .. +0.104 .. +0.172 .. +0.056 ... Avg: +0.063
Opp ..... : . -0.479 .. -0.323 .. -0.470 .. -0.501 ... Avg: -0.443
Diff .... : . +0.246 .. +0.257 .. +0.479 .. +0.328 ... Avg: +0.328
 

Irish

Registered
Joined
Apr 11, 2008
Posts
2,668
Reaction score
0
Location
Arizona
In some ways, the one of the more meaningful offensive stat is points per posession. A team that gets to the line a lot will do well and a tem that does not get many offensive rebounds will get hurt unless they hit an abnormal percentage of outside shots.

In theory a bad shooting team should get more offensive rebounds than a good shooting team because of greater opporunities. What is harder to keep track of his how many fast breaks are permitted when everyone crashs the boards.
 
OP
OP
E

Errntknght

Registered User
Joined
Sep 24, 2002
Posts
6,342
Reaction score
319
Location
Phoenix
In some ways, the one of the more meaningful offensive stat is points per posession. A team that gets to the line a lot will do well and a tem that does not get many offensive rebounds will get hurt unless they hit an abnormal percentage of outside shots.

In theory a bad shooting team should get more offensive rebounds than a good shooting team because of greater opporunities. What is harder to keep track of his how many fast breaks are permitted when everyone crashs the boards.

Yeah, OffEff, is a good stat, but isn't it rather obvious that scoring more pts/poss than your opponent does is the way to win ball games - since the game is designed to give an equal number of possessions to each team.

(I can't tell if you're aware of it or not but hoopdata calls pts per possession Offensive Efficiency and Defensive Efficiency is the same for opponents.)

The interesting stats to me are the ones that don't contribute to winning (+ or -) the way you'd expect them to. A steal is obviously a good thing in itself but since STLs don't contribute much to winning, clearly teams are giving up something to get those steals - nearly equal to the value of the steal from the statistics. Coaches know some players give up too much in going for steals but the stat tells them that league wide, players are giving up too much to get steals so they'd probably do well to rein their worst offenders in even more.

ORBs are similar, a rebound is valuable but since more ORBs contribute little to winning, the coaches ought not encourage their guys to all hit the glass, unless the situation is desperate. Actually what they ought to tell their players is to either go all out to go for the rebound or get their butts back on defense - the worst thing a guy can do is stand around waiting to see who gets the rebound before he moves. Naturally, the worse a guys position is for rebounding the more he should tend to run back. I imagine the better coaches work with their players on just this sort of thing - Larry Brown probably has a six step decision tree for it.

At one time the broadcasters used to talk about which offensive player had 'backcourt responsibility' at various points in time. You scarcely ever hear that nowdays but that might well be because the announcers think the audience is not that sophisticated. Anyway, whoever had backcourt responsibility was not supposed to crash the boards.

One rather pervasive mystery to me is that most stats that have an own and opp version, the opp version has a higher correlation value than own. The items themselves appear to be perfectly symmetric so you'd expect quite similar CCs. Interestingly, OffEff and DefEff is the one pair that does not show this pattern.
 
Last edited:

Irish

Registered
Joined
Apr 11, 2008
Posts
2,668
Reaction score
0
Location
Arizona
I remeber the old joke, "The team that scores more than their opponents generally win." Really objoius, but does get obscured.

Sometimes stuff gets obscured by dropping context. For exampel, "Defense wins championships" is an obvious truth, yet every season there are several top defensive teams dropping into the lottery.

The more subtle problem is when a coach like Porter thinks that the "offensive is fine,k what is needed is better defense". 'The offensve wasn't fine because most teams give up more points early in the 24 seconds and tighten up later. When Porter stopped the team from taking early shots, the offense was harmed a lot.

Too many coaches still think that low scoring by oppontns is good defense, when it actually reflects a slower tempo. I think the stats confirm that.
 
Top