Subscribe to Blog via Email
Good Stats Bad Stats
Search Text
December 2024 S M T W T F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 -
Recent Posts
goodstatsbadstats.com
I watched the men’s downhill races yesterday. Matthias Mayer won the race. But as I watched the individual runs it was not at all clear that he would be expected to do so. The reason for this was in how the local (United States) network showed the split times as each racer moved down the course. My question then is there an alternative way to present the race that would give the viewing audience a better perspective on how their favorite racer is doing as he progresses through the course?
To be at bit clear on what happened let me focus on just three of the racers – Mattias Mayer, Bode Miller, and Christof Innerhofer. They finished in first, second and eighth place respectively. Mayer was the first of the three down the course. So when Miller and Innerhofer ran the course Mayer already had the best time.
Miller was the second of there to run the course. After the first split he held a lead of 0.27 seconds over Mayer. After the second spit he held a lead of 0.31 seconds over Mayer. These were the numbers that the media presented to their audience. Yet Miller lost the race by 0.52 seconds. The media made much of Miller’s “errors” after the second split time.
Innerhofer had a very similar experience to that of Miller. He led Mayer by 0.58 seconds. He still had a lead of 0.54 seconds at the second split. Yet he lost the race by 0.06 seconds.
The fallacy in the media presentation was in comparing the race times of Miller and Innerhofer to those of Mayer for each split. Reviewing the split times for the top ten finishers it becomes very clear that Mayer times were only about average for the first part of the course. He won because of how well he did on the second half of the course. At the first split Mayer was sixth of the ten top racers. At the second split he was also only sixth out of the top ten racers. At the third split he moved up to third place. It was only at the fourth split that he moved into the lead. Innerhofer was actually in first place at the first, second, and third splits.
As a consequence of this because of this both Miller and Innerhofer looked like they were doing much better than was actually true because their times for the first two splits were compared to Mayer more average times. Thus the surprise when the racers got the end of the course and had both lost to Mayer.
The analyst blamed Millers lose on his performance on the last part of the course. But that is where he made his errors. In truth Mayer made his errors, if you want to call them that, on the early part of the course. It is just that his errors were of less consequence. The analysts choose not to focus on that part of the race. Had they done so, they would have been saying that both Miller and Innerhofer outperformed Mayer on that part of the course.
This is really an issue of variance. Each racer performs differently on each section of the course. There is natural variation due to course conditions, difference in the course between racers depending on the tracks of the previous racers, changes in weather conditions as the race progresses, and where each racer makes his errors or does an outstanding job of racing. All of this variability is ignored when subsequent racers split times are compared only to those of the lead racer at that point in the race.
That only give an understanding of what happened in the race. The open question, and I do not offer a solution, is how to present the race result in real time so that the view has a better idea of how each racer is doing as he descends the course. That presentation also has to be clear to the viewing public who we should all assume does not understand these statistical nuances. I wonder if better understanding would occur if the splits were compared to the best times up to each split in the course rather than to the time of the best racer at that point.
The full results, including all the spit times for race can be found here.
I love it when the non-statisticians in the media catch a statistical error. I spotted this piece in the Washington Post by the guy who does the traffic reporting – Dr. Gridlock. The piece was actually on upcoming local hearings on the proposed fare increases for the Washington metro area Metro.
The problem was in a survey being conducted online by Metro. As Dr. Gridlock says
Some questions encourage respondents toward a certain view of the transit authority: “Metro’s six-year Metro Forward rebuilding program has reached its halfway point. Riders are already seeing improvements in overall reliability of trains and buses, as well as escalator availability, continued investments over the next three years will build on those improvements.”
That is a big no-no in survey questionnaire design. The preamble to any question should not put forth a particular position.
Thank you Dr. Gridlock, and please Metro don’t do that again.
The Prince George’s county Maryland school officials had a difficult decision to make. It seems that they managed to lose 500 tests that students took that are used as one of measures to determine who makes it into the county’s talented and gifted program. Unfortunately they decided on a biased method of dealing with the problem.
The tests that were lost were the Otis Lennon School Ability Test. They only lost the test for 500 students. They did not loose the test for all of the students. The administrators made the obvious decision to administer a retake of the test. However in the mean time they managed to figure out where they misplaced the first round of test.
They then decided that for those who took the test twice they would use the higher of the two grades to make the decision on who would make it into the talented and gifted program. And that created the biased selection method. The probability of making it into the program was greater for those who took the test twice. If they missed the cutoff the first time they had the opportunity to make it in the second round of testing.
Now an unbiased method would have been to toss the second round tests. But likely they would have a problem with parents who were already upset that their kids had to take the test a second time. Now the administrators did not have to deal with parents saying “but Johny passed the second time he took the test. And they were unlikely to have to answer to parents who would say “but Jane did not get a second chance at the test.”
How bad was the problem. There is no data at this point to know. I have not seen a press release saying that because of the error an extra x students made it into the program. Selecting students for a gifted and talented program is inexact in many ways. Plus there are those who would argue that the programs should not even exists.
It would be interesting to see the two sets of scores for the 500 students involved and how those scores affected the eventual decision on which students made it into the talented and gifted programs. It would give us one measure on the quality of the selection process.