<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Good&#160;Stats &#160;&#160;&#160;  Bad&#160;Stats</title>
	<atom:link href="http://goodstatsbadstats.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://goodstatsbadstats.com</link>
	<description>goodstatsbadstats.com</description>
	<lastBuildDate>Wed, 15 May 2013 21:50:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Drinking and Driving &#8211; Let the Battle Begin</title>
		<link>http://goodstatsbadstats.com/?p=1456</link>
		<comments>http://goodstatsbadstats.com/?p=1456#comments</comments>
		<pubDate>Wed, 15 May 2013 21:50:12 +0000</pubDate>
		<dc:creator>Larry</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Telling the Full Story]]></category>

		<guid isPermaLink="false">http://goodstatsbadstats.com/?p=1456</guid>
		<description><![CDATA[The National Transportation Safety Board (NTSB) released a report yesterday titled: Reaching Zero: Actions to Eliminate Alcohol &#8211; Impaired Driving. There were numerous recommendations and a good bit of data on a number of actions that could be taken to reduce traffic fatalities related to alcohol consumption. One recommendation garnered a good deal of media [...]]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://www.ntsb.gov/">National Transportation Safety Board (NTSB)</a> released a report yesterday titled: <a href="http://www.ntsb.gov/doclib/reports/2013/SR1301.pdf">Reaching Zero: Actions to Eliminate Alcohol &#8211; Impaired Driving</a>. There were numerous recommendations and a good bit of data on a number of actions that could be taken to reduce traffic fatalities related to alcohol consumption. One recommendation garnered a good deal of media and industry attention. That was the recommendation to reduce the level of blood alcohol concentration (BAC) from 0.08 to 0.05.</p>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/05/Risk-Chart.jpg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/05/Risk-Chart-300x234.jpg" alt="Risk Chart" width="400" height="300" class="alignright size-medium wp-image-1457" /></a>There were two charts in the NTSB report that are most useful in understanding the current situation as it relates to drinking and driving. The first showed the risk of a fatality as a function of the BAC. That graphic is shown on the right. This graphic shows that the relative risk is quite high at the current 0.08 limit. In fact it is also quite high at a 0.07 level. The NTSB clearly feels that the a risk factor of 1.38 at the 0.05 level is high enough to justify classifying those driving at that level as impaired. </p>
<p>Most of the rest of the developed world seems to agree with the NTSB as <a href="http://chartsbin.com/view/2037">many countries</a> currently set the limit at 0.05. The <a href="http://www.who.int/gho/road_safety/legislation/alcohol_text/en/index.html">World Health Organization</a> is also in agreement with the 0.05 level. Clearly reasonable people have considered the facts and arrived at 0.05 as a reasonable level.</p>
<p>But has the NTSB made a good case for the proposed lowering of the limit? Certainly with close to a third of all traffic fatalities related the alcohol use the level is much too high and current methods to reduce that level are not effective. The current level of 10,000 deaths a year due to alcohol and driving is clearly unacceptable. We as a country would be outraged if 10,000 people a year were dying from any other preventable cause. Just consider the reaction any time there is an e-coli outbreak due to tainted items in the food chain.</p>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/05/Distribution.jpg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/05/Distribution-300x179.jpg" alt="Distribution" width="400" height="240" class="alignright size-medium wp-image-1462" /></a>The figure at the right, also taken from the NTSB report, shows the distribution of traffic fatalities by BAC level. This chart shows that about 1,000 fatalities can be attributed to situations where the BAC was 0.05 to 0.05.</p>
<p>A major weakness in the report is the failure to provide detailed numbers for that group of people. The breakdowns are almost universally 0.01 to 0.07, and levels above 0.07. The 0.05 to 0.07 level is the targeted group in the recommendation so more detailed data on that group needs to have have been included. The additional weakness in the analysis is any clear linkage to the number of fatalities that would be eliminated by lowering the level. No method is going to eliminate all fatalities in that group. Three will always be first offenders who cause fatalities. At the same time a reduction in the threshold BAC level would likely reduce fatalities in some of the other parts of the distribution. </p>
<p>Meanwhile the <a href="http://abionline.org/">American Beverage Institute</a> has <a href="http://abionline.org/restaurant-association-criticizes-ntsb-push-for-lower-legal-blood-alcohol-limit/">reacted vigorously</a> to the propose lowering of the BAC criterion. They have describe the proposal as ludicrous. That is a strong word and in my mind should not be used in the middle of a serious discussion of issues such as this. Never the less it does show their level of concern. Of course they do have a vested interest in protecting the profits of the alcohol and restaurant industries. Their first responsibility is not in protecting the lives of those on the road of this country.</p>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/05/aba-chart.jpg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/05/aba-chart-300x210.jpg" alt="aba chart" width="400" height="280" class="alignright size-medium wp-image-1468" /></a>Unfortunately the American Beverage Institute did not bring any data to bear on the discussion in support of their position. Surfing their web sites it becomes immediate clear that their main goal to protect the industry from anything that would impact the sale of alcoholic beverages. One of their <a href="http://thenewprohibition.com/bad-stats.php">web sites even focused</a> on &#8220;Bad Stats.&#8221; When they attack the statistics of their opponents they need to ensure that their own statistics are clean. They failed to do that with the graphic to the right taken from their <a href="http://negligentdriving.com/">negligent driving web site</a>. What stands out here is that a site devoted to the issue of negligent driving makes as its only home page issue the case that driving and driving is no longer the issue.</p>
<p>There are numerous problems with the chart. Start with the title: &#8220;Decline in Alcohol-Related Fatalities 1982-2011.&#8221; The chart shows only percentages. There are no counts. So the chart cannot show the actual decline in alcohol-related fatalities that has occured. By using their title they create in the chart the implication of an increase in the number of non alcohol-related fatalities in the same time period. The truth is that the two percentages should add to 100%. A decrease on one percentage will be offset by an increase in the other. Then they cut off the lower limit of that chart at 30% giving the mistaken impression that the alcohol-related fatalities is at a very low level when just about anyone should consider 30% to be unacceptable high.</p>
<p>A change to the legal BAC level, if it happens, will likely take years to implement. The arguments will bring forth a wealth of associated information and misinformation. I wait to see how the issue will evolve.</p>
]]></content:encoded>
			<wfw:commentRss>http://goodstatsbadstats.com/?feed=rss2&#038;p=1456</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Wealthy get Wealthier &#8211; Looking Deeper</title>
		<link>http://goodstatsbadstats.com/?p=1420</link>
		<comments>http://goodstatsbadstats.com/?p=1420#comments</comments>
		<pubDate>Sun, 28 Apr 2013 18:36:03 +0000</pubDate>
		<dc:creator>Larry</dc:creator>
				<category><![CDATA[Methodolgy Issues]]></category>

		<guid isPermaLink="false">http://goodstatsbadstats.com/?p=1420</guid>
		<description><![CDATA[After my last post I got to asking myself what should I see in the table in the Pew Research Center report on mean net worth. Specifically I was looking at the table on the right and asking what to expect for the percent change in mean net worth for those with a net worth [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/04/Pew-SIPP-Data.jpg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/04/Pew-SIPP-Data-300x169.jpg" alt="Pew - SIPP Data" width="300" height="169" class="alignright size-medium wp-image-1394" /></a>After my <a href="http://goodstatsbadstats.com/?p=1391">last post</a> I got to asking myself what should I see in the table in the <a href="http://www.pewsocialtrends.org/2013/04/23/a-rise-in-wealth-for-the-wealthydeclines-for-the-lower-93/">Pew Research Center report</a> on mean net worth. Specifically I was looking at the table on the right and asking what to expect for the percent change in mean net worth for those with a net worth above the $500,000 level given various scenarios on how income increased during the recovery from the recession.</p>
<p>This led me to the conclusion that the table itself is almost meaningless because the adjustment to bring the 2009 data up to current dollars in 2011 was been done incorrectly. In fact with the methodology used and the nature of the table the percent change in mean net worth column for all but the last cell can be expected to be in the range of -5% regardless of what is happening in the recovery. This problem could well have been my number one issue in the previous post had I asked the right questions.</p>
<p>The Pew report using the table above claims:</p>
<blockquote><p>The net worth of the nation’s households increased from 2009 to 2011, but the increase in wealth was far from widely distributed among households. The vast majority of the nation’s households experienced a decline in net worth.</p></blockquote>
<p>and then goes on to say:</p>
<blockquote><p>Households in all eight net worth categories from negative or zero to $250,000 to $499,999 of net worth experienced a decline in mean net worth from 2009 to 2011.</p></blockquote>
<p>Such assertions simply cannot be made from the table provided and neither can they be substantiated by other date in the Census Bureau SIPP tables.</p>
<p>In order to create the table the authors of the Pew report needed to adjust the 2009 data for inflation to reflect current dollars. What they did was to adjust upwards the means for each net worth cell in the 2009 data by about 5%. This was the amount inflation for the two year period. For example the Census Bureau reported a mean net worth of $73,458 for those with a net worth between $50,000 and $99,999. The Pew authors used the inflation adjustment to arrive at the $77,028 number in the table. Unfortunately the adjustment should have been made first on the cell limits. To do so they would have needed to compute the mean net worth in 2009 for those with a net worth between approximately $47,619 and $95,237. They would then adjust that value upwards by 5%. The $47,619 figure in 2009 is equivalent to $50,000 in 2011. To do this they would have to have used the mico-data file for the SIPP that the Census Bureau makes available. By failing to perform the first step the authors in effect make the adjustment twice. Failing to adjust the cell limit in approximate terms is very close to being equivalent to making the inflation adjustment.</p>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/04/graph1.jpeg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/04/graph1-300x213.jpeg" alt="graph1" width="300" height="213" class="alignright size-medium wp-image-1425" /></a><br />
All this becomes a bit easier to visualize if we look at an actual distribution. The same factors apply for income distributions as they do for the distribution of net worth. I&#8217;ll look at the income distribution here as that is easier for many to understand. The results apply just as well to a distribution of net worth. The figure to the right represents the 2011 income distribution in the United States. I generated it as a simulation of the US household income distribution for 2011. It is based on a gamma distribution. The actual distribution can be seen <a href="http://www.theglitteringeye.com/images/us-income-distribution.gif">here</a>. The simulation is close enough to illustrate my points. </p>
<p>Consider incomes for 2011 those between $90,000 and $110,000. I have marked that group as between the two vertical red lines on the chart. I can compute a mean income for the individuals in that range. The value is just under $100,000. Now I want to look at the 2009 income distribution and ask how it changed. Like the authors of the Pew Research Center report I need to adjust or inflation so that I am working in current dollars. Doing this correctly is the problem. If I made the adjustment as in the Pew report I would look at the 2009 income distribution between $90,000 and $110,000, compute the mean income and then inflate the computed mean by the inflation adjustment between 2009 and 2011. That would be about 5%.</p>
<p>That is the wrong adjustment. Real dollars in 2009 would be 5% less than the 2011 numbers. So the correct procedure, as I outlined above, is to look at individuals in 2009 with incomes between about $85,714 and $104,761. I would compute the mean income for those individuals and then make the inflation adjustment. </p>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/04/graph2.jpeg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/04/graph2-300x300.jpeg" alt="graph2" width="300" height="300" class="alignright size-medium wp-image-1430" /></a>With the correct adjustment they would have seen almost no changes in mean net worth for any but the top cell. Does this mean that they are only holding steady while those at the top are getting wealthier. No it does not. In any of the middle cells as income, or net worth, increases households move out of the cell into the next higher cell, some remain in the cell but their incomes increase. And new households enter the cell from a lower cell replacing those who have left. The mean and the median for those in the cell at any point in time remain about the same.</p>
<p>The figure on the right illustrates this point and is based of the previously mentioned figure. The figure shows under the red line the distribution of households with incomes between $90,000 and $110,000 at some point in time. If everyone&#8217;s income increased by 5% the blue line represent the distribution of incomes at the new point in time for those who now have incomes between $90,000 and $110,000, but who in 2009 had incomes between $85,714 and $104,761. The income distribution in figure two has been shifted to the right and is shown in blue in the new figure.</p>
<p>Visualize computing the mean income for the two distributions. Be careful as we are copying the mean on the x-axis, not the y-axis. There are more households in the distribution as the second point in time as illustrated by the higher curve, but the shape of the distribution is almost identical and thus the computed mean income will remain vertically unchanged. So unless there were major changes in the shape of the income distribution over the two year time span the actual measured change in mean income (an mean net worth if that was what I was looking at here) would remain almost constant. This is true for all of the bounded cells. What happens in the top cell is more complex and dependent on the a number of factors.</p>
<p>Because the expected change is close to zero when the authors of the Pew report made the inflation adjustment they way they did the virtually ensured that the estimated change in net worth for all of the bounded cells would be negative an approximately the size of the inflation adjustment. It is no surprise that their table shows (erroneous) that mean net worth declined by about 5% for all of the bounded cells.</p>
]]></content:encoded>
			<wfw:commentRss>http://goodstatsbadstats.com/?feed=rss2&#038;p=1420</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Wealthy get Wealthier</title>
		<link>http://goodstatsbadstats.com/?p=1391</link>
		<comments>http://goodstatsbadstats.com/?p=1391#comments</comments>
		<pubDate>Thu, 25 Apr 2013 19:10:00 +0000</pubDate>
		<dc:creator>Larry</dc:creator>
				<category><![CDATA[Methodolgy Issues]]></category>
		<category><![CDATA[Statistical Literacy]]></category>
		<category><![CDATA[Telling the Full Story]]></category>

		<guid isPermaLink="false">http://goodstatsbadstats.com/?p=1391</guid>
		<description><![CDATA[The headlines this week read &#8220;Richest are getting richer.&#8221; This was based on the report from the Pew Research Center which in turn used on data on wealth released by the Census Bureau from the Survey of Income and Program Participation(SIPP). The report from the Pew Research Center was titled: An Uneven Recovery, 2009-2011 &#8211; [...]]]></description>
				<content:encoded><![CDATA[<p>The headlines this week read &#8220;Richest are getting richer.&#8221; This was based on the report from the <a href="www.pewresearch.org">Pew Research Center</a> which in turn used on data on wealth released by the <a href="www.census.gov">Census Bureau</a> from the <a href="http://www.census.gov/people/wealth/data/dtables.html">Survey of Income and Program Participation(SIPP)</a>. The report from the Pew Research Center was titled:<br />
<a href=" http://www.pewsocialtrends.org/2013/04/23/a-rise-in-wealth-for-the-wealthydeclines-for-the-lower-93/">An Uneven Recovery, 2009-2011 &#8211; A Rise in Wealth for the Wealthy; Declines for the Lower 93%</a>. </p>
<p>Usually I like the work coming out of the Pew Research Center. They are generally very good with their analysis and good about describing how they do their work. But in this case they have needlessly complicated their analysis and in the process fell short on a couple of basic statistical principles. </p>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/04/Pew-SIPP-Data.jpg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/04/Pew-SIPP-Data-300x169.jpg" alt="Pew - SIPP Data" width="400" height="220" class="alignright size-medium wp-image-1394" /></a>The main headline for the report is that the top 7 percent of households saw their net worth increase by 28 percent between 2009 and 2011 while all other households was a decrease in net worth. However, the SIPP data provided no information on wealth for the top 7 percent of households.  The data only gave information for the top 13 percent of households. The Pew Research Center authors had to make a number assumptions and calculation to get to the 7 percent number. The PEW version of the SIPP data table is shown to the right.</p>
<p><strong>Issue 1: Use of the mean instead of the median.</strong></p>
<p>In most discussion of income it is preferable to use the median rather than the mean as unusually large but rare income level changes can translate into large shift in the value of the mean while the median is immune to such effects. Outlier observations are prone to distort the value of the mean. This is especially true when working the upper tail of the income distribution. In this case the mean and the median told two very different stories. The authors saw this and said in their report: </p>
<blockquote><p>Even though households with net worth of $500,000 or above saw their mean net worth increase from 2009 to 2011, this group’s median net worth decreased during the same period—to $836,033 in 2011 from $889,275 in 2009.</p></blockquote>
<p><a href="http://goodstatsbadstats.com/wp-content/uploads/2013/04/means-and-medians.jpg"><img src="http://goodstatsbadstats.com/wp-content/uploads/2013/04/means-and-medians-300x116.jpg" alt="means and medians" width="400" height="150" class="alignright size-medium wp-image-1399" /></a>But they then chose to ignore the issue. I have inserted an Excel table on the right showing the data by wealth level for both the means and the medians from the SIPP data. It is imperative that the difference between the two measures be dealt with in the discussions of the issues raised in the report. Otherwise the authors leave themselves open to criticism that they chose the measure beneficial to the point they wished to make. In this analysis the criticism is given greater creditability by the length the authors went to so they could focus on the top 7 percent of households rather than on the top 13 percent of households as described in the SIPP. This enabled them to talk about a smaller proportion of the household and to show a greater differential gain in net worth (28 vs 21 percent).</p>
<p><strong>Issue 2: Ignoring sampling error.</strong></p>
<p>The report barely mentions the issue of sampling error saying only: </p>
<blockquote><p>&#8220;The estimates are based on responses from a sample of the population and may differ from the actual values because of sampling variability and other factors.&#8221; </p></blockquote>
<p>The Census Bureau provided measure of sampling error for both mean and median net wealth estimates. Of most concern here is the large standard error on the 2011 estimate of mean net worth. The 90 percent confidence interval extends from $1,543,460 to $2,298,452. The fact that this confidence interval overlaps the 2009 estimate of net wealth is of significant concern. This does not necessarily mean that a classical test of the difference between the 2009 and the 2011 estimates would fail to demonstrate an increase in mean net worth as the sample for the same year are based on the SIPP longitudinal panel using mostly the same respondents. However it does raise questions about the use of mean.</p>
<p><strong>Issue 3: Computing the new worth for the top 93 percent of households.</strong></p>
<p>It is in the computation of the new worth for the top 7 percent of households that questions on the methodology come to the forefront. The Census Bureau provides no data that can be use to make direct estimate so the authors of the Pew study made a number of claims and assumptions. They start by claiming: </p>
<blockquote><p>&#8220;A simultaneous rise in the mean and decline in the median implies that aggregate net worth increased only among households above the median—that is, the 8 million households with net worth of $836,033 or more in 2011.&#8221;</p></blockquote>
<p>This is just plain wrong. There is any number of scenarios under which the aggregate net worth of those under the median can increase while the median remains unchanged or declines. Complicating this analysis is that there are two median in play for those with net worth over $500,000 &#8211; the 2009 median of $889,275 and the 2011 median of $836,033. The authors choose to focus on the 2011 median of $836,033. But this means that since the number of households with a net worth over $500,000 was essentially the same at both points in time there are almost certainly more households in the range of $500,000 to $836,033 in 2011 that there were in 2009. And thus the 2011 aggregate net worth for this group could easily be larger in 2011 than it was in 2009.</p>
<p>The report goes on to say: </p>
<blockquote><p>&#8220;Those upper 7% of households had an estimated aggregate wealth gain of 28% from 2009 to 2011, while the estimated aggregate wealth of households in the $500,000 to $836,033 range fell by 4%.&#8221;</p></blockquote>
<p>The methodology used to get the 28% figure is described in a footnote on page 8 of the report. Since the actual data is not available they authors made the assumption the the distribution of wealth was uniform across the interval from $500,000 to 889,275 in 2009 and similarly assumed that the distribution was uniform across the interval form $500,000 to $836,033 in 2011. But this is the tail of the income distribution. An assumption of a uniform distribution is clearly wrong if one downloads the <a href="http://thedataweb.rm.census.gov/ftp/sipp_ftp.html">SIPP micro-data file</a> for either year and looks at a simple histogram. </p>
<p>Such obvious problems should not be found in a report from of research organization of the caliber of the Pew Research Center.</p>
<p><strong>Some parting thoughts.</strong></p>
<p>I have tried not to say I disagree with the overall conclusions the authors make. I do not think the data presented here supports that conclusion. Their logic on what is happening the nations economy makes sense and it such the it seems very likely that with the recovery of the stock market the wealthy have gained disproportionately. I must ask did they also loss disproportionately when the stock market fell? An analysis/report such as this in my mind does more to damage the credibility of the case the authors are making than to support their claims.</p>
]]></content:encoded>
			<wfw:commentRss>http://goodstatsbadstats.com/?feed=rss2&#038;p=1391</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
