<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Healthy Algorithms</title>
	<atom:link href="http://healthyalgorithms.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://healthyalgorithms.wordpress.com</link>
	<description>A blog about algorithms, combinatorics, and optimization applications in global health informatics.</description>
	<lastBuildDate>Fri, 16 Oct 2009 19:18:45 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='healthyalgorithms.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/67874012201ff9ae53597ea09aa85611?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>Healthy Algorithms</title>
		<link>http://healthyalgorithms.wordpress.com</link>
	</image>
			<item>
		<title>Dense-Subset Break-the-Bank Challenge</title>
		<link>http://healthyalgorithms.wordpress.com/2009/10/16/dense-subset-break-the-bank-challenge/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/10/16/dense-subset-break-the-bank-challenge/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 19:18:45 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[TCS]]></category>
		<category><![CDATA[combinatorial optimization]]></category>
		<category><![CDATA[cryptography]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[networkx]]></category>
		<category><![CDATA[aco]]></category>
		<category><![CDATA[dense subset problem]]></category>
		<category><![CDATA[finance]]></category>
		<category><![CDATA[challenge]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=696</guid>
		<description><![CDATA[I&#8217;m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I&#8217;ll like, and I want to mention it briefly before I fly.
Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=696&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I&#8217;ll like, and I want to mention it briefly before I fly.</p>
<p><a href="http://www.cs.princeton.edu/~rongge/derivative.pdf">Computational Complexity and Information Asymmetry in Financial Products</a> is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!</p>
<p><a href="http://www.freedom-to-tinker.com/blog/appel/intractability-financial-derivatives">Freedom to Tinker</a> has a nice summary of this paper, if you want to know what it&#8217;s about in a hurry.</p>
<p><a href="http://mat.tepper.cmu.edu/blog/?p=906">Mike Trick</a> makes the salient observation that NP-hard doesn&#8217;t mean computers can&#8217;t do it.  But the assumption that this paper is based on is not about worst-case complexity;  it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.</p>
<p>As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem.  My plan is to repeat the problem here, and share some Python code for generating instances of it.  Then, you, me, and everyone, can have a handy instance to try optimizing.  I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut.<span id="more-696"></span></p>
<p>First, the <strong>Densest subgraph problem</strong> (bottom of p. 5):  </p>
<blockquote><p>Fix <em>M</em>; <em>N</em>; <em>D</em>; <em>m</em>; <em>n</em>; and <em>d</em> to be some parameters. The (average case, decision) densest subgraph problem with these parameters is to distinguish between the following two distributions <em>P</em> and <em>D</em> on (<em>M</em>;<em>N</em>;<em>D</em>) graphs, where <em>R</em> is obtained by choosing for every top vertex <em>D</em> random neighbors on the bottom; and <em>P</em> is obtained by first choosing random hidden subsets <em>S</em> from [<em>N</em>] and <em>T</em> from [<em>M</em>] with |<em>S</em>| = <em>n</em> and |<em>T</em>| = <em>m</em>, and then choosing <em>D</em> random neighbors for every vertex outside of <em>T</em>, and <em>D</em>-<em>d</em> random neighbors for every vertex in <em>T</em>. We then choose <em>d</em> random additional neighbors in <em>S</em> for every vertex in <em>T</em>.
</p></blockquote>
<p>Then the <strong>Densest subgraph assumption</strong> (middle of p. 6) is:</p>
<blockquote><p>Let (<em>N</em>; <em>M</em>; <em>D</em>; <em>n</em>; <em>m</em>; <em>d</em>) be such that <em>N</em> = o(<em>MD</em>), <img src='http://s1.wordpress.com/latex.php?latex=%28m+d%5E2%2Fn%29%5E2+%3D+o%28MD%5E2%2FN%29&#038;bg=ffffff&#038;fg=444444&#038;s=0' alt='(m d^2/n)^2 = o(MD^2/N)' title='(m d^2/n)^2 = o(MD^2/N)' class='latex' />, then there is no <img src='http://s2.wordpress.com/latex.php?latex=%5Cepsilon+%3E+0&#038;bg=ffffff&#038;fg=444444&#038;s=0' alt='\epsilon &gt; 0' title='\epsilon &gt; 0' class='latex' /> and poly-time algorithm that distinguishes between R and P with advantage <img src='http://s3.wordpress.com/latex.php?latex=%5Cepsilon&#038;bg=ffffff&#038;fg=444444&#038;s=0' alt='\epsilon' title='\epsilon' class='latex' />.</p></blockquote>
<p>Or, to say the same thing in Python, with a little help from networkx:</p>
<pre class="brush: python;">
import random
from networkx import Graph

def planted_dense_subgraph(M=1000, N=1000, D=500, m=25, n=25, d=15):
    &quot;&quot;&quot; Generate a bipartite graph with a planted dense subgraph
    (distribution P)

    Parameters
    ----------
    M, N, D, m, n, d : int, optional
      M and N are the sizes of the bipartitions and m and n are the
      size of the planted node sets.  D is the degree of the M-vertices
      and d is the number of edges from an m-vertex to n-vertices

    Output
    ------
    G : Graph
      A bipartite graph, with vertices T_1, ..., T_M and B_1, ..., B_M
    T_hidden, B_hidden : lists
      The vertex sets of size m and n that are hidden in the T and B
      vertices
    &quot;&quot;&quot;

    T = ['T_%d'%i for i in range(M)]
    B = ['B_%d'%i for i in range(N)]

    T_hidden = random.sample(T, m)
    B_hidden = random.sample(B, n)

    G = Graph()
    G.add_nodes_from(T)
    G.add_nodes_from(B)

    for t in T:
        if t in T_hidden:
            G.add_star([t] + random.sample(B, D-d))
            G.add_star([t] + random.sample(B_hidden, d))
        else:
            G.add_star([t] + random.sample(B, D))

    return G, T_hidden, B_hidden

def random_graph(M=1000, N=1000, D=500):
    &quot;&quot;&quot; Generate a bipartite graph without a planted dense subgraph
    (distribution R)

    Parameters
    ----------
    M, N, D : int, optional

      M and N are the sizes of the bipartitions and D is the degree of
      the M-vertices

    Output
    ------
    G : Graph
      A bipartite graph, with vertices T_1, ..., T_M and B_1, ..., B_M
    &quot;&quot;&quot;

    T = ['T_%d'%i for i in range(M)]
    B = ['B_%d'%i for i in range(N)]

    G = Graph()
    G.add_nodes_from(T)
    G.add_nodes_from(B)

    for t in T:
        G.add_star([t] + random.sample(B, D))

    return G
</pre>
<p>If I give you the graph produced by of one of these functions, you can&#8217;t tell me which function I used with any more accuracy than if you flip a coin to decide.</p>
<p>As the authors say, this is <em>an assumption</em>.  It could be proven false by a clever algorithm tomorrow.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/696/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/696/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/696/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=696&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/10/16/dense-subset-break-the-bank-challenge/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>
	</item>
		<item>
		<title>The Two Rules of Program Optimization</title>
		<link>http://healthyalgorithms.wordpress.com/2009/10/10/the-two-rules-of/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/10/10/the-two-rules-of/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 00:40:08 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[software engineering]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=687</guid>
		<description><![CDATA[Wow, where does the day go?  I spent all my non-meeting time debugging something.  At least I fixed it before 5 PM.
The details of the problem are boring, but the whole ordeal could have been avoided if I had just followed the two rules of optimizing software in my Generic Disease Modeling System. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=687&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Wow, where does the day go?  I spent all my non-meeting time debugging something.  At least I fixed it before 5 PM.</p>
<p>The details of the problem are boring, but the whole ordeal could have been avoided if I had just followed the two rules of optimizing software in my Generic Disease Modeling System.  What are they?</p>
<ul>
<ol>
First Rule of Program Optimization: Don&#8217;t do it</ol>
<ol>
Second Rule of Program Optimization (for experts only!):  Don&#8217;t do it yet</ol>
</ul>
<p>Maybe next week I&#8217;ll get a second to write about the good kind of optimization;  my statistical physics friends have posted an article on the arxiv which I am a co-author on, about an application of <a href="http://healthyalgorithms.wordpress.com/2008/10/28/minimum-spanning-trees-of-bounded-depth-random/">bounded-depth minimum spanning trees</a>, <a href="http://arxiv.org/abs/0910.0767v1">Clustering with Shallow Trees</a>.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/687/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/687/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/687/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=687&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/10/10/the-two-rules-of/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>
	</item>
		<item>
		<title>Conference you should know about</title>
		<link>http://healthyalgorithms.wordpress.com/2009/10/05/conference-you-should-know-about/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/10/05/conference-you-should-know-about/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 18:30:26 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[TCS]]></category>
		<category><![CDATA[global health]]></category>
		<category><![CDATA[ai4d]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[ict4d]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=678</guid>
		<description><![CDATA[This weekend marks the submission of my first &#8220;Global Health&#8221; paper.  Congratulations to me!  And many, many thanks to all the people who have worked with me to make it happen.  I&#8217;ll go into details sometime in the future, first let me see how things go in the refereeing process.

While I was [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=678&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>This weekend marks the submission of my first &#8220;Global Health&#8221; paper.  Congratulations to me!  And many, many thanks to all the people who have worked with me to make it happen.  I&#8217;ll go into details sometime in the future, first let me see how things go in the refereeing process.</p>
<p><a href="http://ai-d.org/"><img class="aligncenter size-full wp-image-680" src="http://healthyalgorithms.files.wordpress.com/2009/10/ai-d.png?w=413&#038;h=319" alt="" width="413" height="319" /></a></p>
<p>While I was over-working on that business, I got an interesting Call-for-Papers forwarded from global health/AI researcher Emma Brunskill. The <a href="http://ai-d.org/cfp.html">AAAI Spring Symposium on Artificial Intelligence for Development (AI-D)</a> is an effort to build a community of people applying computer science and artificial intelligence in less-developed settings.</p>
<p>TCS people, don&#8217;t let the &#8220;AI&#8221; in their title turn you off.  Eric Horvitz says that this is for all of us. <span id="more-678"></span></p>
<p>From the <a href="http://ai-d.org/cfp.html">CfP</a>:</p>
<blockquote><p>There has been great interest in information and communication technology for development (ICT-D) over the last several years. The work is diverse and extends from information technologies that provide infrastructure for micropayments to techniques for monitoring and enhancing the cultivation of crops. While efforts in ICT-D have been interdisciplinary, ICT-D has largely overlooked opportunities for harnessing machine learning and reasoning to create new kinds of services, and to serve a role in analyses of data that may provide insights about socioeconomic development for disadvantaged populations. The unprecedented volume of data currently being generated in the developing world on human health, movement, communication, and financial transactions provides new opportunities for applying machine learning methods to development efforts, however. Our aim is to foster the creation of a subfield of ICT-D, which we refer to as artificial intelligence for development (AI-D), to harness these opportunities.</p></blockquote>
<p>It&#8217;s great to see Computer Science trying to address the social issues of our time.</p>
<p><a href="http://www.flickr.com/photos/30686429@N07/sets/72157622330082619/show/with/3953914015/"><img src="http://healthyalgorithms.files.wordpress.com/2009/10/nerd-picket.jpg?w=500&#038;h=244" alt="" title="" width="500" height="244" class="aligncenter size-full wp-image-684" /></a></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/678/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/678/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/678/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/678/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/678/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/678/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/678/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/678/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/678/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/678/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=678&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/10/05/conference-you-should-know-about/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/10/ai-d.png" medium="image" />

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/10/nerd-picket.jpg" medium="image" />
	</item>
		<item>
		<title>Top</title>
		<link>http://healthyalgorithms.wordpress.com/2009/09/17/top/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/09/17/top/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 22:51:04 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[arxiv]]></category>
		<category><![CDATA[political science]]></category>
		<category><![CDATA[TCS]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=667</guid>
		<description><![CDATA[I don&#8217;t feel like having that post about how big things are brewing in US health care reform on the top of my blog anymore, so here is a quick replacement:  a ranking paper that caught my eye recently on arxiv, where computer scientists is applied to politics:  On Ranking Senators By Their [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=667&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I don&#8217;t feel like having that post about how big things are brewing in US health care reform on the top of my blog anymore, so here is a quick replacement:  a ranking paper that caught my eye recently on arxiv, where computer scientists is applied to politics:  <a href="http://arxiv.org/abs/0909.1418">On Ranking Senators By Their Votes</a>, by my fellow CMU alum, Mugizi Rwebangira (<a href="http://twitter.com/rweba">@rweba</a> on twitter).</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/667/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=667&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/09/17/top/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>
	</item>
		<item>
		<title>Holiday reading</title>
		<link>http://healthyalgorithms.wordpress.com/2009/09/05/holiday-reading/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/09/05/holiday-reading/#comments</comments>
		<pubDate>Sat, 05 Sep 2009 00:39:28 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[global health]]></category>
		<category><![CDATA[videos]]></category>
		<category><![CDATA[foreclosures]]></category>
		<category><![CDATA[healthcare]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=662</guid>
		<description><![CDATA[Whoops, I got busy again and didn&#8217;t have time to make new pictures of TFR vs HDI for Rif and Tanja, let alone fix the Bayes factor estimation code or implement the nested sampling version (which I think will be the cool way to estimate evidence).  But coming soon: How MCMC is tying my [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=662&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Whoops, I got busy again and didn&#8217;t have time to make new pictures of <a href="http://healthyalgorithms.wordpress.com/2009/08/25/mcmc-in-python-pymc-for-bayesian-model-selection/">TFR vs HDI</a> for Rif and Tanja, let alone fix the Bayes factor estimation code or implement the nested sampling version (which I think will be the <em>cool</em> way to estimate evidence).  But coming soon: How MCMC is tying my new work in Health Metrics to my education in Operations Research.  That will be in two weeks, at best.</p>
<p>Until then, here is some light reading to get ready for a big week of US healthcare reform debate:  <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1416947">Get Sick, Get Out</a>, a survey conducted by lawyers interested in catastrophic medical payments and their connection to housing forclosures.  It&#8217;s 40 pages long, but it&#8217;s in legal-journal format, where they have like 10 words per page if you skip the footnotes.  From the abstract:</p>
<blockquote><p>Half of all respondents (49%) indicated that their foreclosure was caused in part by a medical problem, including illness or injuries (32%), unmanageable medical bills (23%), lost work due to a medical problem (27%), or caring for sick family members (14%).</p></blockquote>
<p>I&#8217;m excited for the next week of healthcare reform debates.  When my most jaded friends are forwarding me Moveon.org videos (and I&#8217;m listening to 4 minutes of recent REM), I know something unusual is going on.</p>
<p><span style="text-align:center; display: block;"><a href="http://healthyalgorithms.wordpress.com/2009/09/05/holiday-reading/"><img src="http://img.youtube.com/vi/8GoFj8Fc9iM/2.jpg" alt="" /></a></span></p>
<p>Happy labor day weekend!</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/662/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/662/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/662/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=662&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/09/05/holiday-reading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>

		<media:content url="http://img.youtube.com/vi/8GoFj8Fc9iM/2.jpg" medium="image" />
	</item>
		<item>
		<title>MCMC in Python: PyMC for Bayesian Model Selection</title>
		<link>http://healthyalgorithms.wordpress.com/2009/08/25/mcmc-in-python-pymc-for-bayesian-model-selection/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/08/25/mcmc-in-python-pymc-for-bayesian-model-selection/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 00:07:28 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[MCMC]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[human development index]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[pymc]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[total fertility rate]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=602</guid>
		<description><![CDATA[(Updated 9/2/2009)
I never took a statistics class, so I only know the kind of statistics you learn on the street.  But now that I&#8217;m in global health research, I&#8217;ve been doing a lot of on-the-job learning.  This post is about something I&#8217;ve been reading about recently, how to decide if a simple statistical [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=602&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><span style="color:#ff0000;">(Updated 9/2/2009)</span></p>
<p>I never took a statistics class, so I only know the kind of statistics you learn on the street.  But now that I&#8217;m in global health research, I&#8217;ve been doing a lot of on-the-job learning.  This post is about something I&#8217;ve been reading about recently, how to decide if a simple statistical model is sufficient or if the data demands a more complicated one.  To keep the matter concrete (and controversial) I&#8217;ll focus on a claim from a recent paper in Nature that my colleague, Haidong Wang, choose for our IHME journal club last week:  <a href="http://www.nature.com/nature/journal/v460/n7256/full/nature08230.html">Advances in development reverse fertility declines</a>.  The title of this short letter boldly claims a causal link between total fertility rate (an instantaneous measure of how many babies a population is making) and the human development index (a composite measure of how &#8220;developed&#8221; a country is, on a scale of 0 to 1).  Exhibit A in their case is the following figure:</p>
<p><a href="http://www.nature.com/nature/journal/v460/n7256/fig_tab/nature08230_F1.html"><img class="aligncenter size-full wp-image-605" src="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr.png?w=330&#038;h=331" alt="" width="330" height="331" /></a></p>
<p>An astute observer of this chart might ask, &#8220;what&#8217;s up with the scales on those axes?&#8221;  But this post is not about the visual display of quantitative information.  It is about deciding if the data has a piecewise linear relationship that Myrskyla et al claim, and doing it in a Bayesian framework with Python and PyMC.  But let&#8217;s start with a figure where the axes have a familiar linear scale!<span id="more-602"></span></p>
<p><a href="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr2.png"><img class="aligncenter size-full wp-image-610" src="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr2.png?w=329&#038;h=300" alt="" width="329" height="300" /></a></p>
<p>I was able to produce this alternative plot without working much because the authors and the journal made their dataset easily available on the web as a csv file.  Thanks, all!  In fact, they have provided more data than 1975 and 2005.  Here is everything from the years in between as well:</p>
<p><a href="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr31.png"><img class="aligncenter size-medium wp-image-622" src="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr31.png?w=300&#038;h=273" alt="" width="300" height="273" /></a></p>
<p>Since we discussed this in journal club, I can now repeat my fellow post-docs&#8217; observations and sound smart.  In fact, I already did with that bit about the axes being on a non-linear (and non-logarithmic) scale.  The letter claims that there is a breakpoint around human development index .86, and the data is piecewise linear with decreasing slope below this value, and increasing above.  The next observation that is not mine originally, is that the data also looks like a quadratic would fit it pretty well.</p>
<p>To be slightly more formal, here are two alternative models for the association between HDI and TFR:</p>
<p><img src='http://s1.wordpress.com/latex.php?latex=M_1%3A+TFR_i+%5Csim+%5Cbeta_0+%2B+%5Cbeta_1+HDI_i+%2B+%5Cbeta_2+HDI_i%5E2+%2B+N%280%2C+%5Csigma%5E2%29&#038;bg=ffffff&#038;fg=444444&#038;s=0' alt='M_1: TFR_i \sim \beta_0 + \beta_1 HDI_i + \beta_2 HDI_i^2 + N(0, \sigma^2)' title='M_1: TFR_i \sim \beta_0 + \beta_1 HDI_i + \beta_2 HDI_i^2 + N(0, \sigma^2)' class='latex' /></p>
<p>and</p>
<p><img src='http://s2.wordpress.com/latex.php?latex=M_2%3A+%5Cbegin%7Bcases%7DTFR_i+%5Csim+%5Cbeta_0+%2B+%5Cbeta_1+%28HDI_i-.86%29%2C+%5Ctext%7Bif+%7D+HDI_i+%5Cleq+.86%3B%5C+%5Cbeta_0+%2B+%5Cbeta_2+%28HDI_i-.86%29%2C+%5Ctext%7Botherwise%7D%3B%5Cend%7Bcases%7D+%2B+N%280%2C+%5Csigma%5E2%29.&#038;bg=ffffff&#038;fg=444444&#038;s=0' alt='M_2: \begin{cases}TFR_i \sim \beta_0 + \beta_1 (HDI_i-.86), \text{if } HDI_i \leq .86;\ \beta_0 + \beta_2 (HDI_i-.86), \text{otherwise};\end{cases} + N(0, \sigma^2).' title='M_2: \begin{cases}TFR_i \sim \beta_0 + \beta_1 (HDI_i-.86), \text{if } HDI_i \leq .86;\ \beta_0 + \beta_2 (HDI_i-.86), \text{otherwise};\end{cases} + N(0, \sigma^2).' class='latex' /></p>
<p>How should I decide between these two models?  Well, I&#8217;m open to suggestions, but I learned about one way when I read a citation classic by UW statistician Adrian Raftery and co-author about <a href="ftp://stat-ftp.berkeley.edu/pub/users/binyu/212A/papers/Kass_1995.pdf">Bayes Factors</a>.  They describe the ratio <img src='http://s3.wordpress.com/latex.php?latex=K+%3D+%5CPr%5Bdata+%7C+M_2%5D+%2F+%5CPr%5Bdata+%7C+M_1%5D&#038;bg=ffffff&#038;fg=444444&#038;s=0' alt='K = \Pr[data | M_2] / \Pr[data | M_1]' title='K = \Pr[data | M_2] / \Pr[data | M_1]' class='latex' />.  If <em>K</em> is more than 3.2, this constitutes &#8220;substantial&#8221; evidence that model 2 is superior.</p>
<p>Since I&#8217;ve titled this post in the form of a tutorial, I&#8217;m now going to go through calculating the Bayes factor with MCMC in Python, which turns out to be a slightly challenging computation (but easy to code; thanks, PyMC!).</p>
<p>To set up the models in the fully Bayesian way, we need some priors, which I&#8217;ve made up below in a convenient, reusable form:</p>
<pre class="brush: python;">
class linear:
    def __init__(self, X, Y, order=1, mu_beta=None, sigma_beta=1., mu_sigma=1.):
        if mu_beta == None:
            mu_beta = zeros(order+1)

        self.beta = Normal('beta', mu_beta, sigma_beta**-2)
        self.sigma = Gamma('standard error', 1., 1./mu_sigma)

        @potential
        def data_potential(beta=self.beta, sigma=self.sigma,
                           X=X, Y=Y):
            mu = self.predict(beta, X)
            return normal_like(Y, mu, 1 / sigma**2)
        self.data_potential = data_potential

    def predict(self, beta, X):
        return polyval(beta, X)

    def logp(self):
        logp = []
        for beta_val, sigma_val in zip(self.beta.trace(), self.sigma.trace()):
            self.beta.value = beta_val
            self.sigma.value = sigma_val
            logp.append(Model(self).logp)
        return array(logp)
</pre>
<p>and</p>
<pre class="brush: python;">
class piecewise_linear:
    def __init__(self, X, Y, breakpoint=.86,
                 mu_beta=[0., 0., 0.], sigma_beta=1., mu_sigma=1.):
        self.breakpoint=breakpoint
        self.beta = Normal('beta', mu_beta, sigma_beta**-2)
        self.sigma = Gamma('standard error', 1., 1./mu_sigma)

        @potential
        def data_potential(beta=self.beta, sigma=self.sigma,
                           X=X, Y=Y, breakpoint=self.breakpoint):
            mu = self.predict(beta, breakpoint, X)
            return normal_like(Y, mu, 1 / sigma**2)
        self.data_potential = data_potential

    def predict(self, beta, breakpoint, X):
            very_high_dev_indicator = X &amp;amp;gt;= breakpoint
            mu = (beta[0] + beta[1]*(X-breakpoint)) * (1 - very_high_dev_indicator)
            mu += (beta[0] + beta[2]*(X-breakpoint)) * very_high_dev_indicator
            return mu

    def logp(self, beta_val=None, sigma_val=None, breakpoint_val=None):
        for beta_val, sigma_val in zip(self.beta.trace(), self.sigma.trace()):
            self.beta.value = beta_val
            self.sigma.value = sigma_val
            logp.append(Model(self).logp)
    return array(logp)
</pre>
<p>Then to try to calculate the Bayes factor, I can draw samples from the posterior distribution of each model, and look at the harmonic mean of their posterior liklihoods.</p>
<pre class="brush: python;">
def bayes_factor(m1, m2, iter=1e6, burn=25000, thin=10, verbose=0):
    MCMC(m1).sample(iter, burn, thin, verbose=verbose)
    logp1 = m1.logp()

    MCMC(m2).sample(iter, burn, thin, verbose=verbose)
    logp2 = m2.logp()

    mu_logp = mean(logp2)
    K = exp(pymc.flib.logsum(-logp1) - log(len(logp1))
            - (pymc.flib.logsum(-logp2) - log(len(logp2))))

    return K
</pre>
<p>Unfortunately, it seems to take a prohibitively large number of samples to get the same average twice.</p>
<p>To make it all run, I&#8217;ve made a little module to load the data, but I won&#8217;t bore you with the details;  it&#8217;s <a href="http://github.com/aflaxman/pymc_bayes_factor/tree/master">online here</a> if you want to play around with it yourself.</p>
<pre class="brush: python;">
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; import data
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; m1=models.linear(X=data.hdi, Y=data.tfr, order=2)
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; m2=models.piecewise_linear(X=data.hdi, Y=data.tfr, mu_beta=[1,-10,1])
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; model_selection.bayes_factor(m1, m2, iter=1e7)
</pre>
<p>According to the Bayes factor, the piecewise linear model is (to be filled in soon) better than the quadratic model.  Or, more quantitatively, the observed data is (tba) times more likely under model 2 than model 1.  Cool!</p>
<p>As a side-effect, this yields an alternative way to ask if the association on the &#8220;high development&#8221; piece of the piecewise model is really positive:</p>
<pre class="brush: python;">
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; m2 = models.piecewise_linear(X=data.hdi, Y=data.tfr, breakpoint=.86, mu_beta=[1,-8,1], sigma_beta=1., mu_sigma=.1)
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; MCMC(m2).sample(iter=1000*1000+20000, thin=1000, burn=20000, verbose=1)
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; m2.beta.stats()['95% HPD interval']
array([[ 1.91520875,  2.01669476],
       [-9.84036014, -9.48869047],
       [-3.85410676, -1.23869569]])
&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; m2.beta.stats()['mean']
array([ 1.96932651, -9.66349812, -2.5683748 ])
</pre>
<p>The research questions for the computer scientist in me are:  did I draw enough samples to get a correct answer? and did I really need to draw that many?</p>
<p>The tentative answers are no and no!  See the comments for leads on more efficient schemes.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/602/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/602/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/602/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/602/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/602/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/602/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/602/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/602/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/602/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/602/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=602&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/08/25/mcmc-in-python-pymc-for-bayesian-model-selection/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr.png" medium="image" />

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr2.png" medium="image" />

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/08/hdi_v_tfr31.png?w=300" medium="image" />
	</item>
		<item>
		<title>August is Too-Many-Projects Month</title>
		<link>http://healthyalgorithms.wordpress.com/2009/08/14/august-is-too-many-projects-month/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/08/14/august-is-too-many-projects-month/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 18:20:58 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[TCS]]></category>
		<category><![CDATA[combinatorial optimization]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[networkx]]></category>
		<category><![CDATA[personalized page rank]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=596</guid>
		<description><![CDATA[(Tap&#8230; tap&#8230; tap&#8230; is this thing on?  Good.)
July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don&#8217;t like biking on the side of a highway all day.  Don&#8217;t worry, I escaped in Coos Bay and took trains and buses between Eugene, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=596&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>(Tap&#8230; tap&#8230; tap&#8230; is this thing on?  Good.)</p>
<p>July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don&#8217;t like biking on the side of a highway all day.  Don&#8217;t worry, I escaped in Coos Bay and took trains and buses between Eugene, Santa Cruz, Berkeley, and SF for a vacation more my speed.</p>
<p>But now that I&#8217;m back, August is turning out to be project month.  I have 3 great TCS applications to global health in the pipeline, and I have big plans to tell you about them soon.  But one mixed blessing about these applications is that people actually want to see the results, like, yesterday!  So first I have to deal with the results, and then I can write papers and blogs about the techniques.</p>
<p>Since Project Month is a little over-booked with projects, I&#8217;m going to have to triage one today.  You&#8217;ve heard of the NetFlix Challenge, right?  Well, github.com is running a <a href="http://contest.github.com/">smaller scale recommendation contest</a>, and I was messing around with personal page rank, which seems like a fine approach for recommending code repositories to hackers.  I haven&#8217;t got it working very well (best results, 15% of holdout set recovered), but I was having fun with it.  Maybe someone else will take it up, let me know if you get it to work;  networkx + data = good times.</p>
<pre class="brush: python;">
    f = open('download/data.txt')
    for l in f:
        u_id, r_id = l.strip().split(':')
        G.add_edge(user(u_id), repo(r_id))
</pre>
<p>[<a href="http://github.com/aflaxman/ppr-github-contest/tree/master">get the code</a>]</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/596/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/596/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/596/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/596/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/596/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/596/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/596/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/596/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/596/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/596/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=596&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/08/14/august-is-too-many-projects-month/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>
	</item>
		<item>
		<title>ID Modeling Summer School</title>
		<link>http://healthyalgorithms.wordpress.com/2009/06/25/id-modeling-summer-school/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/06/25/id-modeling-summer-school/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 20:04:11 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[education]]></category>
		<category><![CDATA[global health]]></category>
		<category><![CDATA[infectious disease]]></category>
		<category><![CDATA[pet peeves]]></category>
		<category><![CDATA[sample binomial]]></category>
		<category><![CDATA[summer school]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[wolfram alpha]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=589</guid>
		<description><![CDATA[I&#8217;ve been spending the week at the Infectious Disease Modeling Summer School here at UW.  It&#8217;s very interesting, and good for me to learn more about how people in my new field think (especially people in my new field, outside of my little institute&#8230;)
I&#8217;ve discovered a pet peeve during this week of presentations, though. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=589&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><a href="http://depts.washington.edu/sismid09"><img src="http://healthyalgorithms.files.wordpress.com/2009/06/sph1.png?w=105&#038;h=74" alt="" title="" width="105" height="74" class="alignright size-full wp-image-591" /></a>I&#8217;ve been spending the week at the <a href="http://depts.washington.edu/sismid09/">Infectious Disease Modeling Summer School</a> here at UW.  It&#8217;s very interesting, and good for me to learn more about how people in my new field think (especially people in my new field, <em>outside</em> of my little institute&#8230;)</p>
<p>I&#8217;ve discovered a pet peeve during this week of presentations, though.  I&#8217;ve seen a lot of numerical examples where the numbers work out perfectly&#8230; a little too perfectly.  If you split 1000 people into an experimental and control group by choosing a random subset of 500, fine.  But if you look within that group to see how many have a trait that occurs independently with probability 0.2, you do not often find exactly 100 in group A and 100 in group B.  I think a little more complexity in the numbers makes the example <em>easier</em> to understand.</p>
<p>I&#8217;m sure that you, my loyal reader, can generate random numbers from a multitude of distributions, if you wanted to spend the time.  But if you&#8217;re busy, busy, busy, then you can have wolfram alpha do all the work.  It actually comes through for that one:  &#8220;<a href="http://www83.wolframalpha.com/input/?i=sample+Binomial(500%2C+.2)">sample Binomial(500, .2)</a>&#8220;.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/589/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/589/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/589/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=589&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/06/25/id-modeling-summer-school/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/06/sph1.png" medium="image" />
	</item>
		<item>
		<title>US Health Care Costs, cont.</title>
		<link>http://healthyalgorithms.wordpress.com/2009/06/24/us-health-care-costs-cont/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/06/24/us-health-care-costs-cont/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 02:52:26 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[Mysteries]]></category>
		<category><![CDATA[global health]]></category>
		<category><![CDATA[costs]]></category>
		<category><![CDATA[end-of-life care]]></category>
		<category><![CDATA[gawande]]></category>
		<category><![CDATA[medicare]]></category>
		<category><![CDATA[public health]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=574</guid>
		<description><![CDATA[I wrote two months ago about the mysterious differences in health care costs that I found so intriguing in a talk by Jonathan Skinner.  (That was two months ago?  Really?)  Since then, the surgeon/author Atul Gawande has brought the mystery to the national stage.  In a long story for the New [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=574&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I wrote <a href="http://healthyalgorithms.wordpress.com/2009/04/18/mysterious-question-differences-in-health-care-costs/">two months ago</a> about the mysterious differences in health care costs that I found so intriguing in a talk by Jonathan Skinner.  (That was <em>two months</em> ago?  Really?)  Since then, the surgeon/author Atul Gawande has brought the mystery to the national stage.  In <a href="http://www.newyorker.com/reporting/2009/06/01/090601fa_fact_gawande">a long story for the New Yorker</a>, he gave the non-technical version of Skinner&#8217;s talk, and <a href="http://www.newyorker.com/online/blogs/newsdesk/2009/06/atul-gawande-the-cost-conundrum-redux.html">today he addressed some of the feedback</a> that this article has received over the last month.</p>
<p>His short answer to the mystery is this:</p>
<blockquote><p>Analysis of Medicare data by the Dartmouth Atlas project shows the difference is due to marked differences in the amount of care ordered for patients—patients in McAllen receive vastly more diagnostic tests, hospital admissions, operations, specialist visits, and home nursing care than in El Paso.</p></blockquote>
<p>But that is not the end of the story.  It only takes a sentence to explain the &#8220;proximal&#8221; cause of these cost differences, but it takes the whole article for Gawande to do justice to his theory on the underlying cause, and his is certainly not the only theory.</p>
<p>Since his theory of the root cause of this inequality is centered on physicians putting profit over patients, it has made some doctors uneasy.  Greg Roth, a physician that I work with hadn&#8217;t had time to read the article when we last chatted, but he did attend Skinner&#8217;s talk with me two months ago.  Greg told me about a detail that has emerged as doctors put Gawande&#8217;s article under their microscopes:  we might be making a mountain out of molehill-sized mystery.</p>
<p>Look at this plot, which shows the complementary cumulative distribution function for the primary quantity in Gawande&#8217;s article, <a href="http://www.dartmouthatlas.org/data/download.shtm">Total Medicare reimbursements per enrollee for 2006</a>.</p>
<p><a href="http://healthyalgorithms.files.wordpress.com/2009/06/reimb_dist.png"><img src="http://healthyalgorithms.files.wordpress.com/2009/06/reimb_dist.png?w=500&#038;h=320" alt="" title="" width="500" height="320" class="aligncenter size-full wp-image-578" /></a></p>
<p>Investigative reporter have to get the story, and raking the muck way out in the tail of this distribution turned out to be a good bet this time.  But McAllen is 6 standard deviations above the mean (not to imply that this distribution is normal&#8230;  should it be?)  How much impact would it have, for the whole population, if the outliers were greatly improved?</p>
<p>If through anti-fraud policing, better culture, and general hard work, the top 10% of hospitals reduced their cost per patient to the national average, that would reduce the average cost by 3.6%.  Outliers show what is possible, but making a big change involves more than outliers.</p>
<p><a href="http://healthyalgorithms.files.wordpress.com/2009/06/reimb_dist_hypo.png"><img src="http://healthyalgorithms.files.wordpress.com/2009/06/reimb_dist_hypo.png?w=500&#038;h=516" alt="" title="" width="500" height="516" class="aligncenter size-full wp-image-581" /></a></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/574/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/574/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/574/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/574/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/574/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/574/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/574/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/574/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/574/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/574/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=574&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/06/24/us-health-care-costs-cont/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/06/reimb_dist.png" medium="image" />

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/06/reimb_dist_hypo.png" medium="image" />
	</item>
		<item>
		<title>Population Health in Iran</title>
		<link>http://healthyalgorithms.wordpress.com/2009/06/18/population-health-in-iran/</link>
		<comments>http://healthyalgorithms.wordpress.com/2009/06/18/population-health-in-iran/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 16:13:22 +0000</pubDate>
		<dc:creator>Abraham Flaxman</dc:creator>
				<category><![CDATA[global health]]></category>
		<category><![CDATA[gbd]]></category>

		<guid isPermaLink="false">http://healthyalgorithms.wordpress.com/?p=567</guid>
		<description><![CDATA[The political situation in Iran has been in the news and on the nets a lot this week.  I hope that the friends and families of all my Iranian colleagues are safe.  I&#8217;m thinking of you.

I thought Obama&#8217;s statement was pretty astute, when he said, 
It’s not productive, given the history of US-Iranian [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=567&subd=healthyalgorithms&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The political situation in Iran has been in the news and on the nets a lot this week.  I hope that the friends and families of all my Iranian colleagues are safe.  I&#8217;m thinking of you.</p>
<p><a href="http://www.flickr.com/photos/fhashemi/3636326778/in/set-72157619758530748"><img src="http://healthyalgorithms.files.wordpress.com/2009/06/3636326778_7cbf8e2313.jpg?w=500&#038;h=333" alt="" title="" width="500" height="333" class="alignleft size-full wp-image-568" /></a><span id="more-567"></span></p>
<p>I thought Obama&#8217;s statement was pretty astute, when he said, </p>
<blockquote><p>It’s not productive, given the history of US-Iranian relations, to be seen as meddling, the US President meddling in Iranian elections.</p></blockquote>
<p>I remember learning from some Iranian computer scientist that his impression of the popular narrative of how the Shah returned to power in the 1950s differed <a href="http://en.wikipedia.org/wiki/1953_Iranian_coup_d%27%C3%A9tat">from my impression</a> in some critical details.</p>
<p>By coincidence, this week also sees the preliminary publication of the <a href="http://www.pophealthmetrics.com/content/7/1/9">Iran National Burden of Disease 2003 Study</a>.  This is a national-sized study  of the big project that I have been working on for the last many months.  I plan to write more about the global version soon, including especially the challenges for theoretical computer science that you might want to be working on.</p>
<p><a href="http://www.pophealthmetrics.com/content/pdf/1478-7954-7-9.pdf"><img src="http://healthyalgorithms.files.wordpress.com/2009/06/nbd-dalys-iran.png?w=500&#038;h=267" alt="" title="" width="500" height="267" class="aligncenter size-full wp-image-569" /></a></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/healthyalgorithms.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/healthyalgorithms.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/healthyalgorithms.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/healthyalgorithms.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/healthyalgorithms.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/healthyalgorithms.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/healthyalgorithms.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/healthyalgorithms.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/healthyalgorithms.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/healthyalgorithms.wordpress.com/567/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=healthyalgorithms.wordpress.com&blog=4781973&post=567&subd=healthyalgorithms&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://healthyalgorithms.wordpress.com/2009/06/18/population-health-in-iran/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">Abraham Flaxman</media:title>
		</media:content>

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/06/3636326778_7cbf8e2313.jpg" medium="image" />

		<media:content url="http://healthyalgorithms.files.wordpress.com/2009/06/nbd-dalys-iran.png" medium="image" />
	</item>
	</channel>
</rss>