I might have mentioned that I got to do some world traveling for my work recently. Seeing rural Tanzania was an experience that I still don’t really have good words to describe. But this is not a post about that. This is a post about a sticky idea I got stuck on in some science fiction I was reading during my multi-day to and fro travel.
On my around-the-world-in-4.5-days journey, I read the Jewish feminist sci-fi novel He, She, and It by Marge Piercy. It’s got a classic hard AI theme, about a robot that is so, so human… I’d recommend it. But dilemmas of whether a robot can make a minyon in the reform tradition of 2059 has not stuck in my mind the way this one line about whales has: Keep reading →
Too bad for me, my first global health paper will have to be revised and resubmitted. In addition to some more substantive objections, the negative reviewer said “It is unclear what software was used to carry out the Bayesian estimation by MCMC. This is not possible in STATA and would be extremely difficult in the scripting language, Python.” It was difficult in Python! I doubt that any software would make it much easier, though.
Would you like to work with me applying computational algorithms to challenges in global health metrics? Then apply for the IHME post-graduate fellowship. Deadline is Feb 15.
(There is also a “pre-graduate” version, for those who have not started graduate school yet.)
I’m updating my CV, and that reminded me that I meant to promote this cool clustering technique that I was a little bit involved in, Clustering With Shallow Trees.
This goes way back to about half-way through my post-doc at MSR, when statistical physicist Riccardo Zecchina was visiting for a semester, and was teaching me about all of the “intractable” optimization problems that he can solve using his panoply of propagation algorithms. In particular, he was working on algorithms for certain types of steiner tree optimization, and he had discovered that adding an extra constraint on the depth of the tree didn’t make the problem harder. (All variants of the problem he considers are NP-hard, but some are NP-harder than others.) Keep reading →
I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.
Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.
Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.
As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut. Keep reading →
Wow, where does the day go? I spent all my non-meeting time debugging something. At least I fixed it before 5 PM.
The details of the problem are boring, but the whole ordeal could have been avoided if I had just followed the two rules of optimizing software in my Generic Disease Modeling System. What are they?
First Rule of Program Optimization: Don’t do it
Second Rule of Program Optimization (for experts only!): Don’t do it yet
Maybe next week I’ll get a second to write about the good kind of optimization; my statistical physics friends have posted an article on the arxiv which I am a co-author on, about an application of bounded-depth minimum spanning trees, Clustering with Shallow Trees.
This weekend marks the submission of my first “Global Health” paper. Congratulations to me! And many, many thanks to all the people who have worked with me to make it happen. I’ll go into details sometime in the future, first let me see how things go in the refereeing process.
While I was over-working on that business, I got an interesting Call-for-Papers forwarded from global health/AI researcher Emma Brunskill. The AAAI Spring Symposium on Artificial Intelligence for Development (AI-D) is an effort to build a community of people applying computer science and artificial intelligence in less-developed settings.
TCS people, don’t let the “AI” in their title turn you off. Eric Horvitz says that this is for all of us. Keep reading →
I don’t feel like having that post about how big things are brewing in US health care reform on the top of my blog anymore, so here is a quick replacement: a ranking paper that caught my eye recently on arxiv, where computer scientists is applied to politics: On Ranking Senators By Their Votes, by my fellow CMU alum, Mugizi Rwebangira (@rweba on twitter).
Whoops, I got busy again and didn’t have time to make new pictures of TFR vs HDI for Rif and Tanja, let alone fix the Bayes factor estimation code or implement the nested sampling version (which I think will be the cool way to estimate evidence). But coming soon: How MCMC is tying my new work in Health Metrics to my education in Operations Research. That will be in two weeks, at best.
Until then, here is some light reading to get ready for a big week of US healthcare reform debate: Get Sick, Get Out, a survey conducted by lawyers interested in catastrophic medical payments and their connection to housing forclosures. It’s 40 pages long, but it’s in legal-journal format, where they have like 10 words per page if you skip the footnotes. From the abstract:
Half of all respondents (49%) indicated that their foreclosure was caused in part by a medical problem, including illness or injuries (32%), unmanageable medical bills (23%), lost work due to a medical problem (27%), or caring for sick family members (14%).
I’m excited for the next week of healthcare reform debates. When my most jaded friends are forwarding me Moveon.org videos (and I’m listening to 4 minutes of recent REM), I know something unusual is going on.
I never took a statistics class, so I only know the kind of statistics you learn on the street. But now that I’m in global health research, I’ve been doing a lot of on-the-job learning. This post is about something I’ve been reading about recently, how to decide if a simple statistical model is sufficient or if the data demands a more complicated one. To keep the matter concrete (and controversial) I’ll focus on a claim from a recent paper in Nature that my colleague, Haidong Wang, choose for our IHME journal club last week: Advances in development reverse fertility declines. The title of this short letter boldly claims a causal link between total fertility rate (an instantaneous measure of how many babies a population is making) and the human development index (a composite measure of how “developed” a country is, on a scale of 0 to 1). Exhibit A in their case is the following figure:
An astute observer of this chart might ask, “what’s up with the scales on those axes?” But this post is not about the visual display of quantitative information. It is about deciding if the data has a piecewise linear relationship that Myrskyla et al claim, and doing it in a Bayesian framework with Python and PyMC. But let’s start with a figure where the axes have a familiar linear scale! Keep reading →