Yale School of Management

Research Overview

My research uses statistical machine learning ideas to improve the operations of complex systems. Drawing on actual questions faced by the medical profession, I develop data-driven tools for managing healthcare delivery that also have applications to other areas like fintech and marketing.


Machine learning for healthcare queueing data

In an emergency department, how many beds are needed in an auxiliary ward to handle demand during peak times? What is the impact on societal welfare if a new transplant organ allocation policy is adopted? From the organ transplant candidate’s viewpoint, should he/she accept or decline a particular donor organ? These questions can be answered by modelling the emergency department and the organ transplant waitlist as queueing networks. The abundance of sojourn data captured by modern IT systems makes it possible to accurately specify the arrival and transition processes of such queues in a data-driven way. In current, on-going work we study some of the questions above using novel machinery we develop for inferring queueing dynamics.


A practical algorithm for hazard regression that generalizes all semiparametric models including the venerated Cox proportional- and Aalen additive-hazards


Medical outcomes prediction and evaluation

Patients near the end of life are often better served by hospice care than by aggressive and unnecessary therapy, hence accurate predictions of survival are desirable for care planning. With the growth of Electronic Medical Records (EMR), machine learning techniques can now be applied to high frequency medical data to identify high risk patients. We develop such a tool for cancer patients, and find that it can improve quality of residual life while also saving unnecessary medical costs. Our team is currently adapting the method to create early warning systems for other areas of medicine as well.


Equally important to predicting future events is the study of past outcomes: Evidence-based medicine is only possible if sound causal inference methods exist for determining if the treatment and control groups differ in outcomes. However, the lack of information sometimes complicates things: For example, are needle exchange programs effective in checking the spread of HIV among drug users? Simply examining the proportion of returned needles that are infected may underestimate the population infection rate if uninfected needles are more likely to be brought in for exchange. Inspired by the ideas behind partial identification and robust optimization, we show that the outcome of interest can nonetheless be identified up to an interval. The upper or lower bound then provides efficient evaluation of the efficacy of a new medical treatment, sometimes reducing a two-sided p-value from 0.08 to 0.05.


Resolves a well known 90-year old statistical practice used in randomized experiments proposed by Neyman 1923 (of Neyman-Pearson fame). The solution yields smaller p-values when testing for a difference in means among the treatment and control groups, sometimes reducing a two-sided p-value from 0.08 to 0.05


Financial incentives in healthcare

Healthcare spending in the U.S. continues to outstrip inflation, hence there is need for new payment models and more efficient practices to alleviate the financial pressure on payers, providers, and patients. One approach we consider is for dialysis providers to overbook a small amount of capacity to increase earnings without reducing patient access. In another study, we take the payer’s perspective and develop a data-driven pay-for-performance system for Medicare's dialysis program. We find that the system can potentially extend a patient’s lifespan by two weeks per year, cut Medicare expenditures, and also increase provider earnings.


Taking the consumer’s viewpoint, in recent work we examine how households can use Health Savings Accounts (HSA) to maximize tax efficiency. By using a personalized health cost evolution model we develop, we find that households can save anywhere from 1 to 27 percent in costs.