Yale School of Management

Research overview

My research develops data-driven models for healthcare delivery systems to help improve their efficiencies. My work blends together ideas from healthcare operations and modern statistical learning, and the specific problems I study are all driven by actual questions faced by physicians and policymakers.


Machine learning for healthcare queueing data

In an emergency department, how many beds are needed in an auxiliary ward to handle demand during peak times? What is the impact on societal welfare if a new transplant organ allocation policy is adopted? From the organ transplant candidate’s viewpoint, should he/she accept or decline a particular donor organ? These questions can be answered by modelling the emergency department and the organ transplant waitlist as queueing networks. The abundance of sojourn data captured by modern IT systems makes it possible to accurately specify the arrival and transition processes of such queues in a data-driven way. In current, on-going work we study some of the questions above using novel machinery we develop for inferring queueing dynamics.


A practical algorithm for hazard regression that generalizes all semiparametric models including the venerated Cox proportional- and Aalen additive-hazards.


Medical outcomes prediction and evaluation

Patients near the end of life are often better served by hospice care than by aggressive and unnecessary therapy, hence accurate predictions of survival are desirable for care planning. With the growth of Electronic Medical Records (EMR), machine learning techniques can now be applied to high frequency medical data to identify high risk patients. We develop such a tool for cancer patients, and find that it can improve quality of residual life while also saving unnecessary medical costs. Our team is currently adapting the method to create early warning systems for other areas of medicine as well.


Equally important to predicting future events is the study of past outcomes: Evidence-based medicine is only possible if sound causal inference methods exist for determining if the treatment and control groups differ in outcomes. However, the lack of information sometimes complicates things: For example, are needle exchange programs effective in checking the spread of HIV among drug users? Simply examining the proportion of returned needles that are infected may underestimate the population infection rate if uninfected needles are more likely to be brought in for exchange. Inspired by the ideas behind partial identification and robust optimization, we show that the outcome of interest can nonetheless be identified up to an interval. The upper or lower bound then provides efficient evaluation of the efficacy of a new medical treatment, sometimes reducing a one-sided p-value from 0.14 to 0.05.


  • Aronow and Lee (2013): Interval estimation of population means under unknown but bounded probabilities of sample selection. Biometrika 100:1:235-240.
  • Aronow, Green, Lee (2014): Sharp bounds on the variance in randomized experiments. Annals of Statistics 42:3:850-871.

Resolves a well known 90-year old problem posed by Neyman 1923 (of Neyman-Pearson fame). Our solution yields smaller p-values when testing for a difference in means among the treatment and control groups, sometimes reducing a one-sided p-value from 0.14 to 0.05.

  • Adelson, Lee, Velji et al. (2017): Development of Imminent Mortality Predictor for Advanced Cancer (IMPAC), a Tool to Predict Short-Term Mortality in Hospitalized Advanced Cancer Patients. Journal of Oncology Practice (forthcoming)


Financial incentives in healthcare

Healthcare spending in the U.S. continues to outstrip inflation, hence there is need for new payment models and more efficient practices to alleviate the financial pressure on payers, providers, and patients. One approach we consider is for dialysis providers to overbook a small amount of capacity to increase earnings without reducing patient access. In another study, we take the payer’s perspective and develop a data-driven pay-for-performance system for Medicare's dialysis program. We find that the system can potentially extend a patient’s lifespan by two weeks per year, cut Medicare expenditures, and also increase provider earnings.


Taking the consumer’s viewpoint, in recent work we examine how households can use Health Savings Accounts (HSA) to maximize tax efficiency. By using a personalized health cost evolution model we develop, we find that households can save anywhere from 1 to 27 percent in costs.


  • Lee and Zenios (2009): Optimal capacity overbooking for the regular treatment of chronic conditions. Operations Research 57:4:852-865.
  • Lee, Chertow, Zenios (2010): Re-exploring differences among for-profit and non-profit dialysis providers. Health Services Research 45:3:633-646.
  • Lee and Zenios (2012): An evidence-based incentive system for Medicare’s End-Stage Renal Disease Program. Management Science 58:6:1092-1105.
  • Lowsky, Lee, Zenios: Health Savings Accounts: Consumer contribution strategies & policy implications (available by request).