Interview: Sarah Jabbour, University of Michigan, on AI in healthcare

We recently caught up with Sarah Jabbour, PhD candidate in Computer Science and Engineering at the University of Michigan, to talk about her research on the challenges and opportunities of machine learning, human-AI interaction, and models for diagnosis in healthcare.

We asked Sarah to introduce herself, and to share with us a little bit about her journey to date, and how she came to be conducting this research on AI in healthcare.

“I’m in my fourth year of my PhD in computer science at the University of Michigan advised by Professors Jenna Wiens and David Fouhey. I was originally a business student, and during my undergrad I started to learn about AI through my business classes,” Sarah explained. “I thought it was very cool and that I’d enjoy learning about it, so I made the switch to computer science. Then I wanted to keep learning and I wanted to contribute to knowledge in AI, so that’s why I decided to pursue a PhD.”

One of Sarah’s advisers, Jenna, taught her about machine learning as an undergraduate and happened to have an undergrad research position available in her lab, so Sarah joined and ended up staying for her PhD.

“Jenna’s lab, Machine Learning for Data-Driven Decisions, is an AI lab that focuses on machine learning problems that are inspired by healthcare,” she said. “The lab works on many different areas in health – I work in respiratory failure with chest X-rays, and I have lab mates who work on problems related to sepsis prediction, or patient deterioration, or trying to predict if a patient acquires an infection while they’re in the hospital. There are many different facets in the lab.”

At the time that Sarah joined, she had told Jenna that she enjoyed working with images and had recently undertaken a computer vision course taught by David Fouhey. Jenna told her that she had a collaborator with a chest X-ray data set that they didn’t know what to do with yet, offering Sarah the opportunity to look into it.

“I haven’t stopped since,” said Sarah. “That was May 2019. What I love about research is that you try to solve a problem and then 10 other problems pop up. That’s how we ended up here, building on that work.”

Researching bias in AI

“My work started by focusing on developing AI models that could predict the diagnosis of a patient based on their chest X-rays and their electronic health record data. We found that the models were doing really well, which is very exciting, but we always try to dig into what the model is actually basing its decisions on,” she explained. 

“We found that when the model looked at the chest X-rays, it would focus on areas that our clinical collaborators were telling us were not relevant to the diagnosis, like the text in the corner of an image, for example. We learned that models could pick up on spurious correlations in the data, such as a correlation between text in the image and a diagnosis of pneumonia, which can lead to biased predictions.”

This spurred Sarah on to take her research in that particular direction: looking into bias.

“We started by developing computational methods to try to mitigate this bias; but those don’t work 100 percent. Then we thought that perhaps clinicians at the bedside could be another mitigating factor for this bias – maybe when we give the clinician the model’s prediction, they could look at the AI’s explanation and identify that something is wrong. That’s how we started moving into the human-AI interaction space.”

Sarah undertook a clinical study around this topic, which asked how diagnostic accuracy is impacted when clinicians are provided with AI models and image-based AI model explanations, along with how those explanations can help clinicians when they are shown biased AI model predictions. It was published in JAMA journal in December. 

“An example of the bias would be that oftentimes for heart disease, women are underdiagnosed compared to men,” Sarah noted. “When men are having heart attacks, their symptoms present differently than women, so woman are often underdiagnosed because we are not paying attention to the right symptoms. That is reflected in the data sets that we use to train AI models. If we have more men with heart disease in the data set, then the model will learn to associate being male with having heart disease from their chest X-ray, and the model thinks that is what heart disease looks like in general. It could then fail in a population where this correlation doesn’t hold.”

Learnings and implications of the study

“When I built this study, I was really hopeful that the explanations would work,” Sarah said, “but we found that they didn’t. What we tested was the most extreme case – we would have the model pay attention to something completely irrelevant, something no doctor would pay attention to on the chest X-ray, because we thought the doctors in our study would pick it up. We found that this wasn’t the case. So what does that mean?”

The study led to a lot of learnings and raised a lot of implications, Sarah noted. “There’s plenty of discussion in terms of policy and regulation around AI. One of the things being discussed is that if AI is bad, we can use AI explanations to mitigate this. But that approach hasn’t been fully tested. There needs to be a lot of thought around what this regulation actually means; because some explanations, like we have shown, might not work. What does this mean downstream? Who would be liable for the outcomes of the clinician plus the model if the explanation didn’t work?”

Another learning Sarah raised is that there are a lot more things to think about in terms of deploying a model and showing it to clinicians. “They might need more medical AI training – maybe if they knew more about bias and how it can show up, they could be on the lookout for bias in the outputs of our model,” she said. “We might need better explanations, because the outcome of the study doesn’t mean that explanations never work, it just means that the specific explanations we tested didn’t work.”

In addition, Sarah shared, the study showed that when the model had reasonably good accuracy, it did help clinicians to make better diagnostic decisions. “There’s definitely a lot of potential. The outcome of this paper that AI could be biased and hurt clinicians shouldn’t deter people from continuing to work on this problem. We still showed that there is potential to help clinicians in making these decisions, so I think there’s a lot of very interesting outcomes of the study that I’m excited to build on and I hope other people build on as well.” 

AI for diagnosis 

Sarah also shared with us some of her insights from a recent project looking at using AI for diagnosis. 

“We built an AI model that predicts the likelihood of that a patient’s respiratory failure is due to pneumonia, heart failure and/or COPD,” she said. “Our model did reasonably well, and then we compared it to a randomly selected physician in our data set. What we showed was that the model did just as well as the randomly selected physician, but with significantly less data. The physicians who labelled the data had access to the entirety of the patient chart – any prior hospitalisation, prior diagnoses, prior treatments, responses to treatments. The model only had access to one chest X-ray at the time of respiratory failure, and a limited set of vitals and labs, and it did just as well. That was really exciting, because often clinicians have a hard time synthesising all the different data types that they have to consider when making these diagnoses, and the model is able to do it with significantly less information.”

The hope here is that the output of the model could provide a summarisation of sorts for the clinician, that they can consider alongside the other sources. 

There is definitely information missing, Sarah acknowledged, that the clinician can provide. “The model doesn’t see the physical presentation of the patient – what they are saying, how they are feeling. It doesn’t see the radiology report, either. So rather than seeing it as a complete information source, you could see this as the model synthesising the information it does have, to help guide the clinician towards a diagnosis.” 

What are you most excited about in the realm of AI and ML in health?

“I think over the last year to 18 months, with the release of the large language models like ChatGPT and the push to understand how well they do compared to doctors, there’s a lot more talk around the idea of using AI in different areas including health. What’s exciting for me is having more people excited about using AI for health,” Sarah stated. “We can only solve problems when we’re paying attention to them.” 

“There used to be a lot more scepticism around wanting a human doctor and rejecting the technology, I think, with fears that the AI model might not be very good. Those are real worries, but there’s also a lot of excitement around if we put in the work and the effort. If more people work on this idea, then we can slowly chip away at this problem and figure out how to actually implement models safely into healthcare.”

Sarah continued: “We’ve figured out a lot in terms of the model building and I’m excited about how we get the human to interact with the model, because the model itself is useless if nobody is using it. If we have a better idea of what the clinicians actually wants and how can we get them to use the model efficiently, then hopefully we can start to see some results in terms of better patient outcomes. There are very real potential benefits of using AI in healthcare, like getting patients the treatments they need quicker and reducing healthcare costs. I think we are getting to the point where we can say, ‘OK, how do we actually get this into the hospital?'”

Staying on the topic of translating research into practice, we asked Sarah what observations she had made about the symbiosis between research and practice relating to AI in health.

“I think it definitely depends on where you are,” Sarah considered. “For example, I’m doing my research at the University of Michigan – we have a large research hospital that is very willing to collaborate with us and put in the time to build the infrastructure to test these models in the hospital. But that is a big pain point for a lot of hospitals, where they don’t have the infrastructure to test new AI models.

“Jenna, my advisor, has done tremendous amounts of work to build these relationships between engineering and the hospital; but that needs to happen in a lot more places. It can’t just be the biggest research universities and the biggest hospitals, because then we’re only serving a small portion of the patient population that we could be helping.

I think there’s a lot of room for improvement and unanswered questions about how we go from hospital A to hospital B, when hospital B doesn’t have the infrastructure. I think infrastructure is definitely a big barrier right now, so if we could solve that and give ourselves the ability to test these models in a much larger patient population, we could learn so much more about the models.” 

Finally, we asked Sarah to look to the future. What would she choose to study or research, if she was given free rein and an unlimited budget?

“Right now I’m really interested in understanding how models like this can work across varying patient populations,” Sarah shared. “One thing we struggle with in AI for health, is that we have limited data compared to other areas in AI. For example, ChatGPT was trained on billions of web pages. We don’t have all of the internet when we’re building AI models for health, so with unlimited funds I would survey clinicians across the world, I would test this model on patients across the world, and just get a better understanding of where it’s working and where it is not.

“We have a very good idea of how it works at Michigan Medicine, and we also did an external validation on data from BIDMC using a public dataset called MIMIC; but there’s so much data out there. I would love to have access to significantly more data, information from clinicians about what they want and how they use the model, and I would do what we’re doing now but on a much larger scale.”

We’d like to thank Sarah for taking the time out to talk to us, and for sharing her insights on current research in AI for health.

In another of our recent interviews, we spoke with John Klepper, co-founder and CEO of PIPRA (Pre-Interventional Preventive Risk Assessment), a Zurich-based medtech company that has developed its first product; an AI-based surgical tool to assess a patients’ risk of suffering from postoperative delirium.

We also chatted with Ricardo Baptista Leite, CEO at HealthAI, global agency for responsible AI and health; founder and president of the UNITE Parliamentarians Network for Global Health; national spokesperson for health in Portugal’s Social Democratic Party; and guest lecturer at NOVA Medical School in Lisbon.