On Stanford's COVID-19 Vaccination Algorithm
This past week, Stanford hospital administrators used an algorithm to decide who should be in the first group to receive a COVID-19 vaccine. It didn't go well.
The algorithm's output clearly didn't prioritize frontline workers, including the ~1,300 hospital residents who work closely with COVID-19 patients. Only about 7 of the 5,000 available first-round doses were slotted for residents, which is upsetting to say the least.
Administrators apologized on Friday and declared that they would revise their vaccination plan, but not before swiftly and squarely placing blame— on the algorithm.
On blaming the algorithm
Was this the algorithms's fault? Of course not.
Algorithms are designed, created, implemented, and tested by people. If algorithms aren't performing appropriately, responsibility lies with the people who made them.
As Cathy O'Neil writes, “These models are constructed not just from data but from the choices we make about which data to pay attention to—and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral. If we back away from them and treat mathematical models as a neutral and inevitable force, like the weather or the tides, we abdicate our responsibility." Blaming "the algorithm" is an unacceptable abdication of responsibility.
In the case of Stanford, the intent behind using an algorithm—"to develop an ethical and equitable process for the distribution of the vaccine"— made sense. Unfortunately, there were serious flaws with the project, and administrators must own responsibility.
Building successful algorithms takes a village
To design, build, and deploy a successful algorithm for a problem like vaccine prioritization, the steps a team would take might look something like this:
The process above is focused on the more technical aspects of algorithm design and implementation, and doesn't include hand-offs, necessary training or coaching, communication, or any of the other pieces necessary for turning an algorithm idea into a successful reality.
At every step, it's important to work with subject matter experts to gain the crucial context and buy-in needed for designing, creating, testing, and deploying a successful model. It's also important to test and reflect at each stage to make sure that expectations (continue to) align with reality. And it probably goes without saying, but the more transparency throughout the whole process, the better for building trust in the results and avoiding missteps along the way.
The algorithm is just one part of the larger vaccine prioritization project which also includes planning, design, communication, implementation, human checks, and testing and feedback along the way. Failure at any stage can mean failure of the entire project. It's easy to level blame at "the algorithm", but here it's a convenient synecdoche for the project of vaccine prioritization as a whole, which was carried out by people; to blame the algorithm is not to acknowledge all of the other mistakes that had to have taken place to arrive at this outcome.
Ironically, while the project was obviously a failure, I'm not sure that we even have enough information to know whether the actual algorithm here is faulty— that providing the "correct" inputs would have provided the "correct" outputs— because the design and implementation were so far off the mark.
What went wrong with Stanford's algorithm?
The largest and most damning error of the Stanford algorithm project comes from the final stage, the human check. Human-in-the-loop is becoming a more common component of algorithm implementation to verify algorithm outputs and provide an opportunity to halt automation if things don't look right. Stanford did have this human check in place, and still rolled out the algorithm's output, which (rightfully) has raised serious questions about the priorities of the administration.
While leadership is pointing to an error in the algorithm meant to ensure equity and justice, our understanding is this error was identified on Tuesday and a decision was made not to review the vaccine allocation scheme before its release today. We believe that to achieve the aim of justice, there is a human responsibility of oversight over such algorithms to ensure that the results are equitable. Negligence to act on the error once it was found is astounding.
- Letter from Stanford's Chief Resident Council to administration
While the human check should have been an opportunity to correct any errors, the final lack of action was not the only mistake in Stanford's process. Here are some other issues that I see with the process of designing, creating, testing, and deploying the Stanford algorithm:
The population inputted into the algorithm seems to have excluded nurses, therapists, janitors, food service workers, and other essential frontline staff. This is a huge error in design -- if these populations weren't considered up front, the algorithm was never designed to be applied to them.
It doesn't appear that a clear (or correct?) definition of success or test cases were implemented, given the number of frontline staff who didn't make the top 5,000 vaccine slots, and other seemingly strange prioritizations.
Not all data elements seem to exist equally for all people the algorithm was meant to be applied to. For example, residents don't have one set "location" (one of the seemingly-important inputs), which seems to have hugely impacted their scores.
The lack of available data elements seems to point to a lack of testing and communication at several stages. Assuming that *some value* was inputted for residents in the design (big assumption here), it clearly didn't translate, which points to a lack of testing when implemented.
The algorithm didn't deliver on previous pledges, pointing to large communication gaps (which manifest as design and testing issues).
How do we prevent this from happening?
In my field (data science), there is increased conversation about creating ethical AI, but (from what I can tell without access to the underlying algorithm) the Stanford algorithm isn't AI— or deep learning or even predictive. And yet it's still a glaring example of how algorithms can be used to perpetrate unethical outcomes. Layering predictions in just exacerbates and adds to the room for failure in the modeling process I outlined above.
So, what can we do?
Stanford's Chief Resident board asked for three things in their letter, which all really boil down to one thing: transparency. Transparency and communication throughout the whole project could have helped to catch mistakes sooner, and reverse them.
Testing also seems to be a huge hole here. Testing that the assumed data was actually available, testing that prioritizations made sense across spectrums like age and COVID-19 exposure, testing that outcomes matched the promises of administrators. Testing at all phases should have caught these errors.
Finally, we need to take the human responsibility in algorithms seriously. This includes the very human decisions that go into designing, planning, and executing projects centered on algorithms. For human-in-the-loop checks, this means providing training on what to look for, and what to do if things look wrong -- ideally with a way to "stop the line" and reverse course if needed.
Algorithms are already being used to determine COVID-19 vaccination prioritization at other hospitals (and more globally) and patient outcomes. It's important that the people who create them take the time and put in the work to get them right.