For the past 2 years, Carbon Health’s Hands-Free AI Charting experience has freed up our providers to have more human and connected conversations with their patients, while maintaining the highest standards in medical documentation. In the time since releasing AI Charting, we have shipped features that simplify medical documentation, reduce coding and billing errors, and automate unique workflows like Workers’ Compensation visits, follow-ups, and referrals.
The next frontier feature we wanted to tackle for our AI-enabled CarbyOS EHR was around real-time Clinical Decision Support Systems (CDSS). When surfaced in a friendly, suggestive format, CDSS has the power to further empower providers with the knowledge and insights they need to deliver care. To frame the CDSS problem further, we focused on 3 key deliverables for this experience:
Identifying the relevant medications, lab tests, and procedures for each appointment is uniquely suited to AI. Generative AI models have progressed amazingly well over the past couple of years, capable of extracting maximum insights from both unstructured patient-provider conversations and structured clinical data already present in the CarbyOS EHR. Furthermore, we can leverage our existing AI infrastructure for AI Charting, centered around real-time transcription and external model provider APIs, towards providing a first-in-class provider experience.
We designed a workflow for CDSS to ensure accurate and reliable outputs. It revolves around 4 main steps, and is run continuously throughout the appointment to fetch the freshest context for the best suggestions:
This screenshot shows CDSS suggestions for a patient who came in with a respiratory infection. Orders suggested via CDSS appear in the Orders section under the heading Suggested by Carby. These suggestions can then be accepted via the Add to Orders button, edited via the ✏️ icon, or skipped via the 🚫 icon.
One of the first insights we found while working on the recommendation step in the CDSS framework was within the self-supervised fine-tuning process. In short, fine-tuning involves training an already pretrained large language model (LLM) on examples of input-output pairs that you want the model to generate. Self-supervision means we do not explicitly label or specify an order to predict after the appointment, but rather use our historical appointments data to see what was actually discussed and ordered during the appointment by the provider themself. We developed an annotated transcript of the conversation between the patient and provider from historical data. The transcript was annotated with prior orders at the exact time they were placed in relation to the conversation. This annotation helped the LLM learn the dependencies between actual conversation and orders so that it could predict future orders at the right points in the appointment. For example, a single transcript with 4 actual orders could be used to generate 4 examples, each one with progressively more content (including prior orders) to predict the next order in the sequence.
Another insight was related to how we prepared examples for the training process. Sometimes, the annotated transcript examples we prepared would contain several consecutive orders placed at roughly the same time. To improve the LLM’s ability to generalize to different scenarios, we decided to augment our example set by reordering these consecutive orders. For example, if drug A, lab test B, and procedure C were ordered in a certain appointment, we created additional synthetic examples containing other sequences like B, C, A or C, A, B. This reordering would allow us to create many more examples effectively for free and further allow the LLM to learn the relationships among different orders. We observed improvements in training performance with this new strategy.
Finally, we also prepared examples where no order was placed. Although there already exists intent detection as a “gatekeeper” for requesting recommendations, these “negative” examples would also generalize the LLM by instructing it about scenarios where a suggestion is not necessary. In fact, around 20% of recommendations generated by the LLMs are to recommend no order at all. With this, we wanted to inject a sense of judiciousness and prudence into the LLM so that it generates recommendations conservatively and only when it is “convinced” a suggestion is necessary given the context of the appointment.
Delivering CDSS suggestions to providers in the CarbyOS EHR follows a relatively straightforward technical path. However, in shipping this feature to help our providers, we encountered several challenges to optimizing the experience for providers. One of them was related to cost optimization.
To provide a seamless experience to providers during appointments, CDSS is run continuously throughout an appointment to generate the freshest suggestions regardless of the direction the conversation takes. When we initially released CDSS to a limited number of clinics, we found that the CDSS workflow was running too often, often once every 10 seconds. You might think a lot of meaningful information is shared in 10 seconds, but we found through analysis that this cadence provided negligible incremental value while requiring a substantially higher number of model endpoints for inference. In fact, it made sense from both a technical and financial perspective to run the workflow less frequently. As we tuned the lag between recommendations, we eventually settled on 60 seconds to optimize the right balance between being cost-effective and providing timely and relevant suggestions. As a result of this change, we reduced our endpoint costs by 90% and the average number of endpoints used by 60%.
To date, CDSS has been used by over 500 providers in over 100,000 appointments. We are currently working on some new updates that will improve the provider experience:
Stay tuned!