Delivering Real-time Clinical Orders using AI

The Carbon Health AI/ML Team
September 9, 2025

Introduction

For the past 2 years, Carbon Health’s Hands-Free AI Charting experience has freed up our providers to have more human and connected conversations with their patients, while maintaining the highest standards in medical documentation. In the time since releasing AI Charting, we have shipped features that simplify medical documentation, reduce coding and billing errors, and automate unique workflows like Workers’ Compensation visits, follow-ups, and referrals. 

The next frontier feature we wanted to tackle for our AI-enabled CarbyOS EHR was around real-time Clinical Decision Support Systems (CDSS). When surfaced in a friendly, suggestive format, CDSS has the power to further empower providers with the knowledge and insights they need to deliver care. To frame the CDSS problem further, we focused on 3 key deliverables for this experience:

  • Medications
  • Lab tests
  • Orderable procedures

Identifying the relevant medications, lab tests, and procedures for each appointment is uniquely suited to AI. Generative AI models have progressed amazingly well over the past couple of years, capable of extracting maximum insights from both unstructured patient-provider conversations and structured clinical data already present in the CarbyOS EHR. Furthermore, we can leverage our existing AI infrastructure for AI Charting, centered around real-time transcription and external model provider APIs, towards providing a first-in-class provider experience.

CDSS Framework

We designed a workflow for CDSS to ensure accurate and reliable outputs. It revolves around 4 main steps, and is run continuously throughout the appointment to fetch the freshest context for the best suggestions:

  1. 1. Preprocessing: To recommend the right order at the right time, it's important to first assemble inputs into our recommendation models. Details around the appointment (specialty, visit reason) and the patient (demographics) are collected for the model. In addition, we retrieve the transcribed portion of the patient-provider conversation taking place during the appointment to enrich our input with up-to-date contextual information. Lab tests to order are customized based on the clinic location of the appointment. Prior orders for the appointment are also fetched to make sure we do not duplicate them.
  2. 2. Intent Detection: Part of a healthcare provider’s training and experience is to intuit when a clinical order is necessary during their conversation with a patient. Incorrectly prescribing a medication or ordering a lab test that is inconsistent with clinical guidelines impacts a patient’s quality of care. Within the CDSS framework, we augment for this intuition when expressed in the conversation by identifying breakpoints in the appointment where intent has been expressed by the provider to place a clinical order. Detecting the provider’s intent is done through smaller, cheaper proprietary models available by external API. If intent is affirmative, a recommendation is warranted. Otherwise, the process repeats after some time has passed in the conversation and more transcripts are available. Multiple intents can be expressed by the provider at a given point in time, so we make sure to detect each and every one.
  3. 3. Recommendation: Once the provider’s intent to order has been confirmed, a recommendation can be made. We experimented with a variety of AI models, both proprietary and open-source, with techniques like prompt engineering, retrieval-augmented generation (RAG), and supervised fine-tuning. During this experimentation phase, we found that the task of order recommendation required a lot of in-house knowledge about Carbon Health and the catalog of medications, lab tests, and procedures we have available. This knowledge was not straightforward to instill via additional prompting or via RAG, as context windows are limited and the model would still not have sufficient knowledge to recommend a relevant order. As for fine-tuning, we observed that fine-tuning a single model to generate all orders, regardless of type, resulted in “cross-contamination” of the recommendations. This cross-contamination would result in an order, for example, that contained a lab test description, but also had additional details like dosage, frequency, and drug id, that would be reserved for medications. Ultimately, we developed a suite of smaller, specialized, in-house fine-tuned models that could generate orders of a single specific type. Once the inputs are passed into the model and an intent to order has been identified, a suggestion is generated by the recommendation models for that specific intent. For example, an intent to order a medication results in a medication suggestion only.
  4. 4. Postprocessing: Once a suggestion has been generated, it is important to validate the suggestion to ensure it's available to order for that specific clinic location. To prevent hallucinations, downstream guardrails are enacted to “ground” model predictions in reality. We use clinically approved catalogs of medications, lab tests, and procedures to validate and augment the suggestions. For example, each medication is verified against our medication catalog to confirm it’s an actual medication and the dosage is within appropriate limits, and to fill in additional details, such as its official package description. After the suggestion has been validated, it is surfaced to the provider via the CarbyOS EHR and recorded internally to ensure it is not repeated to the provider.

The entire CDSS framework completes in less than 8 seconds, from start to finish, and occurs approximately 3-5 times per appointment.

This screenshot shows CDSS suggestions for a patient who came in with a respiratory infection. Orders suggested via CDSS appear in the Orders section under the heading Suggested by Carby. These suggestions can then be accepted via the Add to Orders button, edited via the ✏️ icon, or skipped via the 🚫 icon.

Insights into Recommendation

One of the first insights we found while working on the recommendation step in the CDSS framework was within the self-supervised fine-tuning process. In short, fine-tuning involves training an already pretrained large language model (LLM) on examples of input-output pairs that you want the model to generate. Self-supervision means we do not explicitly label or specify an order to predict after the appointment, but rather use our historical appointments data to see what was actually discussed and ordered during the appointment by the provider themself. We developed an annotated transcript of the conversation between the patient and provider from historical data. The transcript was annotated with prior orders at the exact time they were placed in relation to the conversation. This annotation helped the LLM learn the dependencies between actual conversation and orders so that it could predict future orders at the right points in the appointment. For example, a single transcript with 4 actual orders could be used to generate 4 examples, each one with progressively more content (including prior orders) to predict the next order in the sequence.

Another insight was related to how we prepared examples for the training process. Sometimes, the annotated transcript examples we prepared would contain several consecutive orders placed at roughly the same time. To improve the LLM’s ability to generalize to different scenarios, we decided to augment our example set by reordering these consecutive orders. For example, if drug A, lab test B, and procedure C were ordered in a certain appointment, we created additional synthetic examples containing other sequences like B, C, A or C, A, B. This reordering would allow us to create many more examples effectively for free and further allow the LLM to learn the relationships among different orders. We observed improvements in training performance with this new strategy.

Finally, we also prepared examples where no order was placed. Although there already exists intent detection as a “gatekeeper” for requesting recommendations, these “negative” examples would also generalize the LLM by instructing it about scenarios where a suggestion is not necessary. In fact, around 20% of recommendations generated by the LLMs are to recommend no order at all. With this, we wanted to inject a sense of judiciousness and prudence into the LLM so that it generates recommendations conservatively and only when it is “convinced” a suggestion is necessary given the context of the appointment.

Cost Optimization

Delivering CDSS suggestions to providers in the CarbyOS EHR follows a relatively straightforward technical path. However, in shipping this feature to help our providers, we encountered several challenges to optimizing the experience for providers. One of them was related to cost optimization. 

To provide a seamless experience to providers during appointments, CDSS is run continuously throughout an appointment to generate the freshest suggestions regardless of the direction the conversation takes. When we initially released CDSS to a limited number of clinics, we found that the CDSS workflow was running too often, often once every 10 seconds. You might think a lot of meaningful information is shared in 10 seconds, but we found through analysis that this cadence provided negligible incremental value while requiring a substantially higher number of model endpoints for inference. In fact, it made sense from both a technical and financial perspective to run the workflow less frequently. As we tuned the lag between recommendations, we eventually settled on 60 seconds to optimize the right balance between being cost-effective and providing timely and relevant suggestions. As a result of this change, we reduced our endpoint costs by 90% and the average number of endpoints used by 60%. 

What’s next?

To date, CDSS has been used by over 500 providers in over 100,000 appointments. We are currently working on some new updates that will improve the provider experience:

  • Improve the acceptance rate for suggestions
  • Expanding our catalog of ordered procedures
  • Completing medical notes within procedures
  • Recommending imaging orders

Stay tuned!

The Carbon Health AI/ML Team

MediumFacebook
No items found.
No items found.