Clinical abbreviations detection and normalisation

Efficient communication between healthcare professionals about a patient’s health is crucial to delivering the best possible care.
cute girl sitting in between her grandparents

Summer of Research

Project by Enno Huang, University of Auckland, supervised by Edmond Zhang (Orion Health and Precision Driven Health) and Yun Sing Koh (University of Auckland).

“2 yo F here for an RPE w/a recent URI who c/o ear pain y/d.”

If you can’t understand that sentence from a real paediatric note, you are not alone. Research shows many doctors struggle to interpret abbreviations in medical notes.

Patients with complex issues often see several doctors from different specialties and with different expertise, who each need to understand notes written by other clinicians. The patient may also ask to see their notes and because of these unexplained abbreviations, will likely struggle to understand them.

Efficient communication between healthcare professionals about a patient’s health is crucial to delivering the best possible care.

There are more than 7000 deaths caused by medication errors each year in New Zealand. Unexplained or misunderstood clinical abbreviations are a contributing factor to these medication errors, which could be reduced through standardising or normalising abbreviations.

Enno Huang created an abbreviation detector to identify abbreviations and acronyms, and a normalisation program to replace them with terms people could understand or look up.

The detector uses a deep learning framework called the Attention Model, originally designed for English-to-French language translation. The detector breaks each word into small chunks for analysis and uses the word’s context to understand its meaning.

The model was trained by analysing each word in the MIMIC-III medical database and giving it a unique vector – placing each word on a map where similar words are close together.

Enno’s detector was 83.85% accurate (F1 score).

After finding abbreviations in a clinical report, Enno needed a way to replace them with recognisable medical terms.

The Unified Medical Language System (UMLS) assigns each medical term an identification code called a CUI. Enno’s normalisation model matches each abbreviation with the CUI that best describes its meaning.

A fundamental problem is some abbreviations have multiple meanings: “pt” could mean “patient” or “physical therapy”. Likewise, some medical terms have multiple abbreviations: “malignant neoplasms” could be abbreviated to “M/N”, “cancer” or “CA”.

The model assigns correct CUIs to abbreviations by using context to understand the sentence. It checks each abbreviation against the “internal” list of definitions created during analysis of the MIMIC-III database, and an “external” list created from eight famous sources [2].

This approach was 74.72% accurate at assigning the correct CUIs to 3187 abbreviations when tested with clinical notes.

Enno believes the model can be improved by using more sources, such as PubMed and Wikipedia, to find more definitions for abbreviations. He says future work to improve the features and models will give satisfying results.

Enno Huang is one of 10 students who took part in the Summer of Research programme funded by Precision Driven Health. The research is at an early “proof of concept” stage. The projects offer fresh insights into what healthcare will look like when precision medicine is widely used.

  1. P. Das-Purkayastha, K. McLeod, and R. Canter, “Specialist medical abbreviations as a foreign language,” Journal of the Royal Society of Medicine, vol. 97, no. 9, pp. 456–456, 2004.
  2. L. V. Grossman, E. G. Mitchell, G. Hripcsak, C. Weng, and D. K. Vawdrey, “A method for harmonization of clinical abbreviation and acronym sense inventories,” Journal of biomedical informatics, vol. 88, pp. 62–69, 2018.