De-identification of GP e-referrals for a deep-learning based triage decision support tool
Summer of Research project by Nick James, (University of Auckland), supervised by Dr Edmond Zhang (Orion Health) and Dr Patrick Gladding (University of Auckland).
In New Zealand, private medical information cannot be used for research except when it directly relates to treating the patient it was collected from.
There are two ways to make this information available. The first is to get the patients’ consent to use their information for research. The second is to remove any information from the data that could identify the patients.
Medical student Nick James has developed a neural network model that can find identifying details in patient records, so the details can be removed making the records available for research purposes.
The patient records Nick worked with were 300 randomly picked discharge summaries from Waitemata District Health Board. Doctors write discharge summaries when a patient leaves hospital. There is no official form to fill, so each discharge summary is written in a different way. The algorithm must analyse this natural language text to find personal information.
Nick manually identified personal information in 180 discharge summaries and used them to train the algorithm to recognise identifying details. He used the remaining summaries to validate and test the model. The model is a deep learning neural network that analyses individual words and their context to find named entities in the text and label them. The system can analyse hundreds of discharge summaries and detect identifying details like names, contact information, National Health Index (NHI) numbers, healthcare workers’ names, and appointment dates with 99.6% success (F1 score). It is best at predicting birth dates, phone numbers, NHIs and addresses, and is slightly less successful at detecting names.
Some of the inaccuracy came from the algorithm mislabelling patient names as doctor’s names and similar mistakes. Documents can still be completely de-identified with this type of error, because both patient and doctors’ names would be removed.
Nick is currently developing another tool with fellow University of Auckland student Yicheng Shi to remove the identifying details. Continuing work on de-identifying patient records can make much more data available to health researchers, including the thousands of hospital ward round notes, nursing notes and clinic letters being produced every day.
This research may contribute to developing a large de-identified database of local information to improve health in New Zealand.
Nick James is one of 10 students who took part in the Summer of Research programme funded by Precision Driven Health. The research is at an early “proof of concept” stage. The projects offer fresh insights into what healthcare will look like when precision medicine is widely used.