Sahar Khalafi (سحر خلفی)
خلاصه سازی مبتني بر فنوتایپ پرونده الکترونیک سلامت
Phenotype-Based Summarization of Electronic Health Record
Today, medical centers record treatment and clinical care for patients in the form of electronic health records. Most of the necessary clinical care is stored in clinical notes consisting of natural language. The analysis and search of the clinical notes are essential for patient care and computational modeling. The growing number of clinical notes in electronic health records creates potentially negative consequences for clinical processes, including errors due to the omission of important information, delays in providing appropriate treatment, and generally endangering patient's health.
Different data mining methods have been proposed to summarize clinical notes, including statistical methods using knowledge bases, the use of cue expressions, and deep learning models. Among the existing challenges is the need for domain specialists to annotate and define comprehensive concepts and relationships between them, create semantic representations of sentences, and enrich the knowledge base.
Identifying the phenotypes in clinical notes plays a vital role in resolving this issue and also leads to the identification of the patient group, which is a crucial task in the secondary use of electronic health records for the management of clinical information. The methods proposed so far to solve the problem of identifying disease phenotypes have not been accurate enough to extract related features. Conventional machine learning approaches require knowledge bases and field experts' intervention to do feature engineering in clinical notes. On the other hand, deep learning approaches also learn features automatically by deep neural models, which are not usually able to extract semantic information and grammatical features effectively.
In this study, a model consisting of two units is presented, which includes a unit for identifying disease phenotypes based on deep learning to identify the most relevant terms to cardiac and pulmonary phenotypes. A summary unit based on phenotype using a combination of two knowledge bases, including the output of the previous unit, as a base of internal knowledge and, the human phenotype ontology, as a base of external knowledge, identifies the most relevant sentences to cardiopulmonary phenotypic abnormalities.
The proposed model extracts more features than the existing methods and provides a better F1score. Also, the phenotype-based summarizing unit, using the phenotypes identified by the deep neural model, automatically extracts topics related to cardiac and pulmonary phenotypes without the need for experts in the field and can use content-based embedding to record the semantic display of sentences without the need for providing existing concepts with related terms in knowledge bases for sentence-level analysis of sentences. The phenotype-based summary system could tackle the challenges of previous methods when using knowledge bases. New methods of selecting related sentences in this study improve the summary system's ROUGE scores compared to content-based summaries such as BERT and SUMMA based on statistical methods.