For Better Performance Please Use Chrome or Firefox Web Browser

Fahime Shahrokh (فهیمه شاهرخ)

ارائه روش تركيبي براي استخراج مفاهيم از متون زيست پزشكي با استفاده از روابط معنايي

A hybrid method for extracting concepts from biomedical texts using semantic relationships

Extracting information from biomedical texts is a crucial topic in text mining research for this field, which aims to increase the reusability of medical data. Information extraction previously involved finding a word or phrase containing specific words in the text and attributing it to the information contained in ontological references. In recent years, the many complexities in biomedical texts on the one hand and the development of new and efficient text mining approaches, on the other hand, have conducted the extraction of information from biomedical texts to the extraction of concepts. A concept-based approach to text expressions, considering the position of each word in the text, its written structure, and its relationship with other text components, can improve the quality of extraction of multiword concepts and a more accurate classification of these concepts.

Rule-based approaches, natural language processing techniques, and machine learning methods are used to extract biomedical concepts. For concept extraction, the methods have the main weakness of focusing more on words independently and not considering their semantic relationships. It is another drawback of some strategies that feature extraction is supervised during the feature engineering stage. Machine learning and deep learning approaches such as neural networks and word embedding techniques have been observed in recent research.

Due to the importance of identifying the semantic relationships of words that form a multiword

phrase and refer to a concept, this thesis presents an approach based on four different types of inputs to process medical expressions from their written and semantic aspects. The resulting feature vector is processed in a BiLSTM +CRFbased classification layer. On the i2b22010 dataset, this model has achieved an F1score of 90.06 in recognizing named entities.