Creating Resources for Marking Diagnoses in Electronic Health Reports in Serbian

  • Ulfeta Marovac State University of Novi Pazar
  • Aldina Avdić State University of Novi Pazar
  • Dragan Janković Faculty of Electronic Engineering - University of Nis
  • Sead Marovac Odeljenje opšte hirurgije Opšta bolnica Novi Pazar
Keywords: medical reports, diagnoses, automatic marking, computer text processing, lexical resources

Abstract

Thanks to medical information systems, many medical reports are collected in an electronic form daily. Apart from the fields with allowed values for input (the structural part), one part of these reports consists of the free, non-structural text. It contains a more detailed description of the patient's condition, which could not be described using the structural part. Symptoms, results of laboratory analyses, accompanying diagnoses, etc. can often be found in it. Due to a lack of time, doctors often write these descriptions in non-standard ways, using their abbreviations and synonyms, and they often contain typos. All this makes it difficult to extract information in documents specific to the medical domain. This paper presents the creation of medical lexical resources for the automatic labeling of terms from diagnoses in medical reports. In order to perform the automatic marking of the free text, methods of the computer processing of natural languages are needed, as well as appropriate lexical resources. As there are no publicly available medical lexical resources for the Serbian language, as well as a corpus with medical reports, the contribution of this paper is the construction of such resources for needs of automatic marking of diagnoses. Using the proposed resources, diagnosis codes, Latin and Serbian terms specific to certain ICD10 can be mapped with precision of 83.47%, 86.86% and 78.29%, respectively.

Published
2020-06-23
Section
Original Research Papers