Despite the rapid development of medicine and computer science in recent years, the
medical treatment in modern clinical practice is often empirical and based on
retrospective data. With the growing number of patients and their concentration in large
tertiary centers, it becomes attractive to systematically collect clinical data and apply
them to risk stratification models. However, with the increasing volume of data, manual
data collection and processing becomes a challenge, as this approach is time consuming
and costly for the healthcare systems. In addition, unstructured information, such as
clinical notes, are very often written as free text that is unsuitable for direct
analysis. The use of artificial intelligence is very promising and is going to rapidly
change the future of medicine in the upcoming years. Due to the automated processes it
offers, it is possible to quickly and reliably extract data for further processing. The
results from its use can be easily extended to different healthcare systems, amplifying
the knowledge produced and improving diagnostic and therapeutic accuracy, and ultimately
positively affecting health services. Collecting the vast amount of data from different
sources without compromising patients' personal data is a major challenge in modern
science.
Electronically-registered clinical notes of patients who were hospitalized in the
Cardiology ward of tertiary hospitals will be retrospectively collected, as well as
additional files such as the laboratory and imaging examinations related to each
hospitalization. Given the size of the participating clinics and the years during which
the recording of electronic hospital records in electronic form was applied, it is
estimated that the sample of patient records will be about 60.000. All information that
could potentially be used to identify a person, such as name, ID number, postal code,
place of residence, occupation, will be deleted from these electronic files. Only the age
will be recorded, not the exact date of birth of each patient. Only the days of
hospitalization will be recorded and not the exact dates of admission and discharge from
the hospital. Thus, the data will not be able to be assigned to a specific subject, as no
additional information or identifiers will be collected for the subjects. After the files
are anonymized, each patient's clinical note will be linked with a specific key
("identifier"). The electronic file that contains the correlation of the "identifier"
with the patient's clinical note will be stored in a secure hospital electronic location.
The fully anonymized files will initially be manually analyzed to extract information
into a database containing all of patients' clinical information, such as discharge
diagnoses, medications, treatment protocols, laboratory and diagnostic tests. At the same
time, a sample (1/3) of the clinical notes will be analyzed to identify the keywords or
phrases associated with each diagnosis (for example, the atrial fibrillation diagnosis
will probably be recorded as "atrial fibrillation", " AF ", etc.). By using this
generated dictionary of keywords and by integrating artificial intelligence methods and
text mining, such as natural language processing (NLP), an automated extraction of data
and diagnoses from these electronic medical notes will be attempted. The reliability and
accuracy of the computational methods will be evaluated internally, comparing the data
extracted automatically with those recorded manually. In addition, the reliability and
accuracy of these computational methods will be evaluated externally, applying these
methods to 2/3 of the clinical notes in which no association between keywords and
specific diagnoses was attempted.
Regarding Greece, the present study aims to be the first to analyze the usefulness of
artificial intelligence for automated extraction and processing of unstructured clinical
data from patients' medical clinical notes. The results of this study will have a
positive impact on:
the automation of large-scale data analysis and processing procedures
the rapid epidemiological recording and utilization of clinical data
the early diagnosis of diseases
the development of phenotypic patient profiles that could benefit from targeted
therapies
the development of clinical decision support systems that will provide information
about the possible clinical course of patients after hospital discharge and assist
medical decisions
the development and validation of prognostic models for major cardiovascular
diseases