Retrospective De-identified EEG and Clinical Database Creation:
The proposed database (UNMH EEG corpus) will be created in stages and designed to
increase in complexity and functionality given future funding and tool development. The
initial scope for this project includes the construction of a relational database that
links patient demographic data (medical records, EEG study number, date of birth) which
will be linked to the study number. This list will be kept for 6 years from the
completion of study and will be stored in locked office cabinet as a paper form (source
documentation) and password protected computer in locked office (electronic version).
Then, the rest of the data collection will include algorithmically de-identified clinical
reports (e.g. progress notes), recording meta-data (e.g. montage configuration), and
de-identified clinical EEG with only study number without PHI. Future work beyond the
scope of this pilot project will involve: annotating the EEG trace with the timing and
type of seizure (or artifact), extracting medication history from the patient records,
standardizing notes on treatment history and outcome, review that ePHI has been removed,
and dissemination. A full patient assessment can also include MRI, MEG and PET scans and
a final database should also include these valuable images. The creation of the pilot
UNMH EEG corpus will focus on the subset of patients internally referred within UNMH for
whom an EEG was performed, a treatment was provided, and a follow up assessment occurred.
This inclusion criteria will guarantee that the minimum data is present to statistically
relate pre-treatment EEG with post-treatment prognosis.
Collection of De-identified Data Retrospectively:
Time Frame: 1) from current up to August 8, 2007 when Nihon Kodhen Neuroworkbench was
started at UNM, 2) From the start of study, each year, the investigator will add previous
year's de-identified data to the database until the last dataset of 2027. For example, in
January to February 2023, the investigator will add 2022 data to the database. The
investigator will add previous year's data to the database until 2028 with the last
dataset till 2027.
The investigator will generate randomized de-identified study number by computer
programing. The investigator will create the secure table of de-identified study number
to link patient's PHI (medical record number, EEG study number, Date of Birth). This
table will be stored in the locked cabinet in PIs' office. Also, the electronic version
will be stored in HSC password protected computer under HSC IT secured drive with only
access by PIs and study coordinator.
Once the study number is generated, all the de-identified data will be stored under the
study number so that no PHI is present in any of research data.
The investigator will only extract the data from the patients who are 18 years or older
at the time of EEG obtained. The investigator will exclude any vulnerable groups or
information. Please see below for inclusion and exclusion criteria. Children under age of
18 years old will be excluded. Since, there is no informed consent or direct interaction
with the patient in this retrospective data analysis, the patient of any particular
ethnic/ racial/ primary language will not be screened nor targeted. Also, there will be
no particular exclusion for Spanish speaking patients for the above reason.
The investigator will all de-identify EEG data from the clinical EEG database (i.e.
Neuroworkbench of Nihon Kodhen EEG system) and import these to password protected secure
study server in PI's HSC IT secured drive domain. There will be no video data of EEG
since video of patient can be easily identify the patient's information.
Clinical Information: each patient's Neurology notes (History and Physical, Neurology
Progress Note, Neurology Consultation Note, Neurology Clinic note, Neurology Discharge
Summary, Neurodiagnostic Report of EEG results, Neuroimaging studies (brain MRI, brain
PET, brain MEG), and Patient's Medication List of Anti-seizure medication (ASM) will be
pulled. These clinical documentations will be de-identified (removing all PHI) and linked
to the study number. After the de-identification and link to study number, the clinical
information will be stored in password protected secure study server with de-identified
EEG data. While the investigators are creating automated de-identification method, the
investigators will manually extract the data and manually de-identify them. Once the
automated process is established, the investigators will also perform quality check with
manual and automated process comparison.
Specifically, all data (the EEG-BIDS files, the SQL database, and the Excel sheet) will
be stored on an internal hard drive within a UNM HSC IT managed desktop PC, physically
located in Dr. Sam McKenzie's office in Rm 209A in RGFH. The PC is on the UNM HSC network
and the computer runs the UNM HSC mirror of Windows 10. The room is always locked and the
PC requires password log in.
The investigators will use the EEG-BIDS file format to store all data and organize the
database. This file format specifies a path structure tree with particular nomenclature
(Figure 1). Each patient is assigned a directory containing subdirectories for each
session and data modality. For non-identifying details about the patient demographics and
recording details, information will be saved in two file types: a *.tsv file for data
values and a *.json file for descriptive metadata. EEG files with de-identification will
be downsampled to 250 Hz, for hard drive storage efficacy, and saved into the European
Data Format. Also accompanying the EDF EEG file will be a 'coordinates' file which
specifies the location of anatomical landmarks used for montage placement. Another
'events' file will contain annotations of events observed by clinicians in the EEG. This
data will be imported from the original Nihon Kohden annotated dataset using the Python
MNE toolbox1.
Within this file structure we will also save text files with de-identified clinical notes
imported from Cerner Millennium detailing medication, diagnosis, treatment, and
prognosis. Non-identifying patient data will additionally be stored in a SQL database
with a randomized patient identifier.
An Excel sheet will store random patient identification number (used in the EEG-BIDS file
and in the SQL database) and the corresponding patient identifying number for subsequent
re-identification if needed.