De-identified UNMH EEG Corpus Database Creation With Fully De-identified Clinical Information

Phase

N/A

Condition

Epilepsy

Epilepsy (Pediatric)

Treatment

N/A

Clinical Study ID

NCT05140265

21-351

Ages > 18
All Genders

Study Summary

This proposal outlines the steps required for the creation of a pilot database of EEG recordings and de-identified medical records from patients internally referred within the UNMH Comprehensive Epilepsy Center. The UNMH EEG Corpus would be the first database of its kind. Other public databases contain either patient EEG signals or medical records, but without both kinds of information, it is impossible to relate pre-treatment neurobiomarkers with post-treatment prognosis. The database will also contain information that can improve seizure localization based off of scalp and intracranial EEG, and the requisite data for the creation of algorithms that forecast seizure activity; a development that could ultimately lead to novel responsive neural stimulation procedures that suppress seizures before they begin.

Eligibility Criteria

Inclusion

Inclusion Criteria:

We will screen with UNMH EEG database, Nihon Kohden Neuroworkbench first. Aftermeeting all the inclusion and exclusion criteria, we will access the CernerPowerchart (UNMH EMR) for the rest of the clinical information.
18 years old or older. If the patient's age is over 89, we will aggregated them toage 90 or older so that the patient cannot be identified. Also, all the EEG datawill be de-identified and only show the year of the study performed instead of theexact study date to reduce the risk of identification. Of note, we perform over fewthousands of EEG studies per year and it will be almost impossible to identify thepatient based on the study year.

Exclusion

Exclusion Criteria:

Children under age of 18 years old will be excluded.
Mismatched patients between EEG database and EMR

Study Design

October 11, 2021

December 31, 2030

Study Description

Retrospective De-identified EEG and Clinical Database Creation:

The proposed database (UNMH EEG corpus) will be created in stages and designed to increase in complexity and functionality given future funding and tool development. The initial scope for this project includes the construction of a relational database that links patient demographic data (medical records, EEG study number, date of birth) which will be linked to the study number. This list will be kept for 6 years from the completion of study and will be stored in locked office cabinet as a paper form (source documentation) and password protected computer in locked office (electronic version). Then, the rest of the data collection will include algorithmically de-identified clinical reports (e.g. progress notes), recording meta-data (e.g. montage configuration), and de-identified clinical EEG with only study number without PHI. Future work beyond the scope of this pilot project will involve: annotating the EEG trace with the timing and type of seizure (or artifact), extracting medication history from the patient records, standardizing notes on treatment history and outcome, review that ePHI has been removed, and dissemination. A full patient assessment can also include MRI, MEG and PET scans and a final database should also include these valuable images. The creation of the pilot UNMH EEG corpus will focus on the subset of patients internally referred within UNMH for whom an EEG was performed, a treatment was provided, and a follow up assessment occurred. This inclusion criteria will guarantee that the minimum data is present to statistically relate pre-treatment EEG with post-treatment prognosis.

Collection of De-identified Data Retrospectively:

Time Frame: 1) from current up to August 8, 2007 when Nihon Kodhen Neuroworkbench was started at UNM, 2) From the start of study, each year, the investigator will add previous year's de-identified data to the database until the last dataset of 2027. For example, in January to February 2023, the investigator will add 2022 data to the database. The investigator will add previous year's data to the database until 2028 with the last dataset till 2027.

The investigator will generate randomized de-identified study number by computer programing. The investigator will create the secure table of de-identified study number to link patient's PHI (medical record number, EEG study number, Date of Birth). This table will be stored in the locked cabinet in PIs' office. Also, the electronic version will be stored in HSC password protected computer under HSC IT secured drive with only access by PIs and study coordinator.

Once the study number is generated, all the de-identified data will be stored under the study number so that no PHI is present in any of research data.

The investigator will only extract the data from the patients who are 18 years or older at the time of EEG obtained. The investigator will exclude any vulnerable groups or information. Please see below for inclusion and exclusion criteria. Children under age of 18 years old will be excluded. Since, there is no informed consent or direct interaction with the patient in this retrospective data analysis, the patient of any particular ethnic/ racial/ primary language will not be screened nor targeted. Also, there will be no particular exclusion for Spanish speaking patients for the above reason.

The investigator will all de-identify EEG data from the clinical EEG database (i.e. Neuroworkbench of Nihon Kodhen EEG system) and import these to password protected secure study server in PI's HSC IT secured drive domain. There will be no video data of EEG since video of patient can be easily identify the patient's information.

Clinical Information: each patient's Neurology notes (History and Physical, Neurology Progress Note, Neurology Consultation Note, Neurology Clinic note, Neurology Discharge Summary, Neurodiagnostic Report of EEG results, Neuroimaging studies (brain MRI, brain PET, brain MEG), and Patient's Medication List of Anti-seizure medication (ASM) will be pulled. These clinical documentations will be de-identified (removing all PHI) and linked to the study number. After the de-identification and link to study number, the clinical information will be stored in password protected secure study server with de-identified EEG data. While the investigators are creating automated de-identification method, the investigators will manually extract the data and manually de-identify them. Once the automated process is established, the investigators will also perform quality check with manual and automated process comparison.

Specifically, all data (the EEG-BIDS files, the SQL database, and the Excel sheet) will be stored on an internal hard drive within a UNM HSC IT managed desktop PC, physically located in Dr. Sam McKenzie's office in Rm 209A in RGFH. The PC is on the UNM HSC network and the computer runs the UNM HSC mirror of Windows 10. The room is always locked and the PC requires password log in.

The investigators will use the EEG-BIDS file format to store all data and organize the database. This file format specifies a path structure tree with particular nomenclature (Figure 1). Each patient is assigned a directory containing subdirectories for each session and data modality. For non-identifying details about the patient demographics and recording details, information will be saved in two file types: a *.tsv file for data values and a *.json file for descriptive metadata. EEG files with de-identification will be downsampled to 250 Hz, for hard drive storage efficacy, and saved into the European Data Format. Also accompanying the EDF EEG file will be a 'coordinates' file which specifies the location of anatomical landmarks used for montage placement. Another 'events' file will contain annotations of events observed by clinicians in the EEG. This data will be imported from the original Nihon Kohden annotated dataset using the Python MNE toolbox1.

Within this file structure we will also save text files with de-identified clinical notes imported from Cerner Millennium detailing medication, diagnosis, treatment, and prognosis. Non-identifying patient data will additionally be stored in a SQL database with a randomized patient identifier.

An Excel sheet will store random patient identification number (used in the EEG-BIDS file and in the SQL database) and the corresponding patient identifying number for subsequent re-identification if needed.

Connect with a study center

University of New Mexico Health Science
Albuquerque, New Mexico 87106
United States
Active - Recruiting

Not the study for you?

Let us help you find the best match. Sign up as a volunteer and receive email notifications when clinical trials are posted in the medical category of interest to you.