Study procedure The aim of the study is to externally validate a machine learning model
predicting colorectal anastomotic leakage. The prediction model that will be externally
validated, is developed on a prospective database. This database contained data of 2,483
colorectal cancer patients who underwent a surgical procedure between January 2016 and April
2021 in 14 hospitals, both rural and academic in four different countries (the Netherlands,
Italy, Belgium, Australia). Some 189 patients (7.6%) developed colorectal anastomotic
leakage. The models predicted risk of colorectal anastomotic leakage intraoperatively, just
prior to the construction of the anastomosis, using a total of 31 variables. These variables
contain both preoperatively available data and the variables regarding the intraoperative
condition of the patient. The models were internally validated using 10-fold cross validation
and subsequently tested on 20% of unseen data of the database. The area under the curve -
receiver operating characteristics (AUROC) of the best performing machine learning model on
the test set was 0.84, with a sensitivity of 0.86, specificity of 0.78, a positive predictive
value of 0.24 and a negative predictive value of 0.99.
During this prospective simulation study there are no direct benefits or risks for
participating patients. This prospective simulation study will be non-interventional, the
prediction models do not alter the original daily practice and in this phase, it is not
intended to be used as a diagnostic device. Intraoperatively, just prior to the construction
of the anastomosis, the prediction model will predict, using patient, tumor, and
intraoperatively variables (listed in the Data Dictionary paragraph), the probability of
anastomotic leakage. SAS Viya is used for development of the machine learning model. During
the prospective simulation study, the scores of these predictions are only available to the
principle and research investigators, and thus unknown to the participating hospitals or
operating surgeons in order to prevent any influence on current daily practice in this stage
of the research. Thirty days postoperatively, data of the patients regarding the occurrence
of anastomotic leakage will be collected. AUROC, sensitivity, specificity, and accuracy then
will be calculated based on the number of patients assessed as true positive, true negative,
false positive or false negative. After a minimum of 100 events and 100 non-events, the
external validation is completed and the final AUROC, sensitivity and specificity scores will
be presented.
Quality assurance plan, data checks, source data verification Data will be handled
confidentially and anonymously. Data will be pseudo-anonymized for the principal investigator
and the research investigators. Pseudo-anonymized data are entered in a Castor database. A
data dictionary is attached to the original dataset with metadata to describe the data. All
participating hospitals have a Data Sharing Agreement to safely share data of included
patients with the principal investigator and the research investigators. A data management
plan will be created according to our institute's polices with the assistance of a data
management expert, along with the Transparent Reporting of a multivariable prediction model
for Individual Prognosis Or Diagnosis (TRIPOD) guidelines.The characteristics of the
collected and generated data is clinical data extracted from the electronic health records.
This contains continuous, nominal, and dichotomous variables. Data will not be reused or
coupled to existing data. Informed consent of patients is necessary to predict the outcome
using the developed model. Privacy policies and laws are applicable to this project. The
project will also comply with all data protection principles as is defined in the General
Data Protection Regulation. The anonymized dataset can be accessed via a Castor database.
Long term data will be saved in the Amsterdam University Medical Center repository with help
of the research data management (RDM) department. The data will be saved for five years after
the project has ended.
Data dictionary
The following variables will be collected:
i. Patient and tumor characteristics Age; sex; body mass index; American Society of
Anesthesiologists (ASA) classification; intoxications (smoking and/or alcohol consumption);
medical history of diabetes; steroid use (not nasal); hemoglobin; benign or malignant
disease. If there is malignant disease: TNM-stage, tumor distance from anal verge,
neoadjuvant treatment.
ii. Perioperative characteristics Surgical procedure, surgical approach; conversion;
occurrence of intraoperative event (hypoxic events, hypercarbia, bradycardia, hypotension,
embolism, reanimation, more extensive resection than planned, serosa lesions, bladder and
ureteral injuries, intraoperative bleeding, splenectomy) iii. Characteristics just prior to
the creation of the anastomosis Patient temperature; time of antibiotic administration;
administration of vasopressors; blood loss; O2 saturation; mean arterial pressure; fluid
administration; urine production; presence of fecal contamination; subjective assessment of
local perfusion; epidural analgesia; dosing movements; time from incision until the creation
of the anastomosis, intention to create stoma.
iv. Postoperative characteristics Colorectal anastomotic leakage within 30 days and length of
hospital stay.
Standard Operating procedures Patients eligible for inclusion are detected in the first
multidisciplinary team meeting. If eligible, the surgeon will inform and discuss this study
with the patient in the preoperative consultation for surgery. If the patient consents to
participation, written informed consent is required. The patient may withdraw this consent at
any time.
Sample size calculation In the participating hospitals, around 100 to 400 colorectal
resections are performed annually, with an approximate incidence of anastomotic leakage of 5
to 15%. Multiple studies demonstrated a minimum of 100 events and 100 nonevents as an
appropriate sample size for external validation. With an expected total of 1,200 patients
included annually and a leakage percentage around 10%, including 100 events takes
approximately one to two years.
Handling missing data The machine learning model will make a prediction in patients with more
than 80% of the required data available. Missing data are imputed using predictive mean
matching with ten iterations.
Statistical analysis plan The external validation will be performed on at least 100 events
(anastomotic leakage) and 100 non-events (no anastomotic leakage). The machine learning model
with the best predictive performance in terms of AUROC will be used as the implementation
model. Colorectal anastomotic leakage rate will be compared in a multivariate logistic
regression model. All analyses will be carried out under the supervision of a clinical
epidemiologist.