Study purpose and design.
People with prior stroke are at a high-risk of incident adverse cardiovascular outcomes
including heart failure, atrial fibrillation (AF), recurrent stroke and vascular
cognitive impairment and dementia. However, there is a need to clarify the underlying
risk factors for these outcomes specific to post-stroke populations. Extensive research
has been conducted to identify individuals at high-risk of cardiovascular disease through
the development of risk prediction models. This has led to the incorporation of risk
models for cardiovascular disease into guidelines for clinical practice with an aim to
improve patient-centred care and decision-making. Risk factors frequently incorporated in
such models include age, male sex, hypertension, cholesterol, smoking, and diabetes
mellitus. Although some risk prediction models have examined cardiovascular outcomes such
as AF in people post-stroke, further research is needed to refine these models and make
recommendations for implementation to clinical practice. Identifying precise risk
prediction models for cardiovascular disease and cardiovascular-related complications in
people post-stroke is needed to target screening for conditions (such as AF) and develop
targeted intervention strategies specific to this population.
Quality assurance plan
Pseudo-anonymised data using the unique, non-identifiable participant ID will be
collected in an electronic case report form using Research Electronic Data Capture
(REDCap; https://www.project-redcap.org). The data entered in to REDCap for the first 20
patients recruited at each site will be remotely checked for completeness. The data
entered onto REDCap will be checked against electronic medical records and paper
questionnaires at selected sites.
Data checks
The data fields in REDCap have been set with predefined rules for range or consistency
and error messages will display when these rules are violated.
Sample size assessment
As one of the main aims of the study is to examine post-stroke risk prediction models for
AF, this will be used to determine the sample size. Post-stroke prevalence of AF has been
estimated at approximately 24%. Based on the 24% and with a conservative estimate of 15
cases required per variable in the model, 195 cases would be appropriate for a
13-variable model, which is the maximum number of variables included in previous AF
prediction models. Therefore, 815 participants would give approximately 195 patients who
develop AF required for the model. The study aims to recruit these participants and 20%
extra to account for potential loss to follow-up, resulting in a minimum of 978
participants.
Statistical analysis plan
All data collected will be quantitative. Data will be analysed by members of the research
team at the Liverpool Centre for Cardiovascular Science. Cox proportional hazard models
adjusted for potential confounding factors will be used to examine associations between
risk factors and cardiovascular outcomes and mortality. Risk models identified in
previous studies for AF and cardiovascular-related outcomes including cardiovascular
disease, physical function, cognitive impairment and dementia, quality-of-life, and
all-cause and cardiovascular mortality will be examined in the L-HARP stroke cohort.
Receiver operating characteristic curves will be constructed, and Harrell C indexes (i.e.
area under the curve) will be estimated as a measure of model performance and compared
using the DeLong test.
In addition to traditional epidemiological approaches to risk prediction modelling,
machine learning methodologies will also be examined. Machine learning has been shown to
produce comparable results to traditional cardiovascular disease risk prediction scores,
but with advantages such as examining all available data in an unbiased approach which
could lead to the discovery of new relationships among data. As the sample is not very
large, traditional machine learning techniques including k-Nearest Neighbours, random
forest, and decision tree will be utilised rather than deep learning techniques which are
usually applied to very large datasets. A subset of the data will be used for the
training of the model and the rest of the data will be used for the evaluation of the
model. The model derived from machine learning will be compared to risk prediction models
described in previous studies. The accuracy, specificity, sensitivity, positive
predictive value and negative predictive value of the models will be compared.