Despite the rarity of each of the 198 identified rare cancers, collectively they
represent 24% of all new cancer cases diagnosed in the EU28/yearly. Differences in
survival for rare cancers exist across European countries suggesting the existence of
inequalities in healthcare. Rare cancers in general get less scientific consideration and
financial support than their more common counterparts. The generation of clinical
evidence is more difficult due to the difficulties of conducting clinical trials for to
the small number of patients and the paucity of accessible data, including data from
cancer registries.
There is a wide consensus that to support clinical research on rare cancers, clinical
registries should be developed within networks specializing in rare cancers. In the EU, a
unique opportunity is provided by the European Reference Networks (ERNs). The ERNs are
virtual networks of selected institutions targeting rare conditions. Three ERNs are
dedicated to rare cancers: EuroBloodNet for rare haematological diseases, PaedCan for
paediatric cancers and EURACAN for rare adult solid cancers ( https://euracan.eu/).
An EU-supported project Starting an Adult Rare Tumour Registry (STARTER) began on April
1st 2020 to develop the EURACAN registry. The registry initiated on the rare head and
neck cancers including nasal cavity and paranasal sinuses (incidence rate 0.5/100,000),
nasopharynx (incidence rate 0.5/100,000), salivary gland (incidence rate 1.5/100,000) and
middle ear cancers (incidence rate 0.03/100,000), corresponding to 2,500, 2,500, about
8,000 and about 200 new cases/year in Europe, respectively
(http://rarecarenet.istitutotumori.mi.it/analysis.php). Cancer care for head and neck
cancers is complex in particular for the rare ones. Knowledge is limited, diseases are
complex and often need multidisciplinary approach. Moreover, while most head and neck
cancers are predominantly squamous cell carcinomas, salivary gland tumours include more
than 20 distinct histological subtypes. Thus, heterogeneity add complexity to the rarity.
Against this background, the EURACAN registry on rare head and neck cancers was set up
with the following objectives:
to help describe the natural history of rare head and neck cancers;
to evaluate factors that influence prognosis;
to assess treatment effectiveness;
to measure indicators of quality of care.
Furthermore, the registry aims to collect information, where available, on the storage of
biological samples at the premises of the participating healthcare providers (HCPs). This
will facilitate future studies on rare head and neck cancers biology.
The registry is designed to prospectively collect clinical data derived from diagnostic
tests and treatments performed by the HCP as part of patient management. The data
collected for the registry will not entail further examinations or admissions to the HCP
and/or additional appointments to those normally provided. In other words, it will be an
observational, real-world registry.
The registry will exploit data available from:
national or regional registries/databases (DBs) dedicated to rare head and neck
cancers (i.e. nasopharynx; nasal cavity and paranasal sinuses; salivary gland; and
middle ear cancers)
HCP registries/DBs;
ad hoc data collection by HCPs.
The registry is federated thus, data are stored by the data provider. At the local level,
data are pseudonymised.
The Personal Health Train (PHT) enables data from multiple organizations to be analysed
without identifiable data leaving the organization. By keeping data at its source, no
copies of datasets are generated and/or shared with third parties. Vantage6 is the open
source implementation of the PHT (https://www.vantage6.ai). Vantage6 uses the
mathematical principle of "federated learning" which is based on the mathematical
principle of splitting computations into (a) parts at the station (local HCP or registry)
and (b) a central part. The stations share sub-computations with the central server only.
If federated learning does not work, the data, after quality validation, will be
anonymised and sent to the coordination centre (i.e. National Cancer Institute of Milan
[Fondazione IRCCS Istituto Nazionale dei Tumori, Milan-INT]).
Data analyses plan The data analyses will include descriptive statistics showing
frequency and patterns of patients' and cancers' variables; analytical analyses
investigating the association of patients/disease and/or treatment characteristics and
health outcomes.
Descriptive statistics will be used to reconstruct the natural history of rare head and
neck cancers (e.g. primary tumour growth rate and pattern, its metastatic dissemination,
growth of metastases, association with other diseases etc.) and to report about quality
of care.
Multivariable Cox's proportional hazards model and Hazard ratios (HR) for all-cause or
cause specific mortality will be used to determine independent predictors of overall
survival, recurrence and second primary cancer. Variables to include in the multivariable
regression model will be selected based on the results of univariable analysis. The role
of confounding of other covariates will be evaluated using stratified analysis or
sensitivity analysis.
To assess treatment effectiveness, multivariable models, propensity score adjustment and
progression-free survival will be performed.
High proportion of missing data threaten the validity of the inferences/prognostic
models. Thus, a maximum of 10% of missing data will be allowed and missing data will be
imputed using strategy such as unconditional/conditional mean or expectation maximum.
Sample size This is an observational clinical registry which implies a long-term data
collection lasting until all the registry objectives are met. Being this registry an
observational one, there won't be any tipe of sampling, therefore all the patients that
meet the eligibility criteria will be selected. Only HCPs that treat at least 100 cases a
year of all rare head and neck cancers are EURACAN member. As of April 2022, the registry
is activating 10 HCPs and, with time, is envisioning to at least duplicate the number of
data providers. Considering that we expected 6 centres in Italy, 2 centres in Germany, 1
centre in Czech Republic, 1 centre in Spain, 2 centres in France (400 cases per year)and
the whole rare head and neck cases in The Netherlands (300 patients yearly, based on
incidence estimations), we expect about 1700 patients with a rare head and neck cancer
yearly.
Due to the observational nature of the registry, sample size justification is based on
the precision of the estimates presented in terms of width of two-sided 95% Confidence
Interval (CI) for a single proportion using the Simple Asymptotic method (in case of
categorical variables) and in form of normal distribution for means (in case of
continuous variables). Thus, for example, for a categorical endpoint (eg, proportion) a
sample size of 80 patients (e.g. middle ear cases in 4 years) will achieve a maximum
width of 95% CI on estimated proportions of 23.4% (i.e. estimated proportion +/- 11.7%).
For continuous endpoints, a sample size of 80 patients will achieve a maximum width of
95% CI on estimated means of 0.46SD (i.e. estimated mean +/- 0.23SD, where SD=Standard
Deviation).
For the analytical questions involving several different outputs and variables, it is not
possible to define a summary of the sample size calculation. For this reason ad hoc
analysis plans for each research question are envisioned.
Data quality checks Data quality checks aim to assess whether data value are present,
valid and believable in terms of validity, plausibility and completeness. Validity and
plausibility checks are embedded in the electronic case report form (CRF) in the form of
alerts and errors during the data input. Additional checks are implemented in R. The R
script, including the checks is downloaded locally from an online instruction repository.
The R script extracts from the Research Electronic Data Capture (RedCap) (the IT solution
used for the registry CRF) all the completed cases, stores a copy of the DB in the
dedicated local server and runs the checks locally. The results of these checks are
summarized in two reports: a summary and an individual report. Thanks to the Vantage6
software, the two reports (not the data) will reach the registry coordination team (INT)
to be monitored and discussed with each data provider.
After the corrections made by the data providers, all checks will be re-run and quality
reports reviewed by INT. Interaction with data providers will be reiterated until
sufficient data quality is achieved. The DB with sufficient data quality will be saved
and used for the federated learning analysis. These checks will be performed annually and
will ensure high data quality within a federated DB.
Data to be collected Following the EURACAN registry objectives, data will be
prospectively collected on patient characteristics, exposure and outcomes. Patient
characteristics are descriptive patient data, such as patient demographics, including
lifestyle, medical history, health status, etc. The registry will not collect genetic
data. Exposure data focus on the disease, devices, procedures, treatments or services of
interest. Outcome data describe patient outcomes (e.g. survival, progression,
progression-free survival, death, etc.). In addition, data on potential confounders (e.g.
comorbidity; functional status etc.) will be also collected
(https://euracan.eu/research/starter/rare-head-and-neck-cancer-registry/#codebook).
Pitfalls There is a risk of limited representativeness due to the hospital-based nature
of the registry and to the fact that hospital contributing to the registry are expert
centres for these rare cancers. Representativeness of the registry will be tested
comparing the registry data with population-based data in terms of relevant variables
(eg. age, stage, prognosis). Adequate statistical (eg. marginal structural model) methods
will be used if time-varying treatments/confounders and confounding by indication
(selective prescribing) will be present, not to raise methodological problems.
Directed acyclic graphs can also be useful to identify the source of bias and will be
utilized in the definitions of the path between covariates.
Clinical Follow-up could be an issue but active search of the life status of the patients
will be guaranteed.