The results of the previous study found that the gene expression value in the active stage of
lupus disease was more than 1.5 times higher on average than in the inactive stage.
PASS(Power Analysis and Sample Size) 15.0 software was used for analysis, and the sample size
was calculated by t-test between groups. Two-sided α=0.05 test level was used, and the
statistical power was 80% (β=0.2). According to the 3-fold difference in sample size between
the groups, a total of 60 samples were needed. According to the 20% dropout rate, the total
number of cases is 76. In the second stage, the validation set was allocated in a ratio of
1:1, the statistical power was 90% (β=0.1), and other conditions remained unchanged. An
additional 76 lupus patients were recruited for model validation. The patients were visited
in the 0th, 1st, 3rd, 6th and 12th months, and the error within 7 days was allowed for each
time. The contents include the medical history, blood routine testing, urine routine testing,
vital organ function testing, and BLIAG(British Isles Lupus Assessment Group
index)evaluation. If any patients died, the time and cause of death were recorded.
The effect evaluation index mainly include the following four aspects: (1) Epidemiological
questionnaire: face-to-face inquiries by uniformly trained health professionals, mainly
including the socio-demographic information of the research subjects (including gender, date
of birth, marital status, education level, occupation, income level, height and weight,
etc.), environmental exposure factors (including smoking, alcohol and other drug use, etc.),
medical history (hypertension, diabetes, liver and kidney and other important organ diseases)
and historical medical information. (2) Clinical features: the clinical manifestations
(buccal erythema, skin erythema, rash, oral ulcer, arthritis, digestive, nervous and blood
system involvement) of the patients were observed and examined face to face by clinicians in
the department of rheumatology. (3) Test items: including general blood routine examination,
urine routine (urine protein, urine protein-creatinine ratio, etc.), biochemical indexes
(total bilirubin, creatinine, C-reactive protein, etc.) detection, immune routine (complement
and immune globulin, etc.) and echocardiography, chest CT and other data. (4) Gene expression
detection: According to the previous multi-omics research, screen out the gene sets that are
different from normal people and have significant changes before and after treatment (see the
research basis for details). The patients' RNA was extracted and reverse transcribed into
cDNA(complementary deoxyribonucleic acid), and the expression levels of related genes at
different periods were detected by PCR(Polymerase Chain Reaction) array technology.
The electronic medical record report form is used uniformly for data management. In the early
stage, we have established the Jiangsu Provincial Lupus Research Database Entry System to
store the data of this study. The data entry and modification shall be completed by the
researcher, the data shall be traceable and consistent with the original documents. Any
observation and inspection results in the trial should be timely, correct, complete, clear,
standardized and true. The data administrator (a member of the team statistics) is
responsible for reviewing and managing the entered data. For questions about the data, the
data administrator will send corresponding questions to the researcher, and the researcher
will respond to the questions sent by the data administrator in time. The data administrator
can question again when necessary. All subjects' information will be kept strictly
confidential. Research data are also confidential.
SAS(STATISTICAL ANALYSIS SYSTEM)9.4 and R software were used to process and analyze the data.
The cleaning of the dataset mainly includes: a) For covariates with missing values in the
dataset, excluding covariates with missing values greater than 30% and using bagging trees to
fill in missing values; b) Two variables with a high degree of correlation (correlation
coefficient > 0.9), excluding variables with more missing values; c) Excluding variables
whose variance is 0 or close to 0, the rough calculation principle is that the frequency of
the unique value is too small relative to the whole (10% in this study), and the ratio of the
most frequent value to the frequency of the sub-multiple value is greater than 20; d) Box-Cox
transformation was performed for non-normally distributed continuous variables. Different
algorithms in machine learning are used to select features and construct models, and the
prediction ability of different models is compared to obtain the optimal model. In addition
to the classical logistic regression method, we also used some common methods to deal with
high-dimensional data, such as linear discriminant analysis considering the linear
relationship between covariates and outcomes, partial least squares regression, multiple
adaptive regression spline method and elasticity network (EN). Considering that many clinical
features and outcomes in clinical medicine show nonlinear correlation, we also use k-Nearest
neighbors, Adaptive Boosting, support vector machine, random forest and neural network method
to build a predictive model. The variance inflation factor is used to judge the collinearity
problem. Different indicators are used to reflect the predictive ability of the model from
multiple perspectives, and the C statistic is calculated to evaluate the predictive ability
of the built model. The comprehensive judgment improvement index is used to judge the
improvement of the model after the introduction of new variables. And the decision curve is
drawn to find a model that predicts the largest net benefit.
Before the start of the trial, the trial staff explained the informed consent form to each
participant participating in the trial in an easy-to-understand manner, and obtained the
written informed consent form of the participant voluntarily participating in the trial. It
is guaranteed that participants can refuse to participate in this trial or withdraw from this
trial at any time during the progress of the trial, and the rights and interests of subjects
will not be affected in any way.