Abstract |
According to the World Health Organization, lung cancer is estimated to have the highest mortality
rate worldwide. Lung cancer can be divided into two main categories: non-small cell lung carcinoma
(NSCLC) and small cell lung carcinoma (SCLC), with the former being the most prevalent type of lung
cancer, accounting for approximately 85% of cases. The majority of lung cancer cases are diagnosed
after a symptom appears related to primary or metastatic disease. The progression of the disease is
typically described using five stages, from 0 to IV. The accurate staging of lung cancer is essential to
establishing a prognosis and selecting the optimal treatment. However, staging information is not
necessarily predictive of the disease progression or the response to treatment. Several studies have
investigated the relationship between image features and lung cancer. Radiomics refers to the
extraction of a large number of features from medical images with the intent of creating mineable
databases from radiological images. Image features can be used to reveal diagnostic, predictive, and
prognostic associations in cancer patients via correlations with other response criteria like survival or
response to treatment. The increase in deep learning methods has also paved the way for the
extraction of high-dimensional deep features that could capture deeper the cancer information.
Furthermore, advances in transcriptomics have provided genome-wide information on gene structure
and gene function in order to reveal the mechanisms behind the biological processes of cancer.
In many cancer studies, the main outcome under assessment is the time to an event of interest. The
event might be the death of the patient, or the recurrence of the disease after successful treatment.
The modelling of time to event data is called survival analysis and it has been used in many areas,
including the biomedical, social, and engineering sciences. Outcome modelling can be used for the
identification of the prognostic signature of patients and the stratification according to their survival
time into groups with different risks of experiencing the event. Several studies have been conducted
that use single source data to investigate the survival of cancer patients, such as histologic, imaging,
or molecular data.
This master thesis aims to investigate the synergetic properties of multi-view data sources such as
radiomics, transcriptomics, and deep features, in developing machine learning models for survival
analysis. The dataset used comprised of 211 Computer Tomography (CT) examinations, 130 RNA-seq
vectors (𝑃𝐺) and clinical data with histology, genomic, semantic, survival and disease recurrence
information. The intersection of the transcriptomic and imaging data was a subset of 115 patients and
the patient cohort of survival included 40 subjects. Two commonly used machine learning methods
have been examined for the classification of patients into low- and high-risk, random forest and
support vector machine. The feature-fusion strategy included combining all features to perform survival analysis and also combining only radiomics and deep features. The proposed deep
radiotranscriptomic analysis resulted in a C-index 0.77 ± 0.10 using support vector machine with Cindex in the range of 0.65 to 0.83. The C-index using random forest classifier was 0.74 ± 0.11, in the
range of 0.63 to 0.81. Deep radiotranscriptomic analysis outperformed analyses comprised only of
radiomics and deep features. In that case, random forest reached a C-index of 0.68 ± 0.03 and
support vector machine a C-index of 0.73 ± 0.07. The deep features that resulted in the best
predictions were mostly extracted from MobileNet, ResNet, DenseNet, and NasNet models.
Combining imaging information in the form of radiomics and deep features and histologic in the form
of transcriptomics improved classification metrics, such as C-index and better ranked the patients
according to their risk of experiencing the event.
Parts of this work are included in the publication that is under review,
entitled "Deep Radiotranscriptomics of Non-Small Cell Lung Carcinoma for Assessing High-Level
Clinical Outcomes using Multi-View Analysis" conduced by Trivizakis Eleftherios, Koutroumpa
Nikoletta-Maria, Souglakos John, Karantanas Apostolos, Zervakis Michalis E., Marias Kostas. Details
regarding the selected parameters and the complete source code of the analysis are provided online
at https://github.com/NikiKou/deep_radiotranscriptomics_survival_analysis.
|