A Review of Technological Progression from Radiomics to Breathomics for Early Detection of Lung Cancer

Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria Covenant Applied Informatics and Communication Africa Center of Excellence (CApIC-ACE), Covenant University, Ota, Nigeria HRA, Institute for Systems Science, Durban University of Technology, Durban, P.O. Box 1334, Durban, South Africa Division of Oncology, Department of Surgery, Afe Babalola University, Ado-Ekiti, Nigeria Department of Chemistry, Kwara State University, P.M.B. 1530, Malete, Kwara State, Nigeria Department of Microbiology, University of Uyo, PMB 1017, Akwa-Ibom State, Uyo, Nigeria KZN e-Skills CoLab, Durban University of Technology, Durban, South Africa Department of Electrical and Electronic Engineering, Osun State University, Osogbo, Osun State, Nigeria Department of Information and Communication Technology, Mangosuthu University of Technology, P.O. Box 12363 Jacobs, 4026 Durban, South Africa


Introduction
Lung Cancer (LC) also referred to as carcinoma of the lung or bronchogenic carcinoma, is an aggressive lung tumor caused by unrestrained growth of epithelial cells in the tissues of the lung, usually in the bronchi and the airways. This development may spread to a distant site from the lungs and produce metastatic lesions in the brain, bone, liver or adrenal glands. Often times, lung cancer patients are symptomatic of cough, hemoptysis, dyspnea, chest pain, non-resolving pneumonia, metastatic disease like skeletal pain or neurological issues (Hu et al., 2016). Due to the asymptomatic character of this cancer, most cases are not diagnosed till the illness has advanced significantly and therapeutic treatment is not an option anymore.
The International Agency for Research on Cancer (IARC) and World Health Organization (WHO) estimate that lung cancer is the world's most prevalent cancer fatality and the 6th leading cause of mortality among all forms of cancer in Africa (Sanni et al., 2018). WHO also predicted that Chronic Obstructive Pulmonary Disease (COPD) will be liable for every third mortality in the globe by 2030. COPD was also proposed as a contributing factor for lung cancer (Welniak et al., 2015). In addition, when localized, only 16% of cases of lung cancer are found, while 22 and 57% are detected at regional and distant stages, respectively. In spite of the understanding that timely diagnosis of lung cancer will have a significant effect on survival rates, there has not been an appropriate diagnostic process that offers proof of early detection as is available for other types of cancer such as breast and colorectal (Herbst et al., 2018;Lanni et al., 2018).
Lung carcinoma is traditionally classified histologically into 2 key forms: Small Cell LC (SCLC) and Non-Small Cell LC (NSCLC). Mutations involving EGFR, KRAS and TP53 have also been used to subclassify lung cancers on a molecular level. SCLC, which mostly begins in the wider airways as well as the main and secondary bronchi, accounts for around 10-15% of lung cancers. It is the most severe form of lung cancer and develops faster than NSCLC, frequently metastasizing at the early onset of the disease to other areas of the organ. At the onset of symptoms, most SCLC cases have widespread metastasis. SCLC's 5-year survival rate (6%) is lower than that of NSCLC (21%) (Linning et al., 2019).
The most prevalent form of lung cancer is NSCLC, grouped on the basis of therapeutic outlook and the cancer cells' gene expression. Roughly 85% of lung cancer cases fall into this category. Adenocarcinoma (40%), squamous cell carcinoma (25-30%) and large cell carcinoma (10-15%) are the three major NSCLC subtypes.
Many patients diagnosed with NSCLC are at a complicated phase in which it is important to envisage the probability of survival and therapeutic options. Regrettably, during diagnosis, 70% of NSCLC tumors cannot be resected, which is important for a clinical analysis that will disclose molecular details and thus the outcome of the prognosis. Knowing the prognosis helps to assess if seeking a certain procedure is useful (Linning et al., 2019;Vavala and Novello, 2017).
If LC is identified early, at least 50% of the patients will survive and be free of relapse tumors. The significant causes of lung cancer are firsthand smoking of tobacco and secondhand smoking in nonsmokers. Other factors include asbestos contamination, automotive and factory air pollution, as well as arsenic, chromium, nickel, aromatic hydrocarbons and ether contaminations (Hecht, 2012).
Recognition of LC in its early stages can be characterized on the grounds of lead time, length of time and biases of tumor selection. If it is discovered at an indolent stage, the survival rate will improve significantly. Chemo and radiotherapy treatments are usually used for SCLC, whereas NSCLC is usually treated with surgery. Research shows that use of imaging techniques such as Chest Radiograph (CRG), Computed Tomography (CT), Low-Dose CT (LDCT), Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) are standard approaches for detecting lung cancer.
Nonetheless, all these procedures are expensive and protracted. Also, because of their invasiveness, they are unresponsive to carcinogenic cells in their initial phase. Biopsy and autoantibodies are other common ways of identifying lung cancer, but they are also expensive, time-consuming and require expert physicians. A non-invasive breath test will have significant promise as means a of detecting lung carcinoma (Antoniou et al., 2019). A diagnostic test that can identify lung cancer early before it has spread is effective in minimizing lung cancer death rates. The earlier the detection, the higher the chances of successful treatment.
In this review, we focus on the evolution of existing approaches for early lung cancer detection. The resurgence of interest in early identification of LC and the medical imagery used have raised innumerable issues about this international epidemic. Each of the existing methods for LC detection has particular advantages and weaknesses, hence there is continued focus on developing better approaches. The rest of this review is divided into two sections: The trends in LC diagnosis and detection from customary methods to the most recent improvements in exhaled breath analysis are presented, while the latter section presents the research opportunities and the future direction in this area of research.

Conventional Imaging Approaches
The traditional methods of identification and characterization for LC include: CRG, CT, LDCT, MRI and PET. The purpose of these techniques is to locate lesions in subjects who are at high risk of LC. While they are common strategies with a wide pool of professional users, they are impaired by demerits, which include a high false rate, radiation exposure, high cost and poor sensitivity for the identification of cancer cells at early stages (Kovalchik et al., 2013;Kumar and Latte, 2019).
A key advancement in the traditional method for identifying and characterizing LC is the use of advanced algorithms to interpret clinical images called Radiomics. Radiomics improves the precision of LC nodule detection and enables doctors and radiologists to precisely analyze radiographic scans.

Chest Radiograph
A chest radiograph is the traditional screening method for the detection of lesions and pulmonary nodules in patients with LC. It uses X-rays to generate lung images, but is associated with the issues of radiation exposure dose and poor image quality. It has also been reported in the literature that this approach has low sensitivity and is not helpful in reducing LC mortality (Khobragade et al., 2016).

Computed Tomography
CT is an imaging screening tool that uses sophisticated X-ray technology to detect the condition of the lungs and provides data on tumor features such as dimensions, classification and growth. In addition to staging of the tumor, the 3D CT image presents examination of the chest wall, diaphragm and 1073 mediastinum invasion. According to Latifi et al. (2015), LC control is substantially influenced by 4D CT by enabling more accurate targeting of the radiation administered. The key disadvantages of CT remain risky exposure to radiation and the detection of indolent tumors that could lead to over-diagnosis and, in most cases, incur additional cost, anxiety and morbidity for patients (Kovalchik et al., 2013;Ohno et al., 2018;Patz et al., 2014).

Low-Dose Computed Tomography
Low-Dose Computed Tomography (LDCT) was introduced for lung imaging in order to overcome the limitations of radiation in CT. LDCT acquires single-slice spiral Computed Tomography (CT) using Multidetector row CT (MDCT) scanners. Studies show that LDCT spots more lung nodules and cancer cells than a chest radiograph. Up to 90% of new LC cases were detected using LDCT. It uses an iterative image approach, which is more suitable for the detection of LC in high-risk subjects. However, LDCT is limited in reducing LC mortality as it produces high false-positive rates and the problem of excessive exposure to radiation still persists (Journy et al., 2015;Marshall et al., 2017;NLSTRT, 2013).

Positron Emission Tomography/Computed Tomography
Positron Emission Tomography/Computed Tomography (PET/CT) is a fusion of two oncological visualization procedures, which deliver CT structural data and PET anabolic information to envision cancer cells in 3D or bi-dimensional slices, either individually or fused. F18-Fluorodeoxiglucose (F18-FDG) is the most widely used radiotracer, which enables the identification of tumor cells and glucose-consuming metastases similar to majority of malignant pulmonary lesions (Chicklore et al., 2013;de Guevara Hernández, 2015). PET provides better responsiveness and enhanced precision for LC identification compared to CT. In cases with developed NSCLC for stereotactic radiotherapy, PET plays a significant function in the identification of subjects and specification of treatment sites (Mac Manus et al., 2013).

Magnetic Resonance Imaging
Magnetic Resonance Imaging (MRI) was developed to overcome the drawbacks of PET/CT in the investigation of lung malignancies and other diseases. It is a non-ionization radiation tool that works with the principle of magnetic fields and radio waves. It is interfaced with a computer to produce comprehensive images of the structures within the chest and determine the size, extent and degree of LC spread to adjacent structures (Hochhegger et al., 2011). Due to reduced proton intensity and rapid signal degradation of the lung tissue structure, lung MRI has engineering hurdles. In addition, the quality of lung imaging in MRI depends on the patient's ability to conform to the breath hold instructions (Hochhegger et al., 2011;Biederer et al., 2012).

Magnetic Induction Tomography
Magnetic Induction Tomography (MIT) is another type of imagery procedure. MIT is a tomography technique that uses a contactless method for mapping the electrical conductivity of tissue and also imaging the electromagnetic properties of an object using the eddy current effect. It can be used to produce passive electrical 3D images, which include neuroimaging applications, radiology cryosurgery tracking and metal flow modeling in metal working procedures. Intramuscular fat and water-bearing fat-free tissue can be classified by MIT (Han et al., 2016). However, MIT imaging techniques have several issues, including high computation imaging algorithms, hypothetical phantoms of the thoracic cavity, challenging clinical hardware systems and spatial resolution (Xiao et al., 2018).

Computer Aided Diagnosis (CAD) of Lung Cancer using Radiomics
Radiomics is the advancement of high-resolution extraction processes and techniques for quantifiable functions that transform images from traditional imagery methods into databases that can be extracted to facilitate clinical decision-making (Gillies et al., 2016). To improve the precision of neoplasm nodule detection, the CAD approach was created to enable doctors and radiologists to precisely analyze radiographic scans. The traditional computer-aided testing and identification systems focused on medical imaging for LC include; CRG, CT, LDCT, MRI and PET. Such techniques are designed to distinguish tumors in subjects that are particularly prone to LC (Orozco et al., 2012). Therefore, before making final decisions, radiotherapists utilize the device data to ascertain disease state, since the machine is deemed useful for the identification of possible nodules. The key phases used in radiomics are segmentation, feature extraction, classification and nodule detection. Initial radiographs of individuals are collected from the LC testing database of radiographs with illustrated marked-up lesions such as the Lung Image Database Consortium (LIDC) (Demir and Yılmaz Çamurcu, 2015;Junior et al., 2018).
Although radiomics approaches are well-identified methods with a huge network of knowledgeable operators, demerits exist. When analyzing ordinary computed tomography, doctors and radiographers find it tough to distinguish cancerous nodules because the cross-section is complicated. This involves extra efforts in the identification of the LC by radiologists, and hence there is a high risk of error.

Proteomics and Genomics
Proteomics is thea systematic analysis of all the sets of proteins present in cells, chromosomes, or tissues. This technique models how the architecture of proteins is modified by disease. The arrangement of proteins varies from a robust cell to a carcinomatous cell as protein varies or changes occur, which may influence the shape and functionality of the cell (Broodman et al., 2017). Proteomics as well as Genomics are used to sync up gene and protein research demonstrated in a specific subset of genes, cells or tissues and to discover potential molecular targets as well. A concurrent assessment of gene and protein tumor molecular expression uses microarray technology. LC related proteins are produced by LC tissues, which in turn appear to develop high-affinity auto-antibodies to these proteins from the immune system.
Blood biomarkers include specific tumor-associated antigen proteins and auto-antibodies that may be observable 1 to 3 years prior to clinical diagnosis (Jain, 2016;Zhong et al., 2006). Prior studies also used algorithmic solutions for binary classifications of LC biomarkers. Carcino Embryonic Antigen (CEA), CYFRA 21-1 (cytokeratine 19 fragment), Neuron-Specific Enolase (NSE), Progastrin Releasing Peptide (ProGRP) and Squamous Cell Carcinoma Antigen (SCCA) are some commonly recognized and clinically used LC protein biomarkers, according to Broodman et al. (2017). Other LC protein biomarkers include Antigen-125 (CA-125), Human Epididymis Protein 4 (HE4) and Surfactant Form of Protein B (Pro-SFTPB) (Sin et al., 2013;Taguchi et al., 2013). Table 2 offers a list of some of the recognized LC biomarkers using proteomics and genomics. For instance, earlier work by Adetiba and Olugbara (2015a) revealed that combinations of neural network ensembles with histograms of oriented gradients and affine transformations can be applied for LC detection and classification. In addition, the renowned method of analyzing the Protein Pathway Array (PPA) has been established by the authors in (Li et al., 2013;Liu et al., 2014;Pass et al., 2013;Wang et al., 2011) to classify significant but low sample proteins and phosphoproteins in LC. Despite the specificity of the proteomics and genomics methods, the available data for evaluating the output of these biomarkers for subsequent LC cases and control discrimination is not adequately available (Guida et al., 2018).

Breathomics
Breath analysis, which is also known as breathomics, is a tool that is avant-garde for prompt detection of lung carcinoma. According to Boots et al. (2012), the aim of breathomics research is to identify trends in Volatile Organic Compounds (VOCs) that define pathological metabolic activities in human beings. Breath science began approximately 50 years ago when Pauling used the process of gas chromatography to show that around two hundred and fifty gases make up human breath (Pauling et al., 1971). Nevertheless, it is still in its growing stage, especially with the current advances in the application of artificial intelligence methodologies to analyze the signature of VOC structures in breaths (Boots et al., 2012).
In clinical analysis and environmental investigation, breath scrutiny can be used because it is non-invasive. Application of breath analysis has been explored across different spectrum of diseases. For instance, lung gangrene and necrotizing pneumonia patients' breath smells like a sewer, and a stench of mute or fishy reek can suggest liver sickness. Furthermore, the breath smells of untreated diabetic patients are often characterized as "rotten apples" because of the prevalence of acetone in their breath, while failing kidneys can be accompanied by a urine-like smell. Thus, it has been established in the literature that analyzing the breath prints of a LC patient offers several potential advantages (as listed below) than other conventional medical tests (Ahmed et al., 2017;Shehada et al., 2016;Mertin, 2011): i. For real-time diagnosis and tracking, breath analysis has great potential ii. In a gas matrix (breath), the calculation of gasphase analytes is simplified than in more diverse genetic matrices iii. Expiratory-breath research is uniquely informative, completely non-invasive and an adjunct or precursor to proteogenomics techniques iv. Unlike other specimens, breath can be collected easily as often as necessary v. With no unnecessary side effects, it is simple to acquire and painless for the subject Breath is a mixture of inorganic gas, inert gas and a tiny percentage of the VOC concentration. When breathing, several molecules are emitted into the air in the range of parts per million (ppm) to parts per trillion (ppt) by volume. The VOCs in breath are produced by endogenic or exogenic processes, and their quantitative and qualitative characterization in individuals is not the same. O'Neill et al. (1988) claimed that 28 "fingerprints", which include lipid-peroxidation, hexane, aniline methylpentane and o-toluidine cancer, are present in the respiratory tract of LC victims (Gordon et al., 1985;Sponring et al., 2010). The distinction in the breath of LC and healthier participants is also confirmed by indicating alkanes and benzene compounds among the 22 VOCs for effective identification of LC (Phillips et al., 2003). To date, no less than 3000 VOCs in the breath of healthy volunteers have been reported and a few of the VOCs are Isoprene, Ethane, Pentane, Acetone and etc.
In clinical prognosis, VOCs arise as a result of metabolic reactions and are informative (Phillips et al., 2003;Mochalski et al., 2013;Rocco, 2018;Phillips et al., 1999;Horváth et al., 2005). Because LC cells have specific metabolic features, the chemistry of VOCs in the expiratory breath parallels cell metabolism within the body. These metabolic residual products flow through the bloodstream and are transferred to the lungs, where they are breathed out via the circulation system. This means that alterations in the metabolic processes of the body result in distinctive prints or signatures of VOC. Researchers have also indicated that the unique metabolism of the disease can be found as breath "fingerprints" showing the presence of LC from the study of exhaled breath of LC patients (Sponring et al., 2010;An et al., 2010;Jordan et al., 2010).

Chemical Analysis of Volatile Organic Compound
There are several procedures explained in the literature to sample, investigate and anatomize exhaled VOCs. The Gas Chromatography (GC) technique is the main standard and is mostly used for chemical analysis of VOCs. Proton Breath sampling in the chemical analysis approach is an important stage in the procedure. Initially, the exhaled breath is collected and preserved briefly. Each VOC is evaluated by GC following a desorption process, which is typically followed by mass spectrometry (GC-MS). Depending on the chemical characteristics, the different VOCs are first isolated, sequentially ionized, and classified by their mass-to-charge (m/z) ratio.
The most common sampling method for breath collection is to use Tedlar sampling bags because they are inexpensive, can easily be manipulated, and are reusable. They are produced from inert resources to prevent both diffusion and reactions between the compounds and the bag. Other sampling methods found in the literature include, Mylar Bags, Cold Trap Systems, Bio VOCs and Gas Tight Syringes (Callol-Sanchez et al., 2017;Nakhleh et al., 2017;Oguma et al., 2017;Rudnicka et al., 2011). Sampled VOCs are usually processed using different statistical methods. A few of the research papers for the analytical chemistry of VOCs in the respiratory tract are shown in Table 3.
However, these methods of LC detection are not readily available in health care facilities. This is due to the level of sophistication of the equipment required. They are bulky, expensive and only available in specialized laboratories. Also, these procedures require in-depth preparation of the sample, rare skill-sets to interpret the results and are timeconsuming. Furthermore, breath VOC analysis at part per billion by volume (ppbv) to part per trillion by volume (pptv) involving GC-MS based methods requires pre-concentration on commencement of the procedure. This can enhance indications of some VOCs and potentially miss other VOCs (Schmidt and Podmore, 2015;Van de Goor et al., 2018). Even though these techniques are effective for early identification of LC, they are not compact, have no flexibility of use, and, as such, cannot be used in hospitals or homes (at the Point of Care (POC)).    Ligor et al. (2015) Butane SPME/ GC-MS Tedlar bags PCA 2-methyl-butane 2-pentanone 4-methyl octane Propane 2,4-dimethyl heptanes Propene Zou et al. (2014) 5-(2-methylpropyl)nonane, SPME/GC-MS Tedlar bags Pearson's χ2 test 8-hexylpentadecane 2,6,11-trimethyldodecane 2,6-di-tert-butyl-4-methylphenol Hexadecanal Fu et al. (2014) 2

Electronic Nose Based Volatile Organic Compound Analysis
The E-nose is a system that senses and classifies VOCs in exhaled breath using a sensor array. By using qualitative and pattern-based "breath print" techniques, this technology alleviates the shortcomings of the chemical analysis approach.
The comparison between the human olfactory system and the E-nose is shown in Fig. 1 pictorially. As demonstrated in the top part of Fig. 1, the E-nose consists of a processing unit for sensing, that is, the hardware and the processing unit for automatic pattern recognition. The sensing device consists of an assembly of sensors where, on the basis of chemical properties, each sensor senses a particular biomarker and then transforms it to a digital signal to generate the "breath print". The signature of each biomarker is then used to create a labeled signature database that would be used to teach and build an automated pattern recognizer. Principal Component Analysis (PCA) is primarily used to obtain main variables while the pattern recognition units employ supervised pattern classification algorithms such as k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) and Artificial Neural Network (ANN) for the classification of datasets (Estakhroyeh et al., 2018). E-nose as a breathomics procedure has phenomenal prospects to promote PoC-centered early detection of lung cancer. This technique is economical, non-invasive, has a quick response time and highly portable. Brief descriptions of some of the current E-nose applications are given in Table 4.
Early diagnosis is supported by E-nose. It is an affordable, fast, portable and non-invasive tool that can detect LC at an early stage at the POC. As shown in Table  5, the practicability of using E-noses for detecting health concerns and detecting diseases in the primary phase has been established by different researchers.       Table 5, the majority of the E-noses developed by researchers are dependent on breath samples from Caucasians, which may not correctly identify black people and other races with LC. In addition, the reported E-noses and inferences were trained off-device. The sample sizes utilized in training the existing E-noses were small populations of high-risk patients. Furthermore, shallow machine learning algorithms were largely used to develop the E-noses in the literature, which could possibly hinder the accuracy, sensitivity and specificity that are sufficient for PoC application.

Research Opportunities and Future Directions
This literature review shows that the metabolic alterations associated with LC can be captured from breath as VOC biomarkers, with clinical significance for early detection of LC. Chemical analysis and E-nose-based approaches are used to identify and analyse these VOCs.
Many improvements have been made on the procedure or sensing material since the inception of this technology to create low-cost, efficient and handheld devices. Nevertheless, the state-of-the-art results on E-nose research have shown that the device is largely applicationspecific rather than universal. It can detect and discriminate the production profiles of VOC from microbial infections in-situ and other genetic diseases (Phillips et al., 1999;Längkvist et al., 2013).
Furthermore, it has provided a plethora of benefits to a variety of commercial industries, including the agricultural, biomedical, cosmetics, environmental, food, manufacturing, military, pharmaceutical, regulatory and various scientific research fields. However, further multi-disciplinary research efforts on the technology will culminate in advances that will improve its attributes such as uniformity, consistency, sensitivity, specificity and accuracy. The VOC biomarker method for LC detection offers several advantages over customary imaging techniques by enabling early spotting of malignant nodules at the PoC in a non-invasive and pain-free manner, with the added benefit of low cost. This advantage makes this approach a more probable LC detection method, especially in third world countries.

Conclusion
This study has shown the resurgence of interests in LC early detection and the trends in the application of radiomics through proteomics/genomics to breathomics (E-nose) for LC detection over the years. However, there have been only a few efforts on the deployment of E-nose for early detection of LC at the PoC. With increasing innovations in sensing and detection technologies, machine learning and signal processing, E-nose device can be used as a PoC tool to provide rapid data processing and real-time results. Thus, if VOC biomarkers are leveraged for early detection of LC, there would be drastic reduction in LC patients' mortality rate. A non-invasive, highly specific, sensitive, accurate, cheap, portable and easy to operate intelligent E-nose based on breathomics approach has a great potential for improving the medical outlook of LC.