EEG-based Processing and Classification Methodologies for Autism Spectrum Disorder: A Review

: Autism Spectrum Disorder is a lifelong neurodevelopmental condition which affects social interaction, communication and behaviour of an individual. The symptoms are diverse with different levels of severity. Recent studies have revealed that early intervention is highly effective for improving the condition. However, current ASD diagnostic criteria are subjective which makes early diagnosis challenging, due to the unavailability of well-defined medical tests to diagnose ASD. Over the years, several objective measures utilizing abnormalities found in EEG signals and statistical analysis have been proposed. Machine learning based approaches provide more flexibility and have produced better results in ASD classification. This paper presents a survey of major EEG-based ASD classification approaches from 2010 to 2018, which adopt machine learning. The methodology is divided into four phases: EEG data collection, pre-processing, feature extraction and classification. This study explores different techniques and tools used for pre-processing, feature extraction and feature selection techniques, classification models and measures for evaluating the model. We analyze the strengths and weaknesses of the techniques and tools. Further, this study summarizes the ASD classification approaches and discusses the existing challenges, limitations and future directions.


Introduction
Autism Spectrum Disorder (ASD) is a heterogeneous neurodevelopmental condition characterized by behavioural impairments in social interaction and communication, along with restricted and repetitive behaviours (APA, 2013). ASD is called a spectrum disorder as the symptoms and their severity are unique for each individual. Common symptoms include difficulty in understanding facial expressions, delayed speech and poor comprehension skills. The symptoms start to appear in early childhood within the first three years. A recent report of the Centers for Disease Control (CDC) identifies having siblings with ASD, having older parents and certain genetic conditions as general risk factors of ASD.
The motivation behind this survey is the lack of well-defined automated approaches for ASD diagnosis. In order to support studies on automated ASD classification, it is important to explore various techniques along with the diagnostic processes. This paper explores and analyzes the techniques for EEG pre-processing, feature extraction and classification, which enables to automate the diagnostic process. Moreover, this paper identifies the existing limitations, challenges and suggests future research directions. Hence, the researchers and practitioners can utilize the suggested techniques and address the limitations in the course of the possible research area.
The methodology of the ASD diagnosis is divided into four phases: (1) EEG data collection, (2) pre-processing, (3) feature extraction and (4) classification using learning models. Under EEG data collection we have discussed EEG metadata and challenges due to its diversity. Pre-processing phase discusses different techniques for noise removal, data transformation and popular EEG pre-processing tools. Commonly used EEG-based features for ASD classification, feature extraction techniques and feature selection techniques are discussed under the feature extraction phase. The classification phase states different machine learning algorithms and different evaluation metrics.
Finally, the paper discusses the existing challenges, limitations and potential areas for future work.

Overview of the Current Diagnostic Criteria
The etiology of ASD is still under research and lacks a well-defined medical test for ASD diagnosis. Current diagnostic criteria are behaviour dependent, which utilizes direct observation and standardized interviews (Newschaffer et al., 2007). They are based on the presence or absence of specific behaviours. These practices are generalized as a comprehensive developmental approach, where several characteristics of a child's development are evaluated. These characteristics include different levels of functioning, the child's developmental progress, genetic, family, medical and educational histories and child's ability to apply the skills in everyday life. DSM-IV-TR (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision), ADOS (Autism Diagnostic Observation Schedule), Autism Diagnostic Interview-Revised (ADI-R), The Diagnostic Interview for Social and Communication Disorders (DISCO) and Developmental, Dimensional and Diagnostic Interview (3di) are some techniques used for clinical diagnosis. Among them, ADOS and ADI-R are considered as the main standards (Reaven et al., 2008).
In addition to determining ASD or no-ASD, another key aspect is the autism severity rating. ADOS score is widely used for ASD severity measurement. Besides ADOS and ADI-R, several other scales including Childhood Autism Rating Scale (CARS), Gilliam Autism Rating Scale (GARS) and Autism Behaviour Checklist (ABC) also provide autism severity ratings (Gotham et al., 2009). Severity scores assist in providing specific individualized interventions rather than more general treatment plans. It would also help monitor the change in risk profiles as the child's development progresses and how the subject is responding to intervention.

Behaviour-Independent Diagnostic Practice
According to a recent CDC report, one in 59 children in the United States has been diagnosed with ASD (Baio et al., 2018). In 2010, it was calculated to be 1 in 68. Thus, it is evident that the prevalence of ASD is increasing over the years. ASD might not be a fatal disease, yet the daily activities of autistic people are extremely challenging. Even though ASD cannot be cured, the symptoms can be improved through proper individualized treatment. An early diagnosis would facilitate starting the medication, therapies and social skills training at an early age which enhances a child's response to treatment.
A significant challenge is that the current clinical diagnosis practices are subjective, especially behaviour dependent. Current diagnostic procedures require input from a team of multi-disciplinary professionals. Besides, a complete profile of the child's abilities is required for an accurate diagnosis. Such comprehensive evaluations sometimes take several months or even years, delaying the diagnosis and the treatment. Also, current nosological systems and ASD severity measures work well for children above the age of three, however not so accurate for children younger than two years of age.
Early diagnosis of ASD is difficult as the defining behaviours often become significant only after the first three years and routine well-baby check-ups do not contain simple, reliable measures to identify them. Early diagnosis of milder forms of ASD is even harder as the symptoms tend to overlap with several other diagnoses. Moreover, the early diagnosis needs to be re-evaluated because of rapid development in early ages and the impact of the intervention (Hollander et al., 2011). There also exists the problem of misdiagnosis (Mandell et al., 2007). The symptoms for ASD being diverse and several symptoms being overlapped with other diagnoses similar to ADHD (Mayes et al., 2012) are the major causes for the misdiagnosis.
The fact that etiology and developmental course are getting more diverse with time makes future diagnosis even more challenging. By developing behaviourindependent diagnostic approaches which are simple, affordable and easy to implement in the routine wellbaby check-ups, these challenges can be resolved.

EEG as a Diagnostic Test
A behaviour-independent approach can be designed based on Electroencephalography (EEG). EEG records the electrical activity of the brain by recording the electrical impulses of different frequencies used by neurons for communications through electrodes attached to the scalp. EEG is being studied for a long time to support medical diagnosis (Niedermeyer and da Silva, 2005). The abnormalities in EEG signals have been found to be reliable biomarkers for medical conditions such as epileptic seizures (Tzallas et al., 2009) and Alzheimer's disease (Jeong, 2004). In addition to diagnosis, novel approaches to facilitate treatment plans using EEG have also been proposed (Fan et al., 2015).
Literature reveals that two different types of EEG based approaches were proposed in the past to diagnose ASD: (1) comparison method and (2) pattern recognition and classification approach (Hashemian and Pourghassem, 2014). In the first approach, EEG signal characteristics of typically developing individuals are compared with that of individuals with ASD. This paper focuses on the second approach which adopts machine learning algorithms to analyse the EEG signal and classify ASD.

Phase 1: EEG Data Collection
Recording the EEG data is the first step in the classification methodology. Our focus is not on the technical details of EEG data collection but on the metadata. The metadata of EEG datasets plays a crucial role in deciding the processes carried out in the next phases of classification. The metadata of an EEG dataset generally includes details regarding the sampling frequency, number of electrodes, electrode locations, EEG montage, recording duration, the activities in which the subjects were involved while recording the data and data types. The EEG output is a relative value. The values are generated based on a reference point. The montage provides information about the point of reference. Different EEG montages include bipolar, common electrode reference, average reference, weighted average reference and Laplacian.
The datasets used in the related studies are unique. They have diverse metadata. Different file formats of the EEG data include but not limited to BrainVision file formats (.vhdr, .vmrk, .eeg), European data format (.edf) and BioSemi data format. EEG signals were sampled at different frequencies of 128 Hz, 250 Hz, 256Hz and 500 Hz. While recording the EEG signals subjects were involved in a different set of activities such as blowing bubbles to control the subjects' attention, carrying out ADOS assessment and keeping the subjects in a resting state.
EEG dataset with a different number of channels and different electrode placement locations were also used. International 10-20 system is an internationally recognized electrode placement standard. Placement of electrodes in the locations Fp1, Fp2, F7, F3, Fz, F4,  F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1 and O2 according to International 10-20 system is shown in Fig. 1. One major limitation is that because of the diverse EEG datasets, the proposed approach becomes specific to the dataset. None of the studies has tested their approaches over different datasets with varying metadata. Hence it is challenging to measure how well the approaches can be generalized.

Phase 2: Pre-Processing
Overview of EEG Signal Pre-Processing Data pre-processing is a crucial step for any machine learning based approach because real-world datasets contain incomplete, noisy and inconsistent data. Poor data quality will result in poor classification. According to (Han et al., 2011), major tasks in data preprocessing include data cleaning, data integration, data transformation, data reduction and data discretization. This paper emphasizes the noise elimination techniques because of its significance in the context of classifying ASD.  The noise in the EEG signal is induced by both nonphysiological factors (external environment) and physiological factors (because of the subject being examined). Several external artefacts are discussed in (Tandle and Jog, 2015). The artefacts that depend on the subjects are of three main types: electrooculogram (EOG), electromyogram (EMG) and cardiac activity. EOG is the noise generated by eye blink and cornea movement, while EMG is the noise generated by muscle activity around the electrodes, specifically in the neck, face and scalp.

Independent Component Analysis
Independent Component Analysis (ICA) is a multivariate analysis which decomposes the original signal into a set of Independent Components (ICs). It separates the signals from different sources from a set of mixed signals. Two important assumptions are made in ICA: (1) the signals from different sources are independent of each other and (2) independent components have non-gaussian distribution. Artefact removal in EEG signals using ICA is a three-step process: (1) decomposing into ICs, (2) discarding standalone ICs and (3) concatenating the remaining ICs to form an artefact-free signal (Lai et al., 2018).
Popular EEG signal processing tools including EEGLAB provide functionalities to perform ICA (Delorme and Makeig, 2004). Even though multiple ICA algorithms exist, FastICA, Infomax and JADE are being widely used (Azlan and Low, 2014). Several studies report second-order blind identification (SOBI), an ICA algorithm, as a successful technique to remove all types of artefacts from the EEG signal (Urigüen and Garcia-Zapirain, 2015). ICA has been used as a pre-processing technique for ASD classification in (Djemal et al., 2017). It has also been used in (Abdulhay et al., 2017) as a pre-processing step to detect abnormal EEG activities and neural connectivity in autistic individuals.

Principal Component Analysis
Principal Component Analysis (PCA) converts a set of possibly correlated variables into a set of linearly uncorrelated variables using orthogonal transformation. The linearly uncorrelated variables are called the principal components. The principal components are constructed in such a way that they maximize the variance and the i th principal component is orthogonal to the (i-1) th principal component. The principle behind using PCA as a denoising technique is that the principal components with relatively higher variance compared to the effect of the noise are relatively less noisy. Denoising techniques based on PCA have been presented in (Kang and Zhizeng, 2012;Turnip and Junaidi, 2014). However, the survey done in (Urigüen and Garcia-Zapirain, 2015) reveals that recent works prefer ICA over PCA since artefacts are better modeled as independent components rather than orthogonal components.

Wavelet-based Analysis
Wavelet is a rapidly decaying oscillation with a zeromean value. There are two types of wavelet transforms, continuous wavelet transform (CWT) and discrete wavelet transform (DWT). DWT has been frequently used for denoising signals. Denoising using DWT is a three-step process: (1) decompose, (2) discard and (3) reconstruct. Initially, the signal is filtered using a low pass and a high pass filter and the outputs are called approximation coefficients and detail coefficients, respectively. Signal decomposition using DWT is shown in Fig. 2.     (Bosl et al., 2018) O X ASD classification using EEG and eye movement (Thapaliya et al., 2018) X X Classifying ASD using MS-ROM/I-FAST algorithm (Grossi et al., 2017) X ASD diagnosis using DWT, Shannon entropy and ANN (Djemal et al., 2017) O X X X Wavelet-based ASD classification (Cheong et al., 2015) O X X ASD diagnosis utilizing brain connectivity (Jamal et al., 2014) X X O Fuzzy synchronization likelihood methodology for ASD diagnosis (Ahmadlou et al., 2012a) O ASD diagnosis based on improved visibility graph fractality (Ahmadlou et al., 2012b) O X EEG as a biomarker for distinguishing ASD children (Bosl et al., 2011) X Classification of ASD using fractal dimensions (Ahmadlou et al., 2010) O X Frequency 3D mapping and interchannel stability of EEG as indicators towards ASD diagnosis (Abdulhay et al., 2017) X X O O Diagnosing ASD utilizing EEG spectral coherence (Duffy and Als, 2012) X X X X ASDGenus: channel optimised classification using EEG (Haputhanthri et al., 2019) X O X The high-frequency band (detail coefficients) contains most of the noise and useful information as well. The useful information needs to be preserved while removing the noise. A threshold value is chosen and the coefficients with magnitudes less than the threshold value are discarded. The signal is then reconstructed based on the new coefficients (inverse DWT). The low pass subband is decomposed further at multiple levels for further analysis. Table 1 states the five frequency bands and noise separated using DWT, as the initial step of noise removal.
In (Kumar et al., 2008) and (Zhou and Gotman, 2004), techniques based on wavelet transformation to denoise ASD using EEG signals have been proposed. Daubechies wavelet was used in (Bosl et al., 2018;Djemal et al., 2017) and Coifman wavelet was used in (Ahmadlou et al., 2012a), to perform DWT. CWT was used in (Jamal et al., 2014). However, in these studies, wavelets were used for signal decomposition instead of noise removal.

Visual Inspection
Manual noise removal using visual inspection is an easy and reliable approach. However, it is hard to perform when the dataset contains long duration signals from many subjects. Visual inspection was used in (Thapaliya et al., 2018) as a pre-processing step in classifying ASD. Table 2 summarizes different pre-processing techniques used to process the EEG signal. Even though the last two studies are not related to classifying ASD using machine learning algorithms, they have been included to introduce new techniques for noise removal as noise filtering is independent of the application. The "X" symbol indicates techniques used for noise filtering and the "O" symbol indicates other pre-processing techniques used for data transformations. Even though DWT can be used for removing noise, the studies have used it primarily to decompose the signal into different frequency bands. Frequencies outside the range of the frequency bands were filtered using band-pass filters in most of the researches. Band-pass filters are simple and easy to implement. I-FAST and Makoto's pre-processing pipeline combine several techniques for EEG signal preprocessing. Apart from the techniques discussed earlier, adaptive filtering, Fourier transform, source component technique, multivariate regression and empirical mode decomposition have also been used for artefact removal. The source component techniques are a combination of two approaches for artefact removal based on brain electric source analysis and principal component analysis proposed in (Lins et al., 1993;Berg and Scherg, 1991).
The studies done in (Khatwani and Tiwari, 2013;Urigüen and Garcia-Zapirain, 2015;Lai et al., 2018) have presented surveys of denoising techniques. Khatwani and Tiwari (2013) have discussed denoising techniques based on PCA, ICA, wavelet and wavelet packet in their work. The effectiveness of these techniques was measured based on Mean Squared Error (MSE), signal to noise ratio (SNR) and peak signal to noise ratio (PSNR). High SNR and PSNR values and low MSE values are indicators for less noisy signals. They conclude that the wavelet-based method produces better results based on the MSE, SNR and PSNR values calculated in different studies. Besides the work done in (Lai et al., 2018), has presented ICA and wavelet-based analysis that uses statistical analysis methods and additional artefact removal techniques.
Urigüen and Garcia-Zapirain (2015) presented a detailed survey of denoising techniques in their work. Their study explores the noise removal techniques under the following major categories: linear regression methods, EOG correction methods, filtering methods, blind source separation (BSS) methods, source decomposition methods, the combination of different algorithms and other methods. ICA and PCA were categorized under BSS methods with several other techniques. Wavelets were categorized under source decomposition methods. Methods suitable for removing specific artefact types such as ocular artefacts, muscle artefacts, cardiac artefacts and mixed artefacts were also discussed. Their study concludes that the best technique for a given scenario should be chosen considering the type of EEG signal, artefacts that are present and the signal to contaminant ratio. There is no best technique which can be applied to all scenarios.

EEG Pre-Processing Tools
Several tools with user-friendly graphical user interface (GUI) have been developed to facilitate the analysis of EEG recordings. This section summarises some of the widely used tools.

EEGLAB
EEGLAB was initially developed as a MATLAB toolbox with a GUI to process EEG data (Delorme and Makeig, 2004). New tools and plugins for EEGLAB have been continuously developed over time making it a versatile pre-processing tool. In (Delorme et al., 2011), the authors have summarized several pre-processing tools which can be integrated with the EEGLAB. Some of the tools are EEGLAB STUDY Design, SIFT (source information flow toolbox), NFT (neuroelectromagnetic forward head modelling toolbox), BCILAB (brain-computer interface LAB) and ERICA (experimental real-time interactive control and analysis). These tools are freely available with a GUI/CLI (Command Line Interface) environment.
Recent versions of EEGLAB can process EEG, magnetoencephalography (MEG) and other electrophysiological data. Some of the useful features are a user-friendly GUI, the privilege for experienced MATLAB users to interact using MATLAB scripts, ability to handle multiple data formats, effective data visualization, ICA functionality, time/frequency transforms, continuous upgrades with new tools and plugins and availability of ample tutorials.

Brainstorm
Brainstorm is an opensource application for MEG/EEG analysis (Tadel et al., 2011). This application is intended to provide user-friendly tools to the scientific community. Hence, Brainstorm provides a rich and intuitive GUI (Graphical User Interface). It is written using MATLAB scripts and Java which makes it a portable, cross-platform software (a stand-alone version for users who do not own a MATLAB license is also available). The end users without any programming knowledge can use the software easily as well. Besides, advanced users have the privilege to interact using MATLAB scripts similar to EEGLAB. It is well documented with enough support online. Apart from the inbuilt pre-processing pipeline, other tools such as EEGLAB can be used for pre-processing and the results can be imported. Brainstorm supports different file formats including Neuroscan (cnt, eeg, avg), Brainvision BrainAmp, EGI (raw), EEGLAB, Cartool and Generic ASCII text files.

Overview of EEG Feature Extraction
After pre-processing the EEG signal, the next step is to extract features to train the learning model. Noise filtering techniques for EEG are generally independent of the application. We can use the same noise filtering techniques regardless of the considered disorder type. However, feature extraction techniques are often application specific. Depending on the features that we need to extract, the feature extraction techniques vary. In general practice, features which have a strong correlation with the target class are selected. If the root cause of ASD is known, features can be easily selected utilizing the available background knowledge. Since the etiology of ASD is yet to be discovered, the feature extraction is a trial and error approach. Even though the etiology is unknown, several studies have focused on the abnormality identification in EEG signals of autistic individuals. Such abnormalities can be used as features in the classification task.

EEG-based Features for ASD Classification
Power, Hemispheric Asymmetry and Coherence Wang et al. (2013), have reviewed abnormal power, abnormal hemispheric asymmetry and abnormal coherence in resting state EEG. EEG power is further categorized into relative and absolute power. Relative power measures the activity in one band compared to other bands while absolute power measures the activity in one band independent of the others. Their work has summarized the variations in absolute and relative powers of different frequency bands (delta, theta, alpha, beta and gamma) of different brain regions. They have identified a U-shaped profile where high-frequency bands (beta, gamma) and low-frequency bands (delta, theta) display excessive power while middle range frequency bands (alpha) display reduced power as shown in Fig. 3.
Enhanced power in delta and theta bands has been found in both relative and absolute powers in multiple regions. Similarly, the alpha band also shows reduced power in both relative and absolute powers. However excess power is seen in relative beta and absolute gamma only. Their work also highlights that according to most of the existing literature, the left hemisphere exhibits enhanced power than the right hemisphere in ASD patients. Separate studies report the dominance of left hemisphere in the delta, alpha and beta powers over the right hemisphere. Finally, the presence of weaker longrange coherence patterns has also been pointed out.

Statistical Features
Standard deviation and mean are the commonly used statistical features. Statistical features were used in (Bosl et al., 2018;Cheong et al., 2015;Djemal et al., 2017;Thapaliya et al., 2018) to classify ASD.

Entropy
Entropy is one of the frequently used features in ASD classification. Entropy is a measure of uncertainty of random variables. If X is a discrete random variable, its entropy is calculated according to Equation 1: where, p(x) is the probability mass function of X. There are many entropy-based methods such as sample entropy, Shannon entropy, multiscale entropy and modified multiscale entropy. Entropy has been used in (Bosl et al., 2018;Djemal et al., 2017;Thapaliya et al., 2018) for the diagnosis of ASD. Several EEG-based features for ASD classification including EEG rhythm, absolute and relative power, coherence, mu wave suppression, cordance and multiscale entropy have been discussed in (Hashemian and Pourghassem, 2014).    (Bosl et al., 2018) X X X X ASD classification using EEGand eye movement (Thapaliya et al., 2018) X X X X Classifying ASD using MS-ROM/I-FAST algorithm (Grossi et al., 2017) X X ASD diagnosis using DWT, Shannon entropy and ANN (Djemal et al., 2017) X X Wavelet-based ASD classification (Cheong et al., 2015) X ASD diagnosis utilizing brain connectivity (Jamal et al., 2014) X X Fuzzy synchronization likelihoodmethodology for ASD diagnosis (Ahmadlou et al., 2012a) X X ASD diagnosis based on Improved visibility graph fractality (Ahmadlou et al., 2012b) X X EEG as a biomarker for distinguishing ASD children (W. Bosl et al., 2011) X Classification of ASD using fractal dimensions (Ahmadlou et al., 2010) X X ASDGenus: channel optimized classification using EEG (Haputhanthri et al., 2019) X X

Feature Extraction Techniques
Feature extraction techniques are used to compute the selected features. However, there are techniques which are applied during pre-processing the signal to facilitate feature extraction such as ICA, PCA, DWT and adaptive filtering. For instance, instead of calculating the standard deviation of the original signal, DWT can be applied to decompose the signal at multiple levels. Then standard deviation can be calculated for the decomposed signals. Most of these algorithms split the original signal into multiple components and they can also be used for noise filtering. These techniques only pre-process the signal to facilitate feature extraction but do not extract any features (Lakshmi et al., 2014;Azlan and Low, 2014). Table 3 summarizes different techniques used for feature extraction in the related studies. Statistical feature extraction and entropy-based techniques are more common compared to other techniques. Standard deviation and mean are the common statistical features that are extracted. Among several entropy-based techniques Shannon entropy, multiscale entropy and modified multiscale entropy have been used in the related studies. One noteworthy aspect is that unlike preprocessing techniques, feature extraction techniques are sparsely distributed. Because of the unknown etiology, studies intend to discover new features which have strong correlations with ASD classification. Almost all the studies use a unique set of features and as a result, a different set of feature extraction techniques were used.

Feature Selection Techniques
After the feature extraction phase, often many features will be available. For example, suppose the EEG dataset contains data from 128 channels and after decomposing the signal into five frequency bands, standard deviation, mean and entropy were calculated. At the end of the process, 1920 features (128 channels x 5 frequency bands x 3 features) would be generated. Training a model with 1920 features requires a larger number of training samples. However, in many of the previous studies, only less than 100 samples were available. In addition, irrelevant features will negatively impact the classification. One challenge after feature extraction is to select the best features which contribute to the classification process. Feature selection reduces overfitting, improves accuracy and reduces training time. Some of the commonly used feature selection techniques are correlation-based feature selection (CFS), analysis of variance (ANOVA), PCA and training with input selection and testing (TWIST) algorithm.
Different feature selection techniques used in related studies are summarized in Table 3. Here, RQA denotes Recurrence Quantitative Analysis and DFA indicates Detrended Fluctuation Analysis. ANOVA has been used in several related work by the same author. Feature selection techniques that were used are also unique to different studies. However, there is no significant reason behind and often it is a choice based on which technique produces the best results.
There are no best features, best feature extraction or feature selection techniques. Often, it is a trial and error approach. Besides, since the etiology of ASD is unknown, there is a high possibility for discovering new features with a strong correlation to ASD classification. The best approach is to try different combinations of feature sets and techniques and select the one which produces the best results.

Introduction to Classification
The selected features from the feature extraction phase are fed as input to the fourth phase, which is the final phase in diagnosing ASD. In this section, we have summarized different machine learning algorithms which have been used frequently in the context of ASD classification and different techniques to evaluate the correctness of the trained model.
For the classification task, the dataset is divided into two mutually exclusive sets, one for training the model and the other one to test the model. Any machine learning based classifier functions in the following manner. Initially, a classification model is built based on the training data. Then its correctness is measured by applying the model on the test set. If the obtained accuracy is not satisfactory, the model will be retrained and retested. It is impossible to universally define an algorithm as the best fit for a specific problem. Finding a suitable algorithm is an empirical task.
In this section, our intention is not to provide an indepth understanding of the learning algorithms but to give an abstract idea about the algorithms, their pros and cons and their applications in the context of ASD.

Support Vector Machine
The idea of support vector machine was introduced in the 1990s by Boser, Guyon and Vapnik. The original SVM is a supervised, non-probabilistic, binary classifier. It can classify only linearly separable data. Using the idea of kernels, SVM can classify data which are not linearly separable by mapping them to a higher dimensional space (Burges, 1998). SVM classifies the data points by constructing a hyperplane that separates the data points of available target classes as shown in Fig. 4.
Some of the advantages of using SVM are the ability to handle high dimensionality (>10 6 ), efficient memory usage and versatility (due to the ability to apply new kernels). If the number of features is greater than the number of training samples, it will lead to low results.
Besides, SVM does not offer a direct probabilistic interpretation. Yet, the distance from the hyperplane can be used as an indirect measure of the probability. SVM was used in (Bosl et al., 2018;Jamal et al., 2014;Thapaliya et al., 2018) to classify ASD.

Logistic Regression
Logistic regression has been used in the field of statistics starting from the 19th century. In machine learning, logistic regression is a popular algorithm for binary classification problems, similar to classifying ASD and no-ASD (Dreiseitl and Ohno-Machado, 2002). When the model is trained, values for the weights and bias are learned. The core function used is a sigmoid function. The output value will be in the range of 0 to 1. By setting a threshold value T 0 , output values above T 0 are classified to be class and output values below T 0 are classified to be the other class. In this context, the two classes are ASD and no-ASD. Logistic regression is simple, easy to implement and does not require extreme computational power. Authors of (Thapaliya et al., 2018;Grossi et al., 2017) have used logistic regression to diagnose ASD.

Naïve Bayes
Naïve Bayes classifier is considered as the gold standard against which other algorithms are compared. It is based on the Bayes' theorem and considered naïve because of its class conditional independence assumption (Rish, 2001). Even though the assumption does not hold in many real-world problems, it produces reasonable, satisfactory results. Unlike SVM, it can predict the probability for a given sample to belong to a specific target class. Naïve Bayes classifier requires relatively less amount of training data and it is scalable, simple, easy to implement and fast. Among the proposed ASD classification approaches, Naïve Bayes has been used in (Thapaliya et al., 2018;Grossi et al., 2017).

Random Forest
Random forest is an ensemble algorithm which builds multiple models and combines the results of each model to generate the overall result (Liaw and Wiener, 2002). It creates a collection of decision trees from randomized subsets of the training data and during classification, results from each decision tree are combined and a result is generated. Building several models increase the accuracy of the result by reducing the effect of noise and other biases. However, many decision trees will slow down the algorithm. In (Bosl et al., 2018;Grossi et al., 2017) random forest technique has been used to classify ASD.

K-Nearest Neighbour (KNN)
Classification algorithms can be divided into lazy learners and eager learners. Lazy learners simply store the training data and do not build any models. They wait until a sample is provided for the classification. Eager learners construct a classification model using the training data and use the model for classification. Lazy learners are relatively slow during prediction. KNN is a lazy learning algorithm. Given a data sample, it would find K number of nearest neighbours from the training set and target class of the given sample will be decided based on the most common target class of the neighbours (Peterson, 2009). Among the proposed machine learning based ASD diagnosis approaches, KNN was used in (Bosl et al., 2018;Grossi et al., 2017).

Neural Networks
A single node in a neural network (Haykin, 2009) imitates a neuron in the human nervous system. They consist of an input layer, one or more hidden layers and an output layer. Each layer consists of one or more nodes. A model of a neural network is shown in Fig. 5. Each node is a computational unit which calculates the weighted sum of inputs from the previous layer.

Input Layer
Hidden Layers Output Layer

Output
In order to add non-linearity, activation functions are introduced into the nodes. The weighted sums are fed as parameters for the activation functions. The activation function decides the output of a node. Some of the common activation functions are ReLU (Rectified Linear Unit), sigmoid and linear functions. Given enough amount of training samples, neural networks can classify most of the complex relationships. However, it requires a considerably large amount of training data for learning. Majority of the proposed approaches use a neural network. Some of them are (Thapaliya et al., 2018;Ahmadlou et al., 2012a;Cheong et al., 2015) and (Djemal et al., 2017).
Different algorithms used for classification in the related studies are summarized in Table 4. As the table illustrates, the neural network has been most frequently used for classification. Next to neural networks, SVM is the most common algorithm. Compared to other techniques discriminant analysis, sequential minimal optimization and k-contractive map have been seldom used. However, we cannot define one algorithm as the best since it depends on several factors. ASD classification being a medical application, interpretability of the decision is important. Algorithms such as decision trees generate classification models with better interpretability.
Models generated by algorithms similar to SVM and neural network are black boxes which are difficult to interpret. However, they can model complex relationships unlike simpler methods such as decision trees and Naïve Bayes. Further, if sufficient data is not available neural networks will not produce satisfactory results since it requires a large amount of data to train the model. Similarly, not all algorithms can handle noisy data. It is a standard practice to start with simpler models and if the results are not satisfactory then move on to more complex models to avoid overfitting. If many samples are available choosing neural networks has a high probability for producing more accurate results.

Evaluation Techniques
Evaluating the learning model is an essential step in any classification task. Choosing evaluation techniques and evaluation procedures which are not suitable can lead to biased and misleading results. Two popular evaluation techniques are the holdout method and crossvalidation method.

Holdout Method
It is widely known as the training-testing approach. In the holdout method, the dataset is randomly partitioned into a training set and a test set which is mutually exclusive. The rule of thumb is to allocate twothirds of the data for training and one-third for testing. Random subsampling is a variation of the holdout method in which several iterations of training-testing are carried out and the overall accuracy is obtained by combining the accuracy of each iteration.
One drawback of this approach is that when there is not enough data, the produced accuracy values are not reliable. Besides, if the same training set is used for several iterations, there is a high tendency for overfitting, where the model classifies the training sets well, however, performs poorly when classifying new samples.

Cross-Validation
Cross-validation is very useful when only a limited number of data samples are available. In k-fold crossvalidation, the dataset is divided into k partitions of approximately equal size. In each iteration, one partition is used for testing and all others are used for training. The overall accuracy is the number of correctly classified samples from all the iterations divided by the total number of samples. 10-fold cross-validation and leave-one-out cross-validation (only one sample is used for testing in each iteration) are commonly used k-fold cross-validation approaches.    (Thapaliya et al., 2018) Classifying ASD using MS-ROM/I-FAST algorithm (Grossi et al., 2017) X X X X X X X X X ASD diagnosis using DWT, Shannon entropy and ANN (Djemal et al., 2017) X X Wavelet-based ASD classification (Cheong et al., 2015) X X ASD diagnosis utilizing brain connectivity X X X (Jamal et al., 2014) Fuzzy synchronization likelihood methodology X X for ASD diagnosis (Ahmadlou et al., 2012a) ASD diagnosis based on improved visibility X X graph fractality (Ahmadlou et al., 2012b) EEG as a biomarker for distinguishing ASD X X X X children (Bosl et al., 2011) Classification of ASD using fractal X X dimensions (Ahmadlou et al., 2010) ASDGenus: channel optimised classification using EEG (Haputhanthri et al., 2019) X X X X X Evaluation techniques used in the related works are also summarized in Table 4. Most recent studies carried out after 2017 have used cross-validation while the holdout method had been popular among the initial studies. Since the number of samples in the dataset are often limited in most of the studies, using crossvalidation would produce more reliable results. Further, compared to the holdout method a larger fraction of the dataset can be used for training. Thapaliya et al. (2018), aim to identify ASD using a combination of EEG and eye movement data. They have also compared different machine learning classifiers. EEG data were recorded from 128 channels at a sampling rate of 500Hz while subjects were watching videos. Among the data collected from 52 participants, data of 34 participants were used in the study. Since the scope is limited to EEG, eye movement metrics are not discussed in detail. In the pre-processing stage, Makoto's pre-processing pipeline was used paired up visual inspection. For feature extraction, mean, standard deviation and entropy values were used. Fig. 6 shows the workflow of the classification process using EEG data.

ASD Classification Approaches
The results were obtained after running the tests 200 times, except for DNN due to its computationally exhaustivity. Here, the ratio between the training and test set was 80:20. Based on the results of 10x2 crossvalidation, 100% accuracy has shown for the combined dataset using Naïve Bayes and Logistic Regression Classifiers. Using only the eye movement data, Logistic Regression and DNN have achieved 100% accuracy.
A data-driven approach is followed by Bosl et al. (2018), to classify ASD subjects as shown in Fig. 7. Unlike most of the other studies, EEG data collected from 188 participants were used. It includes 89 Low-Risk Controls (LRC) (among which 3 were diagnosed with ASD) and 99 High Risk for Autism (HRA) (among which 32 were diagnosed with ASD). In addition, the participants were in between the ages of 3 to 36 months of age and were scheduled several visits in that period. During the collection period, bubbles were blown to control the child's behaviour. EEG data from either 64 or 128 channels were recorded but only the channels in the International 10-20 system were used for the analysis.
They have extracted features using Sample Entropy, DFA and Recurrence Quantitative Analysis (RQA). For each channel, the 9 features: sample entropy, detrended fluctuation analysis, entropy derived from recurrence plot, max line length, mean line length, recurrence rate, determinism, laminarity and trapping time were generated. The features of interest were filtered using the feature ranking methods (Recursive Feature Selection).

Cross-validation
For the classification of ASD or no-ASD, only the data from ASD and LRC subjects were used for training with leave-one-out cross-validation scheme. The HRA subjects (test set) were classified using data from the ASD and LRC subjects as the training set. SVM was used for classification. The distance from the hyperplane which is used as the decision boundary in SVM is used to calculate the severity score between the range of 1-10. Classification using SVM achieved 100% accuracy in distinguishing ASD subjects from the LRC subjects. However, when classifying HRA subjects, classifier's accuracy depreciated as it was challenging for SVM to classify HRA subjects who were placed close to the decision boundary. Another prominent feature of this study is that severity scores were calculated, and they had a strong correlation with the actual severity score. Multi-Scale Ranked Organizing Map coupled with Implicit Function as Squashing Time algorithm (MS-ROM/I-FAST) is an Artificial Neural Network based system with the capability to extract valuable features from EEG. Mainly it does not require any preliminary preprocessing. The algorithm was able to distinguish Mild Cognitive Impairment and/or Alzheimer's Disease with an accuracy of 94%-98%. The work done in (Grossi et al., 2017), has tried to measure its effectiveness in identifying autistic people. Their work involves 25 participants, 15 ASD (13 males and 2 females; 7-14 years of age; mean -10.4) and 10 typically developing (4 males and 6 females; 7-12 years of age; mean-9.2) individuals.
The collected data were resting state EEG obtained while the participants were opening and closing their eyes. Data were collected for 3 minutes at a sampling rate of 256Hz based on the International 10-20 system. The structure of I-FAST is demonstrated in Fig. 8. It consists of 3 phases: squashing phase, noise elimination phase and classification phase. In normal practice, noise filtering is followed by feature extraction.
However, the I-FAST algorithm transforms the EEG channels into feature vectors first using MSE and MS-ROM in the unsupervised squashing phase. Then in the noise elimination phase, irrelevant features are considered as noise and are filtered. The outputs of the MS-ROM are fed into the TWIST algorithm (Buscema et al., 2013) to select the best features.
Finally, with the help of machine learning algorithms, the classification phase classifies the data. A novel algorithm, MS-ROM, based on the Self Organizing Map (SOM) neural network is presented. It consists of three steps: sampling, projection and ranking. In the sampling phase, EEG signals are sampled many times at different scales and using SOM, the generated subsamples are projected into a two-dimensional grid. In the ranking phase, the generated grids are ranked based on cell frequency. Seven learning algorithms have used for the classification process: sine net neural networks (Sn), logistic regression (LR), sequential minimal optimization (SMO), K-NN, K-Contractive Map (K-CM), Naïve Bayes and Random Forest. This approach was able to produce 100% accuracy consistently with the trainingtesting protocol (11 ASD and 6 control subjects for training and the rest for testing) and with leave-one-out protocol best results were produced by Random Forest with an accuracy of 92.8% and K-Contractive Map and k-Nearest Neighbours with the accuracy of 87.3%.
A Computer Aided Diagnosis (CAD) system for ASD diagnosis using DWT, Shannon entropy and Artificial Neural Network (ANN) was proposed in (Djemal et al., 2017). EEG data were recorded from 19 subjects, 9 autistic subjects (six males and three females) between 10 and 16 years of age and 10 typically developing males between 9 and 16 years of age. Data were recorded in a relaxing state from 16 channels based on the international 10-20 acquisition system, sampled at 256 Hz and filtered using a band-pass filter. To remove ocular artefacts ICA was applied to the channels located close to the eyes (Fp1, Fp2, F7 and F8). Next, the signals were filtered using an elliptic band-pass filter and segmented into 10 minutes long segments. For better feature extraction, the EEG signal was decomposed into approximation and detail coefficients using DWT. A four-level DWT decomposition with Daubechies-four (db4) wavelet was used and the first four detail coefficients (D1, D2, D3 and D4) and the approximation coefficient (A4) were calculated. Then five statistical features (mean, standard deviation, variance, skewness and kurtosis) and four entropy features (log energy, threshold entropy, Renyi entropy and Shannon entropy) were extracted from all the DWT coefficients and the original EEG signal as demonstrated in Fig. 9. Two-layer Artificial Neural Network (ANN) was used for classification. Using 10-fold cross-validation, accuracy, sensitivity and specificity were measured.
The classification was carried out in several stages. In stage one, statistical features and entropy features were used separately as inputs to ANN keeping the segment length fixed. After identifying standard deviation and Shannon entropy as the best features, further optimizations were carried out in the next stages. Tests were carried out to find the optimum segment length and frequency band (wavelet coefficient). Results obtained using overlapping and non-overlapping segments were also analysed. Best segment length was found to be 50 sec. Similarly, detail coefficients D1, D2, D3 and D4 produced the best accuracy of 98.9%. The test results for overlapping and non-overlapping segments revealed that 60 sec long segments with half-segment overlapping produce the best accuracy of 99.7%. The results conclude that the best approach for the CAD system is to extract standard deviation and Shannon entropy from the detail coefficients using 60 sec long half overlapping segments. Cheong et al. (2015), have proposed a classification technique based on DWT. The EEG dataset, used in this research was recorded during stimulation of three tastes (salty, sour and sweet). Data were recorded from 30 ASD subjects between 3 and 10 years of age based on the International 10-20 system at a sampling rate of 500Hz. They were identified with 3 levels of autism, 5 subjects with mild autism, 11 subjects with moderate autism and 14 subjects with severe autism. Only the channels related to the taste sensory (C 3 , C 4 and C z ) were selected for analysis. Fig. 10 shows the process.
Noise filtering was performed using voltage threshold method and bandpass filter with band frequency 0.4Hz to 60Hz was applied. In the feature extraction phase, DWT was applied using db4 as the mother wavelet. Standard deviations of the alpha frequency band (8Hz -16Hz) of the three channels for three different tastes were calculated and used as inputs to the classification phase. A two-layer ANN was used for classification. Through trial and error data division of 65% for training, 10% for testing and 15% for validation was found to be producing the best results of accuracy 92.3% with a mean squared error of 0.0362. One significant feature of this methodology is the usage of a validation set. Other related studies only used training and test sets. When we adjust the model continuously based on the results obtained by evaluating the model on the test set, most likely we would end up overfitting the model to the test set. By using a validation set, the model can be evaluated for overfitting to the test set.
The authors of (Jamal et al., 2014) analyzed the functional connectivity of the brain using phase synchronization to find a reliable biomarker for diagnosing ASD. Studies suggest that inactivation of brain circuitry associated with face processing might be the cause for the challenges faced by autistic children to understand facial expressions.  Hence, the connectivity of the brain was explored in order to find differences between ASD and normal children during face perception. Data were collected from 24 subjects, 12 children with ASD between 6 and 13 years of age (average = 10.2) and 12 typically developing children between 6 and 13 years of age (average = 9.7) while performing face perception tasks. Data were obtained from 128 channels at a sampling rate of 250 Hz and filtered within the range of 0.5 Hz to 50 Hz using a band-pass filter. Fig. 11 shows the methodology proposed in the study. Continuous Wavelet Transform (CWT) was applied and phase synchronized states (synchrostates) were obtained.
Since obtaining synchrostates is a long procedure, we have omitted the details. The brain connectivity graph was built where the EEG electrodes are the nodes and the synchronization values between them are the weights of the edges. Modularity, transitivity, characteristic path length, global efficiency, radius and diameter of the brain connectivity graph were selected as features for the classification task. These six features were calculated corresponding to the three facial stimuli (fear, happy and neutral) with minimum and maximum occurring states. Thus 36 features were obtained in total. Fisher's discriminant ratio was used for feature ranking. Nine different subsets of the features were created and used for classification separately. Discriminant analysis and SVM with polynomial kernel were used for classification. When using all the min and max state features for all three stimuli and all the max features for all three stimuli, classification using SVM with secondorder kernel produced the best accuracy of 94.7% with sensitivity 85.7% and specificity 100%. Ahmadlou et al. (2012a), have proposed an approach which uses Fuzzy Synchronization Likelihood (Fuzzy SL). This approach analyses the functional connectivity of the brain of normal and autistic children using Fuzzy SL and diagnoses ASD based on that. An abstract workflow of their approach is demonstrated in Fig 12. EEG data were collected from 18 subjects, 9 autistic children between 7 and 13 years of age (average = 10.8) and 9 typically developing children between 7 and 13 years of age (average = 11.1), according to International 10-20 system at a sampling rate of 256 Hz.
Applying Butterworth filter EEG is filtered within the range of 1-60Hz and using the wavelet transform signal was divided into 5 frequency bands: gamma, beta, alpha, theta and delta. The electrode locations were categorized into 7 regions: prefrontal, frontal, right temporal, left subjects for testing) 100 times and obtaining the average, an accuracy of 95.5% was obtained with a variance of 1.2%. Since the number of subjects involved in the study is low, using cross-validation would have increased the reliability of the results and allowed more data to be used for training. In addition to the classification, the authors also claim that measured regional Fuzzy SLs can be used in the neurofeedback treatment as well.
Another study by the same authors to diagnose ASD using improved visibility graph (VG) fractality is presented in (Ahmadlou et al., 2012b). Power of scalefreeness of VG (PSVG) and improved PSVG were evaluated in their study for effectiveness in classifying ASD. Visibility graphs convert a fractal time series to a scale-free graph characterized by P(k) = k -r , where P is the probability distribution of the edges, k is the order of the nodes and r is the power of scale-freeness. A scalefree graph is a graph whose degree distribution follows a power law. PSVG is the value of the slope when log 2 [P(k)] is plotted against log 2 [k]. The same data used in their previous study (Ahmadlou et al., 2010) was used for this study and the same methodology as in the previous study was followed until wavelet decomposition. The details of their previous study will be discussed later in this section. The classification methodology is presented in Fig. 13.
PSVG and improved PSVG values were calculated for all 5 sub bands. Using ANOVA features with pvalues less than 0.01 were selected as inputs to the EPNN. PSVG computed for the beta band and improved PSVG computed for beta and alpha bands were selected. About 80% of the data were selected for training and 20% were used for testing. The classification was performed 200 times. Classification based on improved PSVG achieved an average accuracy of 95.5% with 1.7% variance, while classification based on PSVG achieved an average accuracy of 84.2% with 1.8% variance.
The diagnosis approach proposed in (Bosl et al., 2011) is one of the initial attempts which utilized analysis of EEG data to produce a biomarker for children at high risk for ASD. The goal of their study was to demonstrate that mMSE (modified multiscale entropy) can be used as a biomarker to distinguish typically developing children from children at high risk for ASD. The children with an older sibling diagnosed with ASD were categorized as high risk for ASD. The workflow of the approach is presented in Fig. 14.
Their study included 79 participants, among which 46 were at high risk for ASD and 33 controls. Similar to the other studies, the control subjects were defined on the basis that they have a typically developing older sibling and no family history of neurodevelopmental disorders. The participants were between 6 to 24 months of age. From some participants, data were collected multiple times at different ages. Those data were considered as independent datasets, hence even though there were only 79 participants, a total of 143 sets of data were included in the study. EEG data were collected using a 64-channel Sensor Net System while blowing bubbles. Signals were band-pass filtered at 0.1 to 100.0 Hz and sampled at a rate of 250Hz. Out of the 2 minutes long recordings, only 20 seconds long continuous segments were used for the analysis. As the first step for calculating the mMSE values, coarse-grained series from scales 1 to 20 were computed for each channel. Then the entropy values were calculated using modified sample entropy (mSE). The entropy values calculated using mSE are more robust to noise and consistent with short time series. Finally, for each coarse-grained series from scales 1 to 20, mMSE is defined as a series of mSE values. SVM, K-NN and Naïve Bayes algorithms were used for classification. The models were evaluated using 10-fold cross-validation. Unlike the other studies, boys and girls have been classified separately and as a unified complete set as well. Moreover, classification was performed separately for different age groups, at 6, 9, 12, 18 and 24 months of age. For the dataset combining both boys and girls, K-NN achieved the maximum accuracy of 90% for 9 and 18 months age groups. For the boys, SVM produced 100% accuracy for the 9 months age group and for the girls, SVM produced the maximum accuracy of 80% for the 6 months age group. Ahmadlou et al. (2010), have proposed a methodology based on fractality and a wavelet-chaosneural network for diagnosis of ASD as illustrated in Fig. 15. They introduced the idea of using Fractal Dimensions (FDs) as features. FD is a non-integer dimension which shows the degree of complexity and self-similarity of a signal. Eye-closed EEG data were collected from 17 subjects, 9 ASD children (6 to 13 years old) and 8 typically developing children (7 to 13 years old). International 10-20 standard was used for electrode placement and data were recorded from 19 channels at a sampling rate of 256Hz. This dataset was used by the authors in (Ahmadlou et al., 2012b) as well. Applying bandpass filters, signals were filtered within 0-60Hz and using wavelet decomposition gamma, beta, alpha, theta and delta bands were obtained. After preprocessing the signal, Higuchi's Fractal Dimension (HFD) and Katz's Fractal Dimension (KFD) algorithms were used for FD computation of the EEG signals.  Statistically significant FDs with a p-value less than 0.01 were selected using ANOVA. Three features were obtained and were used for classification using a twolayer Radial Basis Function Neural Network (RBFNN). 82% of the data were used for training and 18% were used for testing. The classification was performed 100 times using random subsampling. RBFNN produced results with 90% average accuracy and 0.15% variance.  Non-complex implementation, using only the holdout method. Classify ASD at 3 severity levels, Use a taste-based EEG data (Jamal et al., 2014) Classification The considered ASD diagnostic approaches were selected based on the recent studies, that have applied machine learning approaches for ASD classification, between the year 2010 and 2018. We have explored the details of EEG datasets, techniques used, the methodology followed, significant aspects and the results of each of the related study. We have compared the pre-processing, feature extraction and classification techniques used by each of the studies in identifying ASD subjects. Thus, researchers and practitioners can use this survey to understand the useful and effective techniques.

Small Training Sets
The datasets of most of the related studies contain data from less than 36 subjects. Although, it is challenging to acquire EEG data of autistic subjects, from a statistical point of view the results will be biased and less reliable. Thus, there is a limitation in building solid relationships using the available small dataset.

Less Real-World Practice
Most of the proposed approaches have not been tested practically in real-world applications. Many unexpected issues may arise, when deploying an automated system in clinical practices.

Unavailability of a Benchmark Dataset
Although several ASD classification models have been proposed, there is a lack of a standard measure to compare the models. If a benchmark dataset with an adequate amount of data exists with global accessibility, the models can be applied, and selected the best model.

Dataset-Specific Classification Models
Each of the proposed models was trained and tested on specific EEG data. For instance, the equipment and infrastructure used to record data, electrode locations, number of channels, sampling rate, activities done by each of the subjects during the data collection are specific a given study. They were not tested on multiple EEG datasets with different properties. Thus, there is a limitation of assessing the effectiveness of those models, in classifying other EEG data with varying metadata.

Limited access to data
In general, it is challenging to acquire and access personal health records or medical data due to ethical issues, health care policies and regulations. Thus, the real data accessibility is limited in ASD diagnosis researches.

Difficulty in Classifying Mild Forms of ASD
The severity of ASD varies from person to person. Many studies have reported difficulties in diagnosing milder forms of ASD compared to severe cases. When the predicted results are close to the decision boundary separating ASD and no-ASD, it is challenging to conclude the results with an acceptable level of confidence.

Unknown Etiology
A clear understanding of the relationship between the connectivity of neurons in different regions of the brain and ASD is yet to be discovered. Thus, it is challenging to design a classification framework. Researchers are forced to follow the empirical/trial and error approaches to overcome this barrier. If the etiology is clear, better features can be extracted and optimal classification models can be built.

ASD being a Spectrum Disorder
Unlike most of the typical disorders, identifying ASD or no-ASD is not entirely sufficient because ASD represents a combination of neurodevelopmental conditions including high functioning autism, Asperger's syndrome, pervasive developmental disorder and Rett syndrome. The type and severity of symptoms vary from person to person. Hence, in addition to classifying ASD, the learning model should calculate the severity and if possible, the specific type of disorder.

Future Research Directions Predicting Severity Scores
Majority of the studies were aimed towards classifying ASD but generating severity scores similar to ADOS was explored by only a few. Developing an approach which could predict the severity of ASD and if possible, explore the specific type of ASD would facilitate more individualized treatment.

Building a Generic Decision Support System
Another possible research direction is designing a generic decision support system which supports EEG data with different characteristics (differences in devices used for data collection, data types, sampling rate) and with a simple, user-friendly GUI to facilitate non-technical users. It can easily be deployed for real-world testing and if successful, can be adopted for general use.

Real-World Deployment of the Models
It is important to deploy an ASD diagnosis system in real-world clinical practice. This can be used in parallel with the manual diagnosis process and verified the reliability and correctness of the system.

Optimization Techniques
After achieving the goal of real-world deployment of the models, different measures to optimize performance, resource utilization and accuracy can be explored.

Integrating Different Types of Data
Along with EEG, a model can be built integrating different data sources including eye movement, Functional Magnetic Resonance Imaging (fMRI) and thermal imaging. Combining EEG and eye movement data has already been proven to be an effective measure to classify ASD. A model based on different data sources will be more flexible, robust, reliable and accurate.

Study Importance for the Future Researchers
Research who are involving in EEG based ASD classification can utilize this study to obtain a detailed understanding of the evolution of the proposed classification approaches over the past decade. Moreover, this study helps to identify the techniques and features that have already been used and their effectiveness. Further, for clinical practitioners who are interested in developing a decision support system to diagnose ASD and utilize it for clinical diagnosis, this study will be helpful to select the optimum approach based on the expected accuracy, available resources and complexity of the methodology.

Conclusion
ASD is a lifelong neurodevelopmental condition that requires early intervention. This paper is explored the related studies of ASD diagnostic approaches, discussed the applicability of the techniques, identified the limitations in the current clinical diagnostic practices and the need for a behaviour-independent diagnostic approach. Studies reveal that the prevalence of ASD is increasing every year. By identifying the shortcomings in current ASD diagnostic criteria, we have emphasized the need for behaviour independent diagnostic approaches to facilitate early intervention. Dividing the classification methodology into four phases, this paper has discussed EEG data collection, pre-processing, feature extraction and classification. We have summarized different techniques, their strengths and weaknesses.
Concluding one technique as the best for one phase is impossible, because each technique has its own advantages and disadvantages. The suitable technique for the approach needs to be chosen based on the requirement. However, there are some techniques which produce satisfactory results, not the optimum, in general. For instance, noise filtering technique SOBI is widely used to remove noise from the EEG signal. Similarly, given sufficient data to train, the neural network can classify the subjects with reasonable accuracy.
Further, we have discussed the diagnostic approaches proposed after 2010, providing the workflow of the methodology and significant aspects. Even though most of the related studies have achieved accuracies close to 100%, only a few studies have calculated severity scores similar to ADOS. Additionally, a combination of psychophysiological data such as EEG, fMRI, eye movement data and thermal images can be considered to diagnosis ASD. Further, we have presented the identified limitations, challenges and future research directions of ASD classification. Thus, researchers and practitioners can use this survey to facilitate their work.

Funding Information
This research is funded by the Senate Research Committee Grant SRC/LT/2019/18, University of Moratuwa, Sri Lanka.