An Overview of Research Challenges for Classification of Cardiotocogram Data

Cardiotocography (CTG) is a simultaneous recording of Fetal Heart Rate (FHR) and Uterine Contractions (UC). The most common diagnostic techniques to evaluate maternal and fetal well-being during pregnancy and before delivery. By observing the Cardiotocography trace patterns doctors can understand the state of the fetus. There are several signal processing and computer programming based techniques for interpreting a typical Cardiotocography data. A model based CTG data classification system using a supervised Artificial Neural Network (ANN) which can classify the CTG data based on its training data. The performance neural network based classification model has been compared with the most commonly used unsupervised clustering methods Fuzzy C-mean and k-mean clustering. The arrived results show that the performance of the supervised machine learning based classification approach provided significant performance than other compared unsupervised clustering methods. The traditional clustering methods can identify the Normal CTG patterns; they were incapable of finding Suspicious and Pathologic patterns. The ANN based classifier was capable of identifying Normal, Suspicious and Pathologic condition, from the nature of CTG data with very good accuracy.


INTRODUCTION
Data Mining (DM) and the technology of Knowledge Discovery from Data (KDD) had brought many new developments, methods and technologies in the recent decade.Also the improvement of integration of techniques and the application of data mining techniques had contributed in handling of new kinds of data types and applications.However, the field of data mining and its application in medical domain is still young enough so that the possibilities of the application are still limitless.
The major challenges in medical domain is the extraction of comprehensible knowledge from medical diagnosis data such as CTG data.In this information time, the use of machine learning tools in medical diagnosis is increasing gradually.The effectiveness of classification and recognition systems has improved in a great deal to help medical experts in diagnosing diseases.

Cardiotocography (CTG)
Cardiotocography (CTG) is a simultaneous recording of Fetal Heart Rate (FHR) and Uterine Contractions (UC).It is one of the most common diagnostic techniques to evaluate maternal and fetal well-being during pregnancy and before delivery.FHR patterns are observed manually by obstetricians during the process of CTG analyses.For Science Publications

JCS
the last three decades, great interest has been paid to the fetal heart rate baseline and its frequency analysis (Nidhal et al., 2010).
Fetal Heart Rate (FHR) monitoring remains widely used as a method for detecting changes in fetal oxygenation that can occur during labor (Costa et al., 2009).Yet, deaths and long-term disablement from intrapartum hypoxia remain an important cause of suffering for parents and families, even in industrialized countries.Confidential inquiries have highlighted that as much as 50% of these deaths could have been avoided because they were caused by non-recognition of abnormal FHR patterns, poor communication between staff, or delay in taking appropriate action (Costa et al., 2009).Computation and other data mining techniques can be used to analyze and classify the CTG data to avoid human mistakes and to assist doctors to take a decision.

Clustering and Classification
Clustering is a technique that has been widely studied and applied to many real-life applications.Many efficient algorithms, including the well known and widely applied k-means algorithm, have been devised to solve the clustering problem efficiently.Traditionally, clustering algorithms deal with a set of objects whose positions are accurately known.The goal is to find a way to divide objects into clusters (Klimesova and Ocelikova, 2009).
Classification process may be applied in different areas of research and practice, e.g., farms, military, medicine, remote Earth sensing.The classical classification techniques use statistical approach, which typically assumes the normal multidimensional distribution of probability in the experimental data set.Data classification may be supervised and unsupervised (Kriegel et al., 2009).
The supervised classification method requires the presence of training data set typically defined by the expert-the teacher.Each class of objects is characterised by the basic statistical parameters (mean values vector, covariance matrix), which are values vector, covariance matrix), which are computed from the training set.These parameters guide the discrimination process.The Bayesian classifiers are typical representatives (Bayes classifier, Fisher, Wald sequential).
The unsupervised classification is also known as classification without the teacher (Kriegel et al., 2009).This classification uses, in most cases, the methods of cluster analysis (Mary and Kumar, 2012).The device that performs the function of classification is called classifier.
The classifier is the system containing several inputs that are transported with signals carrying information about the objects (Deng et al., 2010).The system generates information about the competence of objects into a particular class on the output.

Problem Definition
Cardiotocography (CTG), consisting of Fetal Heart Rate (FHR) and Tocographic (TOCO) measurements, is used to evaluate fetal well-being during the delivery.Since 1970, many researchers have employed different methods to help the doctors to interpret the CTG trace pattern from the field of signal processing and computer programming.They have supported doctors with interpretations in order to reach a satisfactory level of reliability so as to act as a decision support system in obstetrics.Up to now, none of them has been adopted worldwide for everyday practice (Geijnt, 1996).There is currently, no consensus on the best methodology for baseline estimation in computer analysis of cardiotocography.More than 30 years after the introduction of antepartum cardiotocography into clinical practice, the predictive capacity of the method remains controversial.From the review articles published on this subject, it was found that its reported sensitivity varies between 2 and 100% and its specificity between 37 and 100%.So, in this work, we are going to evaluate some of the statistical, machine learning and data mining techniques for the classification of CTG data.
Classification can be viewed as a supervised learning scenario.Here, a training data set of records is accompanied by class labels.New data can be classified based on the training set by generating descriptions of the classes.In addition to the training set, there is also a test data set which is used to determine the effectiveness of a classification.In principle, the popular neural network can be trained to recognize the data directly.However, a simple network can be very complex and difficult to train.Further, if the dimension of the input data is high, then the training process will consume a lot of time and the accuracy of classification will also vary with the increase of dimension (Kao et al., 2010) in the training data.Generally, the techniques used in the neural network systems will depend on the application of the system.
As means of data collection have become more capable, the need for non-linear modeling techniques has become more and more apparent.Traditional statistical methods rely on an assumption of linearity.However, since most of the data collected concerns are the results of human behavior -humans rarely behave linearly, methods that assume linear separability are ultimately doomed to failure.Furthermore, data collection streams are broadening.The number of variables of concern to modelers has increased by at least an order of magnitude.

JCS
Traditional methods simply were not designed to work with one hundred or more variables.
In answer to this, the last decade has seen the emergence of neural networks as a means of non-linear modeling.These devices resulted from the efforts of a number of cognitive scientists to mimic learning and memory in the human brain.The back-propagation neural network in particular has proven successful in creating useful models from large masses of complex data.The algorithm has been successfully applied in a variety of settings including direct marketing, intelligence and process control.Because of its pattern recognition nature, it has proven robust with respect to missing data and other data irregularities.

The Medical Background of Cardiotocograph (CTG)
A CTG is a record of the Fetal Heart Rate (FHR) either measured from a transducer on the abdomen or a probe on the fetal scalp (Ayres- de-Camposa et al., 2005).In addition to the fetal heart rate, another transducer measures the uterine contractions over the fundus.The CTG trace generally shows two lines.The upper line is a record of the fetal heart rate in beats per minute.The lower line is a recording of uterine contractions from the TOCO.

Baseline Rate
This should be between 110 and 150 Beats Per Minute (BPM) and is indicated by the FHR when stable (with accelerations and decelerations absent).It should be taken over a period of 5-10 min.The rate may change over a period of time but normally it remains fairly constant.

Baseline Variability
The amount in beats per minute (bpm) by which the baseline varies.

Bradycardia
If bradycardia is between 110 and 100 it is suspicious.If it is below 100 it is pathological.A steep sustained decrease in rate is indicative of fetal distress and if the cause cannot be reversed the fetus should be delivered.

Tachycardia
A suspicious tachycardia is defined as being between 150 and 170 whereas a pathological pattern is above 170.Tachycardias can be indicative of fever or fetal infection and occasionally fetal distress (with other abnormalities).An epidural may also induce a tachycardia in the fetus.

Baseline Variations
The short term variations in the baseline should be between 10 and 15 bpm (except during intervals of fetal sleep which should be no longer than 60 min).Prolonged reduced variability along with other abnormalities may be indicative of fetal distress.

Accelerations
This is defined as a transient increase in heart rate of greater than 15 bpm for at least 15 sec.Two accelerations in 20 min is considered a reactive trace (Fig. 1).Accelerations are a good sign as they show fetal responsiveness and the integrity of the mechanisms controlling the heart.

Decelerations
These may either be normal or pathological.They are normally perfectly benign.Late decelerations persist after the contraction has finished and suggest fetal distress.Variable decelerations vary in timings and shape with respect to each other and may be indicative of hypoxia or cord compression: Declaration transient fall in baseline rate 15 bpm lasting more than 15sec = >

Type 1 (Early)
Synchronous with Uterine contraction, the nadir of the heart rate trace.It corresponds to the peak of the uterine contraction.Uniform, repetitive, periodic slowing of FHR with onset early in the contraction and return to baseline at the end of the contraction are usually due to fetal head compression and therefore occur in first and second stage labour with decent of the head.It may be due to head compression, cord compression or early hypoxia.Mx: Check fetal pH if the pattern deteriorates or persists.

Type 2 (Late)
Synchronous with uterine contraction is the nadir of the heart rate trace, it occurs after the peak of the uterine contraction.Uniform, repetitive, slowing of FHR with onset mid to end of the contraction and nadir more than 20 sec after the peak of the contraction and ending after the contraction.The greater the lag time the more serious the significance.The worst picture is of shallow late decelerations, loss of baseline irregularity and tachycardia.Mx: a fetal pH measurement is mandatory.

Type 3 (Variable)
Deceleration is unrelated to uterine contractions.Variable, repetitive, periodic slowing of FHR is with rapid onset and recovery.Time relationships with contraction cycles are variable and they may occur in isolation.Sometimes they resemble other types of deceleration patterns in timing and shape.If they appear consistently, fetal hypoxia is likely.Mx: check fetal pH if the pattern persists after turning the patient on her side (or if other adverse features are present).
Prolonged deceleration of <100 bpm for 3 min or 80 bpm for 2 min.
A normal CTG is a good sign but a poor CTG does not always suggest fetal distress.A more definitive diagnosis may be made from fetal blood sampling but if this is not possible or there is an acute situation (such as a prolonged bradycardia) intervention may be indicated.

Fuzzy C-Means Clustering
Fuzzy C-Means (FCM) is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade.This technique was originally introduced by Bezdek (1981) as an improvement on earlier clustering methods.
It provides a method that shows how to group data points that populate some multidimensional space into a specific number of different clusters.The Fuzzy cmeans algorithm starts with an initial guess for the cluster centers, which are intended to mark the mean location of each cluster (Bezdek, 1980).The initial guess for these cluster centers is most likely incorrect.Additionally, Fuzzy c-means algorithm assigns every data point a membership grade for each cluster.By iteratively updating the cluster centers and the membership grades for each data point, Fuzzy c-means algorithm iteratively moves the cluster centers to the right location within a data set.This iteration is based on minimizing an objective function that represents the distance from any given data point to a cluster center weighted by that data point's membership grade.
The Fuzzy C-Means (FCM) algorithm was introduced by Bezdek (1981).The idea of FCM is using the weights that minimize the total weighted mean-square error: The FCM allows each feature vector to belong to every cluster with a fuzzy truth value (between 0 and 1), which is computed using the above equation.The algorithm assigns a feature vector to a cluster according to the maximum weight of the feature vector over all clusters.

K-Mean Clustering Algorithm
One of the most popular heuristics for solving the kmeans problem is based on a simple iterative scheme for finding a locally optimal solution.This algorithm is

JCS
often called the k-means algorithm.There are a number of variants to this algorithm.K-Means algorithm is very popular for data clustering.
K-means is a partition based clustering algorithm.Kmeans' goal is to partition data D into K parts, where there is little similarity across groups, but great similarity within a group.More specifically, K-means aims to minimize the mean square error of each point in a cluster, with respect to its cluster centroid.
Formula for Square Error: ( )

∑
Where: k = The number of clusters |ci| = The number of elements in cluster Ci and M ci = The mean for cluster ci

Steps of K-Means Algorithm
The k Means algorithm is explained in the following steps.The algorithm normally converges in short iterations.But will take considerably long time for a iteration if the number of data points and the dimension of each data is high (Chen et al., 2012): Step 1: Choose k random points as the cluster centroids.
Step 2: For every point p in the data, assign it to the closest centroid.That is compute d(p, M ci ) for all clusters and assign p to cluster C* where distance: (d(P, Mc*) <= d(P, M ci )) Step 3: Recompute the center point of each cluster based on all points assigned to said cluster.
Step 4: Repeat steps 2 and 3 until there is convergence.
(Note: Convergence can mean repeating for a fixed number of times, or until SE new -S Eold < = ε, where ε is some small constant, the meaning being that we stop the clustering if the new squared error objective is sufficiently close to the old SE).

ANN Based Classification
Classification can be viewed as a supervised learning scenario.Here a training data set of records is accompanied by class labels.New data can be classified based on the training set by generating descriptions of the classes.In addition to the training set, there is also a test data set which is used to determine the effectiveness of a classification.In principle, the popular neural network can be trained to recognize the data directly.However, a simple network can be very complex and difficult to train.Further, if the dimension of the input data is high, then the training process will consume very lot of time and the accuracy of classification also vary with the increase of dimension in the training data.Generally, the techniques used in the neural network systems will depend on the application of the system.

Structuring the Network
The number of layers and the number of processing elements per layer are important decisions.These parameters to a feed-forward, back-propagation topology are also the most ethereal-they are the "art" of the network designer.There is no quantifiable, best answer to the layout of the network for any particular application.There are only general rules picked up over time and followed by most researchers and engineers applying this architecture to their problems.
Rule One: As the complexity in the relationship between the input data and the desired output increases, the number of the processing elements in the hidden layer should also increase.
Rule Two: If the process being modeled is separable into multiple stages, then additional hidden layer(s) may be required.If the process is not separable into stages, then additional layers may simply enable memorization of the training set and not a true general solution effective with other data.
Rule Three: The amount of training data available sets an upper bound for the number of processing elements in the hidden layer(s).To calculate this upper bound, use the number of cases in the training data set and divide that number by the sum of the number of nodes in the input and output layers in the network.Then divide that result again by a scaling factor between five and ten.Larger scaling factors are used for relatively less noisy data.If you use too many artificial neurons the training set will be memorized.If that happens, generalization of the data will not occur, making the network useless on new data sets.
A single-layer network of S logsig neurons having R inputs is shown Fig. 2. in full detail on the left and with a layer diagram on the right.
Feed forward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons.Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between input and output vectors.The linear output layer lets the network produce values outside the range -1 to +1.On the other hand, if you want to constrain the outputs of a network (such as between 0 and 1), then the output layer should use a sigmoid transfer function.

The ANN Based CTG Data Classification System
The Fig. 3 shows the ANN based CTG data Classification system.

The metrics Used for the Evaluation
Precision, recall and F-Score are computed for every (class, cluster) pair.But Rand index is a metric which will consider all the classes and the clusters as the whole.

Rand Index
The Rand index or Rand measure is a commonly used technique for measure of such similarity between two data clusters.
Given a set of n objects S = {O1, ..., On} and two data clusters of S which we want to compare: X = {x1, ..., xR} and Y = {y1, ..., yS} where the different partitions of X and Y are disjoint and their union is equal to S; we can compute the following values: • a is the number of elements in S that are in the same partition in X and in the same partition in Y • b is the number of elements in S that are not in the same partition in X and not in the same partition in Y • c is the number of elements in S that are in the same partition in X and not in the same partition in Y • d is the number of elements in S that are not in the same partition in X but are the same partition in Y The Rand index has a value between 0 and 1 with 0 indicating that the two set of data clusters do not agree on any pair of points and 1 indicating that the two data clusters are exactly similar.

Precision
Precision is calculated as the fraction of correct objects among those that the algorithm believes belonging to the relevant class.It can be loosely equated to accuracy and it will roughly answer the question: "How many of the points in this cluster belong there/ correctly classified?"

DISCUSSION
The results from Table 1-4 obviously shows that supervised machine learning based methods can be used for the classification of CTG data.We realized that there are some training glitches in the case of suspicious records which caused some unexpected poor results while classifying the CTG data class "suspicious".Even, the Fuzzy C-Mean algorithm provided little bit of better result in the case of 'suspicious' category of CTG data.This should be noted while designing an improved algorithm for good performance of CTG data classification.

CONCLUSION
We have evaluated the performance of the three methods with respect to four different metrics.The performance neural network based classification model has been compared with the clustering methods Fuzzy Cmean and k-mean.According to the arrived results, the performance of the supervised machine learning based classification approach provided significant performance than other compared unsupervised clustering methods.Even though the traditional clustering methods can identify the Normal CTG patterns, probably they were incapable to predict Suspicious and Pathologic Patterns, so that the traditional unsupervised methods provided very poor accuracy in predicting the different classes.It was found that the ANN based classifier was capable of identifying Normal, Suspicious and Pathologic condition, from the nature of CTG data with very good accuracy.If we see the performance of ANN with respect to Rand Index, then we can say that it almost provided double the performance of the other two compared methods.
We train the system with all the classes of samples, there is a chance by which the trained system may be incapable of identifying suspicious record.That is why we are getting comparatively poor average performance while classifying suspicious records.It is a major weakness of the system and it should be overcomes in future design.One may address the way to improve the system for getting proper training with different classes of CTG patterns.Future works may address hybrid models using statistical and machine learning techniques for improved classification accuracy.

Fig. 3 .
Fig. 3.The ANN based CTG data classifier Intuitively, one can think of a+b as the number of agreements between X and Y and c+d the number of disagreements between X and Y.The Rand index, R, then becomes: a d RI a b c d + = + + + Fig. 4. The 3D projection of CTG data

Table 1 .
The Average Performance in terms of Rand Index

Table 2 .
The average performance of Fuzzy C-Mean clustering

Table 4 .
The Average Performance of ANN Based Classifier