A Descriptive Framework for the Multidimensional Medical Data Mining and Representation

Problem statement: Association rule mining with fuzzy logic was explored by research for effective datamining and classification. Approach: It was used to find all the rules existing in the transactional database that satisfy some minimum support and minimum confidence constraints. Results: In this study, we propose new rule mining technique using fuzzy logic for mining medical data in order to understand and better serve the needs of Multidimensional Breast cancer Data applications. Conclusion: The main objective of multidimensional Medical data mining is to provide the end user with more useful and interesting patterns. Therefore, the main contribution of this study is the proposed and implementation of fuzzy temporal association rule mining algorithm to classify and detect breast cancer from the dataset.


INTRODUCTION
A temporal Association rule is a well established data mining technique used to discover co-occurrences of items mainly in temporal sequence data where the data items in the database are usually recorded as binary data (present or not present).Many techniques are available in the literature aims to find association rules (with strong support and high confidence) in large datasets.For example, Classical Association Rule Mining (ARM) (Mahafzah et al., 2009) deals with the relationships among the data items present in Multidimensional databases.
Similarly, there are a few works that focus on temporal data mining.Temporal rule mining is concerned with data mining of large sequential data sets.Sequential data (Vijayalakshmi and Mohan, 2010), mining deals with the mining of data that is ordered with respect to some index.The scope of temporal data mining (Alcalá-Fdez et al., 2009) extends beyond the standard forecast or control applications of time series analysis.Often temporal data mining methods must be capable of analyzing data sets that are prohibitively large for conventional time series modeling techniques to handle efficiently.
Problem statement: Due to tremendous advances and achievement in biomedical, bioinformatics, biological and clinical data is being mined at tremendous speed.Thus the biological sequence data (Hu et al., 2009) stored at data warehouse in the format of ultidimensional temporal sequential data can be used for finding temporal pattern (Intan and Yenty, 2008).Moreover, due to the highly distributed uncontrolled mining and use of a wide variety of bio medical data, data collection, data analysis and semantic integration of such heterogeneous and widely distributed temporal sequence data has become an important task for systematic and coordinated analysis of medical dataset (Khan et al., 2010).Bio medical data analysis and integration becomes very difficult due to data complexity, distribution, volume of data.Most commercial data mining products provide large number of modules and tools for performing various data mining tasks but few provide intelligent assistance for addressing many important decisions that must be considered during the mining process.Multi objective association rule mining with minimum support and minimum confidence is suitable for datamining analysis (Hamid Reza Qodmanan, et al., 2011).
Traditional data analysis techniques cannot support huge and complex medical data set.New data analysis technique such as data mining can be helpful in analysis large and complex medical sequence data set.Researchers may need data and knowledge which was discovered by other researchers for their research that is distributes multidimensional sequential data format.New systems are needed to manage, Integrate and analyze large &complex medical data from data warehouses.Not only the evaluation estimation and analysis of data is important but providing the intelligent assistance in equally important.Mostly analysis products do not provide the intelligent assistant in decision making process (Papageorgiou, 2011).Fuzzy association rule will be suitable for multidimensional data analysis (Hong et al., 2009;Weng and Chen, 2010;Wu et al., 2010).
In this study, we propose a fuzzy temporal (Ch.S. Reddy and KVSVN Raju, 2009) association rule mining algorithm for effective temporal datamining.These rules are further used for the classification of breast cancer data and it has been found that accurate prediction of breast cancer is possible with the proposed fuzzy temporal association rule mining algorithm.
A Bayesian network consists of a structural model and a set of conditional probabilities.The structural model is a directed graph in which nodes represent attributes and arcs represent attribute dependencies.Attribute dependencies are quantified by conditional probabilities for each node of given its parents.Bayesian networks (Khan et al., 2010)  In naive Bayes, each attribute node has the class node as its parent, but does not have any parent from attribute nodes.Because the values of p and P can be easily estimated from training examples, naive Bayes is easy to construct.Naive Bayes is the simplest form of Bayesian networks (Khan et al., 2010).It is obvious that the conditional independence assumption in naive Bayes is rarely true in reality, which would harm its performance in the applications with complex attribute dependencies.Based on the theory of Bayesian networks, Naive Bayes is a simple yet consistently performing probabilistic model.Data classification with naive Bayes (Khan et al., 2010) is the task of predicting the class of an instance from a set of attributes describing that instance and assumes that all the attributes are conditionally independent given the class.
Predicting the class of an instance are done through utility independent privacy preserving data mining by vertically partitioned data (Poovammal and Ponnavaikko, 2009).It has been shown that naïve Bayesian classifier is extremely effective in practice and difficult to improve upon.
System architecture: The complete implementation system architecture is given in Fig. 1 which include data collection modules, data analysis modules, data preprocess modules, data classification module and Association rule mining Knowledge discovered data output module with all the internal component of proposed work in details.The details of all the component of the proposed model is given in details The proposed algorithm ANOVA T classification and fuzzy D discretization is also given in details.

Data analysis:
Considering that an attribute X has a large number of values, the probability of the value P(X=xi |C=c) from Eq. 2 can be infinitely small.Hence the probability density estimation is used assuming that X within the class c are drawn from a normal (Gaussian) distribution:

− μ σ πσ
Where: σ c = The standard deviation µ c = The mean of the attribute values from the training set The major problem with this approach is that if the attribute data does not follow a normal distribution, as often is the case with real-world data, the estimation could be unreliable.Other methods suggested include the kernel density estimation approach.But since this approach causes very high computational memory and time it does not suit the simplicity of naive Bayes classification).When there are no values for a class label as well as an attribute value, then the conditional probability P(x|c) will be also zero if frequency counts are considered.To circumvent this problem, a typical approach is to use the Laplace-m estimate.Accordingly: Where: n ci = Number of instances satisfying both X=xi and C=c m=2 (a constant) P(X=xi) = Estimated similarly as P(C=c) given above Data discretization for preprocessing: Discretization is the process of transforming data containing a quantitative attribute so that the attribute in question is replaced by a qualitative attribute (Pedreschi et al., 2008).Data attributes are either numeric or categorical.While categorical attributes are discrete, numerical attributes are either discrete or continuous.Research study shows that naive Bayes classification works best for discretized attributes and discretization effectively approximates a continuous variable.
The Minimum Description Length (MDL) discretization is Entropy based heuristic given by Fayyad and Irani .The technique evaluates a candidate cut point between each successive pair of sorted values.For each candidate cut point, the data are discretized into two intervals and the class information entropy is calculated.The candidate cut point, which provides the minimum entropy is chosen as the cut point.The technique is applied recursively to the two subintervals until the criteria of the Minimum candidate cut point, the data are discretized into two intervals and the class information entropy is Description Length .For a set of instances S, a feature A and a partition boundary T, the class information entropy of the partition induced by T is given by: For the given feature the boundary T min that minimizes the class information entropy over the possible partitions is selected as the binary discretization boundary.The method is then applied recursively to both partitions induced by T min until the stopping criteria known as the Minimum Description Length (MDL) is met.The MDL principle ascertains that for accepting a partition T, the cost of encoding the partition and classes of the instances in the intervals induced by T should be less than the cost of encoding the instances before the splitting.The partition is accepted only when: The term feature: Selection is taken to refer to algorithms that output a subset of the input feature set.One factor that classification algorithms is the quality of the data.If information is irrelevant or redundant or the data is noisy and unreliable then knowledge discovery during training is more difficult .Regardless of whether a learner attempts to select features itself or ignores the issue, feature selection prior to learning can be beneficial.Reducing the dimensionality of the data reduces the size of the data information set and more effectively.In some cases accuracy on classification can be improved.As a learning scheme naive Bayes is simple, very robust with noisy data and easily implementable.
After the proposed statistical analysis with ANOVA-T classification, attribute value represent Fig. 2 has some irrelevant data to be removed.

Fuzzy-d discretization:
Fuzzy-D discretization is another proposed method.By using the Fuzzy-D discretization methods, we reduce the classification error as shown in Fig. 2, the estimation of p (ai < Xi • bij | C = c) is obtained.Because space limits, we present here only the version that according to our experiments, best to reduce the classification error.Fuzzy-D initially forms k equal-width intervals (ai; bi) (1 • i • k) using EWD (equal width).Then FD estimates p (ai < Xi • bi jC = c) from all training instances rather than from instances that have value of Xi in (ai; bi).The influence of a training instance with value v of Xi on (ai; bi) is assumed to be normal.

Pseudo code for fuzzy-d discretization:
• Training instance value v of x; on (a 1i ,b i ) • Normal distributed mean value equal to V∝ to P(u, σ, i): • Cut points are: Here, k = 10 ie user defined parameter • Equal width interval: Here, n c -> training instances with known value for X; class c, • Fuzzy-D Probability estimation = to be obtained by evaluation of: Distributed with the mean value equal to v and is proportional to: ∫ σ is a parameter to the algorithm and is used to control the 'fuzziness' of the interval bounds.Hence the Equal Width Discretization (EWD) (Yang and Webb, 2002) divides the number line between V min and V max into k intervals of equal width.Thus the intervals have width: and the cut points are at: V min w, V min 2w....v min (k 1)w Here the k is a user predefined parameter and is set as 10 in our experiments.Suppose there are nc training instances with known value for Xi and with class c, each with: j P( , ,i) υ σ Influence on: Fuzzy-T association rule mining: Association Rule Mining algorithm preserving privacy in data analysis.
Our algorithm uses two phases in a partition-approach to generate fuzzy association rules .The dataset is logically divided into p disjoint horizontal partitions P1, P2, …, Pp.Each partition is as large as can fit in available main memory.For ease of exposition, we assume that the partitions are equal-sized, though each partition could be of any arbitrary size as well.
We use the following notations: • E = fuzzy dataset generated after pre-processing • Set of partitions P = {P1, P2, …, Pp} • td[it] = tid list of item set it • µ = fuzzy membership of any item set

RESULTS
In Table 1, the efficiency of ANOVA T classification algorithm is compared with existing ANOVA classification algorithm.In Table 2, the proposed fuzzy-D discretization algorithm is analyzed by comparing with existing algorithm in calculating classification error rate.In Table 3, the proposed Fuzzy-T ARM algorithm is analyzed by comparing with existing algorithm in analyzing the performance.
Figure 2 shows the performance analysis of the proposed method by comparing with the existing.The performance of the proposed method is 5 percentages higher than the existing method.
Figure 3 shows the performance analysis of the proposed FTARM method by comparing with the existing FARM.The performance of the proposed method is faster than the existing method.
• Count[it] = cumulative µ of item set it over all partitions in which it has been processed • d = number of partitions (for any particular item set it) that have been processed since the partition in which it was added.
The byte-vector-like data structure will represent the phase code.Each cell of the byte-vector stores µ of the item set corresponding to the cell index of the tid to which the µ pertains.Thus, the i th cell of the byte-vector contains the µ for the i th tid.If a particular transaction does not contain the item set under consideration, the cell corresponding to that transaction is assigned a value of 0. When the byte-vector is initialized, each cell by default has 0.  Table 2: Fuzzy-D discretization-Reduced classification error report Dataset ---------------------------------------------- are often used for classification problems, in which a learner attempts to construct a classifier from a given set of training examples with class labels.Assume that A1;A2; . ..An are n attributes (corresponding to attribute nodes in a Bayesian network).An example E is represented by a vector <a1; a2; . . .; an>, where ai is the value of Ai.Let C represent the class variable (corresponding to the class node in a Bayesian network).We use c to represent the value that C takes and c to denote the class of E. The classifier represented by a general Bayesian network is defined in (1arg max P(c)P(a ,a ,...,a c) ∈ =Assume that all attributes are independent given the class (conditional independence assumption), the resulting classifier is called naive Bayes(Khan et al.,
Number of instances satisfying C=c N = Number of training instances n = Number of classes and k=1(Predefined): of instances, c,c1,c2 are number of distinct classes present in S, S1 and S2 respectively.MDL discretized datasets show good classification accuracy performance with naive Bayes.Classification on ANOVA-T data selection:The proposed ANOVA-T statistical algorithm is used for Classification.Feature selection is often an essential data preprocessing step prior to applying a classification algorithm such as variance ANOVA-T.