An Interval Type-2 Fuzzy Association Rule Mining Approach to Pattern Discovery in Breast Cancer Dataset

: In the literature, several methods explored to analyze breast cancer dataset have failed to sufficiently handle quantitative attribute sharp boundary problem to resolve inter and intra uncertainties in breast cancer dataset analysis. In this study an Interval Type-2 fuzzy association rule mining approach is proposed for pattern discovery in breast cancer dataset. In the first part of this analysis, the interval Type-2 fuzzification of the breast cancer dataset is carried out using Hao and Mendel approach. In the second part, FP-growth algorithm is adopted for associative pattern discovery from the fuzzified dataset from the first part. To define the intuitive words for breast cancer determinant factors and expert data interval, thirty (30) medical experts from specialized hospitals were consulted through questionnaire poling method. To establish the adequacy of the linguistic word defined by the expert, Jaccard similarity measure is used. This analysis is able to discover associative rules with minimum number of symptoms at confidence values as high as 91%. It also identifies High Bare Nuclei and High Uniformity of Cell Shape as strong determinant factors for diagnosing breast cancer. The proposed approach performed better in terms of rules generated when compared with traditional quantitative association rule mining. It is able to eliminate redundant rules which reduce the number of generated rules by 39.5% and memory usage by 22.6%. The discovered rules are viable in building a comprehensive and compact expert driven knowledge-base for breast cancer decision support or expert system.


Introduction
In today's information age, medical databases are increasing rapidly with a large number of quantitative attributes. Analyzing this dataset is crucial for enhancing medical decision making and management (Delgado et al., 2001). In data analysis research, medical case studies like heart disease, diabetes and breast cancer are often considered because they present a combination of imprecise causal knowledge, very large amount of information and potentially life-threatening consequences of incorrect conclusion. Breast Cancer is a dangerous disease which is inherently triggered by environmental factors that mutate genes encoding critical cellregulatory proteins (Alison et al., 2001). Some of the procedures for detecting cancer are self-examinations, biopsy, ultrasound, magnetic resonance imaging and computed tomography. Breast cancer is one of the lifethreatening diseases and the leading form of cancer with second estimated highest death rate among women (Siegel et al., 2018). In 2018, about 266,120 United States women were found to have invasive breast cancer; about 40,920 of these women were estimated to have died (Siegel et al., 2018). Early detection of breast cancer incidence has been confirmed to increase the survival rate and reduce death rate (Ed-daoudy and Maalmi, 2020). According to (Yeh et al., 2009) around 97% of women can survive for 5 years or more due to earlier diagnosis and improved treatment. This fact was further buttressed by American Cancer Society's 2018 report. The report highlighted the decline death rate of about 39% from 1989 to 2015 by breast cancer due to early detection. The statistics above shows that early detection of breast cancer is vital to decreased life crisis. However, quick detection requires a reliable and accurate diagnosis procedure that can predict the risk even without going for surgical biopsy.
Data mining analysis is capable of extracting desirable, previously unknown and potentially useful patterns from existing medical dataset for risk determination of the disease. In healthcare field, data mining has been widely utilized to discover patterns and relationships that occur among attributes in medical dataset for the prognosis and diagnosis of diseases (Aalaei et al., 2016;Nilashi et al., 2017;Sakri et al., 2018). Breast cancer dataset is fast growing with a big number of quantitative attributes, mostly from medical laboratory tests (Delgado et al., 2001). In considering the highly increasing rate of breast cancer dataset to control the breast cancer menace, a data mining approach is found appropriate (Ehtemam et al., 2017;Ed-daoudy and Maalmi, 2020). Hence, association rule mining is a valuable technique to discover patterns for breast cancer prediction. Association Rule Mining (ARM) is used to discover the hidden relationships and potential associations among items or attributes of a large dataset (Chiu et al., 2012). ARM has been widely used for pattern discovery in the medical domain (Gupta et al., 2006;Serban and Moldovan, 2006;Umesh and Ramachandra, 2015). This is because it can manage large heterogeneous data sets and the generated rules are intuitively interpretable and humanly understandable (Karabatak and Ince, 2009). It is capable of handling unreliable diagnosis tests in medical diagnosis and presence of noise in training examples (Serban and Moldovan, 2006). However, because of the quantitative nature of breast cancer datasets and the limitation of quantitative association rule mining it is essential to identify a technique for resolving the sharp boundary problem attributed with quantitative value intervals (Kuok et al., 1998;Gyenesei, 2001;Oladipupo et al., 2010). This is to make provision for intra and inter uncertainties among the attribute values so as to enhance data mining analysis result.
Fuzzy logic concept referred to as Type-1 fuzzy was proposed by (Zadeh, 1975) to resolve subjectivity and intra uncertainty in decision making. Beyond this level of uncertainty, uncertainty within different domain experts' opinions in decision making is another crucial issue needed to be resolved in order to enhance medical decisions. This concern motivated the introduction of type-2 fuzzy in (Mendel, 2001), which is able to make provision for both intra and inter uncertainties in analyzing quantitative dataset and defining fuzzy attribute values interval. The inter uncertainty is attributed to diverse domain experts' opinions in defining fuzzy attribute values interval as "words mean different things to different people" (Mendel, 2007). Most of the recent studies on breast cancer data analysis attention are more on comparative analysis of predictive algorithms (Cesar et al., 2020;Mohammed et al., 2020;Vrigazova, 2020;Ak, 2020;Ahmed et al., 2020). Most of the studies did not pay attention to uncertainties attributed with quantitate clinical dataset before they analyse the dataset (Nguyen et al., 2015a). Few studies have engaged type-1 fuzzy parameters in breast cancer data mining analysis to cater for intra uncertainty, among which are (Nilashi et al., 2017;Jain and Abraham, 2004;Alharbi and Tchier, 2017). However, type-2 fuzzy, which is able to handle both intra and inter uncertainties with association rule mining, is yet to gain proper attention in analyzing breast cancer dataset. Therefore, in this study the hybrid of type-2 fuzzy and association rule mining in analyzing breast cancer dataset is introduced with the following two distinct objectives: (i). Construction of interval values for breast cancer determinant factors and fuzzification of breast cancer dataset based on interval type-2 fuzzy parameters (ii). Construction of frequent pattern tree and extraction of useful patterns for breast cancer risk determination This analysis is expected to enhance the acquired knowledge from mining breast cancer dataset with domain experts' opinions, eliminate redundant rules, minimize determinant symptoms and possibly identify strong determinant factors for diagnosing breast cancer. The patterns could be recommended for building a less subjective, compact and comprehensive decision support system for breast cancer risk determination. The remaining part of this paper is organized as thus: Section 2, presents review of literature on the interval type-2 fuzzy algorithms and related works. Section 3 discusses the proposed approach methodology and the dataset considered for the analysis. In section 4, the data analysis, result and discussion are presented. Section 5 presents the comparative evaluation of the proposed approach: Interval type-2 fuzzy ARM and the traditional quantitative ARM. Finally, in section 6 the paper is concluded by highlighting the important findings from the proposed approach and future work highlighted.

Literature Review
In this section Interval type-2 fuzzy set (IT2 FS) models approaches and related works are discussed.

Interval Type-2 Fuzzy Set (IT2 FS) Models Approaches
In capturing the uncertainties (both intra and inter) which limited the capability of Type-1 Fuzzy Set (T1 FS) the methodology of Computing With Word (CWW) must be established, because words mean different things to different people (Hao and Mendel, 2015). In the literature there are few methods that define an IT2 FS model for a word from the data collected about the word, especially from a set of domain experts. Liu and Mendel (2008), an Interval Approach (IA) is proposed. This approach is divided into two sections; the data process section and the fuzzy process section. In the data process section statistics is used to reduce the data intervals while in the second section (Fuzzy process) each data interval that was not removed by the statistics is mapped into a Type 1 FS. After this the Footprint Of Uncertainty (FOU) is gotten for the word by bounding all of the Type 1 FSs both from below and above. This approach is limited by too fat and wide FOUs and the Lower Membership Functions (LMFs) of the interior FOUs usually have very small height .
In order to improve on the IA approach, Enhanced Interval Approach (EIA) is proposed in . In EIA there are more steps and conditions for data interval reduction compared to IA, which improve the effectiveness of data intervals reduction process. Also, EIA incorporates and improves procedures to compute the LMF. In comparing this approach with IA, it was reported in  that EIA converges in a meansquared sense to a stable model and more data intervals are collected. Also in (Tahayori and Sadeghian, 2012) another method is proposed that calculates the median boundaries range of the membership function associated with the word. This method does not directly map the data interval uncertainties into Membership Function (MF) uncertainties, as do the IA and EIA and it leads to normal IT2 FSs. Other methods are also proposed in (Pagola et al., 2013;Bilgin et al., 2012;Moharrer et al., 2013). These approaches introduce another level of uncertainties called methodological uncertainty (Hao and Mendel, 2015). It is not possible to separate this uncertainty from linguistic uncertainties; therefore, any resulting FS model that uses such data does not represent a linguistic-uncertainty model (Hao and Mendel, 2015). Hao and Mendel (2015) proposed another approach based on Computing With Words (CWW) to enhance the performance of EIA. This new approach makes more use of the information from the data intervals, for example, the use of the overlap shared by all intervals. This approach (HMA) also has two stages; the first stage identifies a group of data intervals while the second stage determines the common overlap for this group of data intervals. An Interval Type-2 FS is determined from a set of shorter remaining data intervals that exclude the overlap. This approach is unique among all the Interval Type-2 FS word models because the resulting Interval Type-2 FSs-Shoulders and Interior FOUs-are normal. This approach is thereby adopted in this study for modeling breast cancer Interval Type-2 FS.

Related Works
Nowadays, different methodologies which include data mining methods have been richly explored in the field of Medicine to extract useful patterns for diagnosis, prediction and decision making. This exploration does not exclude the area of breast cancer diagnosis and prediction for early detection of the disease (Ed-daoudy and Maalmi, 2020; Chaurasia et al., 2018). He et al. (2012), association rule mining was used to explore patterns for Chines medicine formulae to treat and prevent breast cancer. Medjahed et al. (2013), K-nearest neighbour with different distance and classification rules were engaged for breast cancer diagnosis. Fuzzy analysis of breast cancer disease with fuzzy c-means and pattern recognition was carried out in (Muhic, 2013). Nguyen et al. (2015b), classification of breast cancer was done using the fuzzy C-means clustering algorithm. Breast cancer classification using deep belief network was carried out in (Abdel-Zaher and Eldeib. 2016). Umesh and Ramachandra (2015), association rule mining was used to predict recurrence of breast cancer on SEER breast cancer data. Aalaei et al. (2016), feature selection using genetic algorithm for breast cancer diagnosis was proposed and experimented with artificial neural network, PS-classifier and generic algorithm classifier. Vijaylakshmi and Priyadarshini (2016), Particle Swamp Optimization technique was used to analyse breast cancer dataset. A knowledge-based system for breast cancer prediction was built with expectation maximization clustering algorithm and Classification and regression tree in (Nilashi et al., 2017). Ehtemam et al. (2017) used 64 data mining models in WEKA and MATLAB to build models for prognosis and early diagnosis of Ductal and Lobular Type in Breast cancer patient. Hybrid computeraided-diagnosis system for prediction of breast cancer recurrence using optimized ensemble learning was presented in (Mohebian et al., 2017). Sakri et al. (2018) implemented Naïve Bayes and K-nearest neighbor techniques with Particle Swamp Optimization feature selection method for predicting Breast cancer recurrence. Many studies have captured comparative analysis of different machine leaning and data mining algorithms for predicting breast cancer (Cesar et al., 2020;Mohammed et al., 2020;Vrigazova, 2020;Ak, 2020;Ahmed et al., 2020).
Furthermore, researchers have identified the use of fuzzy logic as a potent computational approach for handling uncertainty in clinical data and also for providing intuitive results through linguistic rules (Nguyen et al., 2015a). With respect to superiority in enhancing interpretability of discrete intervals in medical data sets, fuzzy logic is the go-to approach (Delgado et al., 2001). It also provides a smooth transition from one fuzzy set to another in handling the sharp boundary interval problem. The deficiency of quantitative association rule mining led to the introduction of Fuzzy Association Rule Mining (FARM) (Kuok et al., 1998;Gyenesei, 2001). This kind of rules generated by FARM not only addresses the "sharp boundary problem", it provides better expression of the association rules in terms of intuition and understandability and also minimizes the number of rules generated (Oladipupo et al., 2010). In the literature, there have been instances of Type-1 fuzzy association rule mining in different domains (Chiu et al., 2012;Oladipupo et al., 2012;Ho et al., 2012;Arafah and Mukhlash, 2015;Kim et al., 2016). The implication of this is that, the fuzzy membership functions were modelled base on an opinion from an individual over a repeated survey (Intra-expert) (Mendel, 2007). Therefore, type-1 fuzzy set impact on the quantitative value could only handle the intra uncertainty which make provision for a minimal level of subjectivity. Also, a change in environmental and operating conditions can render type-1 fuzzy set sub-optimal and research has shown how Interval type-2 fuzzy logic systems outperform type-1 fuzzy logic systems. Hence, type-2 fuzzy is found more appropriate and extensive in any decision-making process to cater for both intra and inter uncertainties (Mendel, 2001). This is very crucial in a life-threatening consequence domain like medical, where high level of accuracy is expected in building decision support or expert system.
Type-2 fuzzy system has been widely used in different fields in solving real life problems with good report, among others (Nguyen et al., 2015a;Hosseini et al., 2010;Pimenta and Camargo, 2010;Sanz et al., 2010;Wu and Mendel, 2009;Chumklin et al., 2010;Abiyev, 2011;Castillo and Amador-Angulo, 2018); Castillo et al., 2016;Bobillo and Straccia, 2017;Rubio et al., 2017;Celik and Akvuz, 2018;Greenfield and Chiclana, 2018;Torres-Blanc et al., 2018). Chen et al., (2015), fuzzy association rule mining with Type-2 membership functions was investigated with synthetic dataset, which was reported to perform better in relations to number of rules. In spite of the efforts of researchers there is still a gap in the effective use of Interval type-2 models with Frequent Pattern Growth (FP-Growth) to analyse breast cancer dataset.

Methodology
In this study, pattern discovery in breast cancer dataset using Interval Type-2 fuzzy association rule mining is proposed and clearly covered in two major sequential stages. The proposed IT2 fuzzy association rule mining framework for Breast Cancer Data Analysis is shown in Fig. 1: The first stage is termed data fuzzification and the second stage is pattern discovery.

Data Description
In this study the Wisconsin Breast Cancer dataset from UCI machine learning data repository was utilized because of its effective utilization in the literature. The Dataset comprises 10 attributes and 699 instances used to differentiate malignant (cancerous) from benign (noncancerous) samples. The first 9 attributes are the determinant factors for the breast cancer and the last attribute describes the patient breast cancer status. The dataset attribute is shown in Table 1. These determinant factors are often chosen as the optimum set of factors because: (1) They represent effective features which reduce redundancy of features space; and (2) They provide significant estimation of a high dimensional classification function with the available finite number of training data (Salama et al., 2012).

Data Fuzzification
In this section a detail description of IT2 fuzzification process for breast cancer dataset is discussed. This section is crucial to this study. Even though the process has been briefly examined in our previous study with results (Oladipupo et al., 2019), it is important that the process and the algorithms utilized are stated in this study for more clarification of the proposed approach.

Expert Data Interval
In this study, linguistic terms (words) were defined for eliciting the view of medical experts as regards the determinant factors and how the factors can lead to a malignant case of breast cancer. Thirty (30) medical experts from specialized hospitals were consulted for the expert data interval to define the intuitive word for the factors and the data intervals. The experts' data intervals were collected using a form as shown in Fig. 2. Words such as {High, Medium, Low} were defined to express each determinant factor. These words were used in approximate reasoning by medical experts for gathering diverse opinions with respect to each determinant factor. In order to ascertain the adequacy of the linguistic word defined by the expert, Jaccard similarity measure was used to determine similarities among the three words. This was chosen because of the drawbacks attributed to the other five similarity measures for IT2 FSs (Wu and Mendel, 2008;Mendel and Wu, 2010). Afterward, the data intervals for each word defined were collected from the medical experts using a questionnaire poling method. Experts were required to describe an interval or a "range" that falls somewhere between 0 and 10 for intuitive words defined for all determinant factors.

Interval Type-2 Fuzzification Process
The interval type-2 fuzzification process was carried out using the Hao and Mendel Approach (HMA). The HMA mathematical model was simulated using Matlab. This is to take an advantage of the efficiency and effectiveness of the HMA over other approaches in order to enhance the quality of the fuzzification output (Hao and Mendel, 2015) the HMA is achieved in two sections which are the data process section and the fuzzy process section.

Section 1: Data Process Section
The data part takes data intervals from the experts as the input (Wu This part acts on the interval endpoints starting with the n intervals collected from all subjects (The medical experts) and processed in 4 stages.

Bad Data Processing
During this process, only intervals that satisfy the Eq.
(1) are accepted; others are rejected because some subjects (the experts) might not take the survey serious: where, a (i) is the low range of the interval and b (i) is the upper range of the interval to be supplied by the medical experts. This step decreases n interval endpoints to nʹ interval endpoints.

Processing of Outlier
To remove outliers, Box and Whiskers tests (Walpole et al., 2013) are carried out on the remaining nʹ interval, then, only intervals that satisfy Eq. (2) This process also decreases the nʹ interval endpoints to nʹʹ interval endpoints.
Subsequently, QL (0.25), QRL (0.75) and IQRL are calculated using the remaining n intervals and only intervals that satisfy Eq. (4) are kept: This process decreases the nʹʹ interval endpoints to mʹ interval endpoints.

Processing Tolerance Limit
This is done to keep on the data intervals that are within a satisfactory two-sided tolerant limit. This is done on a (i) and b (i) firstly and then on L (i) = b (i) -a (i) . Subsequently, only intervals that satisfy Equation (5) and (6)

Reasonable-Interval Processing
This is done to eliminate data intervals that have no overlap or too small overlap with other data intervals, so as to establish the maxim "word must mean similar things to different people". To do this, one finds one of the values: such that ma ≤ ξ * ≤ mb.
where, ma and mb are the mean values of the left and right endpoints of the surviving mʹʹ intervals. Only the intervals [a (i) , b (i) ] are kept such that: At the end of the Data process section, the original n data intervals have been reduced to a set of m data where m ≤ n.

Section 2: Fuzzy Set Process Section
The fuzzy set process section established the nature of the FOU as either a Left-or Right-shoulder or an Interior FOU, by making computations on the overlap of the intervals, removing the overlap from each of the original intervals and mapping the set(s) of smaller intervals into the two parameters that define the respective FOU. The fuzzy set part of the HMA is as follows in four steps (Hao and Mendel, 2015): 1.
Classify the FOU as one of the following: (1) Leftshoulder FOU, (2) Right-shoulder FOU and (3) Interior FOU. We achieve this by (1) computing the one-sided tolerance intervals for the end-points and (2)

Type Reduction and Defuzzification
The combined FOU is type reduced by computing the centroid (measure of uncertainty) of the IT2 FS with the Enhanced Kernik-Mendel (EKM) approach (Algorithm 1 and 2) (Wu and Mendel, 2009). The result is an interval valued set, which is defuzzified by taking the average of the interval's two endpoints.
Algorithm 1: The EKM algorithm for computing yl is: 1) Sort i x (i = 1,2,….., N) in increasing order and call the sorted i x by the same name, but now 12 ... .

N x x x 
Match the weights wi with their respective i x and renumber them so that their index corresponds to the renumbered .

Pattern Discovery
Interval Type-2 fuzzy association rule mining was used to achieve pattern discovery in breast cancer dataset. The FP Growth algorithm was adopted to discover all frequent patterns. The algorithm is in two stages: the first stage is the generation of frequent itemsets by constructing the FP-Tree compact structure. The second stage is the pattern discovery from the frequent item set. The input into the Fuzzy FP-growth algorithm is the IT2 fuzzified dataset from the data preprocessing phase (Zhao and Bhowmick, 2003) for FP Growth algorithm details). The algorithm was implemented with Python Programming language.

Data Analysis, Result and Discussion
In achieving this study objectives, the breast cancer dataset was fuzzified with interval type-2 fuzzy concept and analyzed with association rule mining. The superiority of this analysis comes from the handling of inter and intra uncertainty in breast cancer dataset by exploring interval type-2 fuzzify approach. This approach enables different medical experts with diverse opinions to be involved in the analysis. The medical experts engaged in this study defined data intervals for each of the three intuitive words (Low, Medium and High) established for each determinant factor as it was carried out in our previous study (Oladipupo et al., 2019). An instance (Twenty-four records) of the data intervals described by the medical experts for Clump Thickness" linguistic terms is shown in Table 2.

Results Obtained from Breast Cancer Dataset Fuzzification and Discussion
The interval type-2 fuzzy set: HM approach was used to model the inevitability of uncertainties of the medical experts that gave their respective ranges on each of the breast cancer determinant factors. Some instance of the results in this section have been presented in our previous work (Oladipupo et al., 2019). However, the whole results are presented in this study systematically. After executing the HM interval type-2 algorithm (Data and Fuzzy part) the output is shown in Table 3. For each row, the last column in Table 3 shows the number of reliable data intervals remaining that were finally utilised to construct the footprint of uncertainty for each word. This established the maxim that "each word now means similar things to different people (medical experts)" from the initial maxim of "words mean different things to different people".
The similarities among the words, computed with the Jaccard similarity, are summarized in Table 4. The Jaccard similarity provides a realistic result because it can be observed that the similarity declines monotonically as two words get further away from each other. This ascertains the adequacy of the linguistic word defined by the expert. The Upper Membership Function (UMF) and Lower Membership Function (LMF) parameters for each word of the determinant factors using the MATLAB are represented in Fig. 3a-3i.     The type-2 fuzzy set derived for each word after the processes and the values obtained after type-reduction using the Enhanced Karnik-Mendel (EKM) Approach are also shown in Table 5. Finally, the Crips breast cancer dataset collected are fuzzified based on the IT2 fuzzy set membership expression generated from the HM approach. The fuzzified dataset now captures the intra and inter uncertainties of medical experts on the determinant factors. The snapshot for the generated fuzzified dataset is represented in Fig. 4.

Results Obtained from Pattern Discovery and Discussion
After generating IT2 fuzzy dataset, the FP Growth algorithm accepts the dataset as input to construct FP-Tree compact structure and saves it in the system memory. The system interface snapshot for frequent pattern tree representation is shown in Fig. 4. The next step is rule generation from the pattern tree. The association rule mining FP Growth algorithm was applied to uncover potentially useful patterns from the breast cancer IT2 fuzzy dataset. Firstly, the association rule mining algorithm generated 350 pattern rules when the minimum support was set to 20% and confidence value of 0.2. The system snapshot for rule generation is shown in Fig. 5. To determine the interesting rules, the rules that determine malignant breast cancer with minimum confidence greater than or equal to 0.8 were considered. Out of the 350 rules 35 determine malignant breast cancer while just 18 of them meet the minimum confidence of 0.8 (80%). The final interesting rules highlighted are 18 in number as shown in Table 6.

Evaluation of the Proposed Approach
An Interval Type-2 Fuzzy Association Rule Mining (IT2FARM) was explored for patterns' discovery in breast cancer dataset in this study. The breast cancer dataset from Wisconsin in UCI data repository was used for the mining process. Specifically, a test dataset consisting of 10 attributes and 699 instances used to distinguish malignant (cancerous) from benign (noncancerous) samples was used.
A comparison between the proposed IT2FARM and quantitative Association Rule Mining (ARM) (Gyenesei, 2001) implementation using Rapid miner on the same breast dataset was carried out. The comparison was based on the number of rule generated, size on disk and maximum condition in rule premise as shown in Table 7. The confidence value of 0.2 was used as the threshold. The snapshot for quantitative association rule mining using Rapid miner analytical tool is shown in Fig. 6.  6: Snapshot of the rules obtained using quantitative ARM in Rapid miner One major observation from this evaluation result is the number of rules generated. In building a comprehensive expert system, there is need for a compact knowledge base with minimized number of rule to avoid knowledge based unwieldiness (Meesad, 2001;Oladipupo et al., 2010). The proposed approach was able to reduce the amount of rule generated by 39.5%. This is an advantage of fuzzy parameter over the quantitative mining approach. This is supported by the result in (Oladipupo et al., 2010) where fuzzy was used to enhance the knowledge discovery in coronary heart disease. Another advantage is the disk size usage, which is well minimized by IT2FARM. The observation shows that IT2FARM outperformed quantitative ARM based on the above criteria in analyzing breast cancer dataset, which makes it recommendable for evolving a comprehensive knowledge base for breast cancer decision support system.

Conclusion
In this study, an Interval Type-2 fuzzy association rule mining was proposed to extract interesting patterns from breast cancer that correspond more intuitively to expert's perception. The analysis in this study was divided into two stages; the data fuzzification process and pattern discovery. Hao and Mendel approach was adopted for interval Type-2 data fuzzification while FP-growth algorithm was explored for pattern discovery in breast cancer. The breast cancer dataset was acquired (699 instances and 11 attributed) from Wisconsin in UCI machine learning repository. Thirty (30) medical experts from specialized hospitals were consulted using a questionnaire poling method to define the intuitive words for each attribute and the data intervals. This study is able to generate rule with fewer symptoms for diagnosing breast cancer with confidence greater than 80%. It also identified High Uniformity of Cell Shape (10/18) and High Bare Nuclei (9/18) as two strong determinant factors for determining breast cancer. Benchmarking, the IT2FARM with traditional Quantitative ARM, the former performed better in terms of rules with 39.5 and 22.6% reduction in generated rules and memory respectively. This supports the effect of type-2 fuzzy in association rule mining as reported in (Chen et al., 2015) due to its ability to resolve intra and inter uncertainty within the dataset. The discovered rules in this study are viable in building a comprehensive and compact expert driven knowledgebase for breast cancer decision support or expert system Thus, for future improvements on this research, a hybridization of Interval type 2 fuzzy logic with other data mining techniques like rough sets, artificial neural and decision tree is recommended for prediction purposes. Also, building an expert system with discovered patterns with IT2 fuzzy Association rule mining could be considered in future work.