Is Single Scan based Restructuring Always a Suitable Approach to Handle Incremental Frequent Pattern Mining?

Corresponding Author: Shafiul Alom Ahmed Department of Computer Science and Engineering, Tezpur University, Tezpur, India Email: tezu.shafiul@gmail.com Abstract: Incremental mining of frequent patterns has attracted the attention of researchers in the last two decades. The researchers have explored the frequent pattern mining from incremental database problems by considering that the complete database to be processed can be accommodated in systems’ main memory even after the database gets updated very frequently. The FP-tree-based approaches were able to draw more interest because of their compact representation and requirement of a minimum number of database scans. The researchers have developed a few FP-tree based methods to handle the incremental scenario by adjusting or restructuring the tree prefix paths. Although the approaches have managed to solve the re-computation problem by constructing a complete pattern tree data structure using only one database scan, restructuring the prefix paths for each transaction is a computationally costly task, leading to the high tree construction time. If the FP-tree construction process can be supported with suitable data structures, reconstruction of the FP-tree from scratch may be less time consuming than the restructuring approaches in case of incremental scenario. In this study, we have proposed a tree data structure called Improved Frequent Pattern tree (Improved FP-tree). The proposed Improved FP-tree construction algorithm has immensely improved the performance of tree construction time by resourcefully using node links, maintained in header table to manage the same item node list in the FP-tree. The experimental results emphasize the significance of the proposed Improved FP-tree construction algorithm over a few conventional incremental FP-tree construction algorithms with prefix path restructuring.


Introduction
In this 21st century, transactional databases are dynamic. Nowadays, researchers have focused on finding the hidden knowledge from these incremental databases. Frequent pattern mining is one of the most widely used knowledge retrieval techniques of data mining. The problem of mining frequent itemset was first brought to attention in the Apriori algorithm (Agrawal et al., 1993). Apriori algorithm is a level-wise computation, which employs multiple database scans and generates an enormous number of candidate itemsets. Moreover, it exercises costly testing and prune out approach to discard the redundant and infrequent candidate itemsets to generate the complete set of frequent patterns. Later on, many attempts have been made by the researchers to propose an efficient method to mine the frequent itemsets from large datasets by adopting the Apriori approach. However, most of the proposed approaches sustain the same multiple database scans, computation time (candidate itemsets) and space problems.
To mitigate the multiple database scans and an enormous number of candidate itemsets generation, a group of researchers, (Han et al., 2000) came up with a prefix path tree-based data structure approach called FP-Growth. FP-Growth handled the multiple scan problem by restricting it to only two. Moreover, it is capable of generating the complete set of frequent itemset without generating any candidate itemset. There have been many FP-tree based algorithms proposed by the researchers to improve the performance of FP-tree construction. However, those approaches primarily emphasize 206 efficiently constructing the FP-tree and generating useful frequent patterns from static databases. However, in dynamic or incremental databases, FP-tree cannot directly reflect the database modifications onto the FP-tree or the already generated frequent patterns. Reconstructing afresh FP-tree and incrementally restructuring the FP-tree are two possible ways to deal with the incremental scenarios. If only a few transactions are frequently added to the database, it becomes computationally infeasible to repeatedly reconstruct the FP-tree from scratch every time transactions are added to the database. Therefore, researchers have concentrated on incrementally restructuring the FP-tree without reconstructing the FP-tree from scratch. The main aim of the restructuring operation is to maintain the basic FP-tree structure and properties. The restructuring is performed to reflect the changes in the database directly onto the FP-tree. The restructuring is achieved by performing a sequence of costly swapping and merging of FP-tree prefix paths. Though incrementally restructuring the FP-tree requires only a single database scan, the restructuring using swap and merge for each transaction becomes much costlier than afresh FP-tree construction if the size of newly added transactions is enormous. It is required to perform the restructuring operation along all the prefix paths containing an item that is to be updated in the FP-tree. Therefore, if the updated database size and the dimensionality are huge, the incremental restructuring consumes a significant amount of time compared to afresh FP-tree construction. Depending on the database size and dimensionality, both approaches have their pros and cons.
Most importantly, the data structures have a significant influence on the performance of frequent itemset mining algorithms. The data structure used by the FP-Growth algorithm is a compact prefix tree data structure named Frequent Pattern tree (FP-tree). FP-tree consists of nodes and each node contains a value pair of data or item and its count. The nodes containing the same item are maintained in the list. Therefore, every time a new node is created, it must traverse the whole same item list and add the newly created node at the list's rear position. Hence, the FP-tree data structure performs well in dense databases as many transactions will share the common prefix paths in the tree. However, for sparse databases with higher dimensions and high average transaction lengths, the FP-tree size becomes vast. Therefore, traversing a long list every time a new node is inserted into the FP-tree demotes the tree construction performance drastically. Depending on specific characteristics of databases such as updated database size, dimensionality, the average length of transactions, dense and sparse, a few approaches perform well. However, it may not be feasible to handle the incremental scenarios with all cases. Therefore, in this study, we have addressed a new two scan based FP-tree construction algorithm named Improved FP-tree (IFP-tree) from scratch by manipulating the same item node links to handle the incremental scenarios. Instead of the linked list, we have maintained the nodes containing the same items in a stack. Stack enables to directly access the top node so that the newly created node can be inserted at the top of the stack without traversing the whole stack. Which saves a significant amount of time. Hence improves the IFP-tree construction time significantly. The IFP-tree construction algorithm's strength is that if the size of newly added transactions to the database is very high irrespective of dense or sparse, it outperforms few incremental restructuring algorithms and constructs the FP-tree from scratch approaches. The experimental results are significantly promising and establish the novelty of the proposed IFP-tree.
Frequent pattern tree construction algorithm plays an essential role in frequent pattern mining. The proposed Improved FP-tree data structure can efficiently mine frequent patterns and association rules from static and incremental datasets. Frequent patterns and association rules are used in different application domains such as market-basket analysis, risk analysis in commercial environments, disease factor analysis and patients survivability possibility analysis.

Preliminaries
The main objective of FIM is to generate the frequently occurring patterns or itemsets from transactional databases that are useful and meet users criteria in decision making. The usefulness and interestingness of patterns generated is gauged by some popular and most widely used measures discussed below: Let D be a transactional database with items I = [i1, i2….im} and set of all transactions T = {t1, t2….tn}. Each transaction tj is a subset of I. A transaction tj is said to contain an itemset say i, if i is a subset of tj.

Definition 1. Support Count ()
Support count is the total number of transactions in the database that contain an itemset. Mathematically, the support count  of an itemset P can be represented as:

Definition 2. Support (Supp)
Support of an itemset P can be defined as the percentage of transactions in the transactional database D that contains the itemset P. Mathematically, the support of an itemset P can be represented as: 207

Definition 3. Frequent Pattern or Frequent Itemset
An itemset is said to be frequent if the support of the itemset is greater than or equal to the user specified minimum support threshold minSupp. Formally, an itemset P is said to be frequent if it satisfies the following constraint:

Related Work
The problem of mining frequent itemsets from static databases was first coined by Agrawal et al., named Apriori (Agrawal et al., 1993). There are several variants of Apriori based incremental algorithms have been proposed by the researchers, for instance, FUP (Cheung et al., 1996), FUP-2 (Cheung et al., 1997), Border algorithm (Aumann et al., 1999), Modified borders (Das and Bhattacharyya, 2004), Update With Early Pruning (Ayan et al., 1999) (UWEP), DEMON (Ganti et al., 2001), Incremental Constrained APriori (ICAP) (Ayad, 2000), Maintaining Association Rules with Apriori Property (Zhou and Ezeife, 2001) (MAAP), Maximal Frequent Trend Pattern (MFTP) (Guirguis et al., 2006), PRE-HU (Lin et al., 2014). Like Apriori, if the length of the maximum frequent itemset is K, the approaches require at least K number of scans over the database. Several limitations, such as multiple database scans and the generation of an enormous number of candidate itemset of the Apriori algorithm makes it computationally infeasible to handle the incremental scenario of frequent pattern mining. Later on, (Han et al., 2000) propose an efficient approach, "FP-growth," using a prefix tree data structure called FP-tree. The researchers have exhaustively exploited the FP-tree of the FP-growth algorithm to mine frequent patterns as it can improve the mining performance compared to the candidate itemset generation and prune out mechanism of Apriori using multiple databases scans. FP-growth requires only two database scans. Since FP-tree is dependant on the user-defined minimum support threshold and the FP-tree contains information about only those items, it cannot be easily made compatible with the dynamic scenario. Reconstructing the FP-tree from scratch using two database scans every time new transactions are added to the database, or the minimum support change is not feasible. Although a significant number of approaches viz. nonordfp (Rácz, 2004), IFP Growth (Lin et al., 2011), LP Growth (Pyun et al., 2014), Incremental FP Tree (Adnan et al., 2006b), Alternative FP Tree (Alhajj and Barker, 2008), DB-tree (Ezeife and Su, 2002), FUFP (Hong et al., 2008) and Pre-FUFP (Lin et al., 2009) have been proposed in the last two decades. However, most of the approaches suffer from the same problems as FP-tree. Therefore, to deal with constructing afresh FP-tree, researchers have developed few new approaches, basically improvements over the FP-tree, to generate frequent patterns from incremental databases efficiently. The incremental approaches take only one database scan and apply a split, swap and merge operations sequence to construct the FP-tree incrementally. Few FP-tree based single scan incrementally restructuring approaches are discussed below.

FP-Tree based Incremental Approaches
Incremental Frequent Pattern Tree (Adnan et al., 2006a) does not reconstruct the FP-tree from scratch whenever the database changes. To achieve this, a complete FP-tree (assuming the minimum support threshold as one) is constructed. The complete reflects all the occurrences of items in the database onto the FP-tree. As the database gets updated, this algorithm incrementally updates the FP-tree without re-scanning the old database or reconstructing the FP-tree from scratch. This algorithm uses two primary operations: "Shuffling" and "merging" to maintain the FP-tree structure. However, this algorithm scans the original database twice to construct the initial FP-tree also scans twice the newly added set of transactions every time the database gets updated. The Fast Updated FP-tree (FUFP) (Hong et al., 2008) is an improvement over FP-tree based on the FUFP concept to mine incremental frequent patterns. This algorithm first divides the items into four groups based on whether the items are large or small in the old database and new transactions. Whenever these sets get changed, the Header- Table and the FUFP-tree are updated accordingly. When a sufficiently large number of transactions are inserted, then the entire tree needs to be reconstructed in a batch way. Pre-FUFP (Lin et al., 2009) is a modification over FUFP based on the concept of "pre-large" itemsets. This algorithm uses two threshold values; one is a lower support threshold and another one is an upper support threshold to define the pre-large itemsets. However, this algorithm requires an extra minimum support threshold input. The CATS-tree (Cheung and Zaiane, 2003) is an extension over FP-tree to improve storage compression so that it can be quickly adapted to mine frequent itemsets from incremental databases without candidate generation. The CATS-tree (Cheung and Zaiane, 2003) requires only a single pass over the dataset to construct the CATS-tree and each path from the root node to the leaf node represents a set of transactions. When the database gets updated, the new transactions are added at the root level. Then, the transactions' items are compared with the child_nodes in each level to determine the common items in both the new transaction and the child_nodes. If there are common items, then the transaction is merged with the nodes. The frequencies of the node are incremented. However, this algorithm consumes a considerable amount of time to update the tree when the database gets incremented. Adjusting FPtree for Incremental Mining (AFPIM) (Koh and Shieh, 208 2004) is also an improvement over the FP-tree structure; it uses two support thresholds to mine incremental frequent itemsets. It is an O(n 2 ) algorithm and also it requires extra pre-minimum support. Later on, to solve the problems of AFPIM, (Leung et al., 2005;2007) developed a tree data structure called CANonical-order Tree (CanTree). However, the ordering of the items in the CanTree should be unaffected even if the frequency of the items changes due to incremental database updates to maintain the items' canonical order. Which may not be possible in all cases of real-life scenarios. Though CanTree does not require any prefix path adjustment or restructuring, it requires a huge memory space to store the database information. Afterward, (Tanbeer et al., 2009) developed a tree data structure named CP-tree or Compact Pattern tree. CP-tree is constructed using only one database scan. A pre-defined number of transactions (slot) are inserted into the CP-tree one by one according to the pre-defined item order. After inserting a slot of transactions, if the frequency-dependent item order gets changes up to a pre-defined degree, the CP-tree construction algorithm restructures the tree by adjusting and sorting the prefix paths. Even if the CP-tree is periodically restructured, it still requires a considerable amount of time. The researchers proposed several FPtree-based incremental approaches with both single and multiple database scans during the last two decades. A summary of a few incremental approaches found in the literature is briefly discussed in Table 1.

Improved FP-tree Construction
The Improved FP-tree construction algorithm presented in this study is based on the basic paradigm of the FP-tree construction algorithm of FP-growth (Han et al., 2000). Unlike FP-tree, the Improved FP-tree is a complete tree, i.e., it stores all the database transactions without any information loss. The proposed Improved FP-Growth algorithm customizes the conventional FPtree construction algorithm to enhance tree construction performance. The performance enhancement is achieved by maintaining each list of nodes containing the same item of Improved FP-tree as a linked-list implemented as stack instead of maintaining a simple linked-list as a conventional FP-tree. The same item node-link from the header table points to the most recently inserted, i.e., the stack's top node. The stack implementation of the same item node list lets us access the top node without traversing the whole list directly. The direct access eliminates the same item node list traversal for every new node inserted into the Improved FP-tree during its construction. Which saves a significant of time. For example, let Node(X) is the top node of the same item list stack of item X. Therefore, the header table same item node-link for item X will be pointing to the stack top node Node(X). Whenever a new node Node(X) is inserted, we can directly access the top node Node(X) of the stack with header tables' node-link, without traversing the whole list for item X. Then the same item link of node Node(X) is set to the existing top node Node(X). After that, the header table's node link is updated to the recently added node Node(X). The complete step by step procedure for constructing the Improved FP-tree is illustrated in Algorithm 1 and Algorithm 2.
The working principle of Improved FP-tree construction is illustrated in this section by considering a small transactional dataset D [ Table 2] and the minimum support to be 1.
The frequency counts are set as zero after inserting the frequent items into the header table in frequency descending order. After that, the root node of the Improved FP-tree is created. In the second database scan, the transactions of the database are read and inserted into the Improved FP-tree one by one. The steps required to be performed to insert the transactions of D into the tree are as follows: (a) All items of the first transaction {b, d} are frequent are sorted according to the order of header table items and the resulting sorted transaction is {d, b}.
Since the root node has no child branches yet, a node (Node(d:1)) containing item = 'd' and count is created and inserted as the first child_node of root.
Then the frequency count of item 'd' in the header (e) After sorting the fifth transaction as {d, b, e}, it can be seen in Fig. 4 that, items 'd' and 'b' are already shared by a prefix path. Therefore, the node counts and the corresponding header table frequency counts are simply incremented by 1. Since, Node(b:2) has no child_node, so for item 'e', Node(e:1) is created and inserted as child_node of Node(b:2). Thereafter, the header table frequency count, header table node link and same item node link of newly created Node(e:1) are updated. The resultant Improved FPtree is illustrated in Fig. 5

Experimental Results Evaluation
In this section, we are going to analyze the performance of the proposed Improved FP-tree construction. The Improved FP-tree algorithm's significance is established by comparing its performance with FP-tree with two databases scanbased tree construction and two incrementally restructuring single scan-based approaches viz. CP-tree and SSP-tree. Several experiments have been carried out to assess the Improved FP-tree construction algorithm's effectiveness by considering the total tree construction time, the effect of updated database size and minimum support threshold change.

Experimental Environment and Datasets
All the tree construction and pattern growth algorithms are coded in C and run on Ubuntu-18.04.2 with 2.67 GHz CPU and 8 GB main memory. To assess the significance of the Improved FP-tree construction algorithm over other alternative tree construction algorithms, we have conducted experiments on both real and synthetic datasets, as well as dense and sparse datasets. The datasets presented in Table 5 are retrieved from the UCI Machine Learning Repository and FIMI Repository.

Complete Tree Construction Time
The initial experiment has been carried out to analyze the total time taken by the proposed tree construction algorithm to construct the Improved FP-tree. The total time taken by the Improved FP-tree construction algorithm has been compared to the total tree construction time of FP-tree, CP-tree and SSP-tree for different datasets mentioned in Table 5. As mentioned above, Improved FP-tree, as well as CP-tree and SSPtree, are complete trees. That means all three tree data structures are independent of the user-defined minimum support threshold. Therefore, all the items appearing in a database are considered for constructing the trees, irrespective of their frequency counts. Hence for performing a proper comparison, though the FP-tree construction algorithm takes two database scans, we have considered the minimum support 1 to construct the FP-tree. SSP-tree construction algorithm constructs the tree in a single scan over the database and processes the database transactions one by one. The algorithm performs some restructuring operations based on the updated header table item counts to maintain the FP-tree properties for each transaction. Which consumes a significant amount of time. On the other hand, the CPtree algorithm also constructs the tree by taking a single database scan. However, instead of performing restructuring for each transaction periodically, i.e., after inserting a certain number of transactions (slot), the restructuring is performed to improve the tree construction time. For this experiment, we have considered the slot size to be 10K for CP-tree construction. Therefore, the CP-tree construction algorithm performs the restructuring after inserting 10K transactions into the CP-tree. Table 6 depicts the total time taken by all the above-mentioned tree construction algorithms to construct or restructure the complete trees.
From Fig. 8, it can be observed that Improved FP-tree outperforms all the other three tree construction algorithms for both dense and sparse databases. However, for sparse databases, the performance of the Improved FPtree is more prominent. The number of items in a sparse database is very high as compared to a dense database. The possibility of sharing the tree prefix paths less, resulting in expanse the same item lists, increases the tree size. Therefore, every time a new node is inserted into the tree, the FP-tree construction algorithm has to traverse a long list of the same item nodes. In addition to the same item list traversals, CP-tree and SSP-tree construction algorithms also have to restructure the tree prefix paths. Table 6 represents the number of same item list traversals performed by different algorithms to construct the trees for different databases.
From Table 7, it can be observed that all three algorithms perform several traversals over the same item lists. Simultaneously, the proposed Improved FP-tree construction algorithm does not perform the traversal a single time. Though it constructs the Improved FP-tree from scratch using two database scans, neither it performs any same item list traversal nor requires incremental restructuring of the Improved FP-tree to handle incremental scenarios.

Effect of Updated Database Size
To assess the Improved FP-tree construction algorithm's performance concerning database update, i.e., in the incremental scenario, we have initially conducted experiments in one dense real-life database, "Connect-4" and one sparse synthetic database, "T40I10D100K". For the "Connect-4" database, to begin with, we have considered the first 15K transactions as input and executed the tree construction algorithm. Each time, the size of the input database size is updated with the next set of 15K transactions. Finally, the remaining 7557 transactions are set as input. For a fair comparison, we have considered the minimum support to be 1 for FP-tree, i.e., if an item occurs at least once in the database, it will be taken into account to construct the tree. Since the CP-tree performs the restructuring periodically, we have performed the restructuring after inserting every 2.5K transactions in this experiment. The time taken by the proposed Improved FP-tree construction algorithm and the other tree construction algorithms for a different-sized set of transactions of "Connect-4" is illustrated in Fig. 9.    From Fig. 9, it can be observed that in the incremental scenario, our proposed Improved FP-tree construction the algorithm outperforms the approaches with reconstruction from scratch as well as incrementally restructuring approaches. The dense databases contain a small number of attribute values or items. Therefore, the possibility of sharing the items in transactions is very high. Therefore, a tree prefix path or its sub-part is shared by multiple database transactions. Which maximizes the compactness of the tree. Since most of the tree nodes are shared by multiple transaction items, a minimum number of tree nodes creation and relatively smaller same item node lists lead to a minimum number of tree nodes creation. The same item node list is traversed if a new node is created and inserted into the tree. Though the same item node lists are relatively smaller for dense databases, it does not prevent the FPtree, SSP-tree, or CP-tree from traversing the list whenever a new node is inserted into the tree. On the contrary, in our proposed Improved FP-tree construction algorithm, it is not required to traverse the whole same item node list every time a new node is inserted into the Improved FP-tree. The stack implementation of the same item node list prevents it from traversing the same item node list. As the same item node-link points to the most recently inserted or the top node of the stack, we can directly access the last top node and insert the new node as the new top node. It saves a significant amount of time compared to conventional FP-tree, SSP-tree and CP-tree. A sparse database can be considered as contradictory to a dense database. That means the database contains a relatively large number of distinct items. Therefore, the possibility of sharing a prefix path is significantly less as compared to the dense database, which increases the size (breadth) of the FP-tree and leads to relatively longer same item node lists. Therefore, in the case of a sparse database, the performance of the FP-tree, SSP-tree and Cptree construction algorithms atrophy drastically. For the sparse database "T40I10D100K", we have considered the minimum support to be 1 and each time 20K transactions increment the input database slot size. Similarly, for CPtree construction, we have performed the restructuring after inserting 5K transactions. The time is taken by the proposed Improved FP-tree construction algorithm and the conventional FP-tree, SSP-tree and CP-tree construction algorithms for the different sized sets of transactions for the "T40I10D100K" database is illustrated in Fig. 10.
From Fig. 10, it can observe that in the case of the sparse database, the performance of the Improved FPtree construction algorithm is much prominent than FPtree, SSP-tree and CP-tree construction algorithms. For a sparse database, the same item node lists' length is relatively more extensive compared to dense databases. Therefore, FP-tree, SSP-tree and CP-tree consume a considerable amount of time to traverse those lists. Moreover, SSP-tree and CP-tree need to restructure the tree data structures. Fig. 10 shows that when the CP-tree is restructured after inserting every 5K transactions, SSP-tree and CP-tree take almost the same amount of time to construct the trees in the incremental scenario. Therefore, if the CP-tree will be structured after inserting a lesser number of transactions, i.e., it will increase the number of CP-tree restructuring and demote the tree construction performance. At some point in time, CPtree may require more time than SSP-tree restructures the tree. On the other hand, Improved FP-tree acquired great convenience concerning execution time over other tree construction algorithms by avoiding the same item node lists' repetitive traversal. Figure 9 and 10 show that though the Improved FP-tree is reconstructed from scratch every time the database gets updated, it takes significantly less time to construct the Improved FP-tree than other tree construction algorithms.

Effect of Minimum Support Threshold Change
Except for FP-tree, SSP-tree, CP-tree and the proposed Improved FP-tree are complete trees. But in our experiment, we have constructed the FP-tree also by considering the minimum support to be 1. Therefore, like complete trees, FP-tree also maintains all the database transactions without any information loss. Hence, even if the support threshold changes, it does not affect the tree construction algorithms' performance. The significant advantage of constructing a complete tree is that neither it is required to reconstruct the tree from scratch nor restructure the tree even if the support threshold changes. Moreover, it enables generating frequent patterns for any set of minimum support threshold without intrusion to the tree data structure. However, the complete tree has a significant disadvantage also. If the minimum support threshold is very high, only a few database items will be frequent. Therefore, the complete tree will consume a considerable amount of physical memory to maintain less interesting or infrequent items. Which will unnecessarily increase the tree size.

Computational Complexity Analysis
The conventional FP-tree and the proposed Improved FP-tree construction algorithms can be defined in three phases: Header table management, sort transaction and transaction insertion. On the other hand, prefix-path restructuring based SSP-tree and CP-tree construction algorithms require an additional phase, "prefix-path restructuring" to maintain the FP-tree properties. Therefore, the time complexity for each phase is analysed subsequently. Let D be a dataset of T transactions and containing N number of items. Let M is the longest transaction size, where 2  M  N. Therefore, M will also be the tree height:  Header Since FP-tree, SSP-tree and CP-tree traverse the whole same item list to insert a transaction item into the tree. Therefore, the total time complexity of all these three tree construction algorithms for this On the other hand, the proposed Improved FP-tree can directly insert a node without traversing the same item node list (constant time). Therefore, the total time complexity of Improved FP-tree construction algorithm for this phase is Therefore, the total time complexity of all the tree construction algorithms can be asymptotically represented as: From the time complexity analysis of the tree construction algorithms, it can be observed that the proposed Improved FP-tree construction algorithm is much faster than the other tree construction algorithms.

Discussion
The requirement of two database scans and its dependency on the user-defined minimum support threshold makes the conventional FP-tree technically infeasible for incremental frequent pattern mining. The FP-tree maintains only those database information or items which satisfy the user-defined minimum support threshold value. The items having a frequency count greater than or equal to the minimum support threshold are called frequent items. So, FP-tree excludes the infrequent items, which do not meets the minimum support threshold. Later on, if the database gets updated, it must reconstruct a fresh FP-tree from scratch. The infrequent items excluded while constructing the FP-tree for the original database may become frequent after the database gets updated. Therefore, to solve these problems, researchers came up with a new concept called the complete tree, which uses only single database scans.

218
The main idea behind constructing a complete tree is maintaining all the database information without any information loss. A complete tree's most significant advantage is that it is not required to reconstruct the FPtree from scratch even if the database gets updated or the support threshold changes. Whenever the database gets updated, the tree can be incrementally updated by performing some prefix path restructuring operations. It has been observed that the approaches use split, swap and merge operations to reflect the database update to the complete tree. Most of the incremental approaches perform the restructuring before inserting each transaction. Since restructuring is a very costly computation in terms of execution time, a few approaches construct the complete tree by periodically restructuring the tree data structure to minimize the computation cost. Figure 8 shows that though the incrementally restructuring approaches take only one database scan, they still require more time to construct the complete trees than the complete FP-tree. However, in cases of incremental scenario, from Fig. 9 and 10, it can be observed that the incremental approaches take comparatively less amount of time than FP-tree. Therefore, we initially experimented with analyzing the impact of database scans on tree construction. It has been observed that though FP-tree performs two database scans, but the total time required only to scan the database twice is very nominal as compared to complete tree construction time. Therefore, we have proposed and demonstrated an effective tree data structure (Improved FP-tree) construction algorithm in this manuscript.
Improved FP-tree is a complete tree constructed using two database scans. It is an improvement over the conventional FP-tree data structure; therefore, it is named Improved FP-tree. The main aim of our experiment is to minimize tree construction time. The Improved FP-tree construction algorithm has achieved a remarkable performance gain in terms of total tree construction time. It has been gained by intelligently maintaining the same item list as a stack instead of a simple linked list. The stack implementation of the same item list bypasses the whole list traversal and directly accesses the most recently inserted same item node in the tree. A new item can be directly added as the new top same item node since it removes the overhead of traversing the whole same item list every time a new node is inserted into the Improved FP-tree. It saves a significant amount of time. From Fig. 8, it can be observed that Improved FP-tree outperforms all the other three tree construction algorithms concerning complete tree construction. The Improved FP-tree construction algorithm constructs the tree from scratch using two database scans over the updated (original database + newly added transactions) database. Still, from the experimental results shown in Fig. 9 and 10, it can be observed that the performance of the Improved FP-tree construction algorithm is much prominent in the case of incremental scenario also. The Improved FP-tree outperforms conventional FP-tree, SSP-tree and CPtree construction algorithms in terms of runtime in all cases of incremental updates for both sparse and dense databases. However, there may be a situation when the updated database size is huge, then reconstructing the Improved FP-tree from scratch may not be a suitable approach to handle the incremental scenario.

Conclusion
As mentioned above, the FP-tree-based incremental frequent pattern mining approaches perform frequent pattern mining by considering that the complete database to be processed can be accommodated in the systems main memory even after the database gets updated very frequently. Most of the existing algorithms use at least two database scans to construct the FP-tree. A few methods have been found, using only a single scan over the newly added transactions to restructure tree data structure to handle incremental databases. In this research work, we have proposed a two scan based tree data structure called Improved FP-tree. From the experimental results, it can be observed that though it is required to reconstruct the Improved FP-tree from scratch whenever the database gets updated, it still requires less time to construct the tree. The proposed Improved FP-tree construction algorithm outperforms all the FP-tree, SSP-tree and CP-tree construction algorithms in incremental scenarios for dense and sparse databases. Since all the approaches and the Improved FP-tree is main memory dependent and it is not always a suitable approach to use the restructuring approaches to construct the tree. Only if a few transactions are added to the database can the restructuring approaches be recommended. Nevertheless, if the number of newly added transaction is very high, then constructing the tree using our proposed Improved FP-tree construction algorithm will save a significant amount of time. The computational complexity analysis also shows that our proposed Improved FP-tree construction algorithm outperforms the other frequent pattern tree construction algorithms. The main limitation of the proposed Improved FP-tree is that it is main memory dependent. If the Improved FP-tree tree size exhausts the available main memory during construction, the algorithm will fail to construct the complete Improved FP-tree. To solve the main memory dependent problem of the proposed Improved FP-tree, shortly we will make an effort to develop an approach that can efficiently mine the frequent patterns from large scan databases even if the tree data structure cannot be accommodated in the computer's main memory.