SWITCHING-ALGEBRAIC ANALYSIS OF RELATIONAL DATABASES

There is an established equivalence between relational database Functional Dependencies (FDs) and a fragment of switching algebra that is built typically of Horn clauses. This equivalence pertains to both concepts and procedures of the FD relational database domain and the switching algebraic domain. This study is an exposition of the use of switching-algebraic tools in solving problems typically encountered in the analysis and design of relational databases. The switching-algebraic tools utilized include purely-algebraic techniques, purely-visual techniques employing the Karnaugh map and intermediary techniques employing the variable-entered Karnaugh map. The problems handled include; (a) the derivation of the closure of a Dependency Set (DS), (b) the derivation of the closure of a set of attributes, (c) the determination of all candidate keys and (d) the derivation of irredundant dependency sets equivalent to a given DS and consequently the determination of the minimal cover of such a set. A relatively large example illustrates the switching-algebraic approach and demonstrates its pedagogical and computational merits over the traditional approach.


INTRODUCTION
It has been known for decades that there is an equivalence between relational database Functional Dependencies (FDs) and a fragment of propositional logic (Delobel and Casey, 1973;Fagin, 1977;Sagiv et al., 1981;Fagin, 1982;Russomano and Bonnell, 1999;Zhang, 2009a;2009b;2010;YiShun and ChunHua 2012). Typically, that fragment covers what is known as Horn clauses in switching theory (twovalued Boolean algebra). Table 1 identifies related concepts and procedures in the domains of relational databases and propositional logic or switching theory. The traditional approach for the analysis and design of relational databases is based on the heuristic application of rules of inference. We demonstrate herein that such analysis and design can be facilitated, made more efficient, rendered algorithmic in nature, extended to problems of larger sizes and equipped with insightful visualization through the utilization of well established and readily-available tools of switching algebra.

MATERIALS AND METHODS
This study utilizes switching-algebraic tools in solving problems typically encountered in the analysis and design of relational databases. Application of armstrong's rules of inference Iterative-consensus procedure Heuristic procedure for closure derivation Algorithmic derivation for the complete sum

JMSS
The switching algebraic tools utilized include purelyalgebraic techniques (Muroga, 1979;Brown, 1990;Gregg, 1998), purely-visual techniques employing the Karnaugh map (Muroga, 1979;Brown, 1990;Gregg, 1998;Rushdi, 1997) and intermediary or mixed techniques employing the Variable-Entered Karnaugh Map (VEKM) (Muroga, 1979;Rushdi, 1985;1987;1997;2001a;2001b;Rushdi and Al-Yahya, 2000;2001a;2001b;Rushdi and Albarakati, 2012;2014;Rushdi and Amashah, 2011). The problems handled include; (a) the derivation of the closure of a Dependency Set (DS), (b) the derivation of the closure of a set of attributes, (c) the determination of all candidate keys and (d) the derivation of all irredundant dependency sets equivalent to a given DS and consequently the determination of the minimal cover of such a set. A single relatively large example is used to apply the switching-algebraic approach to each of these problems and to demonstrate its pedagogical and computational merits over the traditional approach.

The Derivation of the Closure of a Dependency Set (DS)
Starting with a set of functional dependencies A i → C i ,1≤i≤n, that constitutes a set S, we view these dependencies as propositional implications that we denote by the same symbols (A i → C i , 1≤ i ≤), taking liberty to allow a little abuse of notation. According to the Modern Syllogistic Method (MSM) (Blake, 1938;Brown, 1990;Rushdi and Al-Shehri, 2002;Rushdi and Ba-Rukab, 2007;2008a;2008b;2009;Rushdi and Baz, 2007), these implications reduce to the switching Equation 1: Which subsequently reduce to the single Equation 2: Which can be used to produce the equivalent result Equation 3 and 4: where, CS(g) stands for the complete sum of the function g. We derive CS (g) via any appropriate algorithm such as Tison algorithm (Tison, 1967;Cutler et al., 1979;Brown, 1990;Rushdi and Al-Yahya, 2001;Rushdi and Albarakati, 2014), or the algorithm of VEKM folding (Rushdi and Al-Yahya, 2001). Each of the prime implicants (prime consequents) in CS(g) is interpreted as an equation of the form (1) and hence converted to a propositional implication or equivalently to a functional dependency that is a member of S + .

Example 1
Consider the set of FD's described by (Dates, 2004): We want to derive the closure S + of the set S. We first transform the set S from the relational domain to the Boolean domain as Equation 5: Now, we derive CS(g) via the improved Tison algorithm (Rushdi and Al-Yahya, 2001) as detailed in Fig.  1. The final complete sum (after six iterations of consensi generation with respect to each biform variable, followed by absorptions of subsuming products) is Equation 6: Note that CS(g) consists of 23 prime implicants, each of which has a single complemented literal.

The Derivation of the Closure of a Set of Attributes
Given a set Z of attributes of relational variable (relvar) R and a set S of FDs that hold for R, the closure Z+ of Z under S is the set of all attributes A of R such that the FD {Z→A} is a member of S+ (i.e., such that the FD {Z → A) is implied by the FDs in S). Thanks to reflexivity or self-determination {A→A}, this definition agrees with the requirement that Z is a subset of Z + . If we express the given set of FDs as a single equation (g = 0) and then convert its LHS g into the complete-sum form Science Publications JMSS CS(g), as we did earlier, then we can deduce the closure Z + of as follows. Initially, any attribute in Z is added to Z + . Subsequently, a single pass is made through the set of prime implicants P i whose disjunction constitutes CS(g). If the uncomplemented variables in P i represent a subset of Z, then the attribute represented by the single complemented variable in P i is added to Z + (if it is not already there). If the closure Z + of a set of attributes Z equals the total set of attributes of R, then Z constitutes a superkey. If such a set Z is irreducible, then this superkey is a candidate key.

Example 1 (Revisited)
Consider the relational variable (relvar) R with attributes A, B, C, D, E and F and the dependency set described by (4). Equation (6) can be utilized to deduce the closure of any subset of the set of attributes Z = {A, B, C, D, E, F}. We can ascertain that {C} + is given by Equation 8: Thanks to the property of self-determination {C→C} and the existence of prime implicant CA in (6), which is equivalent to the dependency {C→A} Similarly Equation 9: Since the existence of prime implicants DE and DF in (6) is equivalent to the dependencies {D→E, D→F}. Together with {D→D}, these dependencies result in (9). Likewise, the closure {B, C} + equals {B, C, D, E, F} thanks to the appearance of prime implicants BCD, BCE and BCF in (6). The result that Equation 10: Can be accounted for similarly by noting that {C, D} + is a superset for the union of {C} + and {D} + in (8) and (9) and including the effect of the prime implicant CDB{CD B} → in (6). This means that CD is an irreducible superkey (i.e., a candidate key) for the FD's in (4).

The Determination of All Candidate Keys
Since a natural (albeit typically redundant) superkey is the conjunction of all attributes X i , we might supplement the set of FD's by an extra dependency of the form Equation 11: where, K is an additional attribute that stands for "Key". Now, the function g in (5) is replaced by a function f given by Equation 12: The prime implicants in CS(f) that contain the literal K , i.e., that are of the form j J j And hence the set J of attributes are superkeys. In fact, they are candidate keys since they are irredundant, because they correspond to prime implicants. Therefore, one can obtain all the candidate keys by obtaining the complete sum of the function f defined by (12). A candidate key corresponds to the uncomplemented literals in any prime implicant that contains the complemented literal K . This scheme agrees with the procedure in Zhang (2009a). Now, since K is a mono-form variable, it plays no role in the consensus generation used in complete-sum derivation. Hence, we can dispense with it altogether and rewrite (12) with K deleted, i.e., in the form Equation 14: In this new scheme, the final result for CS (f) is Equation 15: where, the prime implicants (∧ j∈J X j ) of solely uncomplemented literals are the candidate keys. This scheme agrees with the procedure in (Russomano and Bonnell, 1999). It could be enhanced if CS (g) is already available, for then we redefine f in (14) by the equivalent formula: Science Publications

JMSS
Equation (16) suggests a method employing Tison algorithm incrementally (Kean and Tsiknis, 1990) for deriving the complete sum. In fact, we can restrict our attention to the consensi generated between terms without complemented literals (initially the single term and various terms of CS(g) w.r.t. bi-form variables traversed one by one. The final set of these consensi is the set of all candidate keys, i.e., To implement the aforementioned method, let us use G k to denote terms in G k containing variable number k complemented and U k to denote the disjunction of uncomplemented implicants of f at step k. At each step of the incremental Tison algorithm, U k is updated via Equation 17: Here, ABS (F) is an absorptive formula for the sumof-products formula F which is a formula obtained from F by successive deletion or absorption of terms subsuming other terms in F (Brown, 1990;Rushdi and Al-Yahya, 2001). The expression U k can be interpreted as a disjunction of superkeys. Its initial value is ( ) n i 1 i X = ∧ and its final value is the disjunction of all prime implicants of CS(f) with solely un-complemented literals, which are the candidate keys.

Example 1 (Revisited)
We want to determine all candidate keys for the set of FD's given in (4). We construct a function f (A, B, C, D, E, F) which equals the function g in (5), (or its complete sum in (6)) disjunctioned with a term (A B C D E F) that equals the conjunction or ANDing of all pertinent variables, all un-complemented. Table 2 shows an implementation of the incremental Tison algorithm via (17). The disjunction of the prime implicants of CS(f) with un-complemented literals is Equation 18: This result means that there are seven candidate keys for the given set of FDs, namely, CD, CE CF, AB, BC BD and BE. It is straightforward to verify that each of these keys is indeed a candidate key by inspecting CS(g) in (6). For example, BC is a superkey due to the existence of prime implicants AC,BCD, BCE and BCF . Further, it is a candidate key since it is irreducible.

The Derivation of Irredundant Dependency Sets Equivalent to a Given DS
An Irredundant Disjunctive Form (IDF) for a switching function g is a disjunction of prime implicants such that removal of any of the prime implicants makes the remaining formula not express the original g (Muroga, 1979). This means that an IDF for g is a minimal sub-formula of CS(g) that covers g (Rushdi and Al-Yahya, 2002). The corresponding entity in the relational domain, namely, the Irredundant Dependency Set (IDS), is defined similarly (see, e.g., (Dates, 2004), with an additional requirement that a functional dependency of multiple consequents be decomposed into several FDs of single consequents. For example, the FD A → BCD must be replaced by the FDs A → B, A→ C and A → D, that map into the terms or products AB, AC and AD , which can fit into a disjunctive form. An Irredundant disjunctive form (and correspondingly, an irredundant dependency set) is not necessarily unique.
There are many algebraic, tabular or mapping methods for obtaining all the IDFs of a switching function (Muroga, 1979). Most of these methods are algorithms that use the complete sum as a starting point when it is available, or more generally act in a 2-step fashion by finding the complete sum first before proceeding to derive the IDFs. There are other heuristic methods for obtaining the IDFs directly, such as those employing the conventional Karnaugh map (Muroga, 1979;Brown, 1990;Gregg, 1998;Rushdi, 1997) or the Variable-Entered Karnaugh Map (VEKM) (Muroga, 1979;Rushdi, 1985;1987;1997;2001a;2001b;Rushdi and Al-Yahya, 2000;2001a;2001b;Rushdi and Albarakati, 2012;2014;Rushdi and Amashah, 2011). These heuristic methods are not guaranteed to find all the IDFs, but they typically find most of them, including the best or minimal among them. We now apply a VEKM minimization procedure to our running example.

Example 1 (Revisited)
For comparison purposes, we present herein the traditional heuristic for obtaining the IDSs that are equivalent to the DS in (4). The first step is to rewrite the given set in (4) such that every FD has a singleton consequent or right side. We denote the resulting set as set I.

JMSS Table 2. Derivation of CS (f) in (16) via the incremental tison method
So far, we have identified 5 irredundant sets (Sets II to Set VI). However, the total number of irredundant sets is at least 16. As Fig. 2 indicates, FDs 1, 2, 10 and 11 are included in every identified irredundant set. These are supplemented by (a) Either FD3 or FD3a, (b) Either FD5 or FD5a, (c) Either FD7 or FD7a and (d) FD8 or a combination of FD9 and FD12. Referring to the original set (Set I), we note that both FD4 {ACD→B} and FD6 {CE→A} have disappeared entirely being non-prime implicants (since they subsume the prime implicants CDB(CD B) → and AC(C A) → , respectively). The result obtained in Fig. 2 can be restated to express the corresponding irredundant disjunctive forms as: 7 7 a 8 9 12 10 11, IDF P V P V P P V P P v p p Out of the 23 prime implicants of g that appear in CS(g) in (6) The notation P 3 {P 3α }with curly brackets means that either prime implicant P 3 or prime implicant P 3α is included in the IDF. Since Equation (19) has 4 such binary alternatives, it represents 2 4 = 16 IDFs. Out of these, there are 8 minimal covers (the ones employing P 8 rather than its alternative (P 9 V P 12 ). However, due to the nonalgorithmic nature of the current heuristic, we are not fully sure that we have exhausted all IDFs and hence all minimal covers.
The result of (19) is now recovered via VEKM minimization in the switching domain. We use Boole-Shannon expansion (Brown, 1990;Rushdi and Al-Yahya, 2001) to obtain the VEKM representation of the function g in (5) as shown in Fig. 3. An almost minimal s-o-p expression for the function g is given (Rushdi, 1987;Rushdi and Al-Yahya, 2000;2001) by Equation 20: where, P r is a prime implicant of one or more of the subfunctions of the function g, i.e., it is a product that appears in at least one VEKM cell. Each P r is ANDed with its minimal s-o-p contribution to g, namely Co (P r ). This contribution can be represented by a CKM directly deducible from the original VEKM according to heuristic rules stated in (Rushdi, 1987;Rushdi and Al-Yahya, 2000;2001b). EFadds P CEF if P = is not added P CFBif P and P = are not jointly added Fig. 10. Contribution of entered product D adds 12 P CDB = to join P 9 in replacing P 8 Fig. 11. Contribution of product DF adds 9 P CFD = to join P 12 in replacing P 8 Note that Co (P r ) is a function of the map variables of the VEKM only while P r itself is a function of the entered variables of the VEKM only.  (19) is recovered via the VEKM heuristics.

DISCUSSION
This study presented switching-algebraic analysis of four important and related problems of relational databases, namely, (a) the problem of deriving the closure of a set of functional dependencies, (b) the problem of deriving the closure for a set of attributes, (c) the problem of deriving all candidate keys and (d) the problem of deriving irredundant dependency sets equivalent to a set of functional dependencies and the determination of the minimal cover of such a set. Three of these four problems were shown to require the derivation of the complete sum (blake canonical form) of certain switching functions obtained via an appropriate transformation of the relational functional dependencies. These three problems were solved by two updated versions of Tison algorithm. The fourth problem could have been solved also by the derivation of a complete sum and the construction of a presence function (Petrick function) (Muroga, 1979;Rushdi and Al-Yahya, 2002), but it was solved herein via a heuristic utilizing the Variable-Entered Karnaugh Map (VEKM). All four problems were presented in ample detail and their methods of solution were demonstrated via the same relatively large example.

CONCLUSION
This study is a serious attempt to transform relational database concepts to the switching-algebraic domain and hence to utilize switching-algebraic procedures and concepts in relational databases. This attempt stresses the pedagogical advantages gained when one departs from the traditional database approach and utilizes pictorial tools of switching theory and digital design, such as the Variable-Entered Karnaugh Map (VEKM). The traditional database approach, adopted by almost all textbooks on database design is based on the heuristic application of axioms and lemmas for the manipulation of functional dependencies. This study replaces the traditional heuristics in the relational domain by more powerful and insightful heuristics and algorithms in the switching domain.
An interesting topic for further research stems from the fact that the function dealt with in deriving the closure for dependency sets is a disjunction of terms derived from particular Horn clauses. Each of these terms is a product (ANDing) of a single complemented literal with some other un-complemented literals. This feature should be studied with the hope of simplifying the algorithm that extracts all the prime implicants. A pertinent question in this respect is whether a linear representation of a switching function (e.g., Rushdi and Ghaleb, 2013;Rushdi and Alsogati, 2013) could provide anyadvantage over the current sum-of-products representation.
Another promising direction of potentially fruitful research is to utilize the switching-domain tools of this study in the implementation of the coneptual database designing model advocated by Hegazi (2014).

ACKNOWLEDGEMENT
This article was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah. The authors, therefore, acknowledge with thanks DSR technical and financial support.