© 2010 Science Publications Specialization of Recursive Predicates from Positive Examples Only

Problem statement: Given an overly general (definite) program P and its intended semantics Φ (the programmer’s intentions) where P does not satisfy Φ, find out a new version P’ of P such that P’ satisfies Φ. Approach: We proposed an approach for correcting overly general programs from positive examples by exploiting program synthesis techniques. The synthesized program, P’, is a specialization of the original one, P. In contrast to the previous approaches for logic program specialization, no negative examples were given as input but they will be discovered by the algorithm itself. The specialization process is performed according to the positive examples only. A method for refining logic programs into specialized version was then proposed. Results: The proposed approach was able to correct overly general programs using positive examples. We showed that positive examples can also be used for inducing finite-state machines, success sequences, that models the correct program. The failing sequences also exploited by theorem proved to produce counter-examples as in model checking, by composing substitutions used for inducing failing sequences. Conclusion: The contribution of the study was mainly the use of specification predicates to specialize an overly general logic program.


INTRODUCTION
Our aim is to present a top-down approach for logic program specialization w.r.t the intended speciation which is a first-order formula of the following form: ∀ x ∃ Y φ ( x , Y )=∀ x ∃ Y Γ ( x , Y ) ←∆( x ) (or Γ ←∆ for short) where Γ and ∆ are conjunction of atoms. The problem we are interested in can be stated as follows: Given: An overly general (definite) program P = (E + ∪C) where E + is a recursive sub-program defining positive examples, C is supposed to be the set of clauses defining the overly general predicates (i.e. the incorrect component of P) and the intended semantics φ for P (the programmer's intentions) with M(P)| = φ where M (P) denotes the least Herbrand model of P.
Find: A definite program D, called a specialization of C, such that M(D) ⊆ M(C) and M(P') |= φ where P' = (P/C)∪D.
The program P' is called a correct specialization of P' w.r.t E + if M(P') ⊆ M(P), M(E + ) ⊆ M(P') and for any negative example e−, M(P') |= e−.
Roughly the approach takes an overly general program P and its intended semantics φ and tries to produce a program P' that is guaranteed to satisfy the specification and therefore does not require verification. We outline a top-down method for synthesizing a correct and consistent logic program P' that satisfies the given specification. Moreover, the negative examples E − correspond to ground atoms that are not deducible from P' and are automatically discovered during the specialization process.
For example, assume we are given the overly general program P = (E + ∪C) where: 2 even(0) E even(n) even(s (n)) +  ←   ←   and len([],0) C len(x, n) len([a | x],s(n)) ←   ←  and its intended specification: :even (n) len (x,n) φ ← supposed to establish the claim "if n is the length of list x, then n is even". For this specification P is false as we discover while attempting to prove it, for example there are particular values of the list x that generate negative examples: It is the case where the number of elements in x is odd and the negative examples even (s 2k+1 (0)) k = 0,..,n will be generated (s is the successor function). But with the specialized version D of C, the new program P' = (P\C)∪D satisfies φ up to renaming the predicate len by len2 that is defined as follows: The new predicate len2 is called a specialization predicate of φ w.r.t the predicate len. The proposed method consists to synthesize D.
Throughout the study, Γ, ∆ and Λ denote conjunctions of atoms; φ denotes the intended specification (a first-order formula); A and B denote atoms and θ denotes substitution. In all formulas, existentially quantified variables are distinguished from universal variables by giving them upper-case letters. A program is a set of definite clauses, denoted by calligraphic letters: P, Q,....

MATERIALS AND METHODS
Let C be a conjunction of atoms. Then µ(C) = ∅ if C is the predicate true and the multi-set of the atoms of C otherwise.
Definition 1: A conjunction of atoms C 1 is a specialization of (or is syntactically less general than) a conjunction of atoms C 2 (denoted 1 2 C C ≺ ) with a substitution θ iff µ (C 2 θ) ⊆ µ (C 1 ) (Flener and Deville, 1993).
The following definition expresses the relation of generality between two horn clauses.
The following definition expresses the relation of generality between two logic programs.
Definition 3: Let P 1 and P 2 be two definite logic programs, {c 1 ,…, c n } the set of clauses of P 1 and {d 1 ,…, d m } the set of clauses of P 2 . P 1 is a specialization of P 2 (denoted 1 2 P P ≺ ) iff for all 1≤ i≤n, there exists 1≤j≤m s.t. i j c d ≺ .
Definition 4 (Specialization predicate): Let P 1 and P 2 be two definite programs defining the predicates p 1 and p 2 respectively. If P 1 is a correct specialization of P, i.e., p 1 ≺ p 2 , w.r.t the intended specification φ, then p 1 is called a specialization predicate of p 2 w.r.t φ.
Proposition 1: Let φ: Γ ←∆, p 2 be the intended specification of the program P. If p 1 is a specialization predicate of φ with respect to the predicate p 2 , then we have M(P∪P 1 ) |= ((Γ, ∆, p 2 )←p1) where P 1 defines the predicate p 1 .

Proposition 2:
If p 1 is a specialization predicate of φ with respect to the predicate p 2 , then the two formulas (1) and (2) are equivalent: Proof: It is easy to see that the formula (1) is equivalent to: Moreover, if p 1 ≺ p 2 , (p 2 ← p 1 ) is a theorem Therefore, the formula (3) is equivalent to: For example, the formula (even (n)←len(x,n), len 2 (x, n)) is equivalent to even(n)←len2(x, n).

Notation 1:
Hereafter and for the sake of simplicity, the notation <φ|p> stands for <φ←p>.
In the definitions 5-8, we define the semantic calculus that allows given P and its intended specification φ such that P is faulty w.r.t φ, to find P' such that P' satisfies φ.

Transformation rules:
The algorithm applies the transformation rules unfolding, folding and simplification (Sakurai and Motoda, 1988). Intuitively, unfolding is an extension of SLD-resolution and folding applies the induction hypotheses. Indeed, whereas an unfold step replaces a term that "matches" the conclusion of a definition in the program by the corresponding hypothesis, a folding step replaces a conjunction of atoms that match the hypothesis of an induction hypothesis by the corresponding conclusion.
There are two kinds of unfolding rules: The negation as failure inference (nfi for short) that replaces a predicate call, at the right hand side, by the corresponding body and the definite clause inference (dci for short) that replaces a predicate call, at the left hand side, by the corresponding body (Sakurai and Motoda, 1988).
To specialize the original program, it is vital to keep trace of substitutions in the specialization predicates, denoted IO (for Input Output). Therefore each transformation rule is associated with a procedure construction of the corresponding specialization predicates. The application of an unfolding rule on a formula φ 0 generates a finite set of formulas φ i , i = 1,...,k, such that φ 0 follows from the φ i 's in the least Herbrand model of the program under consideration. Each formula φ i is associated with a specialization predicate, as it can be an overly general clause and defined by a program, noted Q R where R is the applied rule. If φ i is trivially true, its associated specialization predicate, IO i , is set to true. If φ i is trivially false (covers only the negative examples), then its associated specialization predicate is set to false and in this case all clauses containing this predicate will not be included in the synthesized program D. The process is iterated until all the formulas newly generated are trivial. The arguments of the input output predicate IO i are those that appear in the corresponding formula φ i .
The folding rule (cutr for short) is necessary for synthesizing recursive predicates.
Definition 5 (negation as failure inference): Let P be a program, φ 0 : Γ ←∆, A a formula and C = {c 1 ,...,c k } the set of clauses of P such that c i : B i ← ∆ i and suppose there is a substitution θ i s.t B i θ i = Aθ i . Then the application of the rule of nfi on φ 0 w.r.t to the atom A yields a conjunction of k formulas: Definition 6: (definite clause inference) Let P be a program, φ 0 : Γ ←∆ a formula and C = {c 1 ,..., c k } the set of clauses of P such that c i : B i ← ∆ i and suppose there is a substitution θ i s.t B i θ i = Aθ i . Then the application of the rule of nfi on φ 0 w.r.t to the atom A yields a disjunction of k formulas: where, IO i is the specialization predicate of φ i w.r.t the predicate A and Q dci = {IO 0 θ i ← IO i } i=1 , … , k . Unlike in Q nfi , the substitution θ i is an existential one (θ i substitutes only existential variables of A) in Q dci .
Definition 7 (folding rule or cutr): Let φ 0 : Λ ← Π and θ 1 : Γ ←∆ 1 , ∆ 2 be two formulas satisfying the following conditions: (i) φ 1 is obtained (directly or indirectly) from φ 0 by the rule of nfi, (ii) ∆ 1 is an instance of Π, i.e., there is a substitution θ such that Πθ = ∆ 1 , (iii) for any local variable x in Π, xθ is a variable and does not occur other than in ∆ 1 and (iii) θ replaces different local variables of Π with different local variables of ∆ 1 . Then replace φ 1 by φ 2 : and IO 2 are the associated specification predicates of φ 0 , φ 1 and φ 2 respectively. φ 0 thus plays the role of the induction hypothesis. This important rule allows us to synthesize recursive predicates. All the rules are partially correct (Demba et al., 2005). Substitutions of variables during the specialization process are stored into the specialization predicates.

RESULTS
Let P be a logic program and φ its intended specification (intended to be true). If M(P)|=φ, then P is buggy. Our goal is (i) to isolate the set of incorrect axioms, denoted by C, of P w.r.t φ and (ii) to synthesize a sub-program, denoted by D, from the proof attempt of P w.r.t. to φ and (iii) to determine the program P' = (P\C)∪D where D is a specialization of C and M(P') |= φ.
Suppose P = (E + ∪C) and D the synthesized program during the proof attempt of P w.r.t φ. Suppose C = {c 1 ,...,c n } and D = {d 1 , …, d m }, here is the specialization algorithm Fig. 1.
Hereafter, we will add clause numbers to incorporate into Fig. 2 and 3 of clause sequences in order to clarify which clauses have been resolved with.
Example 1: Consider the program P = (E + ∪C) where C expresses that any natural number is odd: It is clear that M(P)|=φ, for example we have even(0) and odd(0). Therefore, the program P and specially the sub-program C, covers negative examples. To fix this problem, we need to specialize the predicate odd. To do that, assume IO 0 is a specialization predicate of odd associated toφ: φ: even (s(n)) ← odd(n) | IO 0 (n) Note that we need to synthesize a definite program D defining the predicate IO 0 such that M (E + ∪D) |= φ with φ = even(s(n))←IO 0 (n) and M(D) ⊆ M(C). D is initially empty.
The specialization process of C w.r.t E + is in the way depicted in Fig. 2. The first step consists to unfold Á upon the atom odd (n) using the rule of nfi to obtain the following resultants: The clause c 5 corresponds to a negative example as M(E + ) |= even (s(0)), then its associated specialization predicate IO 1 is set to false and the clause IO 0 (0)←IO 1 () is not included in D: We apply again the rule of nfi on c 6 upon odd(n) to get: ( ) as c 9 is an instance of φ, to complete the proof we can apply the folding rule (cutr), to obtain: 10 6 c : even(s(n)) even(s(n)) | IO (n) ← and the clause IO 5 (n) ← IO 0 (n); IO 6 (n) is generated. The formula c 10 can be simplified to true, IO 6 is then set to true. The final program D is then:  Tamaki and Sato (1984), we get the final version: and we have the correct specialization program P' = (P\C)∪D of P w.r.t E + and M(P') |= φ where φ: even(s(n)) ← IO 0 (n) (according to the Proposition 2).
Note that the clause c 3 is automatically removed and c 4 is refined by specialization. The success sequences of clauses that cover only the positive examples, E + , are of the following form c 4 (c 4 c 2 c 4 )*c 3 represented by the Fig. 2. Any other combination of clauses will cover negative examples, then leads to failure. From E + , we can induce the finite-state machine of Fig. 2 that corresponds to the sub-program D.
The Fig. 2 can be interpreted as follows: the transition c 4 corresponds to the application of the rule of nfi while the transition c 2 corresponds to the application of the rule of dci. The loop means that recursive specialization is necessary, that is the application of the folding rule (cutr) is needed to complete the process.
The approach can also be applied to more complex specification. For example a specification with existential variables or an original program where the positive examples consist of different predicates as in the following example. sup(x, y) E c : nat (0) c : nat(s(x)) nat(x) c : plus(0, x, x) C c : p lus(s(x), y,s(z)) plus(x, y, z) sup(x,y) means that x>y, plus(x,y,z) means that z = x+y and nat(x) is true if x is a natural number. P does not satisfy its intended semantics φ for x = 0 and y = 0. Again, to fix the problem the predicate plus has to be specialized to IO 0 that is defined as follows: Surprisingly, D is a specialization of C and M(K∪D) |= (sup(z, x) ← IO 0 (x, y, z)). Comparing C and D, we can say that the error was in the first clause of C, i.e., the underlined arguments. The correct program P' is then: The success sequences of clauses that cover only the positive examples, E + , are depicted by the Fig. 3. Any other combination of clauses will cover negative examples, then leads to failure. From E + , we can induce the finite-state machine of Fig. 3 that corresponds to the sub-program D.
From E + , we can induce the finite-state machine of Fig. 3. The transitions c 3 and c 4 correspond Fig. 3. Specialization of plus(x,y,Z) w.r.t E + to the application of the rule of nfi and the transitions c 5 and c 6 correspond to the application of the rule dci. Again to complete the process, the application of the folding rule is needed. Bostrom and Idestam-Alquist (1994;1999) presented top down approaches such as the divide-andconquer, covering and SPECTRE algorithms for logic program specialization using unfolding and clause deletion rules. One of the limitations of those algorithms is that the divide-and-conquer algorithm does not work when specializing clauses that define recursive predicates and the SPECTRE algorithm cannot synthesize recursive specifications. A bottom-up approach has been proposed in (Kanamori and Seki, 1986;Ferri et al., 2001;Leuschel and Massart, 2003). Ballis (2005) claimed that his approach can be applied as a top down or a bottom up approach. All those approaches are driven by (a finite set) positive and negative examples. It is not also clear how they handle cases when some positive examples are not included in the specification. Other works have been proposed to correct faulty specification (Protzen, 1996Monroy, 2000 and all deal with faulty universally quantified equations.

DISCUSSION
To guarantee that all positive examples are included in the original program, we have proposed to represent them not as a set of ground terms but a recursive program denoted E + . The intended specification we consider is not limited to Horn clauses but a first-order formula with universal and existential variables. The negative examples are not given as input but discovered during the proof process. Recursive predicates are synthesized, if needed.

CONCLUSION
We have presented a new way to specialize logic programs from positive examples only. With this approach recursive predicates can be obtained. We have shown that positive examples can be used for inducing finite-state machines (success sequences). The failing sequences could also be exploited by theorem proves to produce counter-examples as in model checking, by composing substitutions used for inducing failing sequences. The presented approach is implemented in Ocaml and integrated into the interactive proof assistant SPES (Demba et al., 2005). The contribution of the study is mainly the use of specification predicates to specialize an overly general logic program.
The framework presented here has two major advantages: (i) The positive examples defined in E + are guaranteed to be included both in the meaning of the original program and of the specialized version. Note that in (Ballis, 2005;Alpuente et al., 2001;Bostrom and Idestam-Almquist, 1999), E + consists of a finite set of ground atoms and it is not clear how they handle cases when some positive examples are not included in the original program. (ii) The specialization process is performed according to the positive examples only, no need to negative examples. (iii) It supports reasoning about specifications whose stat-spaces may be infinite.
But more works are needed to guarantee the termination of the procedure. This problem is due by the fact that the procedure is based on theorem proving techniques.