Anti-unification is the process of constructing a generalization common to two given symbolic expressions. As in unification, several frameworks are distinguished depending on which expressions (also called terms) are allowed, and which expressions are considered equal. If variables representing functions are allowed in an expression, the process is called "higher-order anti-unification", otherwise "first-order anti-unification". If the generalization is required to have an instance literally equal to each input expression, the process is called "syntactical anti-unification", otherwise "E-anti-unification", or "anti-unification modulo theory".

An anti-unification algorithm should compute for given expressions a complete and minimal generalization set, that is, a set covering all generalizations and containing no redundant members, respectively. Depending on the framework, a complete and minimal generalization set may have one, finitely many, or possibly infinitely many members, or may not exist at all;[note 1] it cannot be empty, since a trivial generalization exists in any case. For first-order syntactical anti-unification, Gordon Plotkin[1][2] gave an algorithm that computes a complete and minimal singleton generalization set containing the so-called "least general generalization" (lgg).

Anti-unification should not be confused with dis-unification. The latter means the process of solving systems of inequations, that is of finding values for the variables such that all given inequations are satisfied.[note 2] This task is quite different from finding generalizations.

Prerequisites edit

Formally, an anti-unification approach presupposes

  • An infinite set V of variables. For higher-order anti-unification, it is convenient to choose V disjoint from the set of lambda-term bound variables.
  • A set T of terms such that VT. For first-order and higher-order anti-unification, T is usually the set of first-order terms (terms built from variable and function symbols) and lambda terms (terms containing some higher-order variables), respectively.
  • An equivalence relation   on  , indicating which terms are considered equal. For higher-order anti-unification, usually   if   and   are alpha equivalent. For first-order E-anti-unification,   reflects the background knowledge about certain function symbols; for example, if   is considered commutative,   if   results from   by swapping the arguments of   at some (possibly all) occurrences.[note 3] If there is no background knowledge at all, then only literally, or syntactically, identical terms are considered equal.

First-order term edit

Given a set   of variable symbols, a set   of constant symbols and sets   of  -ary function symbols, also called operator symbols, for each natural number  , the set of (unsorted first-order) terms   is recursively defined to be the smallest set with the following properties:[3]

  • every variable symbol is a term: VT,
  • every constant symbol is a term: CT,
  • from every n terms t1,...,tn, and every n-ary function symbol fFn, a larger term   can be built.

For example, if x ∈ V is a variable symbol, 1 ∈ C is a constant symbol, and add ∈ F2 is a binary function symbol, then x ∈ T, 1 ∈ T, and (hence) add(x,1) ∈ T by the first, second, and third term building rule, respectively. The latter term is usually written as x+1, using Infix notation and the more common operator symbol + for convenience.

Higher-order term edit

Substitution edit

A substitution is a mapping   from variables to terms; the notation   refers to a substitution mapping each variable   to the term  , for  , and every other variable to itself. Applying that substitution to a term t is written in postfix notation as  ; it means to (simultaneously) replace every occurrence of each variable   in the term t by  . The result of applying a substitution σ to a term t is called an instance of that term t. As a first-order example, applying the substitution   to the term

f( x , a, g( z ), y) yields
f( h(a,y) , a, g( b ), y) .

Generalization, specialization edit

If a term   has an instance equivalent to a term  , that is, if   for some substitution  , then   is called more general than  , and   is called more special than, or subsumed by,  . For example,   is more general than   if   is commutative, since then  .

If   is literal (syntactic) identity of terms, a term may be both more general and more special than another one only if both terms differ just in their variable names, not in their syntactic structure; such terms are called variants, or renamings of each other. For example,   is a variant of  , since   and  . However,   is not a variant of  , since no substitution can transform the latter term into the former one, although   achieves the reverse direction. The latter term is hence properly more special than the former one.

A substitution   is more special than, or subsumed by, a substitution   if   is more special than   for each variable  . For example,   is more special than  , since   and   is more special than   and  , respectively.

Anti-unification problem, generalization set edit

An anti-unification problem is a pair   of terms. A term   is a common generalization, or anti-unifier, of   and   if   and   for some substitutions  . For a given anti-unification problem, a set   of anti-unifiers is called complete if each generalization subsumes some term  ; the set   is called minimal if none of its members subsumes another one.

First-order syntactical anti-unification edit

The framework of first-order syntactical anti-unification is based on   being the set of first-order terms (over some given set   of variables,   of constants and   of  -ary function symbols) and on   being syntactic equality. In this framework, each anti-unification problem   has a complete, and obviously minimal, singleton solution set  . Its member   is called the least general generalization (lgg) of the problem, it has an instance syntactically equal to   and another one syntactically equal to  . Any common generalization of   and   subsumes  . The lgg is unique up to variants: if   and   are both complete and minimal solution sets of the same syntactical anti-unification problem, then   and   for some terms   and  , that are renamings of each other.

Plotkin[1][2] has given an algorithm to compute the lgg of two given terms. It presupposes an injective mapping  , that is, a mapping assigning each pair   of terms an own variable  , such that no two pairs share the same variable. [note 4] The algorithm consists of two rules:

     
      if previous rule not applicable

For example,  ; this least general generalization reflects the common property of both inputs of being square numbers.

Plotkin used his algorithm to compute the "relative least general generalization (rlgg)" of two clause sets in first-order logic, which was the basis of the Golem approach to inductive logic programming.

First-order anti-unification modulo theory edit

  • Jacobsen, Erik (Jun 1991), Unification and Anti-Unification (PDF), Technical Report
  • Østvold, Bjarte M. (Apr 2004), A Functional Reconstruction of Anti-Unification (PDF), NR Note, vol. DART/04/04, Norwegian Computing Center
  • Boytcheva, Svetla; Markov, Zdravko (2002). "An Algorithm for Inducing Least Generalization Under Relative Implication". Proc. FLAIRS-02. AAAI. pp. 322–326.
  • Kutsia, Temur; Levy, Jordi; Villaret, Mateu (2014). "Anti-Unification for Unranked Terms and Hedges" (PDF). Journal of Automated Reasoning. 52 (2): 155–190. doi:10.1007/s10817-013-9285-6. Software.

Equational theories edit

First-order sorted anti-unification edit

Nominal anti-unification edit

  • Baumgartner, Alexander; Kutsia, Temur; Levy, Jordi; Villaret, Mateu (Jun 2013). Nominal Anti-Unification. Proc. RTA 2015. Vol. 36 of LIPIcs. Schloss Dagstuhl, 57-73. Software.

Applications edit

Higher-order anti-unification edit

Notes edit

  1. ^ Complete generalization sets always exist, but it may be the case that every complete generalization set is non-minimal.
  2. ^ Comon referred in 1986 to inequation-solving as "anti-unification", which nowadays has become quite unusual. Comon, Hubert (1986). "Sufficient Completeness, Term Rewriting Systems and 'Anti-Unification'". Proc. 8th International Conference on Automated Deduction. LNCS. Vol. 230. Springer. pp. 128–140.
  3. ^ E.g.  
  4. ^ From a theoretical viewpoint, such a mapping exists, since both   and   are countably infinite sets; for practical purposes,   can be built up as needed, remembering assigned mappings   in a hash table.

References edit

  1. ^ a b Plotkin, Gordon D. (1970). Meltzer, B.; Michie, D. (eds.). "A Note on Inductive Generalization". Machine Intelligence. 5: 153–163.
  2. ^ a b Plotkin, Gordon D. (1971). Meltzer, B.; Michie, D. (eds.). "A Further Note on Inductive Generalization". Machine Intelligence. 6: 101–124.
  3. ^ C.C. Chang; H. Jerome Keisler (1977). A. Heyting; H.J. Keisler; A. Mostowski; A. Robinson; P. Suppes (eds.). Model Theory. Studies in Logic and the Foundation of Mathematics. Vol. 73. North Holland.; here: Sect.1.3