# Finite model theory

Finite Model Theory (FMT) is a subarea of model theory (MT). MT is the branch of mathematical logic which deals with the relation between a formal language (syntax) and its interpretations (semantics). FMT is a restriction of MT to interpretations of finite structures, i.e. structures with a finite universe.

• Since many central theorems of MT do not hold when restricted to finite structures, FMT is quite different from MT in proof methods. Failing central results include the compactness theorem, Gödel's completeness theorem, and the method of ultraproducts for first-order logic.
• As MT is closely related to mathematical algebra, FMT became an "unusually effective"[1] instrument in computer science. In other words: "In the history of mathematical logic most interest has concentrated on infinite structures....Yet, the objects computers have and hold are always finite. To study computation we need a theory of finite structures."[2] Thus the main application areas are: descriptive complexity theory, database theory and formal language theory.
• FMT is mainly about discrimination of structures: can a given class of structures be described (up to isomorphy) in a given language, e.g. can all cyclic graphs be discriminated (from the non-cyclic ones) by a First-order (henceforth FO) sentence, i.e. is the property "cyclic" FO expressible.

## Basic Challenges

Basically FMT is about the discrimination of structures, i.e. can a set of structures be uniquely described in a certain language. We will see that this can be achieved in FO for a single structures always, for a finite set of structures sometimes and for a set containing infinite structures never.

### Characterisation of a Single Structure

Is a Language L expressive enough to describe a single finite Structure S uniquely (up to isomorphy)?

Single graphs (1) and (1') having common properties.

#### Problem

Given a structure like (1). This structure can be described by FO sentences like

1. every node has an edge to another node: $\forall_x \exists_y G(x, y).$
2. no node has an edge to itself: $\forall_{x,y} (G(x, y) \Rightarrow x \neq y).$
3. there is at least one node that is connected to all others: $\exists_x \forall_y (x \neq y \Rightarrow G(x, y)).$

Now do these properties describe the structure uniquely (up to isomorphy)? Obviously not since for structure (1') the above properties hold as well. Simply put, the question is, if one adds enough properties, is it possible that these properties (all together) describe exactly (1) and are valid (all together) for no other structure (up to isomorphy).

#### Approach

For a single finite structure this is always possible. The principle is quite simple (here for single binary Relations and without constants):

1. say that there are at least n elements: $\varphi_1 = \bigwedge_{i\ne j} \neg x_i = x_j$
2. say that there are at most n elements: $\varphi_2 = \forall_y \bigvee_{i} x_i = y$
3. state every element of the Relation R: $\varphi_3 = \bigwedge_{(a_i, a_j) \in R} R(x_i, x_j)$
4. state every non-element of the Relation R: $\varphi_4 = \bigwedge_{(a_i, a_j) \notin R} \neg R(x_i, x_j)$

all for the same tuple $x_1 .. x_n$, i.e. $\exists_{x_1} ... \exists_{x_n} (\varphi_1 \land \varphi_2 \land \varphi_3 \land \varphi_4)$.

#### Extension to a fixed Number of Structures

This can easily be extended for any fixed number of structures. For, say 2, structures a unique description can easily be obtained by disjunction of the single descriptions, i.e.

$\exists_{x_1} ... \exists_{x_n} (\varphi_1 \land \varphi_2 \land \varphi_3 \land \varphi_4) \lor \exists_{x_1} ... \exists_{x_n} (\varrho_1 \land \varrho_2 \land \varrho_3 \land \varrho_4).$

Writing down a sequence of, say 67, descriptions is easy in theory, but quite impractical. This is a well known problem from programming, where one would use a for loop from 1 to 67 instead. This isn't dealt with in depth here, but mentioned to show that there are more issues to a language than just its expressiveness.

#### Extension to an infinite Structure

This is, by definition, no issue of FMT, but for the sake of understanding we consider this case here briefly. Infinite structures, can never be discriminated in FO, i.e. for every infinite model a non-isomorphic one can be found, having exactly the same properties in FO.

The most famous example is probably Skolem's theorem: there is a countable non-standard model of arithmetic.

### Characterisation of a Class of Structures

Is a Language L expressive enough to describe exactly those finite structures that have certain property P in common (up to isomorphy)?

Set of up to n structures.

#### Problem

The descriptions so far had in common that they strictly define the number of elements of the universe. Unfortunately most of the interesting sets of structures are not restricted to a certain size, like all graphs that are trees, are connected or are acyclic. Thus to discriminate a finite number of structures is of special importance.

#### Approach

The next best thing to a general statement, that we cannot make here, is to give a methodology to differentiate between structures that can and cannot be discriminated.

1. The core idea is that whenever one wants to see if a Property P can be expressed in FO, one chooses Structures A and B, where A does have P and B doesn't. If for A and B the same FO sentences hold P cannot be expressed in FO (else it can). For short:

A ∈ P, B ∉ P and A ≡ B

where A ≡ B is A |= α ⇔ B |= α for all FO-sentences α, and P is the class of Structures with property P.

2. The actual methodology partitions the language, like FO, into countably many classes FO[m], such that for each m the above (core idea) has to be shown. That is:

A ∈ P, B ∉ P and A ≡m B

with a pair A, B for each m and α (in ≡) from FO[m].

3. A partition FO[m] can be obtained by the quantifier rank qr(α) of a FO formula α. It expresses the depth of quantifier nesting. For example for a formula in prenex normal form qr gives simply the total number of its quantifiers. So FO[m] is defined as all FO formulas α with qr(α) ≤ m.

4. Thus it all comes down to show A |= α ⇔ B |= α on the partitions FO[m]. The main approach here is to use the algebraic characterization provided by Ehrenfeucht-Fraisse Games. What they do is roughly, take a single partial isomorphism on A and B and extend it m times, in order to either prove or disprove A ≡m B, dependent on who wins the game.

#### Example

We want to show that the property that the size of an orderered structure A=(A, ≤) is even, can not be expressed in FO.

1. The idea is to pick A ∈ EVEN and B ∉ EVEN, where EVEN is the class of all structures of even size.

2. Therefore we start with 2 ordered structures A2 and B2 with universes A2 = {1, 2, 3, 4} and B2 = {1, 2, 3, 4, 5}. Obviously A2 ∈ EVEN and B2 ∉ EVEN.

3. Now we can show* that in a 2-move Ehrenfeucht-Fraisse Game (i.e. m = 2, what explains the subscrpts above) on A2 and B2 the duplicator always wins, and thus A2 and B2 cannot be discriminated in FO[2], i.e. A2 |= FO[2] ⇔ B2 |= FO[2].

4. Next we have to scale the structures up by increasing m. For example for m = 3 we must find an A3 and B3 s.t. the duplicator wins the game. This can be achieved by A3 = {1, ..., 8} and B3 = {1, ..., 9}. More general, we choose Am = {1, ..., 2m} and Bm = {1, ..., 2m+1} where we can prove that the duplicator always wins*. Thus EVEN on finite ordered structures cannot be expressed in FO, qed.

(*) Notice that the proof of the result of the Ehrenfeucht-Fraisse Game has been omitted, since it is not the main focus here.

↑Jump back a section

## Applications

### Database Theory

A substantial fragment of SQL (namely that which is effectively relational algebra) is based on first-order logic (more precisely can be translated in domain relational calculus by means of Codd's theorem), as the following example illustrates: Think of a database table "GIRLS" with the columns "FIRST_NAME" and "LAST_NAME". This corresponds to a binary relation, say G(f, l) on FIRST_NAME X LAST_NAME. The FO query G('Judy', l), which returns all the last names where the first name is 'Judy', would look in SQL like this:

select LAST_NAME
from GIRLS
where FIRST_NAME = 'Judy'


Notice, we assume here, that all last names appear only once.

Next we want to make a more complex statement. Therefore in addition to the "GIRLS" table we have a table "BOYS" also with the columns "FIRST_NAME" and "LAST_NAME". Now we want to query the last names of all the girls that have the same last name as at least one of the boys. In FO this is h( G(f, l) ∧ B(h, l) ). The corresponding SQL statement is:

select FIRST_NAME, LAST_NAME
from GIRLS
where LAST_NAME IN ( select LAST_NAME from BOYS )


Notice that in order to express the "∧" we introduced the new language element "IN" with a subsequent select statement. This makes the language more expressive for the price of higher difficulty to learn and implement. This is a common trade-off in formal language design. The way shown above ("IN") is by far not the only one to extend the language. An alternative way is e.g. to introduce a "JOIN" operator.

First-order logic however proved insufficient for some database applications, for instance because of its inability to express transitive closure (see recursive WITH for the SQL:1999 construct); various more expressive logics, like fixpoint logics, have been studied in finite model theory precisely because of their relevance to database theory and applications.

### Querying & Search

Narrative data contains no defined relations. Thus the logical structure of text search queries can be expressed in Propositional Logic, like in:

("Java" AND NOT "island") OR ("C#" AND NOT "music")


Notice that the challenges in full text search are different from database querying, like ranking of results.

↑Jump back a section

## History

1. Trakhtenbrot 1950: failure of completeness theorem in FO,
2. Scholz 1952: characterisation of spectra in FO,
3. Fagin 1974: the set of all properties expressible in existential second-order logic is precisely the complexity class NP,
4. Chandra, Harel 1979/ 80: fixed-point FO extension for db query languages capable of expressing transitive closure -> queries as central objects of FMT.
5. Immerman, Vardi 1982: fixed point logic over ordered structures captures PTIME -> descriptive complexity (... Immerman–Szelepcsényi theorem)
6. Ebbinghaus, Flum 1995: First comprehensive book "Finite Model Theory"
7. Abiteboul, Hull, Vianu 1995: Book "Foundations of Databases"
8. Immerman 1999: Book "Descriptive Complexity"
9. Kuper, Libkin, Paredaens 2000: Book "Constraint Databases"
10. Darmstadt 2005/ Aachen2006: first international workshops on "Algorithmic Model Theory"
↑Jump back a section

## Citations

1. ^ Fagin, Ronald (1993). Finite model theory-a personal perspective.
2. ^ Immerman, Neil (1999). Descriptive Complexity. New York: Springer-Verlag. p. 6. ISBN 0-387-98600-6.
↑Jump back a section

## References

↑Jump back a section