In syntactic analysis, a constituent is a word or a group of words that function(s) as a single unit within a hierarchical structure. The analysis of constituent structure is associated mainly with phrase structure grammars, although dependency grammars also allow sentence structure to be broken down into constituent parts. The constituent structure of sentences is identified using constituency tests. These tests manipulate some portion of a sentence and based on the result, clues are delivered about the immediate constituent structure of the sentence. Many constituents are phrases. A phrase is a sequence of one or more words (in some theories two or more) built around a head lexical item and working as a unit within a sentence. A word sequence is shown to be a phrase/constituent if it exhibits one or more of the behaviors discussed below.[1]


Constituency tests

Constituency tests are diagnostics used to identify the constituent structure of sentences.[2] There are numerous constituency tests applied to English sentences, such as:

These tests are rough-and-ready tools which grammarians employ to reveal clues about syntactic structure. A word of caution is warranted when employing these tests, since they often deliver contradictory results. Some syntacticians even arrange the tests on a scale of reliability, with less-reliable tests treated as useful to confirm constituency though not sufficient on their own.[3] Failing to pass a single test does not mean that the unit is not a constituent, and conversely, passing a single test does not mean necessarily that the unit is a constituent. It is best to apply as many tests as possible to a given unit in order to prove or to rule out its status as a constituent.

Topicalization (fronting)

Topicalization involves moving the test sequence to the front of the sentence. It is a simple movement operation:[4]

He is going to attend another course to improve his English.
To improve his English, he is going to attend another course.


Clefting involves placing a sequence of words X within the structure beginning with It is/was: It was X that...[5]

She bought a pair of gloves with silk embroidery.
It was a pair of gloves with silk embroidery that she bought.


Pseudoclefting (also preposing) is similar to clefting in that it puts emphasis on a certain phrase in a sentence. It involves inserting a sequence of words before is/are what or is/are who:[6]

She bought a pair of gloves with silk embroidery.
A pair of gloves with silk embroidery is what she bought.

Pro-form substitution (replacement)

Pro-form substitution, or replacement, involves replacing the test constituent with the appropriate pro-form (e.g. pronoun). Substitution normally involves using a definite pro-form like it, he, there, here, etc. in place of a phrase or a clause. If such a change yields a grammatical sentence where the general structure has not been altered, then the test sequence is a constituent:[7]

I don't know the man who is sleeping in the car.
*I don't know him who is sleeping in the car. (ungrammatical)
I don't know him.

The ungrammaticality of the first changed version and the grammaticality of the second one demonstrates that the whole sequence, the man who is sleeping in the car, and not just the man is a constituent functioning as a unit.

Answer ellipsis (answer fragments, question test, standalone test)

The answer ellipsis test refers to the ability of a sequence of words to stand alone as a reply to a question. It is often used to test the constituency of a verbal phrase but can also be applied to other phrases:[8]

What did you do yesterday? - Worked on my new project.
What did you do yesterday? - *Worked on. (unacceptable, so worked on is not a constituent).

Linguists do not agree whether passing the answer ellipsis test is sufficient, though at a minimum they agree that it can help confirm the results of another constituency test.[citation needed]


Passivization

A car driving too fast nearly hit the little dog.
The little dog was nearly hit by a car driving too fast.

In case passivization results in a grammatical sentence, the phrases which have been moved can be regarded as constituents.

Omission (deletion)

Omission checks whether a sequence of words can be omitted without influencing the grammaticality of the sentence — in most cases, local or temporal adverbials can be safely omitted and thus qualify as constituents.[10]

Fred relaxes at night on his couch.
Fred relaxes on his couch.
Fred relaxes at night.

Since they can be omitted, the prepositional phrases at night and on his couch are constituents.


The coordination test assumes that only constituents can be coordinated, i.e., joined by means of a coordinator such as and:[11]

He enjoys [writing sentences] and [reading them].
[He enjoys writing] and [she enjoys reading] sentences.
[He enjoys] but [she hates] writing sentences.

Based on the fact that writing sentences and reading them are coordinated using and, one can conclude that they are constituents. The validity of the coordination test is challenged by additional data, however. The latter two sentences, which are instances of so-called right node raising, suggest that the sequences in bold should be understood as constituents. Most grammars do not view sequences such as He enjoys to the exclusion of the VP writing sentences as a constituent. Thus while the coordination test is widely employed as a diagnostic for constituent structure, it is faced with major difficulties and is therefore perhaps the least reliable of all the tests mentioned.[12]

Constituency tests and disambiguation

Syntactic ambiguity characterizes sentences which can be interpreted in different ways depending solely on how one perceives syntactic connections between words and arranges them into phrases. Possible interpretations of the sentence They killed the man with a gun are:

'The man was shot.'
'The man who was killed had a gun with him.'

The ambiguity of this sentence results from two possible arrangements into constituents:

They killed [the man] [with a gun].
They killed [the man with a gun].

In the first sentence, with a gun is an independent constituent with instrumental meaning. In the second sentence, it is embedded in the noun phrase the man with a gun and is modifying the noun man. The autonomy of the unit with a gun in the first interpretation can be tested by the answer ellipsis test:

How did they kill the man? - With a gun.

However, the same test can be used to prove that the man with a gun in the second sentence should be treated as a unit:

Who(m) did they kill? - The man with a gun.

The ability of constituency tests to disambiguate certain sentences in this manner bears witness to their utility. Most if not all syntacticians employ constituency tests in some form or another to arrive at the structures that they assign to sentences.

Competing theories

Alternate theoretical approaches to syntax make different assumptions regarding what is considered a constituent. In mainstream phrase structure grammar (and its derivatives), individual words are constituents in and of themselves as well as being parts of other constituents, whereas in dependency grammar,[13] certain core words in each phrase are not a constituent by themselves, but only members of a phrasal constituent. The following trees show the same sentence in two different theoretical representations, with a phrase structure representation on the left and a dependency grammar representation on the right. In both trees, a constituent is understood to be the entire tree or any labelled subtree (a node plus all the nodes dominated by that node); note that words like killed and with, for instance, form subtrees (and are considered constituents) in the phrase structure representation but not in the dependency structure representation.[14]


See also


