Ideal division of a disambiguation page

edit

The purpose of disambiguation pages is for readers to find their target article with as little reading as possible. How many sections, then, should a dab page have, and how long should those sections be?

Suppose we have a dab page with a total of t entries, which we can divide into n sections. Section headers average a words in length, and entries average b words in length. We want to find n that results in the fewest words having to be read, on average.

Questionable assumptions

edit
  1. The disambiguation page will be divided into equal-sized sections, with no sub-sections.
  2. Readers will first read section headers until they find the one they want, then read entries in that section until they find the one they want.
  3. Each entry is equally likely to be the one the reader is looking for. The position of the desired section, and of the desired entry within that section, are random.
  4. Section names and entries are clear and unambiguous. Once a reader reads a section name or entry, they know with 100% certainty whether it is what they want or not.

How questionable are these assumptions?

edit
  1. This is not a very realistic assumption, but serves as a workable average, and the effect of different sized sections on n is not large.
  2. This is a good assumption
  3. This is a good assumption
  4. The strength of this assumption depends on how well subject areas are selected, and how well headers and entries are written, but it should be near 100%.

Solve

edit

Given n sections, the average reader will have to read (n+1)/2 headers to find the one they want. They will then have to read ((t/n)+1)/2 entries to find the one they want. Thus, the average number of words that must be read is w = a*((n+1)/2) + b*(((t/n)+1)/2). To find the value of n that minimizes w, we take the derivative of w with respect to n and see where it equals 0.

The derivative of w is a/2 + bt/(2n^2). Setting this expression equal to zero and rearranging, we find n = sqrt(b/a*t).

Let's plug in some realistic numbers:

  • Section headers average a = 3 words in length
  • Entries average b = 10 words in length

Now n = sqrt(10/3)*sqrt(t) ~ 1.8*sqrt(t)

Suppose our disambiguation page has 30 entries. In that case, n ~ . If we divide the dab page into n sections, the reader will have to read an average of w ~ words.