Purpose
editThis template helps field the details of users' parameter=value deployments on the wiki for any inline template usage.[1]
This version of the search engine, Cirrus Search, offers regular expression searches. Here is the advantage:
- hastemplate:"Convert" insource:"|xx|" prefix:: finds 3123 articles, but
- hastemplate:"Convert" insource:/\{\{ *[Cc]onvert *\|[^}]*\|xx\|/ prefix:: finds the 45 you really wanted, the ones having the xx inside the template call.
This template instills some regexp-search best practices:
- Always filter a regexp search. Never run a bare regexp search. This template creates a search link, but unlike {{search link}}, this template pre-builds filters and the more arcane elements of the regexp necessary to target a pattern inside a template wikitext. Here you need only enter the template name, and start focusing on the search "pattern".
- Start in a small search domain before running it on the wider wiki. This template defaults the search domain to one page in order to create a small footprint, because only a few regex searches are technically able to run at a time against the database. It minimizes your footprint, and guarantees that your search link will never run an untested regexp on 61,811,496 pages, even if someone's default search would let them do that.
- Develop the query with the target data in view for study. By default you start with this template in an ad hoc sandbox, the edit box of a page that already contains a sample of the target. Regular expressions are formal logic, and so these little computer programs will usually contain mistakes at first that are very easy to discover by running a quick test, so it is characteristic of regex that they are rapidly developed around a small set of test data, rather than slowly debugged against the large data-set they are designed for.
{{Regex}} also employs these practices, but not specifically for template calls.
With this template developers can 1) generate lists of sub-optimal or non-preferred template usage, and [2] 2) achieve template feature parity and avoid the need for backward compatible code. They can do this by directly removing unwanted template usage from the wikitext. Robo-edits can change a feature or add a new feature in lock step with a new version of a template. WP:AWB is such a robo-editor and it can also do safe regexp searches, and is a complete alternative, but you'd have to download it first.
Arguments
edit|template= or {{{1}}} |
template name. Defaults to "Template usage". It is also the first unnamed parameter. |
|pattern= or {{{2}}} |
a regexp search pattern. Targets the inside of all occurrences of the template in wikitext, that is, after the first pipe and before the closing curly bracket:
{{Val|9999|ul=m/s|fmt=commas}}. Always use {{!}} for `|'. Use {{=}} for `=' at any time, or when using the unnamed form. See §About CirrusSearch" below for more details about types of queries. |
|prefix= or {{{3}}} |
search domain. Has the usual prefix: meaning, plus accepts a namespace number, or n for the current namespace (or `{{ns:1}}:', etc.). For all of mainspace use : or 0 (zero). To search only mainspace articles that start with letter(s), assign that to prefix. To search another namespace that starts with letter(s), spell-out the namespace (or use `{{ns:1}}:letter(s), etc.'). Defaults to its current page.
|
|label= or {{{4}}} |
search link label. It is the forth "unnamed" parameter, so if you enter the first three directly (unnamed), you can also enter a link label directly. |
{{Template | parameters | can direct template behavior.}}
- "Named" parameters use | name = indirect value | passing in 'indirect value'.
- "Unnamed" parameters use | direct value | passing in ' direct value ' (with outer spaces.)
Procedure
editNamespace plus pagename equals fullpagename.
The procedure here is an iterative, read-evaluate-modify cycle.
- Find an existing fullpagename with the template instances you are interested in targeting. Or create one yourself, and save it to the database so the query will find it.
- Open the wikitext. Enter the template name and a regex pattern. (A prefix will be added later.)
- Show Preview.
- Click the newly rendered search link. Note the bold text in each match, the query (centered), and the count (off to the right).
- Go back in your browser to the edit box. (Or don't go back, you may want to modify the query on the search results page.)
- Modify the regexp in the edit box. Cycle.
- Enter a prefix. Start with a namespace. You can then reduce the number of results by adding the first letter(s) of pagenames onto the namespace.
Then you might need to run each alias (name) the template might have.
Step 6 is the core provision of this template. Caveat emptor: if you change the target, you'll have to save and purge, but not if you just change the pattern.
This template offers the addition of the search link label, but defaults to showing the regexp.
Currently there is no way to share a {{tlusage}} search link if you want it to search more than one namespace. The workaround is one tlusage per namespace, or to copy the regexp from a tlusage results page query to a {{search link}} template, which offers the setting of namespaces, and all. Currently choosing a namespace is not mandatory there, but if you don't choose a namespace there, be aware of possible inconsistencies: the search domain will be different every time it runs, depending on the current user's current search domain. You can set it and forget it at Special:Search Advanced.
Examples and sandbox
editAs an ad hoc sandbox, you can show the wikitext of a section like this, already saved in the database, with template calls on it, modify some patterns, do a Show Preview, and see what matches when you click on the newly formed "search the database" link, all quite safely, and without changing a thing in the database.
The template calls that produce "1 ft/s, 2 sq ft, 3 m/s, 4 m*s-2, 5 ft.s-2, 6 °C/J, and 7 J/C" appear in the wikitext of this section like this:
- {{val|1|ul=ft/s|fmt = commas}}
- {{val|2|u=ft2}}
- {{val|3|u=m/s| fmt =commas }}
- {{val|4|u=m*s-2}}
- {{val|5|u=ft.s-2}}
- {{val|6|u=C/J}}
- {{val|7|ul=J/C}} → 7 J/C
Note how the above targets are |numbered|, then click on these links.
Query | Transcluding {{tlusage}} produces a search link | Answer |
---|---|---|
Q1 Does this page employ template Val? | {{search link|hastemplate:"val" prefix:Template:Template usage}} → hastemplate:"val" prefix:Template:Template usage
|
A. Yes, because its title shows on the search results. |
Q2 Does this page use Val's fmt parameter? | {{tlusage|val|fmt }} →
|
A. Look for 1 and 3 in the search results in bold text. |
Q3. Which calls to Val on this page use u=ft OR ul=ft? (a one letter diff) | {{tlusage|val|pattern=ul?=ft}} →
|
A. Look for 1, 2, and 5 in bold text.
|
Q4. AND of these, who also uses fmt=commas after that? | {{tlusage|val|pattern=ul?=ft.*commas}} →
|
A. No context shown, but article title is shown. A half a Bug? |
Which use one space before commas? | {{tlusage|val|. commas}} →
|
A. 1 but not 2.
|
Q5. Which use either ul?=ft OR fmt=commas | {{tlusage|val|pattern=(ul?=ft{{!}}co)}} →
|
A. 1, 2, 3, and 5.
|
Q6. Which use ft or m, in |u= or |ul= ?
|
{{tlusage|val|pattern=ul?=(ft{{!}}m)}} →
|
A. 1, 2, 3, 4, and 5.
|
Q7. Which use . or * in the unit code? | {{tlusage|val|pattern=u.+(\.{{!}}\*) }} →
|
A. 4 and 5. |
Which use a pipe? | {{tlusage|val|\{{!}} }} →
|
All of them |
Q8. Which use / or - within the |u= or |ul= paramter?
|
{{tlusage|val|pattern=ul?=[^{{!}}}]+(\/{{!}}-)}} →
|
A. 1,3,4,5,6 and 7.
|
Q9. Where is Val used in the template namespace with u or ul? | {{tlre|val|pattern=ul?=|prefix=10}} →
hastemplate:"val" insource:/\{\{ *[Vv]al *\|[^}]*ul/ prefix:Template: |
A. In the 15 or so articles listed. (Uses the {{tlre}} shortcut.)
|
Q10 Which articles employ {{Convert}}'s "and(-)" option? | {{tlre|Convert|Articles using {{tlf|Convert}}'s "and(-)" option.|pattern=and\(-\)|prefix = 0|}}|prefix = 0|}} →
hastemplate:"Convert" insource:/\{\{ *[Cc]onvert *\|[^}]*and\(-\)/ prefix:: |
A Only two. |
In Q2, notice how the MediaWiki software ignores the spaces around parameters, but how in Q4 the same MediaWiki software processes the spaces inside parameters. Q2 might have been solved with a plain insource:val fmt search because "fmt" and "val" are whole words, and fmt is rarely seen apart from inside Val. How about hastemplate:val insource:fmt?
Also see the more general examples for the regex of CirrusSearch.
About CirrusSearch
editThese powerful (but expensive) CirrusSearch search results could not be obtained with the previous Lucene-search parameters. Regexp searches are restricted on the server, so this template reduces the regex search footprint by using the hastemplate: filter every time, and further restricts the search domain to a namespace at most, by using the prefix: filter. The prefix: filter can also filter a namespace by specifying that only page names that start with given letters are searched.
Parameters insource and hastemplate
editHere are some notes on the CirrusSearch features of hastemplate and insource.
Hastemplate finds what is deployed:
- hastemplate will not count a template when only their sub-template is called
- hastemplate will not count templates inside comments
- hastemplate will not count templates inside nowiki tags
- hastemplate will count templates inside parser functions and other templates, as long as the template is wrapped with double curly braces.
Hastemplate is case-insensitive.
Insource has a dual role:
- insource:"quotes-delimited arguments" finds only whole, alphanumeric words, adjacent to one another in that sequence in the wikitext, treating the entire set of non-alpanumeric characters between them as if they were whitespace. For example,
insource:"M S"
matches m/s, as doinsource:"M-S"
andinsource:"m=s"
; they all have two arguments, and what matched is shown in bold. - Plain insource:word1 word2 has one argument, word1. The words after word1 are treated normally: they're all ANDed as whole words (never as pieces or patterns) OR their word stems, anywhere in the wikitext of the page, and in any sequence; and the match is not shown in bold. (Intitle acts the same way around the "quotes" syntax.)
- Insource:/slash delimited argument/ finds everything, even comments. It only ever has one argument. What matched is shown in bold text.
- Insource:/regexp/ finds everything, even pieces and parts, conveying no notion of "words", but only that of a character in an adjacent position to another character in a sequence.
- Insource:/regexp/ requires you to use \/ for any slash character in the pattern for an obvious reason. It also requires you to "backslash-escape" other metacharacters for various other reasons.
For insource: spaces are not allowed after the colon; it's insource:"
, or insource:/
for good reasons.
Insource "with quotes" is a safe and sufficient way to find many kinds of template usage. Say the target string is {{Val|9999|ul=AU|fmt=commas}}:
- insource:"val 9999 ul AU fmt commas" → match
- hastemplate: val insource:"9999 ul" → match
- hastemplate: val insource:"999" → no match
- hastemplate: val insource:"fmt commas" → match
- hastemplate: val insource:"ul AU" → match
- hastemplate: val insource:"ul au" → match
- hastemplate: val insource:fmt → match
In some cases there might be disadvantages. The insource:"quotes version", is case insensitive and blind to non-alphanumeric characters. In other cases it is an advantage to have more search results than intended. For thorough precision, use /regex/.
About regex
editThis covers enough regex to get started using this template to answer any question about wikitext contents on the wiki. Regex are about using meta characters to create patterns that match any literal characters. The pattern you give will match a target, character by character. To make some positions match with multiple possibilities, metacharacters are needed, and they are from the same keyboard characters that are also in the wikitext.
Metacharacters
editThe left curly bracket is a metacharacter, and so the regexp pattern given must "escape" any opening curly bracket \{
in the target "{" intending to match a template in the wikitext. All target text (all wikitext) is literal text, but we can backslash "escape" the regex metacharacters \. \? \+ \* \{{!}} \{ \[ \] \( \) \" \\ \# \@ \< \~
when we refer to them as literal characters in the wikitext we are interested in mining. (Notice the backslash-escape of the already template-escaped pipe character in order to find a literal pipe character in the wikitext.) Search will ignore the backslash wherever it is meaningless or unnecessary: \n
matches n, and so on. So although you don't need to backslash escape &
or >
or }
, it is safe to do so. An unnecessary backslash will not cause your pattern to fail, but what will is using certain characters literally— [ ] . * + ? | { ( ) " \ # @ < ~ .
[0-9]
will match any digit,[a-y]
any lowercase letter except z,[zZ]
any z, (and so on). So square brackets mean "character class".- Dot
.
will match a newline, or any character in the targeted position
The number of sequential digits or characters these symbols match is expressed by following it with a quantifying metacharacter:
*
means zero or more+
means one or more?
means zero or one
of the character it follows after. The number of times it matches can also be given in a range, a{2} a{2,} a{2,5}
matches exactly 2, 2 or more, or 2-5 a's. So curly brackets mean "quantifier".
- The parentheses are a grouping mechanism, so we can quantify more than just the previous character, and so we can make boundaries for a set of alternative matches. (See alternation below.)
- The quotation marks are an escape mechanism, like square brackets or the backslash.
- The angle brackets stand for numerals, not digits. Say
<5-799>
, to match 5–799, in one to three positions. Compare this with the alternative:[0-9]{1,3}
could match ones, tens, or thousands as, 0-999 or 00-999 or 000-999. - Tilde
~
looks ahead and negates the next character.[failed verification] In other words, if the pattern matches in this position, then un-match it if the next character is~
character.
The other metacharacters offered by CirrusSearch[failed verification] may be helpful in some cases: complement ~, interval <3-5559>, intersection &, and any string @.
Character classes
editA character class is enclosed in [square brackets]. It means these characters, "literal characters", plural. It means "literal", and so normally you don't have to escape a metacharacter character in a character class; they're already square-brackets escaped. The /slash delimiters/ mean we must of course escape any slash character, even inside a character class. No other character in a character class except slash always needs escaping; but because ]
and -
have special meaning (metacharacter) to a character class, they must be escaped sometimes: those two are also literal (escaped) metacharacters if they are the first character, but otherwise they must be also, like dash, be escaped: only backslash-escape works as the escape mechanism in a character class.
A character class can serve to escape metacharacters, so [-|*\/.{\]]
or []|*\/.{\-]
means "either a dash OR pipe OR star OR slash OR dot OR left curly bracket or a right square bracket". So [][.?+*|\/{}()\-]"
or [-[.?+*|\/{}()\]]"
works to find all the metacharacters in the wikitext, all of them except the backslash. Neither [\]
nor [\\]
allows us to OR a literal backslash. To OR a backslash character, there's alternation with the pattern \\
to handle that case. (See below.)
A character class understands the "inverse" of itself, [^abc] is "not a or b or c". A character class stands for a single character in a targeted position, so it's not really an inverse of a set, but rather a NOT of a character.
Alternation
editFinally, alternation is a class of regex that contains alternative possibilities for a match, say an AA or a BB, or a CC:
- "AA" OR "BB" OR "CC" in Boolean logic
- AA|BB|CC in a standard, MediaWiki CirrusSearch, regexp
(AA{{!}}BB{{!}}CC)
where it is used within a larger regexp. We need to replace the pipe character with {{!}} so that the "pipe" for the regexp won't confuse this template (or any other template). We need the parentheses at times because an alternation finds the longest pattern, and so the parentheses define that boundary, but it's a boundary you don't have to make if an alternation is the entire regexp pattern. In our case the|pattern=
you supply is situated at the end of a longer, pre-built regexp.
About this template
editThe wiki regex is pretty straightforward. Characters stand for themselves unless they are metacharacters. If they are metacharacters they are escaped if outside of a character class. Use one of three escape mechanisms:
"."
\.
[.]
where the dot is now a literal dot in the wikitext, not the metacharacter.
First, this template take's its arguments named or unnamed. If you use the unnamed one, you can give regexp patterns that start or end with a space. If you use the named one, you must, additionally, "escape" any outer space. (To escape is explained elsewhere.)
The regexp targets the area after the initial pipe and before the first closing curly bracket, {{Val|9999|ul=m/s|fmt=commas}}. This pattern portion is expanded /[Vv]al\|[^}]*
\}/.
{{{pattern}}}
This template could construct the pattern \{[Nn]ame.?\|[^}]*{{{pattern}}}
, where pattern is the value you give. That regexp means
- pattern follows any number (*) of characters that are "not (^) a right curly bracket"; in other words it will precede a right curly bracket.
- The template Name follows a left curly bracket, and is case insensitive.
- A pipe \| (
\{{!}}
) follows the name, but makes allowance for one possible character in between, the dot. - The dot . can match any character, including the "zero or one" (?) newline characters that will match the case where the initial pipe is put on its own line, such as how the citation and infobox templates are often transcluded (or "called").
This template cannot make that pattern with the .? because in general there are many template names that only differ by the last letter, (such as the tl family of template names). But to match the particular case where the template's first parameter starts after a newline you have to match that newline with a dot. You can modify the query and add that .? for searches for Infobox and Cite templates. Because ?
counts zero as a match, it will also work where the pipe is on the same line.
See also
editNotes
edit- ^ Some templates, like Info box, and Cite are usually written with one line per parameter. These are possible to find using regexp, but this feature is not yet available for this template.
- ^ These will propagate themselves as there presence tempts editors who copy other template calls that they see. These errors are caused by haste, or poor, or misunderstood template documentation.