Scantag is a Yapperbot task that runs as a low priority, scanning through every single page on Wikipedia and tagging them appropriately with maintenance tags where certain patterns match.

This is useful for tagging articles that have broken templates especially, as they would not show up as transclusions of the template, but it can also be used for a number of other things; any issue that needs a maintenance template, and which can be detected through a pattern search of the body of the article, is potentially a candidate for use here.

What's currently running? edit

To see the raw details of the currently running rules, take a look at the JSON file that configures them.

How can I request a Scantag pattern? edit

Add a request on the talk page for a new pattern. In your request, you should explain:

  • What the pattern is you want to be added (if you know regex, please provide a regex pattern; if you don't, please explain as carefully as you can, so someone can craft one for you)
  • Why you want this pattern to be scanned
  • What you want the found articles to be tagged with
  • That you understand that this will not happen immediately

Scantag rules may not go live for a long period of time, as the bot will only reread the rules when it has finished scanning the entire corpus of Wikipedia pages. You should not expect your rules to start scanning for at least a week, probably longer, after you make your request.

Either the bot operator, Naypta (talk · contribs), or any administrator who is comfortable doing so, may add rules to the bot.

Instructions for admins edit

Scantag rules can be modified by any administrator, as they are stored in Yapperbot's user JSON pages. However, as these rules will be applied to many, many pages, it is very important that they are accurate. To that end, any administrator modifying Scantag rules should first ensure that they are completely comfortable with doing so. If you have any doubts, do not modify the live rules.

Scantag rules edit

Scantag rules can be tested by modifying the sandbox JSON page. A Scantag rule is made up of the following components:

"Regex to match (remember, this has to be fully JSON escaped, not just a valid regex, otherwise it will not work)": {
    "task": "Brief description of task",
	"example": "Example of something that would be tagged by the task",
	"noTagIf": "A regex which, if it matches against the page, will cause the page to be ignored. Usually used to avoid tagging pages that already contain maintenance tags. Use boolean false to always tag; be careful with this! Like the key regex, must be JSON escaped as well as valid regex.",
	"prefix": "Something to prefix the articles that the task finds with, with $ signs escaped with an additional sign (i.e. $ in output should read $$); each regex capture group is available as `${n}`, replacing n with the one-indexed number of the capture group",
	"suffix": "Same as prefix, but appends to the article rather than prepending",
	"detected": "Describes what was detected and why it's doing something; should come after the word 'detected', and potentially have other detected aspects after it separated with semicolons",
	"testpage": "The page name of a page on which the matching will be tested. When the sandbox is updated, Yapperbot will run Scantag's sandbox rules twice (so that the NoTagIf rule can be tested) over this page. Must be prefixed 'User:Yapperbot/Scantag.sandbox/tests/'."
}

prefix, suffix and testpage are optional; all other tags are required.

The value of prefix is assumed to have the same precedence for MOS:ORDER as a maintenance template. The value of suffix is simply appended to the end of the article.

Rule sandbox and test pages edit

Once you have modified the sandbox JSON page, within five minutes, Yapperbot (talk · contribs) should update the sandbox report page, which contains information explaining each of the rules that Scantag has been given in the sandbox. If you set a testpage parameter in the Scantag rule, Yapperbot will also have run the rule over that page twice. If you see two runs, rather than just one, in the page history linked (click "Up-to-date"), this means that your noTagIf regex is not matching the result of prefix or suffix. This is bad; it means that the prefix and/or suffix will be added to matching pages every time the bot runs, not just the first time the bot spots the issue. Correct your noTagIf regex if you see this happening.

If you modify the sandbox JSON page, the sandbox report will be automatically regenerated within the next five minutes. If you modify the test pages, or any other part of the system, you can manually force a sandbox refresh by removing the {{/ts}} template from the top of the sandbox report page.

Pushing rules live edit

Never push rules live if you have not first tested them in the sandbox, even if a trusted user wrote them.

It is strongly advised to consult with at the very least Naypta (talk · contribs) or one other sysop before making a rule live.

Once you have tested the rules you set up in the sandbox, and you are satisfied that they are working correctly, you can add the sandbox rules to the production JSON file. Note that, because the bot runs over the entire contents of the article namespace, it may take a long time before it finishes its current run, and restarts with the new rules; consequently, the lead time for the rules to take effect may be long.