User:PerfektesChaos/js/WikiSyntaxTextMod/flow/tag

WikiSyntaxTextModSyntax polishing → Step 2

Tags

The second step in the syntax polishing exercise standardizes tags like <tag> (also comments) and detects errors.

Scope

edit

The common and unique appearance of tags is accomplished. Human authors shall not be confused by various formatting styles. Bots and scripts may identify structures in a reliable and simple manner.

Only well known elements will be processed:

a applet area audio b base bdi big blockquote body br button center code command dfn div em embed font form frame frameset gallery h1 h2 h3 h4 h5 h6 head hiddentext hiero hr html i iframe imagemap img includeonly input inputbox isindex kbd layer link map math meta noinclude nowiki object onlyinclude option pages poem pre rb rbc ref references rp rt rtc ruby s samp score script select small source span strike strong style sub sup syntaxhighlight templatedata textarea timeline title tt u wbr xml

Comments are considered here, too.

All unknown tags will be ignored.

Formatting

edit

The following format is expected after polishing:

  • A known tag opened by < is to be closed by > and no other < or > is permitted inside.
  • After and before the limiting < > there is no whitespace.
  • All known tags as enumerated above consist of lowercase letters only.
  • If a backslash \ is detected just after < or before > a manual mistake is assumed and this one is turned into a regular slash.
  • An end tag is written in compact notation: </sup>.
  • An unary tag (like <references />) is written with exactly one space between name (or attribute) and slash.
  • Elements which are permitted in HTML unary only (br, hr and wbr) are enforced to be a unary tag whereever what kind of slash might be present.
  • Empty elements (like <nowiki></nowiki> and <references></references>) will be turned into one unary tag.
    • If there is only whitespace (spaces or linebreaks) between the tags they are regarded as empty, too. There is an optical effect of <pre>\n</pre> but not meaningful except for the Whitespace language. However, <syntaxhighlight> keeps any content unchanged. In other cases an empty tag pair is to be filled with some content.
    • For <div></div> an exception is made.
  • All attribute names are turned into lowercase letters.
  • Every attribute is permitted only one time, multiple occurrence causes an error message.
  • Attribute assignments are written as attr="Val" in compact notation:
    • Whitespace around the equal sign will be removed.
    • The value is encosed in quotation marks ".
    • If inside the value a " has been identified, the apostrophe ' is kept.
    • It is not possible that both quotation mark and apostrophe shall occur in a wikitext and a syntax error (missing delimiter) is assumed, triggering an error message.
    • < or > enclosed in quotation marks are not accepted.
    • Leading and trailing whitespace within the value encosed by quotation marks will be removed.
    • Assignments of empty values are invalid and cause an error message. This goes not for occasional single attributes without equal sign (which are quite rare).
  • Before and ahead an attribute assignment there is exacly one space.
    • In case of multi-line tags line breaks are kept.

Nesting

edit

Associated opening and closing tags are identified.

Correct nesting is checked; if end tags are missing or superfluous in a level an error message is thrown.

Some elements are processed immediately from opening until closing tag.

Content analysis

edit
  • nowiki ranges and some (unary) elements will be protected immediately after regions which are commented out.
  • syntaxhighlight areas will be protected next and entirely.
    • If possible (key word „syntaxhighlight“ not within range) the obsoleted source is turned into syntaxhighlight. By the way, the strike tag is standardized as <s>.
  • For security reasons HTML elements with URL links out of wiki projects (like <a href= or <img src=) are blocked in the generated HTML page. Within wikitext the script will deactivate them by transformation of the leading < into &lt;, which yields the same optical appearance.
  • If typographical tags are met in unary shape, which is meaningful in binary mode only (like <b />, <em />, <i />, <span /> etc.), a certain bad habit is assumed and they are turned into <nowiki />. Parameters would be pointless and will be removed.
  • On activities in <br />, which use the CSS property style="clear:… or contain the non-standard clear=…, only the block element <div /> is possible and br will be transformed respectively. Non-standard forms in <div /> are interpreted and according to the intention proper style="clear:both" etc. will be assigned.
    • In order to ensure valid HTML <div … /> is written as empty <div …></div>.[1]
  • If an attribute assignment is mandatory or might not be permitted, an error message is shown.
    • With elements gallery ref references well-known parameters are tolerated only.
  • If the kind of element suggests more specific processing, whitespace formatting, syntax analysis or possibly content protection, this is done or prebooked.

Comments

edit
  • For the beginning of a comment <!-- the adjacent end --> is searched. If the end cannot be found or there is a space detected within the beginning of a comment an error message is displayed.
  • A comment may be subject to a user defined comment modification.
  • All comments will be protected against any further searching and replacement.

Remarks

edit
  1. ^ The inner tags of wikisyntax are not kept in the HTML document and may be provided as unary XML.

[ German page ]