User:Aoziwe/sandbox/converter

Terminology

edit

Dimension

edit

Dimension - the physical equivalence of a group of units

Dimensions currently configured are:

  • 1 - distance, eg, metre
  • 2 - area, eg, square metre
  • 3 - volume, eg, cubic metre
  • v - volume, eg, litre
  • a - area, eg, hectare
  • al - lineal area, eg, metre square
  • vl - lineal volume, eg, metre cubed
  • m - mass, eg, kilogram
  • p - pressure, eg, pascal
  • t - time, eg, hour
  • T - temperature, eg, degree centigrade
  • ) - arc, eg, radian
  • e - energy, eg, joule
  • P - power, eg, watt
  • V - electric potential, eg, volt
  • c - electric charge, eg, coulomb
  • F - force, eg, newton
  • f - frequency, eg, hertz

Unit

edit

Unit - anything that defines the nature of a measure.

For example, miles per hour, metre, degree centigrade, gallon, cubic centimetres per square kilometre.

/List of initial units

Core unit

edit

Core unit - the single unit to which everything in a dimension is referenced to in terms of ratio.

For example, metre.

Prime unit

edit

Prime unit - the singular form of a unit to which a unit's singular, plural, and abbreviated forms are referenced to.

For example, kilometre, kilometres, km.

Base unit

edit

Base unit - the unit to which a multidimensional unit is referenced to.

For example, "<baseunit> square" or "cubic <baseunit>".

Multidimensional unit

edit

Multidimensional unit - any prefixed unit or suffixed unit.

A prefixed unit:
"23 sq m" == "23 square metres" == "23 x m x m".

A suffixed unit:
"23 m sq" == "23 metres square/d" == "(23 metres) square/d" == "23 x m x 23 x m" == "529 x m x m".

Prefixed unit

edit

Prefixed unit - a unit of the form "square metre".

Suffixed unit

edit

Suffixed unit - a unit of the form "miles square".

Afix

edit

Afix - either a prefix of a suffix.

Recognised afixes are "sq", "square", "squared", "cu", "cubed", "cubic".

This is table driven and can be easily reduced or extended.

Antefix unit

edit

An antefix unit is one where the unit is placed immediately before the value in a measure.

For example, $100.

Antefix units cannot be either superscripted or subscripted.

The parser will not accept any lexical item between the antefix unit and a value for the pair to be treated as a measure.

Postfix unit

edit

Most measures are postfix units, where the unit is placed after the value in a measure.

For example, five miles, 23¢, 45 hectares, 100 dollars.

The parser will accept any white space, depending on the setting of the "wrap" parameter, between a value and a postfix unit for it to be treated as a measure.

The parser will accept a single alpha lexical item, dependent on the setting of the "adjective" parameter, and with appropriate white space, dependent on the setting of the "wrap" parameter, between a value and a postfix unit for it to be treated as a measure.

Subscripted unit

edit

For example, btu<sub>39°F</sub>, displayed as btu39°F.

These are treated like any other unit. They just happen to have an extended single lexical construct for that unit, which includes the "" and "" tags.

Antefix units cannot be subscripted.

Superscripted unit

edit

For example, km<sup>2</sup>, displayed as km2.

A measure with a superscripted unit, for example "23 m2" is treated mathematically as:
"23 m2" == "23 (m2)" == "23 x m x m" == "23 sq m" == "23 square metres"
That is, "23 m2" is treated as a prefixed unit.

Compared to suffixed units:
"23 m sq" == "23 metres square/d" == "(23 metres) square/d" == "23 x m x 23 x m" == "529 x m x m" == "529 square metres" == "529 m2".

Antefix units cannot be superscripted.

Articulated unit

edit

Articulated unit - a unit in the form "an inch", "a mile", "of an acre" preceded by a value.

Compound unit

edit

Compound unit - any unit in the form "<unita> per <unitb>" where unita and unitb are any unit except a compound unit.

For example, metres per second, tons per square inch, gallons per acre, K/millisecond, square kilometres per week.

Also in the form "<unita>/<unitib>".

Hybrid unit

edit

Hybrid unit - any unit in the form "<unita> <unitb>" where unita and unitb are any unit except a compound unit or a hybrd unit.

For example, kilowatt hours, newton metres.

NOT YET IMPLEMENTED

Quantity unit

edit

Quantity unit - a unit having a leading quantity embedded as part of the unit distinct from a value in the measure.

These only apply to the denominator of a compound unit.

For example, "8.6 litres per 100 kilometres".

The quantity can be a decimal value but cannot be in engineering notation format or scientific notation format, and cannot be negative.

Value

edit

Value - the quantity applied to a unit, eg, 23, fifty-five, 3/4, -18, 45.67, three fifths, 98,234, an, 4.567e-3, 6.789x10<sup>3</sup> (displayed as 6.789x103).

Values can be worded or numeric but not combined, ie, not "fifty 6", such must be either fifty-six or 56. The maximum worded value is currently 999,999,999,999,999. This is easily extended.

Values can be positive, zero, and only numeric values can be negative.

If an articulated unit is not preceded by a value, the articulation is instead tried to be treated as a value, ie, "a" = 1.

Not all worded fraction denominators, eg, thirteenths, are configured, but they are table driven and can be easily extended.

Engineering notation on input is not three multiple verified, for example, "4.567e-4" will be accepted by the parser as 0.0004567.

Scientific notation is accepted on input as markup, that is as "6.789x10<sup>3</sup>" will be accepted by the parser as 6,789. The parser will also accept caret formatted numbers. For example, "12.34x10^2" will be accepted by the parser as 1,234.

Measure

edit

Measure - the combination of a value and a unit.

For example, 23 miles, 6', three tenths of an acre, 100 °C, 100 miles per hour, 14 psi

Measure sequence

edit

Measure sequence - a sequence of measures forming one over all measure.

For example, 34 miles 16 yards 2 inches, four hundred gallons 3 pints, 6' 4 3/4", or a single measure, eg, 9 ton.

A measure sequence is converted to a single measure by the converter.

The converter will warn of incompatible unit dimensions but will still convert (into a nonsense) value, eg, "3 gallons 2 inches".

Measure sequence elements can be in any order of value or unit scale and can be repeated, neither the parser nor the converter object to non commonsensical combinations, eg, "3 miles 4 inches" is the same as "4 inches 3 miles", or "70,000 inches half a mile", or "1/3 mile 1/3 mile".

The first unit in a measure sequence is used as the basis for conversions and for checking dimensional consistency of other measure elements in the sequence.

If the last unit in a measure sequence is a suffixed unit or a compound unit then that suffix and-or the denominator of the compound unit will be applied to the whole measure sequence if it is valid to do so for the dimension of each measure sequence element.

For example, 2 miles 4 yards square is processed 'mathematically' as (2 miles + 4 yards) square, 3 miles 2 inches per hour as (3 miles + 2 inches) per hour but 2 square miles 4 yards square will be processed as 2 square miles + 4 yards square and 3 miles per minute 2 inches per hour as 3 miles per minute + 2 inches per hour.

Conjunction sequence

edit

Conjunction sequence - a sequence of values separated by conjunctions followed by one unit.

For example, "1 or 3 pounds", "4 and 7 square inches", "45 - 56 miles per hour", "a hundred and twenty-six to three thousand and two and three quarter tons".

The converter will convert each value separately unless it can reduce the conjunction.

Currently configured conjunctions are "to", "by", "and", ",", "-", "or".

Conjunction reductions are currently only supported for possible area and volume conjunctions.

For example, "3 by 7 and 4 by 9 metres" will be reduced by the converter to the to-unit version of "21 and 36 square metres".

Reduction can be controlled using the "style" parameter.

Conjuncted measure sequence

edit

Conjuncted measure sequence - a sequence of measure sequences separated by conjunctions.

For example, "4 miles 33 yards to 5 miles 7 furlongs".

There is no special processing currently supported for conjuncted measure sequences, each measure sequence is processed separately, eg, "7 miles 60 chains by 5 furlong 6 inches" will not be reduced.

Conversion

edit

Conversion - the conversion of a value to another value based on the corresponding convert from-unit and convert to-unit.

Normal conversion is to a default core unit of the same or a compatible dimension, eg, "miles per hour" to "metres per second".

There is a pre configured conversion order, which provides more useful common conversions.

The converted to-unit can be parameter driven and overide the pre configured conversion order, by defining a user conversion order.

Unit inversion can be done for compound units, eg, conversion from "miles per gallon" to "litres per kilometre", converting to the inverted dimension, by defining a user conversion order, eg, "miles per gallon->litres per kilometre".

Parser

edit

The parser is relatively forgiving in regard to white spacing, ie, normal spaces, tabs, and new lines.

The parser will typically treat "twenty-six", "twenty -six", "twenty- six", "twenty - six" all as the same, where the spaces, if any, adjacent to the "-" can be any one or more white space.

The parser can be set to treat newlines as non white space, ie, to break its parse of a lexical structure at a new line and start a new lexical structure, using the "wrap" parameter.

Warnings

edit

All of the parameter processor, the parser, and the converter, will produce warnings about input they cannot process or input open to more than one interpretation.

Some textual context will be provided to give some indication of the location in the input text being processed.

Warnings are displayed to the standard output device, or can be sent to a file using the warnfile parameter.

Conversion order

edit

A conversion order defines the convert to-units for either a particular convert from-unit or a whole dimension, if defined for a particular from-unit it is also applied for that unit's plural and abbreviations.

There are two pre configured conversion orders for each dimension, one for metric units and one for non metric units, these are table driven and can be easily changed. Metric or non metric is selected using the the "standardout" parameter.

A conversion order consists of a "from-unit" or a "dimension", and one or more "to-units".

If more than one to-unit is specified then the converter will try each to-unit in order until it reaches a to-unit for which the next to-unit would make the value less than one or it is the last to-unit

For example, a conversion order for the length dimension consisting of "millimetre->meter->kilometre" would result in a measure of 0.12345 metres being output as "123.45 millimetres" and a measure of 12,345 metres being output as "12.345 kilometres"

The pre configured conversion orders can be overiden using the "userconvorder" parameter.

Parameters

edit

Parameters may be set either on the command line or in a parameter file.

Parameters and their values are case sensitive.

The parameter file is set using the "paramfile" parameter.

Unrecognised parameters are reported.

Invalid parameter values are not reported, the value will instead default.

paramfile

edit

function - sets the name of a file containing parameters.

Parameter files can be nested, ie, a parameter file can include the paramfile parameter, a parameter file is processed when the paramfile parameter is read, ie, before any parameters following the paramfile parameter. There can be zero, one, or more parameter files specified.

format - "paramfile=<value>"

optional

values: no default

<value> - any valid path and file name There is no soft error detection and the program will abort if there is any issue.

CAUTION - recursion is not checked for and will result in an infinite loop.

negativesequence

edit

function - defines the behaviour of parsing and conversion of a measure sequence when the first measure in a measure sequence has a negative value.

format - "negativesequence=<value>"

optional

values:

default - "split"

"split" - if the first measure value is negative then any further negative values in the sequence will cause the parser to split the sequence at that point and each part will be converted separately, if no further negative values are found then the converter will negate the total value of the measure sequence. The measure sequence "-3 miles 45 yards 2 inches" is treated 'mathematically' as -(3 miles + 45 yards + 2 inches).

"warn" - the parser will treat the sequence as one sequence, if the first measure value is negative then any further negative values in the sequence will cause the converter to subtract them from the total value, and output a warning that further negative values were found, the total value is then still negated. The measure sequence "-3 miles -45 yards 2 inches" is treated 'mathematically' as -(3 miles - 45 yards + 2 inches).

"element" - the parser will treat the sequence as one sequence, causes the converter to treat each measure element in the sequence according to its own sign and adds or subtracts it from the total value of the measure sequence (this allows full unit arithmetic (addition and subtraction) if you wish). The measure sequence "-3 miles -45 yards +2 inches" is treated 'mathematically' as (-3 miles -45 yards +2 inches).

<any other value> - as per default

Measure sequences starting with a positive value are always treated elementally.

standardout

edit

function - defines the default unit system of converted to-units output by the converter.

format - "standardout=<value>"

optional

values:

default - "metric"

"metric" - converted to-units by default will be metric units defined by the pre configured metric conversion order.

"nonmetric" - converted to-units by default will be nonmetric units defined by the pre configured nonmetric conversion order.

<any other value> - as per default

userconvorder

edit

function - defines a convert from-unit to a convert to-unit mapping to be used by the converter.

This will override the relevant pre configured conversion order for the from-unit or dimension, and it ignores any standardout parameter for the from-unit

format - "<from-unit>-><to-unit>-><bigger to-unit>-> . . <biggest to-unit>"

At least one <to-unit> must be provided

eg: "inch->centimetre->decametre"

eg: "hectare->square furlong->miles square"

format - "*<dimension>-><to-unit>-><bigger to-unit>-> . . <biggest to-unit>"

At least one <to-unit> must be provided

NOT YET IMPLEMENTED

optional

values:

no default

<from-unit> - any valid unit, including plurals and abbreviations

<to-unit> - any valid unit, including plurals and abbreviations

<dimension> - any valid dimension

known difficiencies:

Non standard abbreviations cannot be specified in a user conversion order parameter unless pre configured as a specific abbreviation.

decimalpoints

edit

function - sets the number of decimal points to be used in any output numeric values.

format - "decimalpoints=<value>"

optional

values:

default - 2

<any valid non negative integer> - will be the number of decimal points output in values.

There is no soft error detection and the program will abort if there is any issue.

See also the "enotation" parameter and the "snotation" parameter.

abbreviation

edit

function - sets whether or not converted to-units are output in their abbreviated form.

format - "abbreviation=<value>"

optional

values:

default - "no"

"yes" - a converted to-unit will be output in its abbreviated form.

<any other value> - a converted to-unit will be output in its full singular or plural form.

commaseparate

edit

function - sets whether or not converted values are output with comma separaters every three digits, the left side of the decimal point.

format - "commaseparate=<value>"

optional

values:

default - "no"

"yes" - a converted value will be output with commas separating every three digits.

<any other value> - a converted value will be output with no comma separaters.

allowtraildecpt

edit

function - sets whether or not the lexical element at the end of a numeric item is treated as a trailing decimal point or as a full stop.

That is, is "123." similar either to "123.0" or to "123 ."

format - "allowtraildecpt=<value>"

optional

values:

default - "no"

"yes" - a trailing stop at the end of a numeric value will be treated as a decimal point if it can properly be done so.

<any other value> - a trailing stop at the end of numeric value will be treated as a full stop.

measurebrackets

edit

function - sets whether or not brackets, "(" . . ")", around a measure sequence are kept.

format - "measurebrackets=<value>"

optional

values:

default - "remove"

"keep" - brackets are kept

<any other value> - brackets are removed from around a parser recognised measure sequence or conjunction sequence.

infile

edit

function - defines the input file of text to be processed.

format - "infile=<value>"

format - "<value>" - if a parameter with no value is provided it is treated as the input file name if it is the first such parameter and infile has not already been set

mandatory

values:

no default

<value> - any valid path and file name.

There is no soft error detection and the program will abort if there is any issue.

existingconv

edit

function - defines whether or not existing conversions are kept or replaced.

format - "existingconv=<value>"

optional

values:

default - "replace"

"keep" - conversions already existing after a measure sequence or conjunction sequence are kept.

"replace" - bracketed conversions already existing after a measure sequence or conjunction sequence are replaced.

<any other value> - already existing conversions are replaced with a newly generated conversion.

quotedtext

edit

function - determines how the converter processes measures found between pairs of double quotes.

format - "quotedtext=<value>

optional

values:

default - "ignore"

"ignore" - the converter will ignore all text between pairs of double quotes.

"convert" - the converter will process the text between pairs of double quotes as though the double quotes were not in place, and insert or replace conversions.

"popup" - the converter will process the text between pairs of double quotes as though the double quotes were not in place, and insert or replace conversions, but not in line, instead the converter will insert a templated on hover popup with the conversion as the popup.

wnumreplace

edit

function - determines whether or not worded numbers, eg, twenty-one, are replaced with their numeric equivalent.

This parameter is ignored for text between pairs of double quotes whether or not they are part of a measure sequence of a conjunction sequence.

format - "wnumreplace=<value>"

optional

values:

default - "no"

"all" - replace all worded numbers with their numeric equivalent.

"yes" - same as "all".

"measures" - only replace worded numbers which are part of a measure sequence or conjunction sequence.

"no" - do not replace worded numbers with their numeric equivalent.

wnumreplacethresh

edit

function - sets the minimum value for which worded numbers will be converted to numeric equivalents.

format = "wnumreplacethresh=<value>"

optional

values:

default - 100

<any valid numeric value> - worded numbers below this value will not be replaced with their numeric equivalents.

wrap

edit

function - determines whether or not a new line will delineate lexical strutures.

Formatting is preserved in both cases.

format - "wrap=<value>"

optional

values:

default - "no"

"yes" - new lines will split up lexical structures.

For example, the parser will treat:

100 miles 25 yards
65 chains 4 links

as two separate measure sequences and the converter will out put two measures.

<any other value> - new lines are treated as ordinary white space.

For example, the parser will treat the above the same as:

100 miles 25 yards 65 chains 4 links

ie, one measure sequence and the converter will output one measure.

With "wrap=yes",
100
yards
is not the same as
100 yards.

measurereplace

edit

function - determines the behaviour of the parser and converter in regard to existing measure sequences and conjunction sequences.

Does not affect worded numbers, see "wnumreplace" parameter.

This parameter is ignored for text between pairs of double quotes whether or not they are part of a measure sequence or conjunction sequence.

format - "measurereplace=<value>"

optional

values:

default - "no"

"format" - existing measure values are replaced according to the settings of parameters "commaseparate", "decimalpoints", "enotation" and "snotation", the unit is left as is.

"metric" - replace all existing measures with a metric version.

NOT YET IMPLEMENTED

"nonmetric" - replace all existing measures with a non metric version.

NOT YET IMPLEMENTED

<any other value> - existing measures are left as is.

outfile

edit

function - defines the output file for converted text.

CAUTION - there is no overwrite avoidance.

format - "outfile=<value>"

optional

values:

default - standard output io (the screen)

<value> - any valid path and file name.

There is no soft error detection and the program will abort if there is any issue.

print

edit

function - prints out internal data content.

Print out occurs at the end of processing.

Included are pre configured data, user parameter provided data, and data generated during text processing.

format - "print=<value>"

optional

values:

no default

"units" - will print out all units, their plurals, and their abbreviations, and offsets if any.

"dimensions" - will print out all dimensions, their encoding, and core unit.

"convorders" - will print out all conversion orders.

Settings are cumulative, ie, more than one value can be set.

Output will go to standard output (the screen) or as per the setting of the warnfile parameter.


style

edit

function - determines the behaviour of the converter in regard to reducable conjunctions.

format - "style=<value>"

values:

default - "reduce"

"reduce" - where possible and supported, conjunctions will be reduced in style by interpretting the presumed meaning of the conjunctions.

<any other value> - no reduction will occur.

Currently only the conjunction "by" is supported, for reduction for area, ie N by N, and for volume, ie N by N by N, both for a unit of distance.

mode

edit

function - determines the type of output for processed text.

format - "mode=<value>"

values:

default - "convert"

"convert" - output will include conversions of measure sequences and conjunction sequences.

"template" - output will include wiki templates for measure conversions to be processed later.

NOT YET IMPLEMENTED

<any other value> - as per default.

warnfile

edit

function - determines the output file for warnings.

CAUTION - there is no overwrite avoidance.

format - "warnfile=<value>"

optional

values:

default - standard output io (the screen)

<value> - any valid path and file name.

There is no soft error detection and the program will abort if there is any issue.

adjective

edit

function - sets the lexical value which may appear between the value and the unit of a measure.

format - "adjective=<value>"

optional

values:

no default

<value> - any alpha item with no included white space. The parser will allow one such item to occur between the value and the unit. An adjectivised measure will always be put into its own measure sequence.

Settings are cumulative, ie, more than one value can be set.

This only applies to postfix units.

enotation

edit

function - enables engineering notation for output.

For example, 123.45e3, for the value 123,450.

format - "enotation=<value>"

optional

values:

no default

"e" - numbers will be output in engineering notation format using a lower case "e".

"E" - numbers will be output in engineering notation format using an upper case "E".

"yes" - numbers will be output in engineering notation format using a lower case "e".

<any other value> - numbers will be output using standard decimal notation or as per the setting of the "snotation" parameter.

See also, the "decimalpoints" parameter.

snotation

edit

function - enables scientific notation for output.

For example, 1.234x105, for the value 123,450.

format - "snotation=<value>"

optional

values:

no default

"yes" - numbers will be output in marked up scientific notation format, eg, "6.789x10<sup>3</sup>" for the value 6,789.

<any other value> - numbers will be output using standard decimal notation or as per the setting of the "enotation" parameter.

See also, the "decimalpoints" parameter.