User:Cscott/Ideas/Improved for-loops for Lua

What for-loops are missing

edit

At the March 2024 MediaWiki Engineering Offsite, User:MatmaRex proposed a lightning talk titled "What the for-each loop is missing". Without spoiling his talk too much, his observations boil down to two common features programmers have to manually implement on top of standard for-loops:

1. Some way to detect whether you are at the first or last iteration of the loop. For example:

function totuple(list)
    local s = ''
    for _,item in ipairs(list) with first, last do
        if first then
            s = s .. '('
        end
        s = s .. tostring(item)
        if not last then
            s = s .. ", "
        else
            s = s .. ')'
        end
    end
    return s
end

2. Code which executes "only if the loop iterated at least once" or "only if the loop never executed". For example:

function max(list)
    local result = nil
    for _,item in ipairs(list) with first, last do
        if first or item > result then
            result = item
        end
    then
        -- only if loop executed at least once
        return result
    else
        -- if loop never executed (no items in list)
        error("maximum of zero length list")
    end
end

Implementation in Lua

edit

MatmaRex presented his proposal in a language-independent way, and partial implementations for various languages exist. For example, python3 has an else clause used with for-loops which executes "only if loop completes normally". A custom iterator in PHP was also written that provides the first iteration and last iteration booleans. Since I had a Lua grammar and interpreter handy, I decided to take a shot at a Lua implementation of the full proposal.

As shown in the above examples, there are two additions to the lua grammar. First, an optional with <Id>, <Id> clause is added in both the for-in and for-num productions. The first Id names a boolean local variable which is true during the first iteration, and the second Id names a boolean local variable which is true during the final iteration. Note that these can be named anything, although in most of my examples they will be named first and last for clarity. But for nested for loops you may very well have outer_first and inner_first, or first1 and first2, etc.

For simplicity and clarity I've chosen to always make both the "first" and "last" identifiers mandatory; there is no way to ask for only "first" and not "last", or only "last" and not "first". This does have some runtime implications in the for-in case: with lua's implementation of iterators/generators, we can't determine whether we are on the last element without actually requesting it. Thus, when a with clause is present, a for-in loop always executes "one iteration behind"; that is, it requests element N+1 before executing the loop with element N. In some corner cases with user-implemented generator functions this behavior might be observable. Consider this example, adapted from the lua manual's description of how ipairs is implemented:

function iter(a, i)
    i = i + 1
    local v = a[i]
    if v then
        print(i)
        return i, v
    end
end

function myipairs(a)
    return iter, a, 0
end

for i,v in myipairs({"one", "two", "three"}) with first, last do
    print(v)
end

Without the with first, last in the for-in loop this prints:

1
one
2
two
3
three

But when with first, last is added this prints:

1
2
one
3
two
three

It would be possible to add with first (without the , last) as an alternative production, and when only the "first" boolean is required we wouldn't need to execute one iteration behind, but I haven't done that in this implementation.

The second grammar feature is adding optional then and else clauses to the for-in and for-num loops. We have a number of design questions here: what local variables should be visible in the scope of the then and/or else block, and what should their values be? What should the behavior of the then and else block be when break is used in the for loop? (Lua does not have a continue statement.) And, finally: our choice regarding break behavior made it desirable sometimes to combine the "more than zero iterations" (then) and "zero iterations" (else) cases; how should this be done?

I made the following choices:

1. In then blocks, the iteration variable is visible and it is reset to the value it had on the last iteration of the loop. (Any local writes to this variable in the do block are discarded.) In else blocks, the iteration variable is not defined; it would not have a useful value in this case at any rate. This makes this example adapted from Python work in Lua as well:

for _,item in ipairs(list) do
    print(item)
    item = nil
then
    -- note that 'item' is still bound here, and reset to the final item
    print("Final item is:", item)
end

If a with clause is present, neither of its local variables is defined in the then or else block.

2. When executing a break statement, then and else blocks are skipped. This tweaks the semantics for then and else blocks: they are executed only on normal completion (non-break) of non-zero/zero iterations of the for loop. This makes this example adapted from Python work in Lua as well:

-- Primality testing
for n = 2, 10 do
  for x = 2, n-1 do
     if n % x == 0 then
       print(n, "equals", x, "*", n/x)
       break
     end
  then
     -- executes only if loop completed normally (ie, no break)
     print(n, "is a prime number")
  end
end

3. If a then block is present without an else block, then the then block is executed on any normal completion of the loop, even if it had zero iterations. This seems to match the common use case when only a then block is present, as in the example above. You can think of this as effectively duplicating the then block and using it as the else block as well, but note that (unlike usually in an else block) the loop variables are declared in the body of the then block; if the loop was not executed they will all be set to nil. It could be argued that we should use a different grammatical marker for this case, perhaps something like thenelse (as a single keyword), but we've opted to keep it simple in our implementation.


The final grammar, using the LPegRex grammar formalism, looks like:

[==[
ForNum        <== `for` Id `=` @expr @`,` @expr ((`,` @expr) / $false) (ForWith / $false) @ForBody
ForIn         <== `for` @idlist `in` @exprlist (ForWith / $false) @ForBody
ForWith       <== `with` @Id @`,` @Id
ForBody       <== `do` Block (`then` Block / $false) (`else` Block / $false) @`end`
]==]

Using this syntax in Scribunto

edit

The Lua grammar and interpreter is written to be compatible with Scribunto and can be used on wiki. One caveat is that Scribunto enforces syntax-checking on Lua code stored in the Module namespace, which means that Lua code using with/for-then/for-else can't be successfully saved in that namespace. However, we can parse and execute modules from other namespaces; my examples will use Lua code stored under my user namespace.

To execute code using mlua, which supports this extended for-loop syntax, you just need to replace {{#invoke: with {{#invoke:User:Cscott/mlua|invokeUser| in your wikitext. Note that mlua's invoke method defaults to the Module namespace like Scribunto's #invoke does; because our extended syntax "is not syntactically-correct lua" we need to use invokeUser which can execute from the User (or other) namespace. The arguments after invokeUser are the title of the module and then the function name within that module, just as with #invoke.

Live examples using mlua:

  • Source code: /example1
  • Executing ...example1|max|1|2|42: 42
  • Executing ...example1|isprime|<N>:
    • 41 = true
    • 42 = false
  • Executing ...example1|bignum_digits_to_string (see below):
    • 0
    • 1203

This works in French as well (use mlua|invokeFr):

  • Source code: /example1/fr
  • Executing ...example1/fr|max|1|2|42: 42
  • Executing ...example1/fr|estPremier|<N>:
    • 41 = true
    • 42 = false
  • Executing ...example1/fr|chiffresÀChaîner (see below):
    • 0
    • 1203

Additional examples

edit
function bignum_digits_to_string(digit_list)
    s = ''
    for _,d in ipairs(digit_list) do
        s = s .. tostring(d)
    else
        -- if there are no digits in the digit list
        s = '0'
    end
    return s
end