Talk:Compiler-compiler/Archive 1

Latest comment: 4 years ago by Steamerandy in topic Terminology Clarity issue
Archive 1

Info for 'citation needed'

"The first compiler-compiler to use that name was written by Tony Brooker in 1960 and was used to create compilers for the Atlas computer at the University of Manchester, including the Atlas Autocode compiler.[citation needed]" - Here's an article on how one version of the Atlas Autocode compiler was written using the Compiler Compiler: https://academic.oup.com/comjnl/article/9/4/350/390181

While I'm here I feel obliged to comment on some of the ramblings below.

A true compiler-compiler is what is nowadays called an 'extensible language'. Brooker and Irons were working in similar areas and interacted at conferences etc so it's not a coincidence that there are similarities in their work. Brooker coined the term compiler-compiler but Irons preferred the term extensible language. Systems such as Yacc etc are *not* compiler-compilers - they're parser generators. The Brooker compiler-compiler is effectively a very simple compiler that is modified on the fly to add new syntax to the underlying system and which having made these modifications goes on in the same execution to compile a source program in the defined language.

If by 'metacompiler' you mean a compiler engine that has no built-in input language, but will compile a program for any language if given a configuration file that defines that language, then yes, that would be a compiler-compiler. Yacc is *not* a compiler-compiler - or a metacompiler - because it doesn't compile. It just generates a parser.

However this stuff about metacompilers being defined as compilers that compile themselves is utter nonsense. That's just bootstrapping and has nothing to do with compiler-compilers.

The later derivative of Atlas Autocode, "Imp", also had a compiler-compiler facility that was extremely similar in use to the Brooker compiler-compiler - however in the description for the Imp extensible language facility, it is described as a macro facility because it was implemented as a separate phase from the compiler itself, but the end result is identical to what would have been done with code such as was built-in to the compiler-compiler. It is documented at https://gtoal.com/history.dcs.ed.ac.uk/archive/languages/imp/ImpMacros.pdf with an example file at https://gtoal.com/history.dcs.ed.ac.uk/archive/languages/imp/macros.imp This style of language extension using syntax directed macros (ie macros which require the same parser as used in the compiler itself) was also implemented in Robert Rae's WonderPop, the Dec10 implementation of Pop2. It too could reasonably be described as a compiler-compiler, as the language extension was built in (and I've implemented a simple Pascal using Wonderpop macros so I know first-hand it was a proper extensible language)

Oh - and a little throwaway detail that I think has been overlooked - the parsing mechanism used by Tony Brooker in the compiler-compiler, and Atlas Autocode, and later in every Imp compiler ... was what is now called a Parsing Expression Grammar (PEG). It wasn't invented by Brian Ford in 2004, it was invented by Tony Brooker in the late 1950's. You won't find a reference to that online because that term wasn't invented until the 2000's, but if you know enough about parsers to actually read and work on any of the ones I mentioned above, it's obvious that the two technologies are identical.

Terminology Clarity issue

There is a great amount of terminology confusion. Mostly because as compiled languages developed so did engineering and occupational professions. A lot was developed before Computer Science existed as a separate curriculum. To really understand some of the questions here like the difference between compiler compilers, metacompiler, and parser generator one has to look at computer language development history.

Today I must say that metacompiler as defined in metacompiler documents is simply a compiler that takes a metalanguage as input. That is a metacompiler is a metalanguage compiler. META II, TREEMETA and CWIC's SYNTAX language are meta languages. Their compilers produced executable code that parses a language and transform it into an Abstract Syntax Trees or to stack machine code.

Is yacc a metacompiler? yes it takes as input a metalanguage.

META II, TREEMETA, and CWIC are true compiler compiler as they provide features for transforming the recognized source into an executable program.

There's no requirement that metacompilers produce compilers. An early metacompiler developed from META II at IBM took boolean equations and produced executable code to simulate them.

It is to late to fix this. I hope this helps with the clarity problem discussed below.

I think they should not of been combined. Steamerandy (talk) 20:58, 24 April 2020 (UTC)

Clarity

This article is a perfect example of an explanation written by and for people who already know what the subject is in detail. I am a tech writer specializing in software engineering and even though I have more than just a few notions of what this is about I am struggling to make sense of this. What it is missing is real world anchors - e.g. "A compiler-compiler or compiler generator is a tool" should read "A compiler-compiler or compiler generator is a computer program" - there are many more examples throughout the text that show that the author assumes that not only are all the readers of the article software engineers but that they all work in the same field. Some readers may be well informed onlookers; others may have no knowledge whatsoever of software engineering but have enough intelligence to understand what compiler-compilers do. Those people are not being catered for in this article. Tolken (talk) 05:57, 28 May 2010 (UTC)

Contradictory

Main description:

The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in BNF) of a programming language, and whose generated output is the source code of a parser often used as a component of a compiler.

What was the earliest parser generator? Referance!!

The earliest Compiler-Compilers I can find referenced are:

"A SYNTAX DIRECTED COMPILER FOR ALGOL 60" Edgar T. Irons, Communications of the ACM Volume 4 Issue 1, Jan. 1961.

"A General Translation Program for Phrase Structure Languages" Brooker and Morris, Journal of the ACM (JACM) JACM Homepage archive Volume 9 Issue 1, Jan. 1962

"A compiler-building system developed by Brooker and Morris" Brooker and Morris, Communications of the ACM Volume 7 Issue 7, July 1964

"A general-purpose table-driven compiler" Stephen Warshall and Robert M. Shapiro , AFIPS '64 (Spring) Proceedings of the April 21-23, 1964, spring joint computer conference Found no early parser generators. Maybe just not used back then. The earliest ACM topic is in 1971

Also: The compiler-compiler, a seminal idea first presented at a British Computer Society Conference in July 1960 by Brooker and Morris.

As far as I know usable stand alone parser generators, separate from compiler compiler systems, didn't appear until yacc in the 1970's.

There were notable compiler compilers developed befor parser generators separately existed.

This missconception highlights the a problem. Computers existed before computer science existed. Computer terminology existed before computer science. Vary few collages in the late sixties offered computer science. Computer technology was studied under engineering. Programming was tought by math and data processing departments. That is when I was in collage. Computer science was just begining at universities. A few like Stanford and MIT were pioneers. ACM members were leading the technology progress. Academic Computer Science has just ignored what came beforehand.Steamerandy (talk) 12:17, 10th and 20th of September 2018 (UTC) Update of earler post from me--Steamerandy (talk) 20:04, 1 November 2014 (UTC)--

I think it might be useful to merge this article (Compiler-compiler) with the article on Parsers (subsection "parser generator"). (comment by Netizen 2003-10-31)

Compiler-Compiler vs. Metacompiler

Are "metacompiler" and "compiler-compiler" synonyms? I'm not sure, but I believe that a metacompiler is a selfcompiler, i.e. it can compile itself, so it's not equal to a "compiler-compiler". --surueña 10:14, 2005 Jun 18 (UTC)

there is no such term as a "selfcompiler" [1], [2]
All compilers can compile themselves, since their grammar is deterministic/complete. By definition of the term "meta-" (read Gödel, Escher, Bach for a thorough explaination) a meta-compiler is a compiler-compiler and vise-versa. yacc is a meta-compiler. A meta-meta-compiler would be a compiler that creates compilers that compile something - dynamic/on-the-fly creation of compiler modules from the BNF/ontological syntax of a language/ontology included within a file, which in turn is used to compile the actual file.
in short: Message, within a message, within the bottle. Project2501a 11:05, 18 Jun 2005 (UTC)
Yes, it seems that "selfcompiler" is the name of a specific compiler, but you know what I mean. But not every compiler can compile itself:
  • Some specialized compilers (e.g. a lot of C compilers) only implements a part of the language, so small that they cannot compile themselves.
  • A great bunch of compilers are implemented in another language. For example, when there is no compiler for a given language (every language has a first compiler, and of course it cannot be implemented in the same language), and for that languages which have a bad performance (the compiler would be very slow, so it is written in another language)
In short, a compiler that can compile itself is a property worth noting, and I knew that "type" of compilers as metacompilers. But reading some pages it seems that somebody have another completely different meaning for metacompiling [3]. Do you know any source with a good definition? In my opinion, we shouldn't put that metacompiler is a synonym of compiler-compiler until we found more info about it. --surueña 12:34, 2005 Jun 18 (UTC)
hokay, are we talking computer science here or software engineering? "some specialized compilers" are the exception to the rule as afaik. that's software engineering. specific implementation of a compiler/set of bnf instructions. if we are talking computer science, then this is a non-issue: all compilers can compile themselves.
when there is no compiler for a given language that's what a compiler-compiler/metacompiler is. it's a compiler that will give you your first compiler. it's like a candy-dispencing machine, only you insert BNF for coins.
the page you mention still talks about metacompiling. using Forth to build a meta-compiler, basically to use forth to generate applications in meta mode. Good source with good definition? Knuth! and Hofsader! and yes, a metacompiler is synonym of a compiler-compiler ever since 1969, when yacc came around.
what's your major, anyway? Project2501a 12:58, 18 Jun 2005 (UTC)
I don't understand your point. Actually I don't know whether the percentage of compilers that can build themselves is high or low, but it is a fact that there are compilers that cannot compile themselves. I don't understad the disntiction between CS or SE, but the fact is no every compiler can compile itself. Anyway, if you are sure that "metacompiler" is a synonym of "compiler-compiler" we should put it in the article, with those references. Hope this helps --surueña 08:28, 2005 Jun 20 (UTC)
it's not a matter of percentages, it's a matter of compiler theory. Compiler theory is part of Computer science. if there are some compilers out there in the field, that do not compile themselves due to management/engineering decisions, it's a matter of applied computer science, aka Software engineering. Matter of fact is, the set of BNF instructions that makes up any given computer language, can always compile itself. Try it with a pen an paper. Computer science theory is about the data, as the computer science article says, not about the computer. something like organising information theory into patterns. Project2501a 12:18, 20 Jun 2005 (UTC)
OK, I understand now your point, thanks for the explanation. But although in theory every compiler can compile itself in practice this isn't true, and no Computer Science book can change that... :-) I try to find the name given to that type of compilers... --surueña 13:21, 2005 Jun 20 (UTC)
well, yeah, see, an article on compilers of compilers/metacompilers/how ever you want to call it, falls under computer science. not software engineering, so, in this case, the practice is irrelevant, a note in the article, at best. Project2501a

Actually, hmmm...

Request to move to metacompiler

\{\{move|metacompiler\}\}


Zanaq 5 July 2005 21:19 (UTC)Removed the request to move because:
  • The consensus seems to be that a metacompiler and a compiler-compiler might not be the same.
  • The google reveals "compiler-compiler" is the more common term. "Compiler Compiler" 114000 hits, metacompiler 763 hits
  • I personally have heard compiler-compiler occasionaly, but not metacompiler.


Consensus has nothing to do about this, it's a matter of definition, look up your compiler theory book. So, metacompiler shows up less times that compiler-compiler. I will admit that the request to move to metacompiler was a bad idea. compiler-compiler is more popular. but still, char *compiler_compiler; char *metacompiler; compiler_compiler = metacompiler; Project2501a 5 July 2005 22:16 (UTC)

This has been hashed out a lot here. However metacompiler and compiler compiler are not quite the same. Not all compilers are able to compile their self.
In all metacompile documentation I have found the description "A metaacompileris a computer program that takes as input source code written in a metalanguage." Except the FORTH fringe who have idea it means self compiler. However using the metalanguage compiler def FORTH would be a metacompiler. FORTH is simply a language of defined words. Programming in FORTH is defining new words, adding to or extending the language. So FORTH is a programming metalanguage.
The first metacompiler could not compile its self. It toke as input a metalanguage specifying language constructs and produced a program that generated random string conforming to the metalanguage specification.
All documented metacompiler can take as input a program written in a metalanguage and produce a running program. I think this is the difference that separates them from compiler compiler. yacc screwed up the terminologie by calling itself a compiler compiler. It's a matter of computer science history. Because some new author uses termonolgy differently should't make terms concrete. These term should firstly be explained as found in the majority of cases. I would think that the ACM would be one of the best sources for referance.
Originally metacompiler and compiler compiler were the same. When yacc came out that changed. It would have been better to call yacc a metacompiler. Keeping compiler compiler to mean exactly what it states.TREE-META is more powerful then yacc. In 1968 it produced compilers that output assembly code. In 1970 CWIC produced optimizing compilers that output IBM 360 binary machine code. SLIC in 1974 was producing compilers that cold target different processos. Compiler produced ran on a DEC-SYSTEM-10 and produced DEC-10 and TI990 machine code.
What I have found is that all programs documented as a metacompiler take as input a program whose source code is writen in a metalanguage. The metalanguge program is compiled. into executable code. If the metalanguage program is a complete specification a programming language and its translation to machine code the result is a compiler for the specified language.
Compiler compiler has degraded to include parser generators and other such things Of course a metacompiler could produce a parser module.
Not all compilers can compile their self. A logic equation comiler that produces an electronic circuit for instance.
Steamerandy (talk) 12:37, 13 November 2015 (UTC)

I do not know if this topic is still active.

I do note know were the idea that any compiler can compile itself comes from. But it just is isn't true. Some compilers for example produce instructions to be run on specilized machines. Those machines not being generalized computers have no way to run a compiler. I have used a graphical languages that emulates analog computer. It can not be written in its own language. A compiler that alows you to define computer hardware and outputs wiring instructions and board placement has no chance of compiling itself.

A META II spinoff was used to create compiler that took as input boolean equations and produced code to emulate them for testing electronic circuit board designs that were output from the boolean equations. These were spinoffs of Schorre's META II. Steamerandy (talk) 06:45, 9 July 2019 (UTC)

METACOMPIER example

The history section looks a bit confusing, is there any example of a real compiler-compiler?

I am posting this for use here. I developed SLIC that was based on work done by Val Schorre

TREE-META and META II are compiler-compilers / translator systems that document their self as META "language" compilers. That is they were not called metacompilers by their creators. I know from attending ACM L.A. SEGPLAN meetings that they were being refered to as metacompilers. So I believe metacompiler is a

I feel that the term metacompiler came out of its common use when talking about the compilers of these early META "languages".

We had several META "languages": META I, META II, META III META 5 etc. Of course the languages had compilers. We had a compiler for the META II language. A META III compiler etc. So I believe that metacompiler is a portmanteau of META "language x" compiler. So metacompiler is a generic term for a compiler whoes input is a programming metalanguage. So sense metalanguages are languages for talking about a language or languages. A metacompiler is a compiler that takes as input a metaprogramming metalanguage. There is an early metalanguage compiler that was based on Schorre's META II. Only it took as input boolean formula and compiled them into boolean logic simulations used in circuitry design. So no a metacompiler does not necessarily have to compile itself. But they do generally take as input a programming language designed for describing the analysis and translation of programming languages.

I have a real problem with the clame that it is computer science that dictates the meaning terminology. Its kind of the other way around. Computer science tries to put meaningful lables on specific topics.

Such an idea that computer science dictates anything runs afoul of the concept of science. Science is not like religious dogma. Even if old technology is obsolete its terminology must be understood as to what it ment at the time. Termonolgy changes can very much effect the understanding subjects. The terms compiler-compiler and metacompiler have long been misused. There is no possibility of straightening this mess out without a general consensus. The oldest defination of metacompiler makes the most sense. A compiler that takes a metalanguage as input. The term compiler compiler was a compiler that is specificly used to write compilers. (not parser generators). The yacc a parser generator calling itself a compiler compiler. So the main reason for the confusion can be directly blamed of yacc developers calling yacc a compiler compiler when it less then half a compiler. It was a bullshit move on the authors of yacc. Where is the code generation. There isn't any. Is a bold faced lie that yacc is a compiler compiler as it was originally being used it that time. But now we are stuck with it

CWIC (Compiler for Writing and Implementing Compilers) is a compiler compiler developed by Val Schorre and Erwin Book at System Development Corporation also documents itself as a META "language" compiler.

CWIC produced binary IBM 360 executable code. It was a further development of V Schorre the developer of Meta II.

The syntax language was similar to Tree Meta producing a tree structure. But very different in that generator procedures were called to process them. The Generator language was far more powerful having several unique features.

A simple arithmetic expression example in the syntax language.

EXPR = TERM $(('+':ADD | '-':SUB) TERM !2);
TERM = FACTOR $(('*':MPY  | '/':DIV) FACTOR !2);
FACTOR = ('(' EXPR ')' | NUMBER | VAR )
         ('^' FACTOR:POW!2|.EMPTY);

The syntax language is composed of rules that specify the valid syntax of input. CWIC maintains two stacks. One containing nodes and the other, the parse tree. The ":" operator puts a node on the node stack and the "!" operator combines into a list the top node stack entry and the top numbern of entries from the parse stack and pushes the tree onto parse stack.

The EXPR rule defines an expression to be a TERM followed by a series of, zero or more, TERMs separated + OR - OPERATORS. The "$" operator specifies zero or more or the following may be present.

The expression "a + 2*(b+c) + D" would be turned into a tree representation as

ADD[a,ADD[MPY[2.ADD[b,c]],D]] or in list form:
[ADD,a,[ADD,[MPY,2.[ADD,b,c]],D]]

CWIC provide 3 types of rules. Syntax, Token and character class.

A digit character class rule for example:

DIGIT: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';

A simple integer number.

NUMBER .. DIGIT $DIGIT MAKENUM[];

MAKENUM[] is a call to a library generator function that takes the token parsed in the token buffer, converts it to a numeric value and puts it on the parse stack. In the absence of a generator function as MAKENUMBER the token buffer would be simply entered into the symbol table and placed on the parse stack.

The GENERATOR language was an algorithmic procedural language that emitted 360 machine code.

The CWIC generator language provided pattern matched arguments. For example:

gen(ADD[gen(x),gen(y))) => {...}
   {SUB[gen(x),gen(y)]) => {...}
    ...

   integer(x) => return x;
   varable(x) => return x;

A generator consisted of its name (gen above), and pattern productions pairs.

 (<pattern>) => < production>

The pattern could be simply a varable list or as in the above a specific pattern match. The pattern valadated the input and could brake it apart. The unique feature of function call in a pattern is that the function is called with the argument at which it resides and it's parameter list, if any, are returned values.


In 1969 I started implementation of a Mata compiler system called SLIC that was developed at Cerritos Collage in Norwalk Calif. on a DEC-SYSTEM-10.

It consisted of several special purpose languages. The syntax language much like CWIC and TREEMETA was a rule based language producing tree or list structures. Though I now use formula instead of rule hopefully avoiding their confusion with production grammar rewrite rules. Formula are in part reductive grammar rules that are combined with directed transform operators that in early parser languages produce stack machine code. In later versions they produce abstract syntax trees that are passed to generator functions programed in a LISP 2 dialect. The CWIC LISP 2 based generator language produces 8 bit byte code insteuctions targeting IBM 360 computers. SLIC's GENERATOR language generates PSEUDO instructionns. PSEUDO instructions programed procedures that emit machine code. MACHINE instructions are defined in the MACHOP language. A MACHOP define an instruction's assembly to binary code. A machop defines an instruction's mnemonic The Generator language emitted PSEUDO code instructions to named section lists. Section were simply a data structure that defined the section parameters and a list containing emitted pseudo instructions. A call to flush a section would sequentially process the sections pseudo code list, calling the pseudo code procedures. The execution of the pseudo code procedures output relocatable machine code.

The MACHOP defined a target machine instruction or instruction class. A MACHOP could define a single instruction using it's mnemonic name or a set of instructions having the same binary format using a vectored entry. In the vectored entry the MACHOP name was instead a variable preceded by a "#" character. In the vectored entry the MACHOP name was set to the opcode value. A list of opcode mnemonics, value pairs followed the instruction definition.

Machine code was defined in a bit addressable memory space.

An example of a vectored entry MACHOP:

.MACHOP	#OP;
 .MORG 8: H(16): $/8;
  H(8): OP;

  #AAA     0b00110111;
  #AAS     0b00111111;
  #CBW     0b10011000;
  #CLC     0b11111000;
  #CLD     0b11111100;
  #CLI     0b11111010;
  #CMC     0b11110101;
  #CWD     0b10011001;
  #DAA     0b00100111;
  #DAS     0b00101111;
  #HLT     0b11110100;
  #INT3    0b11001100;
  #INTO    0b11001110;
  #IRET    0b11001111
  #LAHF    0b10011111;
  #LEAVE   0b11001001;
  #NOP     0b10010000;
  #POPA    0b01100001;
  #POPF    0b10011101;
  #PUSHA   0b01100000;
  #PUSHF   0b10011100;
  #SAHF    0b10011110;
  #STC     0b11111001;
  #STD     0b11111101;
  #STI     0b11111011;
  #XLAT    0b11010111;

The above is a very simple instruction consisting of only an 8 bit opcode. #OP specifies a vectored entry point. OP id a variable that contains the opcode binary value. The line:

  #AAA     0b00110111;

is an AAA opcode entry. OP would contain the binary value 0b00110111

The .MORG operation is a modular org. The first operand 8 specifies to align the plant location to an 8 bit boundary.

 .MORG 8: H(16): $/8;

H(16) is specifying listing output of a 16 bit field $/8 in hex format. $ is the plant location in bit addressable memory space

 H(8): OP;

The above outputs OP to the output code file. H(8) specifies listing output in HEX of the 8 bit opcode in OP. Assembly listing output is optional.


Another MACHOP example of a more complex instruction:

 .MORG 8: H(16): $/8;
  if isnumber(source) then {
    if isreg(dest) then {
      +H(4):0b1011;
        (1); if size:(dest) = 8 then 0 else 1;
        (3): dest;}
    else {
      if !frame:(dest) and segment:(dest) != curSegment then
        +H(8): overide(dest)

        +H(7):0b1100011;
          (1): if size:(dest) = 8 then 0 else 1;
        +H(2): mod(dest);
          (3): 0;
          (3): R_M(dest);
        +H(szoff): offset:(dest/8);}
    +H(size:(dest)):  source}


  else if isreg(source) && isreg(dest) then {
    +H(7):0b1000100;
      (1): if size:(dest) = 8 then 0 else 1;
    +H(2):0b11;
      (3):source;
      (3):dest;}
  else if isreg(source) then {
    +H(7):0b1000100;
      (1): if size:(dest) = 8 then 0 else 1;
    +H(2): mod(dest);
      (3):source;
      (3):R_M(dest);}
  else  isreg(dest) then {
    +H(7):0b1000101;
      (1): if size:(source) = 8 then 0 else 1;
    +H(2): mod(source);
      (3):source;
      (3):R_M(source);}
  else if issegreg(dest) then {
    +H(8):0b10001110;
      (2): mod(source);
      (3): issegreg(dest);
      (3): R_M(source);
    if !isreg(source);
      +H(szoff): offset:(source/8);}
  else {
    +H(8):0b10001100;
      (2): mod(dest);
      (3): issegreg(source);
      (3): R_M(dest);
    if !isreg(source);
      +H(szoff): offset:(dest/8);}

Many optimazations can be done in the generator language. To provide optimization on the sequential instructions an ISO language was planed but so far not implemented. The ISO language was planed to operate on the pseudo instruction lists.

The main thing SLIC did was to separate the target machine code generation from the semitics process. The pseudo instruction were designed with the language being compiled in mind. Pseudo instruction were much like an assembly macro. I.E. The Pseudo code for a function call:

Output a procedure call instruction to the object file
PSEUDO call proc {

; if a 'far proc' use a far call
; if a 'proc' use a near call
; if a memory referance use memory size to deduce far or near
;                       call needed
; Some of this could be done in the call and farcall MACHOP's

   if TypeAttr:(proc) == farproc
     if segment:(proc) == CurrentSegment {
        ^pushreg cs			; Simulate far call
        ^call    proc}
     else
        ^farcall proc;
   else if TypeAttr:(proc) == proc
        ^call    proc;
   else if SizeAttr(proc) == 2
        ^call    proc;
   else
        ^farcall proc}

The basic concept was to produce code for any target machine architecture make it a simple matter to target different machines. By simply defining PSEUDO code using a target machine instructions.

In 1972 SLIC was used to program a COBOL compiller running on a DEC- SYSTEM-10 generating code for a TI990 mini computer.

Development time for the COBOL compiler was <4 months for myself and one other programer.

Edit of previous post from 2014.Steamerandy (talk) 13:20, 10 September 2018 (UTC) Steamerandy (talk) 21:40, 7 August 2019 (UTC)

CWIC

I added a section about CWIC to the "metacompiler" article. (We might want to consider merging "Metacompiler" and "Compiler-compiler" -- does anybody know how to propose that?)

The problem I'm seeing is that CWIC is probably too small in the great scheme of things to deserve its own article, but adding it as a section to "metacompiler" means it takes up too much of the article, compared with other material. I'm not sure what the "right" thing to do is.

I came into the CWIC project (we called it "The CWIC Mafia" because Erwin Book always wore a suit and looked a little bit like a mafioso) shortly after the initial development, and I basically became the maintenance engineer while Erwin, Val, and Steve worked on direct and indirect applications of the language. The Generators Language compiler was a masterpiece of passing the buck, aka "recursion".

One thing that has bothered me is that the work done on code generation in CWIC seems to have been lost to posterity. YACC, Bison, and friends concentrate on the easy problem: parsing, but offer no help with code generation. CWIC had many features that aided code generation -- the pattern matching, the <...> operator that emitted code, and operators that automatically handled repetitive sections of the tree.

If you used a pattern like (...$x...)=>, then "$x" was a list of things, and in the right hand side of the production you could write something like:

  .do (... x ...)

This meant to repeatedly execute the code in parens, assigning successive elements of the list $x to "x" in each iteration. Or you could write $(...x...), which would generate a new list, with whatever transformation you specified on each element of $x. Similarly,

  $&(...x...) was true only if the parenthesized expression was true for every element of $x
  $|(...x...) was true if the expression was true for any element
  $+(...x...) was the sum , $* was the product.

So CWIC was a significant advance on parser-only metalanguages. What it lacked (and AFAIK no metacompiler has tackled since) was assistance in maintaining the symbol dictionary. Of course, every "atom" that got scanned turned into a symbol that you could assign Lisp-style attributes to, but by now we should have meta-compilers that assist the compiler writer in at least the following two problems:

  1. The syntax says there's supposed to be an ID here. There should be operators to check that (a) the ID has been previously defined (or not defined), and (b) the ID represents a specific type of name (e.g., a variable name, a function name, etc.)
  2. Block-structured languages: you should be able to say "start a new block", which allows re-defining already defined variable names (and even function names).

Bgoldnyxnet (talk) 18:27, 5 June 2013 (UTC)

This article should probably be moved to Parser generator, as that seems to be the main focus. Compiler-compiler can then be made into a disambiguation page to both Parser generator and Metacompiler.
You might also want to have a look at attribute grammars, they add a semantic layer to parser generators to e.g. maintain symbol tables. —Ruud 20:02, 5 June 2013 (UTC)
Thank you, Ruud, I'll look at attribute grammar. Bgoldnyxnet (talk) 04:55, 7 November 2014 (UTC)
I thought CWIC had a symbol table stack. SLIC has a symbol table stack. Built in generaters are used to push and pop the symbol table stack. GETSYMTBL returns the top stack entry. It can be assigned to a symbol attribute for example. Scoped variables locals are handled that way. Debuger needed scope info. I am fairly sure CWIC had those functions. Block structured languages were handled by CWIC acording to Erwin Book. POPSYMTBL also returned the poped symbol table object.
Bgoldnyxnet you may be a bit rusty on CWIC symbol tables. Remember MAKEINT[], MAKESTR[], MAKEOCT[] etc. Intercede function stopped the default symbol table entry.
Earley Schorre meta compiler all produced code. META II generated stack machine code. TREE-META added unpardonable rules and tree making constructs : and [ ]. : followed by a node name is exact the same as CWIC. The [ ] inclosed a number and created a tree of with the number of specified branches just like the ! operator in CWIC. TREE-META did not have the power of CWIC. But was an intermediate step creating a tree and using unparse rules precursors to CWIC's generators. One could describe CWIC's generators as named grouped unparse rules having LISP 2 actions. I do not find Schorre mentioned in the TREE-META documents.
I would say that CWIC should have its own artical. Compared to TREE-META OR META II,there is a lot more to CWIC. CWIC features that earlier ones did not have. 1)Symbol table. 2)Linkable binary output. 3)Token making rules instead of built in functions. .ID, .NUM, .STR. 4)Programed backtracking. Backtrack alternative operator. 5)The much more powerful LISP 2 generater language action code. More powerful unparse rules calling generators and returning results to named local variables. etc. Steamerandy (talk) 09:17, 10 November 2015 (UTC)

Merge

As was stated in #CWIC, we don't need two (or more) pages on the same topic:

I would pick name of the first public occurrence. Thus, not "Language workbench". Ushkin N (talk) 05:22, 23 May 2016 (UTC)

Support merge as the topic seem to be identical. Given the early 1960s META II, Metacompiler seems be to be the better name. I note that the lede for that page includes the statement that a metacompiler is "a subset of a specialized class of compiler writing tools called compiler-compilers."; however, I can't find examples of compliler-complilers that would not be described as metacompilers. The contents of Compiler-compiler seem easier to digest, so material from that page should be used over the more arcane text at metacompiler. 13:35, 27 February 2018 (UTC) by Klbrain
When working on implementing this, I have changed the merge direction, given the 'subset' argument above, as I'm no longer confident in my own interpretation; I'll assume that the text in the lede is correct. {{merge done)) Klbrain (talk) 07:57, 7 April 2018 (UTC)

External links modified

Hello fellow Wikipedians,

I have just modified one external link on Compiler-compiler. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 21:11, 11 January 2018 (UTC)

Metacompiler

I wrote SLIC (System of Languages for Implement Compilers) based on CWIC (Compiler for Writing and Implementing Compilers). I learned of CWIC at a SegPlan meeting in the late 1960s. I attended most of the L.A. SegPlan meeting between 1967 to 1973.

I think that metacompiler is a portmanteau of meta language compiler. It originated out of those SegPlan meetings. There were discussions on meta X languages. META II, META III META 5 etc. So in talking about the META II compiler or META x compiler it became metacompiler. A metacompiler is a compiler that takes as input a metalanguage. METACOMPILER I think became commonly used around 1968. Around the time of TREEMETA's release. The CWIC ACM paper I think best explains metacompiler as: a compiler that takes as input a metalanguage. That is really the best defination of metacompiler.

A compiler compiler was originally a compiler that actually produced a compiler. Then yacc screwed it up. Up until the time of yacc, compiler compilers were complete compiler producing systems.

If these term were not so mixed up by yacc calling itself a compiler compiler we would not have this terminology problem today.

There are metacompilers that do not produce anything you would associate with a compiler. One early metacompile based on META II took as input logic formula and produced a program that emulated them.

The original 1968 TREEMETA paper contains a historical overview of earler metacompilers.

Metacompilers are in use today taking as input logic equations and producing IC manufacturing instructions.

Those IC metacompilers came from that early metacompiler based on META II. It was described in the TREEMETA 1968 document's metacompiler history section.

Merging compiler compiler and metacompiler I think was a mistake. Steamerandy (talk) 17:26, 19 May 2018 (UTC)

A big difference between metacompilers and parser generators is that metacompilers that describe themselves as such all take as input reductive analytical grammar formulas.

That is they are top-down parsers starting with a top driving formula:

program = $((declaration 
	    |.EOF .STOP)
           \ (ERRORX["?Error?"]
              $(-';' (.ANY|.STOP))';')); 

That formula calls other formula that in turn may call other formula. The formula perform a top-down (reductive) analysis of the input source. That is we are programming a top-down, reductive, analysis. META II described itself as compiling syntax equations into recursive subroutines. That was then, when most programs were coded in assembly. Today we would just say they are recursive functions. Take for example an arithmetic expression parser programed in META II:

expr = term $('+' term .OUT(' ADD')/'-' term .OUT(' SUB'));
term = factor $('*' factor .OUT(' MPY')/'/' factor .OUT(' DIV'));
factor = .ID .OUT(' LD  ' *1')/.NUMBER .OUT( 'LDL ' *1)/'(' expr ')' ;    

META II produced stack machine code. Looking at these as functions, expr first calling term. term calling factor, factor calling .ID. .ID and .NUM are builtin token recognizers. The expression 2*(a-5) would generate:

LDL 2
LD  a
LDL 5
ADD
MPY

In factor we called .ID that returned failure so the next alternative .NUM does succeed and outputs the LDL 2. There is a parse stack in META II. .ID and .NUM on success push the recognized string into the *stack (star stack). The *1 in the .OUT functions pops the top *stack string and outputs it.

In CWIC we have a similar modus operandi. We have the *stack or parse stack. A node stack is added to the programming model. The :<mode id> operation creates a node object and pushes it into the node stack. !<number> pops the top node stack item combining it with the top <number> of *stack items creating a list representation of a tree and pushes it onto the *stack. An arithmetic expression programed in the CWIC SYNTAX language:

 expr = term $(('+':ADD|'-':SUB) term!2);
 term = factor $(('*':MPY|'/':DIV)fac                            tor!2);
 factor = (id|number|'('expr')')('^' factor:EXP!2|.EMPTY);

Parsing the expression 2*a^(x+5)+8 with the above:

 expr calls term
  term calls factor
   factor calls id
    id {fails}
   factor calls number
    number succeeds 2 is pushed onto the *stack
   factor '^' fails alternative .EMPTY succeeds
  term match * push MPY on node stack
   factor calls id
    id matches a
   factor matches '^' and calls factor
    factor matches '(' calls expr
     expr calls term
      term calls factor
       factor call id
        id succeeds matching x pushed on *stack
       factor fails to match '^' .EMPTY succeeds
      term fails to recognize '*' or '/' success
     expr recognizing '+' calls term
      term calls factor
       factor matches 5 '^' fails .EMPTY succeeds
      term fails to match a '+' or '-' terminates loop returns success
     returning to expr !2 creates the tree ADD[x,5] continuing the 
     $ loop fails to match '+' or '-' and expr returns success
    factor matches ')' fails '^' returns success
   factor creates an EXP[a,ADD[X,5] tree
  term !2 creates MPY[2,EXP[a,ADD[X,5]] tree
 expr matches '+' pushes ADD node calls term
  term calls factor
   factor recognizes the number falls '^' test
  term fails to find a '*' OR '/' returns success
 expr !2 de ADD[MPY[2,EXP[a,ADD[X,5]],8] tree

Hope I got that right, hard to keep track on a small phone screen.

Basicly you are programming a recursive decent parser that does not have a single drawback claimed by academia today. But it's efficiency is dependent on how one programs the parser. Steamerandy (talk) 02:03, 20 May 2018 (UTC)

Has Thom Cheatam been written out of History?

Thomas Edward Cheatham Jr., a leader in the field, is nowhere mentioned. Shmuel (Seymour J.) Metz Username:Chatul (talk) 21:47, 12 July 2018 (UTC)