Why programming language
This paper was written in 2000. Since that time my point of view related to some aspects of this text might have changed, but I don't want to edit it in the hope it isn't absolutely necessary.
What and Why?
Why is a procedural programming language derived from Forth. The name for the language was given accidentally. It was originally the name of the compiler project ("Why not a Compiler?"), when the language itself wasn't complete, but soon it become the language name ("Why not a Programming Language?").
Basic concept of Why
Like Forth, Why language is stack-oriented. All data being processed are pushed onto stack. All subroutines (words, in Forth terms), pop some upper elements from stack, take some action, and then push the results back. For example, addition operation,
"2 + 3" is written in Forth (and in Why, too) like
"2 3 +": "2" and "3" constants initiate "push 2, push 3" operation, and "+" invokes arithmetic addition operation which affects two top stack elements. When this operation is complete, the top of the stack contains the result "5".
Why language, like Forth, has two stacks arithmetic stack (astack in short), used for any arithmetic operations, and return stack (rstack), which is used in all procedural languages as a storage for return addresses, function arguments, etc. Arithmetic operations in Why can use any of those stacks (there is xstacks statement which swaps tho stack pointers), but it is not the recommended programming style and/or technique. In fact, this operation is only required to provide some compatibility with external routines written in another languages (like operating system API calls), because they use return stack for passing arguments.
Why language procedure consists of some instructions divided by spaces and/or comments. Every instruction is either data element (which pushes this data onto stack), operation or procedure call. The simplest program, traditional "Hello, World!" taken from the famous Kernighan and Ritchie book, looks like
"Hello, World!" Puts
(you will probably find the nonation in somewhat familiar if you know Forth). String constant, or, more exactly, memory address of that string constant, is pushed onto stack, then Puts routine is called. Puts pops the address from stack and displays it. After all, the stack remains just the same.
Why language program or library (in fact, any translation unit is treated as a library) is a set of procedure or function definitions. There is no syntactical differences between procedures and functions. Functions are procedures that return results by pushing them. In Why, function can return any number of result values (the number can even vary from time to time). One cannot even imagine such features in Algol derivatives like C or Pascal.
Program is a library containing Main procedure definition. Why language is case-insensitive, so you can write "main" or "MAIN" if you need.
Control flow structures
All Why constrol structures has very simple syntax. In fact, all of them are inline subroutines that operate with stack. For example, consider to simple loop with counter:
<ToLoopStatement> ::= "to" <Sequence> "loop"
As you can see, the grammar don't contain anything which looks like the counter. "To" and "loop" are indeed two different operations. "To" word compares the top stack element with the element next to the top and performs the conditional jump to the first instruction after the "loop" word. The condition is true, if the top stack element is greater then the one next to the top. "Loop" performs the unconditional jump to the "to" word. All other operations like loop counter initialization, counter incrementation, etc. are left to programmer. Just to save two drop's, Why compiler pops two top elements immediatelly after "loop". Consider the to-loop statement which displays "Hello again" for five times (not so witty, just for example):
5 1 to
"Hello again" Puts
Top stack element inside the loop body represents the counter value. "++" is the arithmetic operation which increases the counter by 1. It is not necessary to use exactly the same operation; you can, for example, use "2 +" sequence to increase the counter by 2 (the compiler don't care for it's not specified by the language grammar). Two "drop" operations assumed just after the "loop" keyword are required to extract counter and loop limit values from the stack.
Besides the "to-loop" construct, the language supports similar "downto-loop" (which differs in the exit condition), "begin-until" and "begin-while-repeat" loops and, of course, branch statement "if-[else-]then". Like in Forth, "if" or "else" branches are executed depending on the value of the top stack element, and the execution continues from the statement next to "then" keyword:
"All goes right"
"Opps! A bug in the compiler."
The compiler don't match any types and don't assume anything about the stack unless it's a trivial situation. The language is not type-safe in its concept, so the following code example compiles (and even executes) successfully:
"String constant" PutNumber
There's no way to specify a type for PutNumber routine's arguments. In fact, the compiler don't know if "String constant" is an argument for
PutNumber. As the result, you are free to pass any number of arguments of any possible type to your own sophisticated routines like
printf in C or
writeln in Pascal.
The current version of Why compiler performs single-pass target machine code optimization. First, some top elements of the arithmetic stack are stored in CPU registers, which significantly reduces the access time. Second, constant values of those elements are also taken care of by a compiler.
Such feature results requires more intelligent code generator, but it can greatly improve the resulting code efficiency. For example, the
"7 3 * 1 - 10 /" expression would result in the same output code as for
"2". Furthermore, compile-time constant evaluation allows to reveal and eliminate the dead code (for example,
"1 5 to ... loop" or
"0 if ... then").
Rebuilding the compiler
The compiler is written in Pascal using the object-oriented technology. Runtime libraries are implemented in Why (most code) and assembly language (a few low-level routines).
To rebuild the compiler you need Turbo Pascal 7.x. The author used Turbo Pascal 7.1, © 1997 Borland International. You may possibly find another compilers that would compile the code without any modifications, but I doublt whether the life is so simple as the source is about 5.5 thousand lines in size. I never tried another compilers, though.
Translated by Alexey Yakovlev
Follow the link to find more notes related to Why compiler.