![]() |
Computing Systems CS134b, Winter 2005 Programming languages and compilers |
|
|
Home |
This document describes the metalanguage used to define the OCaml syntax. First we will give a grammar for the metalanguage; thus the metalanguage will be used to describe itself. Following that is a prose description and a discussion of common usage and how various aspects are understood to be interpreted.
Grammar =
Rules
Rules =
Rules NL+ Rule
Rule
Rule =
Symbol Separator NL Alternative*
Symbol Separator { Character_set_lexeme } NL
Symbol Separator Special_character_lexeme NL
Alternative* =
Alternative+
Empty
Alternative+ =
Alternative+ Alternative
Alternative
Alternative =
Symbol+ NL
Symbol+ =
Symbol+ Symbol
Symbol
Symbol =
Lexeme
Separator =
Lexeme
Character_set_lexeme =
Lexeme
Special_character_lexeme =
Lexeme
Empty =
Lexical_units include:
Lexeme
{
}
NL
White_space(ignored)
Lexeme =
Printable_character+
Printable_character+ =
Printable_character+ Printable_character
Printable_character
Printable_character =
Letter
Digit
Other_printable
Letter =
Uppercase_letter
Lowercase_letter
Uppercase_letter any_of: { ABCDEFGHIJKLMNOPQRSTUVWXYZ }
Lowercase_letter any_of: { abcdefghijklmnopqrstuvwxyz }
Digit any_of: { 0123456789 }
Other_printable any_of: { ,./;'[]-=\`<>?:"{}!@#$%^&*()_+|~ }
NL+ =
NL+ NL
NL
NL =
CR LF
LF CR
CR
LF
White_space(ignored) =
NULL
Blank
Tab
NULL \= (ASCII:0)
Blank \= (ASCII:32)
Tab \= (ASCII:9)
CR \= (ASCII:13)
LF \= (ASCII:10)
A Grammar is a sequence of Rule's, with one or more blank lines (NL) between the Rule's. Most Rule's are written with one or more alternative expansions for the nonterminal symbol (the first symbol in the first line of the Rule), with each Alternative being written by itself on a single line.
{ =
Lexeme
} =
Lexeme
and then any sequence of non-blank characters could be used for the type of Rule that uses { and }.
There are three or four kinds of Rule's that may be expressed. Most Rule's are written with one or more Alternative lines following the first Rule that gives the left hand side non-terminal, and are the usual kind of context free grammar rules that are expressed. The special case of no Alternatives (as exemplified here by Empty) means a non-terminal that may be generated from "nothing". Some texts on formal language refer to this case as "epsilon", sometimes written out and sometimes shown by a Greek letter.
The first non-terminal of the grammar is always taken to be the goal symbol of the language being defined. As a matter of good form the metalanguage mandates that the goal non-terminal not be referenced in any Alternative in the grammar.
The 'while' terminal symbol is identified as itself and not as an Identifier, even though it matches the grammatical definition of Identifier. If it is desired (which is isn't in most cases) that keywords also are allowed to be Identifier's, that may be accomplished by something along these lines:
The inclusion of the 'while' terminal symbol as an alternative for Identifier would allow it to be generated as an Identifier even though both 'while' and Identifier are lexical units. Incidentally, notices the use of '...::=' as a separator to emphasize that the set of alternatives given are not exhaustive (which is the usual rule). |
|
|
|
||