chapter 2 edits

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43760 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Chris Lattner 2007-11-06 07:16:22 +00:00
parent 4134c2821f
commit cde1d9db3f

View File

@ -45,7 +45,7 @@
with LLVM</a>" tutorial. This chapter shows you how to use the <a with LLVM</a>" tutorial. This chapter shows you how to use the <a
href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
href="http://en.wikipedia.org/wiki/Parsing">parser</a> for href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
our Kaleidoscope language and build an <a our Kaleidoscope language. Once we have a parser, we'll define and build an <a
href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
Tree</a> (AST).</p> Tree</a> (AST).</p>
@ -53,7 +53,7 @@ Tree</a> (AST).</p>
href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
Parsing</a> and <a href= Parsing</a> and <a href=
"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence "http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
Parsing</a> to parse the Kaleidoscope language (the later for binary expression Parsing</a> to parse the Kaleidoscope language (the latter for binary expression
and the former for everything else). Before we get to parsing though, lets talk and the former for everything else). Before we get to parsing though, lets talk
about the output of the parser: the Abstract Syntax Tree.</p> about the output of the parser: the Abstract Syntax Tree.</p>
@ -144,7 +144,8 @@ themselves:</p>
<div class="doc_code"> <div class="doc_code">
<pre> <pre>
/// PrototypeAST - This class represents the "prototype" for a function, /// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator. /// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST { class PrototypeAST {
std::string Name; std::string Name;
std::vector&lt;std::string&gt; Args; std::vector&lt;std::string&gt; Args;
@ -165,9 +166,9 @@ public:
</div> </div>
<p>In Kaleidoscope, functions are typed with just a count of their arguments. <p>In Kaleidoscope, functions are typed with just a count of their arguments.
Since all values are double precision floating point, this fact doesn't need to Since all values are double precision floating point, the type of each argument
be captured anywhere. In a more aggressive and realistic language, the doesn't need to be stored anywhere. In a more aggressive and realistic
"ExprAST" class would probably have a type field.</p> language, the "ExprAST" class would probably have a type field.</p>
<p>With this scaffolding, we can now talk about parsing expressions and function <p>With this scaffolding, we can now talk about parsing expressions and function
bodies in Kaleidoscope.</p> bodies in Kaleidoscope.</p>
@ -213,10 +214,6 @@ us to look one token ahead at what the lexer is returning. Every function in
our parser will assume that CurTok is the current token that needs to be our parser will assume that CurTok is the current token that needs to be
parsed.</p> parsed.</p>
<p>Again, we define these with global variables; it would be better design to
wrap the entire parser in a class and use instance variables for these.
</p>
<div class="doc_code"> <div class="doc_code">
<pre> <pre>
@ -293,7 +290,7 @@ static ExprAST *ParseParenExpr() {
<p>This function illustrates a number of interesting things about the parser: <p>This function illustrates a number of interesting things about the parser:
1) it shows how we use the Error routines. When called, this function expects 1) it shows how we use the Error routines. When called, this function expects
that the current token is a '(' token, but after parsing the subexpression, it that the current token is a '(' token, but after parsing the subexpression, it
is possible that there is not a ')' waiting. For example, if the user types in is possible that there is no ')' waiting. For example, if the user types in
"(4 x" instead of "(4)", the parser should emit an error. Because errors can "(4 x" instead of "(4)", the parser should emit an error. Because errors can
occur, the parser needs a way to indicate that they happened: in our parser, we occur, the parser needs a way to indicate that they happened: in our parser, we
return null on an error.</p> return null on an error.</p>
@ -357,10 +354,11 @@ either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
</p> </p>
<p>Now that we have all of our simple expression parsing logic in place, we can <p>Now that we have all of our simple expression parsing logic in place, we can
define a helper function to wrap them up in a class. We call this class of define a helper function to wrap it together into one entry-point. We call this
expressions "primary" expressions, for reasons that will become more clear class of expressions "primary" expressions, for reasons that will become more
later. In order to parse a primary expression, we need to determine what sort clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to
of expression it is:</p> parse an arbitrary primary expression, we need to determine what sort of
specific expression it is:</p>
<div class="doc_code"> <div class="doc_code">
<pre> <pre>
@ -438,12 +436,13 @@ int main() {
</div> </div>
<p>For the basic form of Kaleidoscope, we will only support 4 binary operators <p>For the basic form of Kaleidoscope, we will only support 4 binary operators
(this can obviously be extended by you, the reader). The (this can obviously be extended by you, our brave and intrepid reader). The
<tt>GetTokPrecedence</tt> function returns the precedence for the current token, <tt>GetTokPrecedence</tt> function returns the precedence for the current token,
or -1 if the token is not a binary operator. Having a map makes it easy to add or -1 if the token is not a binary operator. Having a map makes it easy to add
new operators and makes it clear that the algorithm doesn't depend on the new operators and makes it clear that the algorithm doesn't depend on the
specific operators involved, but it would be easy enough to eliminate the map specific operators involved, but it would be easy enough to eliminate the map
and do the comparisons in the <tt>GetTokPrecedence</tt> function.</p> and do the comparisons in the <tt>GetTokPrecedence</tt> function (or just use
a fixed-size array).</p>
<p>With the helper above defined, we can now start parsing binary expressions. <p>With the helper above defined, we can now start parsing binary expressions.
The basic idea of operator precedence parsing is to break down an expression The basic idea of operator precedence parsing is to break down an expression
@ -578,8 +577,8 @@ context):</p>
// the pending operator take RHS as its LHS. // the pending operator take RHS as its LHS.
int NextPrec = GetTokPrecedence(); int NextPrec = GetTokPrecedence();
if (TokPrec &lt; NextPrec) { if (TokPrec &lt; NextPrec) {
RHS = ParseBinOpRHS(TokPrec+1, RHS); <b>RHS = ParseBinOpRHS(TokPrec+1, RHS);
if (RHS == 0) return 0; if (RHS == 0) return 0;</b>
} }
// Merge LHS/RHS. // Merge LHS/RHS.
LHS = new BinaryExprAST(BinOp, LHS, RHS); LHS = new BinaryExprAST(BinOp, LHS, RHS);
@ -600,6 +599,8 @@ of the '+' expression.</p>
<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed. <p>Finally, on the next iteration of the while loop, the "+g" piece is parsed.
and added to the AST. With this little bit of code (14 non-trivial lines), we and added to the AST. With this little bit of code (14 non-trivial lines), we
correctly handle fully general binary expression parsing in a very elegant way. correctly handle fully general binary expression parsing in a very elegant way.
This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
running through it with a few tough examples to see how it works.
</p> </p>
<p>This wraps up handling of expressions. At this point, we can point the <p>This wraps up handling of expressions. At this point, we can point the
@ -616,7 +617,7 @@ handle function definitions etc.</p>
<div class="doc_text"> <div class="doc_text">
<p> <p>
The first basic thing missing is that of function prototypes. In Kaleidoscope, The next thing missing is handling of function prototypes. In Kaleidoscope,
these are used both for 'extern' function declarations as well as function body these are used both for 'extern' function declarations as well as function body
definitions. The code to do this is straight-forward and not very interesting definitions. The code to do this is straight-forward and not very interesting
(once you've survived expressions): (once you've survived expressions):
@ -636,6 +637,7 @@ static PrototypeAST *ParsePrototype() {
if (CurTok != '(') if (CurTok != '(')
return ErrorP("Expected '(' in prototype"); return ErrorP("Expected '(' in prototype");
// Read the list of argument names.
std::vector&lt;std::string&gt; ArgNames; std::vector&lt;std::string&gt; ArgNames;
while (getNextToken() == tok_identifier) while (getNextToken() == tok_identifier)
ArgNames.push_back(IdentifierStr); ArgNames.push_back(IdentifierStr);
@ -750,25 +752,26 @@ type "4+5;" and the parser will know you are done.</p>
<div class="doc_text"> <div class="doc_text">
<p>With just under 400 lines of commented code, we fully defined our minimal <p>With just under 400 lines of commented code (240 lines of non-comment,
language, including a lexer, parser and AST builder. With this done, the non-blank code), we fully defined our minimal language, including a lexer,
executable will validate code and tell us if it is gramatically invalid. For parser and AST builder. With this done, the executable will validate
Kaleidoscope code and tell us if it is gramatically invalid. For
example, here is a sample interaction:</p> example, here is a sample interaction:</p>
<div class="doc_code"> <div class="doc_code">
<pre> <pre>
$ ./a.out $ <b>./a.out</b>
ready&gt; def foo(x y) x+foo(y, 4.0); ready&gt; <b>def foo(x y) x+foo(y, 4.0);</b>
ready&gt; Parsed a function definition. Parsed a function definition.
ready&gt; def foo(x y) x+y y; ready&gt; <b>def foo(x y) x+y y;</b>
ready&gt; Parsed a function definition. Parsed a function definition.
ready&gt; Parsed a top-level expr Parsed a top-level expr
ready&gt; def foo(x y) x+y ); ready&gt; <b>def foo(x y) x+y );</b>
ready&gt; Parsed a function definition. Parsed a function definition.
ready&gt; Error: unknown token when expecting an expression Error: unknown token when expecting an expression
ready&gt; extern sin(a); ready&gt; <b>extern sin(a);</b>
ready&gt; Parsed an extern ready&gt; Parsed an extern
ready&gt; ^D ready&gt; <b>^D</b>
$ $
</pre> </pre>
</div> </div>
@ -794,7 +797,7 @@ course). To build this, just compile with:</p>
<div class="doc_code"> <div class="doc_code">
<pre> <pre>
# Compile # Compile
g++ -g toy.cpp g++ -g -O3 toy.cpp
# Run # Run
./a.out ./a.out
</pre> </pre>
@ -919,7 +922,8 @@ public:
}; };
/// PrototypeAST - This class represents the "prototype" for a function, /// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator. /// which captures its name, and its argument names (thus implicitly the number
/// of arguments the function takes).
class PrototypeAST { class PrototypeAST {
std::string Name; std::string Name;
std::vector&lt; Args; std::vector&lt; Args;