From 1a36bd9b908be2913aa1d3e1c56fc641a907f5e9 Mon Sep 17 00:00:00 2001
From: Chris Lattner Welcome to Chapter 4 of the "Implementing a language
-with LLVM" tutorial. Parts 1-3 described the implementation of a simple
-language and included support for generating LLVM IR. This chapter describes
+with LLVM" tutorial. Chapters 1-3 described the implementation of a simple
+language and added support for generating LLVM IR. This chapter describes
two new techniques: adding optimizer support to your language, and adding JIT
compiler support. This shows how to get nice efficient code for your
language. Well, that was easy. :) In practice, we recommend always using
+ Well, that was easy :). In practice, we recommend always using
LLVMFoldingBuilder when generating code like this. It has no
"syntactic overhead" for its use (you don't have to uglify your compiler with
constant checks everywhere) and it can dramatically reduce the amount of
@@ -166,7 +166,8 @@ at link time, this can be a substantial portion of the whole program). It also
supports and includes "per-function" passes which just operate on a single
function at a time, without looking at other functions. For more information
on passes and how the get run, see the How
-to Write a Pass document.
For Kaleidoscope, we are currently generating functions on the fly, one at a time, as the user types them in. We aren't shooting for the ultimate @@ -212,7 +213,7 @@ add a set of optimizations to run. The code looks like this:
that we're not going to take advantage of here, so I won't dive into what it is all about. -The meat of the matter is the definition of the "OurFPM". It +
The meat of the matter is the definition of "OurFPM". It requires a pointer to the Module (through the ModuleProvider) to construct itself. Once it is set up, we use a series of "add" calls to add a bunch of LLVM passes. The first pass is basically boilerplate, it adds a pass @@ -222,10 +223,10 @@ which we will get to in the next section.
In this case, we choose to add 4 optimization passes. The passes we chose here are a pretty standard set of "cleanup" optimizations that are useful for -a wide variety of code. I won't delve into what they do, but believe that they -are a good starting place.
+a wide variety of code. I won't delve into what they do, but believe me that +they are a good starting place :). -Once the passmanager, is set up, we need to make use of it. We do this by +
Once the PassManager is set up, we need to make use of it. We do this by running it after our newly created function is constructed (in FunctionAST::Codegen), but before it is returned to the client:
@@ -238,8 +239,8 @@ running it after our newly created function is constructed (in // Validate the generated code, checking for consistency. verifyFunction(*TheFunction); - // Optimize the function. - TheFPM->run(*TheFunction); + // Optimize the function. + TheFPM->run(*TheFunction); return TheFunction; } @@ -265,7 +266,7 @@ entry:As expected, we now get our nicely optimized code, saving a floating point -add from the program.
+add instruction from every execution of this function.LLVM provides a wide variety of optimizations that can be used in certain circumstances. Some documentation about the various @@ -286,15 +287,15 @@ executing it!
Once the code is available in LLVM IR form a wide variety of tools can be +
Code that is available in LLVM IR can have a wide variety of tools applied to it. For example, you can run optimizations on it (as we did above), you can dump it out in textual or binary forms, you can compile the code to an assembly file (.s) for some target, or you can JIT compile it. The nice thing -about the LLVM IR representation is that it is the common currency between many -different parts of the compiler. +about the LLVM IR representation is that it is the "common currency" between +many different parts of the compiler.
-In this chapter, we'll add JIT compiler support to our interpreter. The +
In this section, we'll add JIT compiler support to our interpreter. The basic idea that we want for Kaleidoscope is to have the user enter function bodies as they do now, but immediately evaluate the top-level expressions they type in. For example, if they type in "1 + 2;", we should evaluate and print @@ -306,12 +307,12 @@ by adding a global variable and a call in main:
-static ExecutionEngine *TheExecutionEngine; +static ExecutionEngine *TheExecutionEngine; ... int main() { .. - // Create the JIT. - TheExecutionEngine = ExecutionEngine::create(TheModule); + // Create the JIT. + TheExecutionEngine = ExecutionEngine::create(TheModule); .. }@@ -337,13 +338,13 @@ static void HandleTopLevelExpression() { if (Function *LF = F->Codegen()) { LF->dump(); // Dump the function for exposition purposes. - // JIT the function, returning a function pointer. + // JIT the function, returning a function pointer. void *FPtr = TheExecutionEngine->getPointerToFunction(LF); // Cast it to the right type (takes no arguments, returns a double) so we // can call it as a native function. double (*FP)() = (double (*)())FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); + fprintf(stderr, "Evaluated to %f\n", FP()); }
What actually happened here is that the anonymous function is JIT'd when requested. When the Kaleidoscope app calls through the function pointer that is returned, the anonymous function starts executing. It ends up -making the call for the "testfunc" function, and ends up in a stub that invokes +making the call to the "testfunc" function, and ends up in a stub that invokes the JIT, lazily, on testfunc. Once the JIT finishes lazily compiling testfunc, -it returns and the code reexecutes the call.
+it returns and the code re-executes the call.In summary, the JIT will lazily JIT code on the fly as it is needed. The JIT provides a number of other more advanced interfaces for things like freeing @@ -445,11 +446,13 @@ ready> foo(4.0);
Whoa, how does the JIT know about sin and cos? The answer is simple: in this +
Whoa, how does the JIT know about sin and cos? The answer is surprisingly +simple: in this example, the JIT started execution of a function and got to a function call. It realized that the function was not yet JIT compiled and invoked the standard set of routines to resolve the function. In this case, there is no body defined -for the function, so the JIT ended up calling "dlsym("sin")" on itself. +for the function, so the JIT ended up calling "dlsym("sin")" on the +Kaleidoscope process itself. Since "sin" is defined within the JIT's address space, it simply patches up calls in the module to call the libm version of sin directly.
@@ -479,7 +482,7 @@ double putchard(double X) {Now we can produce simple output to the console by using things like: "extern putchard(x); putchard(120);", which prints a lowercase 'x' on -the console (120 is the ascii code for 'x'). Similar code could be used to +the console (120 is the ASCII code for 'x'). Similar code could be used to implement file I/O, console input, and many other capabilities in Kaleidoscope.