Emscripten: An LLVM-to-JavaScript Compiler
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Emscripten: An LLVM-to-JavaScript Compiler Alon Zakai Mozilla azakai@mozilla.com Abstract emscripten.org. As an LLVM-to-JavaScript compiler, the challenges in designing Emscripten JavaScript is the standard language of the web, are somewhat the reverse of the norm – one must supported on essentially all web browsers. De- go from a low-level assembly into a high-level spite efforts to allow other languages to be run as language, and recreate parts of the original high- well, none have come close to being universally level structure of the code that were lost in the available on all browsers, which severely limits compilation to low-level LLVM. We detail the al- their usefulness on the web. However, there are gorithms used in Emscripten to deal with those reasonable reasons why allowing other languages challenges. would be beneficial, including reusing existing code and allowing developers to use their lan- c 2011 Alon Zakai. License: Creative Com- guages of choice. mons Attribution-ShareAlike (CC BY-SA), We present Emscripten, an LLVM-to- http://creativecommons.org/licenses/ JavaScript compiler. Emscripten compiles by-sa/3.0/ LLVM assembly code into standard JavaScript, which opens up two avenues for running code 1 Introduction written in other languages on the web: (1) Com- pile a language directly into LLVM, and then Since the mid 1990’s, JavaScript has been compile that into JavaScript using Emscripten, present in most web browsers (sometimes with or (2) Compiling a language’s entire runtime minor variations and under slightly different (typically written in C or C++) into JavaScript names, e.g., JScript in Internet Explorer), and using Emscripten, and using the compiled today it is well-supported on essentially all web runtime to run code written in that language. browsers, from desktop browsers like Internet Examples of languages that can use the first Explorer, Firefox, Chrome and Safari, to mo- approach are C and C++, as compilers exist bile browsers on smartphones and tablets. To- for them into LLVM. An example of a language gether with HTML and CSS, JavaScript is the that can use the second approach is Python, and standards-based foundation of the web. Emscripten has been used to compile CPython Running other programming languages on the (the standard C implementation of Python) into web has been suggested many times, and browser JavaScript, allowing standard Python code to plugins have allowed doing so, e.g., via the Java be run on the web, which was not previously and Flash plugins. However, plugins must be possible. manually installed and do not integrate in a per- Emscripten itself is written in JavaScript (to fect way with the outside HTML. Perhaps more enable various dynamic compilation techniques), problematic is that they cannot run at all on and is available under the MIT license (a per- some platforms, for example, Java and Flash can- missive open source license), at http://www. not run on iOS devices such as the iPhone and Page 1
iPad. For those reasons, JavaScript remains the • Compile the runtime used to parse and primary programming language of the web. execute code in a particular language into There are, however, justifiable motivations for LLVM, then compile that into JavaScript running code from other programming languages using Emscripten. It is then possible to run on the web, for example, if one has a large code in that runtime on the web. This is amount of existing code already written in an- a useful approach if a language’s runtime is other language, or if one simply has a strong written in a language for which an LLVM preference for another language (and perhaps is frontend exists, but the language iself has no more productive in it). such frontend. For example, no currently- As a consequence, there have been some supported frontend exists for Python, how- efforts to compile languages into JavaScript. ever it is possible to compile CPython – the Since JavaScript is present in essentially all web standard implementation of Python, written browsers, by compiling one’s language of choice in C – into JavaScript, and run Python code into JavaScript, one can still generate content on that (see Subsection X.Y). that will run practically everywhere. Examples of this approach include the Google Web Toolkit, From a technical standpoint, the main chal- which compiles Java into JavaScript; Pyjamas, lenges in designing and implementing Em- which compiles Python into JavaScript; and ru- scripten are that it compiles a low-level lan- mors have it that projects in Oracle and Mi- guage – LLVM assembly – into a high-level one crosoft, respectively, allow running JVM and – JavaScript. This is somethat the reverse of the CLR bytecode in JavaScript (which would allow usual situation one is in when building a com- running the languages of the JVM and CLR, re- piler, and leads to some unique difficulties. For spectively). example, to get good performance in JavaScript In this paper we present another project along one must use natural JavaScript code flow struc- those lines: Emscripten, which compiles LLVM tures, like loops and ifs, but those structures do assembly into JavaScript. LLVM (Low Level Vir- not exist in LLVM assembly (instead, what is tual Machine) is a compiler project primarily fo- present there is essentially ‘flat’ code with goto cused on C, C++ and Objective-C. It compiles commands). Emscripten must therefore recon- those languages through a frontend (the main struct a high-level representation from the low- ones of which are Clang and LLVM-GCC) into level data it receives. the LLVM intermediary representation (which In theory that issue could have been avoided can be machine-readable bitcode, or human- by compiling a higher-level language into readable assembly), and then passes it through JavaScript. For example, if compiling Java into a backend which generates actual machine code JavaScript (as the Google Web Toolkit does), for a particular architecure. Emscripten plays then one can benefit from the fact that Java’s the role of a backend which targets JavaScript. loops, ifs and so forth generally have a very di- By using Emscripten, potentially many lan- rect parallel in JavaScript (of course the down- guages can be run on the web, using one of the side is that this approach yields a compiler only following methods: for Java). Compiling LLVM into JavaScript is less straightforward, but wee will see later that • Compile code in a language recognized it is possible to reconstruct a substantial part of by one of the existing LLVM frontends the high-level structure of the original code. into LLVM, and then compile that into We conclude this introduction with a list of JavaScript using Emscripten. Frontends for this paper’s main contributions: various languages exist, including many of the most popular programming languages • We describe Emscripten itself, during which such as C and C++, and also various new we detail its approach in compiling LLVM and emerging languages (e.g., Rust). into JavaScript. Page 2
• We give details of Emscripten’s ‘Relooper’ @.str = private constant [14 x i8] algorithm, which generates high-level loop c"1+...+100=%d\0A\00" structures from low-level branching data. We are unaware of related results in the lit- define i32 @main() { erature. %1 = alloca i32, align 4 %sum = alloca i32, align 4 In addition, the following are the main contribu- %i = alloca i32, align 4 tions of Emscripten itself, that to our knowledge store i32 0, i32* %1 were not previously possible: store i32 0, i32* %sum, align 4 store i32 1, i32* %i, align 4 • It allows compiling a very large subset of C br label %2 and C++ code into JavaScript, which can then be run on the web. ; :2 • By compiling their runtimes, it allows run- %3 = load i32* %i, align 4 ning languages such as Python on the web. %4 = icmp slt i32 %3, 100 br i1 %4, label %5, label %12 The remainder of this paper is structured as follows. In Section 2 we describe, from a high ; :5 level, the approach taken to compiling LLVM as- %6 = load i32* %i, align 4 sembly into JavaScript. In Section 3 we describe %7 = load i32* %sum, align 4 the workings of Emscripten on a lower, more con- %8 = add nsw i32 %7, %6 crete level. In Section 4 we give an overview of store i32 %8, i32* %sum, align 4 some uses of Emscripten. In Section 5 we sum- br label %9 marize and give directions for future work on Emscripten and uses of it. ; :9 %10 = load i32* %i, align 4 %11 = add nsw i32 %10, 1 2 Compilation Approach store i32 %11, i32* %i, align 4 br label %2 Let us begin by considering what the challenge is, when we want to compile something into ; :12 JavaScript. Assume we are given the following %13 = load i32* %sum, align 4 simple example of a C program, which we want %14 = call i32 (i8*, ...)* to compile into JavaScript: @printf(i8* getelementptr inbounds #include ([14 x i8]* @.str, i32 0, i32 0), int main() i32 %13) { ret i32 0 int sum = 0; } for (int i = 1; i < 100; i++) sum += i; At first glance, this may look more difficult printf("1+...+100=%d\n", sum); to translate into JavaScript than the original return 0; C++. However, compiling C++ in general } would require writing code to handle prepro- cessing, classes, templates, and all the idiosyn- This program calculates the sum of the integers crasies and complexities of C++. LLVM assem- from 1 to 100. When compiled by Clang, the bly, while more verbose in this example, is lower- generated LLVM assembly code includes the fol- level and simpler to work on. It also has the lowing: benefit we mentioned earlier, which is one of the Page 3
main goals of Emscripten, that many languages if ($4) { __label__ = 1; break; } can be compiled into LLVM. else { __label__ = 2; break; } A detailed overview of LLVM assembly is be- case 1: yond our scope here. Briefly, though, the ex- var $6 = HEAP[$i]; ample assembly above can easily be seen to de- var $7 = HEAP[$sum]; fine a function main(), then allocate some values var $8 = $7 + $6; on the stack (alloca), then load and store var- HEAP[$sum] = $8; ious values (load and store). We do not have __label__ = 3; break; the high-level code structure as we had in C++, case 3: with a loop, instead we have code ‘fragments’, var $10 = HEAP[$i]; each with a label, and code flow moves from one var $11 = $10 + 1; to another by branch (br) instructions. (Label HEAP[$i] = $11; 2 is the condition check in the loop; label 5 is __label__ = 0; break; the body, label 9 is the increment, and label 12 case 2: is the final part of the function, outside of the var $13 = HEAP[$sum]; loop). Conditional branches can depend on cal- var $14 = _printf(__str, $13); culations, for example the results of comparing STACKTOP = __stackBase__; two values (icmp). Other numerical operations return 0; include addition (add). Finally, printf is called } (call). The challenge, then, is to convert this and } things like it into JavaScript. In general, Emscripten’s approach is to trans- Some things to take notice of: late each line of LLVM assembly into JavaScript, • A switch-in-a-loop construction is used in 1 for 1, into ‘normal’ JavaScript as much as pos- order to let the flow of execution move be- sible. So, for example, an add operation becomes tween fragments of code in an arbitrary a normal JavaScript addition, a function call be- manner: We set label to the (numerical comes a JavaScript function call, etc. This 1 to representation of the) label of the fragment 1 translation generates JavaScript that resembles we want to reach, and do a break, which assembly code, for example, the LLVM assembly leads to the proper fragment being reached. shown before for main() would be compiled into Inside each fragment, every line of code cor- the following: responds to a line of LLVM assembly, gen- erally in a very straightforward manner. function _main() { var __stackBase__ = STACKTOP; • Memory is implemented by HEAP, a STACKTOP += 12; JavaScript array. Reading from memory is var __label__ = -1; a read from that array, and writing to mem- while(1) switch(__label__) { ory is a write. STACKTOP is the current case -1: position of the stack. (Note that we allo- var $1 = __stackBase__; cate 4 memory locations for 32-bit integers var $sum = __stackBase__+4; on the stack, but only write to 1 of them. var $i = __stackBase__+8; See the Load-Store Consistency subsection HEAP[$1] = 0; below for why.) HEAP[$sum] = 0; HEAP[$i] = 0; • LLVM assembly functions become __label__ = 0; break; JavaScript functions, and function calls case 0: are normal JavaScript function calls. In var $3 = HEAP[$i]; general, we attempt to generate as ‘normal’ var $4 = $3 < 100; JavaScript as possible. Page 4
2.1 Load-Store Consistency (LSC) var x_value = 12345; var x_addr = stackAlloc(4); We saw before that Emscripten’s memory usage HEAP[x_addr] = (x_value >> 0) & 255; allocates the usual number of bytes on the stack HEAP[x_addr+1] = (x_value >> 8) & 255; for variables (4 bytes for a 32-bit integer, etc.). HEAP[x_addr+2] = (x_value >> 16) & 255; However, we only wrote values into the first lo- HEAP[x_addr+3] = (x_value >> 24) & 255; cation, which appeared odd. We will now see the [...] reason for that. printf("first byte: %d\n", HEAP[x_addr]); To get there, we must first step back, and note that Emscripten does not aim to achieve perfect Here we allocate space for the value of x on the compatibility with all possible LLVM assembly stack, and store that address in x addr. The (and correspondingly, with all possible C or C++ stack itself is part of the ‘memory space’, which code, etc.); instead, Emscripten targets a subset is the array HEAP. In order for the read on the of LLVM assembly code, which is portable and final line to give the proper value, we must go to does not make crucial assumptions about the un- the effort of doing 4 store operations, each of the derlying CPU architecture on which the code is value of a particular byte. In other words, HEAP meant to run. That subset is meant to encom- is an array of bytes, and for each store into mem- pass the vast majority of real-world code that ory, we must deconstruct the value into bytes.1 would be compiled into LLVM, while also being Alternatively, we can store the value in a single compilable into very performant JavaScript. operation, and deconstruct into bytes as we load. More specifically, Emscripten assumes that the This will be faster in some cases and slower in LLVM assembly code it is compiling has Load- others, but is still more overhead than we would Store Consistency (LSC), which is the require- like, generally speaking – for if the code does ment that loads from and stores to a specific have LSC, then we can translate that code frag- memory address will use the same type. Nor- ment into the far more optimal mal C and C++ code generally does so: If x is a variable containing a 32-bit floating point num- var x_value = 12345; ber, then both loads and stores of x will be of var x_addr = stackAlloc(4); 32-bit floating point values, and not 16-bit un- HEAP[x_addr] = x_value; signed integers or anything else. (Note that even [...] if we write something like printf("first byte: %d\n", HEAP[x_addr]); float x = 5 (Note that even this can be optimized even more then the compiler will assign a 32-bit float with – we can store x in a normal JavaScript variable. the value of 5 to x, and not an integer.) For now though we are just clarifying why it is To see why this is important for performance, useful to assume we are compiling code that has consider the following C code fragment, which LSC – doing so lets us generate shorter and more does not have LSC: natural JavaScript.) In practice the vast majority of C and C++ int x = 12345; code does have LSC. Exceptions do exist, how- [...] ever, for example: printf("first byte: %d\n", *((char*)&x)); 1 Note that we can use JavaScript typed arrays with Assuming an architecture with more than 8 bits, a shared memory buffer, which would work as expected, this code will read the first byte of x. (It might, assuming (1) we are running in a JavaScript engine which for example, be used to detect the endianness supports typed arrays, and (2) we are running on a CPU of the CPU.) To compile this into JavaScript in with the same architecture as we expect. This is there- fore dangerous as the generated code may run differently a way that will run properly, we must do more on different JavaScript engines and different CPUs. Em- than a single operation for either the read or the scripten currently has optional experimental support for write, for example we could do this: typed arrays. Page 5
• Code that detects CPU features like endian- generated code was fairly complicated and cum- ness, the behavior of floats, etc. In general bersome, and unsurprisingly it performs quite such code can be disabled before running it poorly. There are two main reasons for that: through Emscripten, as it is not needed. First, that the code is simply unoptimized – there are many variables declared when fewer • memset and related functions typically work could suffice, for example, and second, that the on values of one kind, regardless of the un- code does not use ‘normal’ JavaScript, which derlying values. For example, memset may JavaScript engines are optimized for – it stores write 64-bit values on a 64-bit CPU since all variables in an array (not normal JavaScript that is usually faster than writing individ- variables), and it controls the flow of execution ual bytes. This tends to not be a problem, using a switch-in-a-loop, not normal JavaScript as with memset the most common case is loops and ifs. setting to 0, and with memcpy, the values Emscripten’s approach to generating fast- end up copied properly anyhow. performing code is as follows. Emscripten doesn’t do any optimization that can otherwise • Even LSC-obeying C or C++ code may turn be done by additional tools: LLVM can be used into LLVM assembly that does not, after be- to perform optimizations before Emscripten, and ing optimized. For example, when storing the Closure Compiler can perform optimizations two 32-bit integers constants into adjoining on the generated JavaScript afterwards. Those locations in a structure, the optimizer may tools will perform standard useful optimizations generate a single 64-bit store of an appro- like removing unneeded variables, dead code, etc. priate constant. In other words, optimiza- That leaves two major optimizations that are left tion can generate nonportable code, which for Emscripten to perform: runs faster on the current CPU, but nowhere else. Emscripten currently assumes that op- • Variable nativization: Convert variables timizations of this form are not being used. that are on the stack into native JavaScript variables. In general, a variable will be na- In practice it may be hard to know if code has tivized unless it is used outside that func- LSC or not, and requiring a time-consuming code tion, e.g., if its address is taken and stored audit is obviously impractical. Emscripten there- somewhere or passed to another function. fore has automatic tools to detect violations of When optimizing, Emscripten tries to na- LSC, see SAFE HEAP in Subsection X.Y. tivize as many variables as possible. Note that it is somewhat wasteful to allocation 4 memory locations for a 32-bit integer, and use • ‘Relooping’: Recreate high-level loop and only one of them. It is possible to change that be- if structures from the low-level labels and havior with the QUANTUM SIZE parameter to branches that appear in LLVM assembly. Emscripten, however, the difficulty is that LLVM We detail the algorithm Emscripten uses for assembly has hardcoded values that depend on this purpose in Section X. the usual memory sizes being used. For example, memcpy can be called with the normal size each When run with Emscripten’s optimizations, the value would have, and not a single memory the above code looks like this: address each as Emscripten would prefer. We are looking into modifications to LLVM itself to function _main() { remedy that. var __label__; var $1; var $sum; 2.2 Performance var $i; When comparing the example program from be- $1 = 0; fore (that counts the integers from 1 to 100), the $sum = 0; Page 6
$i = 0; 2.3 Dealing with Real-World Code $2$2: while(1) { // $2 var $3 = $i; As mentioned above, in practice it may be hard var $4 = $3 < 100; to know if code has LSC or not, and other dif- if (!($4)) { __label__ = 2; break $2$2; }ficulties may arise as well. Accordingly, Em- var $6 = $i; scripten has a set of tools to help detect if var $7 = $sum; compiled code has certain problems. These are var $8 = $7 + $6; mainly compile-time options that lead to addi- $sum = $8; tional runtime checks. One then runs the code var $10 = $i; and sees the warnings or errors generated. The var $11 = $10 + 1; debugging tools include: $i = $11; __label__ = 0; continue $2$2; } • SAFE HEAP: This option adds runtime var $13 = $sum; checks for LSC. It will raise an exception if var $14 = _printf(__str, $13); LSC does not hold, as well as related issues return 0; like reading from memory before a value was } written (somewhat similarly to tools like Valgrind). If such a problem occurs, possible If in addition the Closure Compiler is run on that solutions are to ignore the issue (if it has no output, we get actual consqeuences), or altering the source code. In some cases, such nonportable code can be avoided in a simple manner by chang- function K() { ing the configutation under which the code var a, b; is built (see the examples in Section X). b = a = 0; a:for(;;) { if(!(b < 100)) { • CHECK OVERFLOWS: This option break a adds runtime checks for overflows in inte- } ger operations, which are an issue since all a += b; JavaScript numbers are doubles, hence sim- b += 1; ple translation of LLVM numerical opera- } tions (addition, multiplication, etc.) into _printf(J, a); JavaScript ones may lead to different results return 0; than expected. CHECK OVERFLOWS } will add runtime checks for actual overflows of this sort, that is, whether the result of an which is fairly close to the original C++ (the operation is a value too large for the type differences, of having the loop’s condition inside it defined as. Code that has such overflows the loop instead of inside the for() expression at can enable CORRECT OVERFLOWS, the top of the original loop, are not important which fixes overflows at runtime. The cost, to performance). Thus, it is possible to recre- however, is decreased speed due to the cor- ate the original high-level structure of the code rection operations (which perform a bitwise that was compiled into LLVM assembly, despite AND to force the value into the proper that structure not being explicitly available to range). Since such overflows are rare, it is Emscripten. possible to use this option to find where they We will see performance measurements in Sec- occur, and modify the original code in a suit- tion 4. able way (see the examples in Section). Page 7
3 Emscripten’s Architecture newlib C library. Some aspects of the runtime environment, as implemented in that C library, In the previous section we saw a general overview are: of Emscripten’s approach to compiling LLVM as- sembly into JavaScript. We will now get into • Files to be read must be ‘preloaded’ in more detail into how Emscripten implements JavaScript. They can then be accessed using that approach and its architecture. the usual C library methods (fopen, fread, Emscripten is written in JavaScript. A main etc.). Files that are written are cached, and reason for that decision was to simplify shar- can be read by JavaScript later. While lim- ing code between the compiler and the runtime, iting, this approach can often be sufficient and to enable various dynamic compilation tech- for many purposes. niques. Two simple examples: (1) The compiler can create JavaScript objects that represent con- • Emscripten allows writing pixel data to an stants in the assembly code, and convert them to HTML5 canvas element, using a subset of a string using JSON.stringify() in a convenient the SDL API. That is, one can write an manner, and (2) The compiler can simplify nu- application in C or C++ using SDL, and merical operations by simply eval()ing the code that same application can be compiled nor- (so “1+2” would become “3”, etc.). In both ex- mally and run locally, or compiled using amples, writing Emscripten is made simpler by Emscripten and run on the web. See the ray- having the exact same environment during com- tracing demo at http://syntensity.com/ pilation as the executing code will have. static/raytrace.html. Emscripten has three main phases: • sbrk is implemented using the HEAP array • The intertyper, which converts from which was mentioned previously. This al- LLVM assembly into Emscripten’s internal lows a normal malloc implementation writ- representation. ten in C to be compiled to JavaScript. • The analyzer, which inspects the internal 3.2 The Relooper: Recreating high- representation and generates various use- level loop structures ful information for the final phase, includ- ing type and variable information, stack us- The Relooper is among the most complicated age analysis, optional data for optimizations components in Emscripten. It receives a ‘soup (variable nativization and relooping), etc. of labels’, which is a set of labeled fragments of code – for brevity we call such fragments simply • The jsifier, which does the final conver- ‘labels’ – each ending with a branch operation sion of the internal representation plus ad- (either a simple branch, a conditional branch, ditional analyzed data into JavaScript. or a switch), and the goal is to generate normal high-level JavaScript code flow structures such 3.1 The Runtime Environment as loops and ifs. For example, the LLVM assembly on page X Code generated from Emscripten is meant to has the following label structure: run in a JavaScript engine, typically in a web browser. This has implications for the kind of /-----------\ runtime environment we can generate for it, for | | example, there is no direct access to the local V | filesystem. ENTRY --> 2 --> 5 --> 9 Emscripten comes with a partial implementa- | tion of a C library, mostly written from scratch V in JavaScript, which parts compiled from the 12 Page 8
In this simple example, it is fairly straightfor- – Handled blocks: A set of blocks ward to see that a natural way to implement it to which execution can enter. When using normal loop structures is we reach the multiple block, we check ENTRY which of them should execute, and go while (true) do there. When execution of that block 2 is complete, or if none of the handled if (condition) break blocks was selected for execution, we 5 proceed to the Next block, below. 9 – Next: A block of labels that will ap- 12 pear just outside this one, in other In general though, this is not always easy or even words, that will be reached after code possible – there may not be a reasonable high- flow exits the Handled blocks, above. level loop structure corresponding to the low- Remember that we have a label variable that level one, if for example the original C code re- helps control the flow of execution: Whenever we lied heavily on goto instructions. In practice, enter a block with more than one entry, we set however, almost all real-world C and C++ code label before we branch into it, and we check tends to be amenable to loop recreation. its value when we enter that block. So, for ex- Emscripten’s Relooper takes as input a ‘soup ample, when we create a Loop block, it’s Next of labels’ as described above, and generates a block can have multiple entries – any label to structured set of code ‘blocks’, which are each which we branch out from the loop. By creating a set of labels, with some logical structure, of a Multiple block after the loop, we can enter the one of the following types: proper label when the loop is exited. Having a • Simple block: A block with one internal label variable does add some overhead, but it label and a Next block, which the internal greatly simplifies the problem that the Relooper label branches to. The block is later trans- needs to solve. Of course, it is possible to opti- lated simply into the code for that label, and mize away writes and reads to label in many the Next block appears right after it. cases. Emscripten uses the following recursive algo- • Loop: A block that represents an infinite rithm for generating blocks from the soup of la- loop, comprised of two internal sub-blocks: bels: – Inner: A block of labels that will ap- pear inside the loop, i.e., when execu- • Receive a set of labels and which of them tion reaches the end of that block, flow are entry points. We wish to create a block will return to the beginning. Typically comprised of all those labels. a loop will contain a conditional break defining where it is exited. When we • Calculate, for each label, which other labels exit, we reach the Next block, below. it can reach, i.e., which labels we are able – Next: A block of labels that will ap- to reach if we start at the current label and pear just outside the loop, in other follow one of the possible paths of branching. words, that will be reached when the • If we have a single entry, and cannot return loop is exited. to it from any other label, then create a Sim- • Multiple: A block that represents an di- ple block, with the entry as its internal la- vergence into several possible branches, that bel, and the Next block comprised of all the eventually rejoin. A Multiple block can im- other labels. The entries for the Next block plement an ‘if’, an ‘if-else’, a ‘switch’, etc. are the entry to which the internal label can It is comprised of branch. Page 9
• If we can return to all of the entries, return a JavaScript, to get us to where we need to go – Loop block, whose Inner block is comprised they do not need any further consideration for of all labels that can reach one of the en- them to work properly). tries, and whose Next block is comprised of Emscripten also does an additional pass af- all the others. The entry labels for the cur- ter running the Relooper algorithm which has rent block become entry labels for the Inner been described. The Relooper is guaranteed to block (note that they must be in the Inner produce valid output (see below). The second block, as each one can reach itself). The pass takes that valid output and optimizes it, by Next block’s entry labels are all the labels making minor changes such as removing continue in the Next block that can be reached by commands that occur at the very end of loops the Inner block. (where they are not needed), etc. In other words, the first pass focuses on generating high-level • If we have more than one entry, try to create code flow structures that are correct, while the a Multiple block: For each entry, find all the second pass simplifies and optimizes that struc- labels it reaches that cannot be reached by ture. any other entry. If at least one entry has We now turn to an analysis of the Relooper such labels, return a Multiple block, whose algorithm. It is straightforward to see that the Handled blocks are blocks for those labels, output of the algorithm, assuming it completes and whose Next block is all the rest. Entry successfully – that is, that if finishes in finite labels for those two blocks become entries time, and does not run into an error in the last of the new block they are now part of. We part (where it is claimed that if we reach it we may have additional entry labels in the Next can return to at least one of the entry labels) – block, for each entry in the Next block that is correct in the sense of code execution being can be reached from the Handled ones. carried out as in the original data. We will now • We must be able to return to at least one prove that the algorithm must in fact complete of the entries (see proof below), so create a successfully. Loop block as described above. First, note that if we successfully create a block, then we simplify the remaining problem, Note that we first create a loop only if we must, where the ‘complexity’ of the problem for our then try to create a multiple, then create a loop purposes now is the sum of labels plus the some if we can. We could have slightly simplified this of branching operations: in various ways. The algorithm as presented above has given overall better results in prac- • This is trivial for Simple blocks (since we tice, in terms of the ‘niceness’ of the shape of the now have a Next block which is strictly generated code, both subjectively and in some smaller). benchmarks (however, the benchmarks are lim- ited, and we cannot be sure the result will remain • It is true for loop blocks simply by remov- true for all possible inputs to the Relooper). ing branching operations (there must be a Additional details of the algorithm include ‘fix- branching back to an entry, which becomes ing’ branch instructions accordingly. For exam- a continue). ple, when we create a Loop block, then all branch instructions outside of the loop are converted • For multiple blocks, if the Next block is into break commands, and all branch instruc- non-empty then we have split into strictly tions to the beginning of the loop are converted smaller blocks (in number of labels) than be- into continue commands. Those commands are fore. If the next block is empty, then since then ignored when called recursively to generate we built the Multiple block from a set of the Inner block (that is, the break and continue labels with more than one entry, then the commands are guaranteed, by the semantics of Handled blocks are strictly smaller than the Page 10
current one. The remaining case is when we The CORRECT OVERFLOWS option in have a single entry. Emscripten is necessary for proper operation of the generated code, as Python hash code relies The remaining issue is whether we can reach a on integer overflows to work normally. situation where we fail to create a block due to The demo can be seen live at http://www. an error, that is, that the claim in the last part syntensity.com/static/python.html. Poten- does not hold. For that to occur, we must not tial uses include ... etc. be able to return to any of the entries (or else we would create a Loop block). But since that is so, we can, at minimum, create a Multiple block 4.2 Bullet Physics Engine with entries for all the current entries, since the Mention other bullet-¿js manual port, via Java entry labels themselves cannot be reached, con- tradicting the assumption that we cannot create a block, and concluding the proof. 4.3 Lua We have not, of course, proven that the shape Mention other lua-¿js compilers of the blocks is optimal in any sense. How- ever, even if it is possible to optimize them, the Relooper already gives a very substantial 5 Summary speedup due to the move from the switch-in- a-loop construction to more natural JavaScript We presented Emscripten, an LLVM-to- code flow structures. TODO: A little data here. JavaScript compiler, which opens up numerous opportunities for running code written in languages other than JavaScript on the web, 4 Example Uses including some not previously possible. Em- scripten can be used to, among other things, Emscripten has been run successfully on several compile real-world C and C++ code and run real-world codebases, including the following: that on the web. In turn, by compiling the TODO: Performance data runtimes of languages implemented in C and C++, we can run them on the web as well. 4.1 CPython Examples were shown for Python and Lua. One of the main tasks for future work in Other ways to run Python on web: pyja- Emscripten is to broaden it’s standard library. mas/pyjs, old pypy-js backend, ?? also Iron- Emscripten currently includes portions of libc Python on Silverlight and Jython in Java. Lim- and other standard C libraries, implemented in itations etc. JavaScript. Portions of existing libc implemen- CPython is the standard implementation of tations written themselves in C can also be com- the Python programming language written in C. piled into JavaScript using Emscripten, but in Compiling it in Emscripten is straightforward, general the difficulty is in creating a suitable except for needing to change some #defines, runtime environment on the web. For example, without which CPython creates platform-specific there is no filesystem accessible, nor normal sys- assembly code. tem calls and so forth. Some of those features can Compilation using llvm-gcc generates approx- be implemented in JavaScript, in particular new imately 27.9MB of LLVM assembly. After HTML features like the File API should help. running Emscripten and the Closure Compiler, Another important task is to support multi- the generated JavaScript code is approximately threading. Emscripten currently assumes the 2.76MB in size. For comparison, a native binary code being compiled is single-threaded, since version of CPython is approxiately 2.28MB in JavaScript does not have support for multi- size, so the two differ by only 21%. threading (Web Workers allow multiprocessing, Page 11
but they do not have shared state, so implement- which would help on such code very significantly, ing threads with them is not trivial). However, it and are already in use or being worked on in would be possible to emulate multithreading in a mainstream JavaScript engines (e.g., SpiderMon- single thread. One approach could be to not gen- key). erate native JavaScript control flow structures, The limit of such an approach is to perform and instead to use a switch-in-a-loop for the en- static analysis on an entire program compiled tire program. Code can then be added in the by Emscripten, generating highly-optimized ma- loop to switch every so often between ‘threads’, chine code from that. As evidence of the poten- while maintaining their state and so forth. tial in such an approach, the PyPy project can A third important task is to improve the speed compile RPython – something very close to im- of generated code. An optimal situation would plicitly statically typed Python – into C, which be for code compiled by Emscripten to run at or can then be compiled and run at native speed. near the speed of native code. In that respect it We may in the future see JavaScript engines per- is worth comparing Emscripten to Portable Na- form such static compilation, when the code they tive Client (PNaCl), a project in development run is implicitly statically typed, which would which aims to allow an LLVM-like format to be allow Emscripten’s generated code to run at na- distributed and run securely on the web, with tive speeds as well. While not trivial, such an speed comparable to native code. approach is possible, and if accomplished, would Both Emscripten and PNaCl allow running mean that the combination of Emscripten and compiled native code on the web, Emscripten by suitable JavaScript engines will let people write compiling that code into JavaScript, and PNaCl code in their languages of choice, and run them by compiling it into an LLVM-like format, which at native speeds on the web. is then run in a special PNaCl runtime. The Finally, we conclude with another another av- major differences are that Emscripten’s gener- enue for optimization. Assume that we are com- ated code can run on all web browsers, since it piling a C or C++ runtime of a language into is standard JavaScript, while PNaCl’s generated JavaScript, and that that runtime uses JIT com- code requires the PNaCl runtime to be installed; pilation to generate machine code. Typically another major difference is that JavaScript en- code generators for JITs are written for the main gines do not yet run code at near-native speeds, CPU architectures, today x86, x86 64 and ARM. while PNaCl does. To summarize this compar- However, it would be possible for a JIT to gener- ison, Emscripten’s approach allows the code to ate JavaScript instead. Thus, the runtime would be run in more places, while PNaCl’s allows the be compiled using Emscripten, and it would pass code to run faster. the generated JavaScript to eval. In this sce- However, improvements in JavaScript engines nario, JavaScript is used as a low-level interme- may narrow the speed gap. In particular, for diate representation in the runtime, and the final purposes of Emscripten we do not need to care conversion to machine code is left to the underly- about all JavaScript, but only the kind gener- ing JavaScript engine. This approach can poten- ated by Emscripten. Such code is implicitly tially allow languages that greatly benefit from statically typed, that is, types are not mixed, a JIT (such as Java, Lua, etc.) to be run on the despite JavaScript in general allowing assigning, web efficiently. e.g., an integer to a variable and later a floating point value or even an object to that same vari- able. Implicitly statically typed code can be stat- ically analyzed and converted into machine code that has no runtime type checks at all. While such static analysis can be time-consuming, there are practical ways for achieving similar results quickly, such as tracing and local type inference, Page 12
You can also read