Comparison of Erlang Runtime System and Java Virtual Machine

Page created by Julio Mason

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Comparison of Erlang Runtime
System and Java Virtual Machine
Tõnis Pool
University of Tartu
tonis.pool@gmail.com
Supervised by Oleg Batrashev

Abstract

This report gives a high level overview of the Erlang Runtime System (ERTS) and the Java Virtual Machine
(JVM), comparing the two in terms of overall architecture, memory layout, parallelism/concurrency and
runtime optimisations. More specifically I’ll look at the HotSpot JVM provided by Oracle and the default
BEAM implementation open sourced by Ericsson.

1. Introduction why it was chosen for comparison.
The JVM works by executing bytecode state-
here are many different virtual machines

T
ments from a class file, generated by compiling
in existence nowadays packing a grow- the java source code. This indirection is what
ing number of clever tricks to make them gives programmers the ability to compile their
run as fast as possible. The JVM is perhaps the source code once and then execute it on differ-
most known virtual machine and one that’s ent platforms that have JVM implementations.
received couple of decades’ worth of improve-
ments and optimisations.
1.2. ERTS
Somewhat similarly to Java, Erlang is a
general-purpose concurrent, garbage-collected Unlike the JVM, Erlang doesn’t have a formal
programming language that nowadays runs on specification for it’s runtime nor for the pro-
a virtual machine and has been around even gramming language. This makes it difficult
longer than Java. The whole runtime system to- to have various implementations, as they all
gether with the language, interpreter, memory have to derive the semantics from the de facto
handling is called the Erlang Runtime System, BEAM implementation[1]. But similarly to Java
but the virtual machine is often referred to also ERTS works by executing an intermediate rep-
as the BEAM. resentation of erlang source code, also known
as BEAM code. As mentioned there’s no speci-
fication for the generated instructions, though
1.1. JVM
a few unofficial documents are available[2].
The term Java Virtual Machine specifies 3 differ- Quite unlike the Java programming lan-
ent notions - a specification of an abstract stack guage Erlang is a functional programming lan-
machine, a implementation of that specification guage based on the actor model, which dates
and a running process. The specification makes back to 1973[3]. The actor model specifies the
sure that different implementations are inter- actor as the unit of concurrency. In erlang ac-
changeable, but on other hand isn’t very strict tors are called processes. Actors can only com-
about areas such as memory layout, garbage municate with each other by sending messages,
collection algorithms and instruction optimisa- but are otherwise independent, which allows
tions, allowing for many different approaches. to run them in parallel with each other[4].
The HotSpot VM provided by Oracle is the Because of this distinction, that the Er-
most widely used implementation, which is lang language focuses specifically on the actor

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

model, comparing the JVM with ERTS can be                       2.       Memory Layout
seen as comparing apples and carrots, because
the ERTS is heavily geared and optimised for         2.1.   JVM
this specific view of the world, where the JVM
                                                     Figure 1 shows the general architecture of the
has no such prejudices.
                                                     HotSpot VM when looked upon as memory
                                                     areas[5].

                                Figure 1: Simplified JVM memory layout

    There’s a shared heap, where all objects         called Eden, two smaller Survivor and Tenured
and arrays live and then there’s a non heap          spaces. This optimisation stems from the ob-
area designated for metadata about the classes       servation that most objects die young, thus it
loaded and a Code Cache, which is used for           makes sense to put them into a separate area
compilation and storage of methods that have         and garbage collect that smaller area more of-
been compiled to native code by the JIT com-         ten than the whole heap[6].
piler.
                                                         The JVM has many different Garbage Col-
    Every thread in the JVM has a program
                                                     lection (GC) algorithms embedded, which cater
counter, which holds the address of the current
                                                     to different needs of the application. In gen-
instruction (if it’s not native), a stack, which
                                                     eral to collect garbage from the shared heap
holds frames for each executing method and a
                                                     a stop-the-world pause is needed, where all
native stack. Every thread in JVM is mapped
                                                     application threads are halted. Different GC
one-to-one to an operating system (OS) level
                                                     strategies optimise either for throughput - total
thread, which is then scheduled and managed
                                                     GC overhead should be low, but long pauses
by the OS.
                                                     are allowed - or latency - each stop-the-world
   Heap on the HotSpot VM consists of a              pause should be as short as possible. Based on
three distinct areas (also known as generations)     different heuristics (like what kind of machine

2

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

it’s running on) the JVM will choose a strategy, give it some latency goals for example[7].
but users can also enforce some strategy and

Figure 2: Simplified ERTS memory layout

2.2. ERTS element that is still referenced, starting with
the root set[8].
Figure 2 shows the general architecture of the The per process heap architecture has many
Erlang Runtime System when looked upon as benefits, one of which is that because each pro-
memory areas[8]. The major difference here cess has it’s own heap and there are presum-
is that instead of having a single shared heap, ably numerous process’s, each heaps’ root set
each process in Erlang has it’s own. The same can be very small and can be garbage collected
memory area is also used for the stack of the quickly in isolation, without affecting other
process, where the two are growing towards running processes. Meaning there aren’t any
each other. Such a setup makes it very cheap to global stop-the-world pauses.
check wether a process is running out of heap When a process dies all it’s memory can be
or stack space. deallocated right away, which means processes
Besides a heap and a stack each process that are short lived or don’t generate much
has a process control block, which is used to garbage, may never go through a single GC
hold various metadata about the process and cycle. The above (+ true preemptive schedul-
a message queue. Binary data that is larger ing, that is discussed later) gives Erlang the
than 64 bytes is kept in a separate area, so that marketed soft real-time capabilities[9], where
different processes could simply refer to it by there aren’t global pauses in the application,
a pointer. which often happen on the JVM.
Similarly to the JVM ERTS uses a genera-
tional garbage collection, meaning the heap is 3. Parallelism and Concurrency
divided into a young and old generation. The
GC strategy used is a copying collector (except Now for the tricky bit. Erlang is marketed
for the binary area, which is garbage collected as a tool for building massively scalable soft
by reference counting), meaning it will allocate real-time systems with requirements on high
another memory area and then copy over each availability[9]. It’s often used in 99.999% up-

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

time guaranteed systems[10]. Java on the other As mentioned in the introduction, Erlang
hand is a more general-purpose programming uses the actor model, where the only means
language, used in fields such as high frequency to share state between concurrent and/or par-
trading to embedded devices. allel actors is message passing. Furthermore,
because each process has its own heap, there
3.1. Shared state isn’t any shared state by design (except for the
binary and a few other Erlang specific memory
In 1971 Edsger Dijkstra posed and solved the areas). Removing shared state and relying only
now famous the Dining Philosophers Problem on message passing has the potential to make
about concurrency. His solution involved a concurrency easy again[13].
construct commonly known as a semaphore,
which restrict access to a shared resource[11]. "Erlang is to locks what Java is to
Dijkstra’s solution is often referred to as shared pointers[11]"
state concurrency, which is the programming
model Java uses. Arguably message passing concurrency
The JVM has a single heap and sharing state model is on a higher abstraction level than
between concurrent parties is done via locks using locks. Though because it’s a higher ab-
that guarantee mutual exclusion to a specific straction it also removes some of the hardships
memory area. This means that the burden of of using locks and is easier to use, because we
guaranteeing correct access to shared variables don’t have to guarantee the correctness of the
lies on the programmer, who must guard the program in terms of concurrency anymore, the
critical sections by locks. However this is noto- runtime does that for us.
riously difficult to do, leading to widely spread Similarly the message passing style of con-
belief that concurrency is hard[12]. currency can be successfully modeled with
There are numerous reasons why shared locks on the JVM. The Akka framework is es-
state between threads brings troubles: sentially doing just this and brings the actor
model to the JVM[14]. Using the Akka frame-
1. Deadlocks - occurs when two threads work on the JVM makes the experience very
both need resources A and B. First thread similar to Erlang with the added benefit of
locks A first and then tries to get B, ability to use the much larger Java ecosystem.
whereas the second thread operates in
the opposite order. This results in a situa-
3.2. Message Passing
tion where both threads have locked one
resource and wait for the other thread to Because of the heap per process architecture
release the other resource. in ERTS, message passing is done by coping
the message from one process heap to another.
2. Race conditions - occurs when the pro-
This can be somewhat surprising as it would
grammer hasn’t correctly guaranteed the
be much cheaper to simply pass a pointer to
order of execution between two threads.
the message from one process to another, but
Which means that the correctness of the
it would require some shared memory area for
program depends on the OS scheduler,
the processes.
which is not deterministic, resulting in
ERTS has actually experimented with both
hard to reproduce errors.
a shared and a hybrid heap architectures in the
3. Spending too much time in the critical past[10], but after multithreading support was
section - occurs when a thread after ac- added to the BEAM the implementations were
quiring a shared resources holds it for too broken and later removed from ERTS[15]. A
long, effectively making all other inter- hybrid heap architecture is one where each pro-
ested parties wait and do nothing during cess still has a separate heap, but messages that
that time. are to be sent to other processes are allocated

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

in a shared memory area to allow fast message The effect is that Erlang is one of a few
passing. Experimentation showed that such languages that actually does preemptive mul-
an approach has promise, but as stated, the titasking. The reduction count of 2000, which
implementation was left unfinished[15]. is sub 1ms, is quite low and forces many small
The original reason for going with the copy- context switches between the Erlang processes.
ing strategy was that the destination process But this also ensures that an Erlang system
might be on another machine. If we are pass- tends to degrade in a graceful manner when
ing messages as pointers among processes on loaded with more work[21].
a single machine, but doing something else for
processes on different machines, means that Schedulers will also balance work between
the error handling code will be more difficult each other. The strategy can be configured to ei-
and has to be handled separately. The underly- ther balancing load out as evenly as possible or
ing reason was to make the semantics of local using the least amount of schedulers (putting
and remote failures the same[16]. the other ones to sleep, conserving power for
Furthermore, the "free lunch" for perfor- example)[20].
mance is long over, meaning processors aren’t
getting faster, instead multicore processors are The important note here is that blocking a
standard[17]. Providing a coherent picture of process doesn’t block a scheduler, it will sim-
memory to all cores is increasingly expensive ply immediately start running the next process.
and one can think of your cpu as a distributed As Erlang processes are not OS level threads
system[18]. Whether it’s hidden from the pro- they are more lightweight, which is the main
grammer or not, the cpu will have to do some reason ERTS can run hundreds of thousands of
copying when a cache miss occurs for example. processes. Erlang processes use a dynamically
When data sharing times between cores con- allocated stack, that starts off much smaller
tinue to rise it makes sense to do the expensive than Java threads. By default an Erlang process
operation of copying immediately[16]. will use around 2.5KB[22] on a 64bit machine,
wheres as a Java thread starts off with a 1024KB
stack on a 64bit machine.
3.3. Lightweight threads
As mentioned before, the JVM maps its threads There is a library for Java called Quasar,
one-to-one to OS level threads, which are then that tries to bring the same lightweight threads
scheduled by the OS scheduler, but in Erlang to the JVM as are Erlang processes. In order
a process is simply a separate memory area to really pull it off, Quasar needs to instru-
and are mapped n-to-m to OS level threads. ment your code to save the execution state and
Since 2006 ERTS supports true multithreading restore it when that lightweight thread starts
through Symmetric Multi Processing (SMP). running again[23]. Though instrumentation
With SMP ERTS starts with the same number has many challenges that add complexity to
of schedulers as there are cores[19]. the program instead of removing it.
Each scheduler has a run queue that con-
tains runnable processes. A process runs until The same applies for the JVM actor frame-
it tries to receive a message, but the mailbox work Akka mentioned earlier. They can work
was empty, or it runs out of reductions. The very well, if the programmers take care to fol-
meaning of reductions in ERTS is not clearly low some principles. As Akka actors run on
defined, but they should represent "units of OS Threads an actor that randomly blocks will
work" and are roughly equivalent to function block the entire thread. Without fundamental
calls. Each process that starts running initially changes in how the JVM works, one cannot
has 2000 reductions, and when it rans out is guarantee that an arbitrary piece of code will
put at the end of the run queue[20]. not block[24].

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

4. Runtime Optimisations JVM, because it can keep track of the shortcuts
it has made and if any of them become invalid,
4.1. JIT compiler for example a method does throw an excep-
tion, then the JIT compiler will deoptimize the
When looking at the JVM and comparing it
method and compile it again without the op-
with other Virtual Machines then we cannot
timisation or run it as interpreted. The HiPE
forget to mention the just-in-tim (JIT) com-
compiler however cannot make such shortcuts,
piler that HotSpot has, which has the largest
because it hasn’t got the ability to change gears
effect on performance of the JVM[25]. As men-
during runtime, the once compiled code has to
tioned in the introduction, both Erlang and
be correct and work in all circumstances.
Java source code are not directly compiled to
However a true just-in-time compiler for
executable binaries or native code. Instead they
ERTS is in development and called BEAMJIT.
rely on the runtime to execute the statements in
Similarly to the HotSpot JIT compiler it uses
the intermediate language ( bytecode for java
tracing to decide which methods should be
and so called BEAM code for Erlang).
compiled to native code and which not. It has
Purely interpreting commands sequentially
shown increased performance in some bench-
is slower simply because of having to translate
marks, but the current implementation doesn’t
commands over and over again into the ma-
do any Erlang language specific optimisations,
chines native instructions, not to mention vari-
which leaves out the best part of having a true
ous optimisations that good compilers do[25].
just-in-time compiler[27].
JIT compiler in the HotSpot works by keeping
track of what methods are "hot" (called often)
and then optimising and compiling them to 5. Summary
native code. This has the benefit that effort is
not spent to compile/optimise methods that As we’ve seen the JVM and ERTS are quite
don’t execute or do so rarely. different underneath. Namely the difference
stems from the fact that Erlang takes the Actor
4.2. HIPE and BEAMJIT model as it’s basis and has built it’s runtime
around that. Whether that’s good or bad is
ERTS comes with a ahead-of-time compiler up to debate and most likely depends on the
called HiPE (High Performance Erlang). It is problem at hand. Java and the JVM provide
sometimes called also a just-in-time compiler, enough tools to retrofit any concurrency model
but in reality it’s more of a ahead-of-time com- out there, but retrofitting anything won’t be the
piler, because the user has to choose which same as taking it into the initial design. That
functions or modules are compiled into native said there’s a wonderful quote from the one of
code. It doesn’t do it automatically during the creators of Erlang:
runtime for most used functions[26].
Compiling ahead of time also strips away "Erlang accidentally has the right
many possibilities for optimisation that the properties to exploit multi-core ar-
HotSpot JIT compiler does. Essentially the chitectures - not by design but by
HotSpot JIT compiler is able to gamble the accident" - Joe Armstrong[16]
system into better performance by doing short-
cuts. For example the HotSpot JIT compiler
assumes that a method never throws an excep- References
tion or that most methods are not overloaded
and links directly to a specific callsite instead [1] E. Stenman, “Vm tuning, know your
of traversing the class hierarchy each time to engine - part ii: the beam.” https://www.
find the most specific method[25]. youtube.com/watch?v=RcyN2yS5PRU,
These optimisations are possible on the July 2013.

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

 [2] “Beam file format.” https://synrc.              [13] J.    Armstrong,   “Concurrency   is
     com/publications/cat/Functional%                     easy.”   http://armstrongonsoftware.
     20Languages/Erlang/BEAM.pdf,   May                   blogspot.com/2006/08/
     2012.                                                concurrency-is-easy.html,    August
                                                          2006.
 [3] C. Hewitt, P. Bishop, and R. Steiger, “A
     universal modular actor formalism for ar-       [14] “http://akka.io/.” http://akka.io/.
     tificial intelligence,” in Proceedings of the
                                                     [15] R. Carlsson, “[erlang-questions] why are
     3rd International Joint Conference on Artifi-
                                                          messages between processes copied?.”
     cial Intelligence, IJCAI’73, (San Francisco,
                                                          http://erlang.org/pipermail/
     CA, USA), pp. 235–245, Morgan Kauf-
                                                          erlang-questions/2012-February/
     mann Publishers Inc., 1973.
                                                          064613.html, February 2012.
 [4] F. Hebert, Learn You Some Erlang for Great      [16] J. Armstrong, “[erlang-questions] why
     Good!: A Beginner’s Guide. No Starch Press           are messages between processes copied?.”
     Series, No Starch Press, 2013.                       http://erlang.org/pipermail/
                                                          erlang-questions/2012-February/
 [5] J. Bloom, “JVM Internals.” http://blog.
                                                          064617.html, February 2012.
     jamesdbloom.com/JVMInternals.html,
     November 2013.                                  [17] H. Sutter, “The free lunch is over: A
                                                          fundamental turn toward concurrency
 [6] B. Goetz, “Java theory and practice:
                                                          in software.” http://www.gotw.ca/
     Garbage collection in the HotSpot JVM.”
                                                          publications/concurrency-ddj.htm,
     http://www.ibm.com/developerworks/
                                                          March 2005.
     library/j-jtp11253/, November 2003.
                                                     [18] A. Baumann, S. Peter, A. Schüpbach,
 [7] “Memory Management in the Java                       A. Singhania, T. Roscoe, P. Barham, and
     HotSpotTM Virtual Machine,” tech. rep.,              R. Isaacs, “Your computer is already
     Sun Microsystems, 04 2006.                           a distributed system. why isn’t your
                                                          os?.” https://www.usenix.org/legacy/
 [8] E. Stenman, “Erlang engine tuning: Part
                                                          event/hotos09/tech/full_papers/
     1 - know your engine.” http://www.
                                                          baumann/baumann_html/index.html,
     youtube.com/watch?v=QbzH0L_0pxI,
                                                          March 2009.
     April 2013.
                                                     [19] K. Lundin, “Inside the erlang vm.”
 [9] “http://www.erlang.org/.” http://www.
                                                          http://www.erlang.org/euc/08/euc_
     erlang.org/.
                                                          smp.pdf, November 2008.
[10] E. Johansson, K. Sagonas, and J. Wilhelms-      [20] L. Larsson, “Understanding the er-
     son, “Heap architectures for concurrent              lang scheduler.” https://vimeo.com/
     languages using message passing,” in Pro-            86106434, February 2014.
     ceedings of the 3rd International Symposium
     on Memory Management, ISMM ’02, (New            [21] J.   L.    Andersen,     “How    er-
     York, NY, USA), pp. 88–99, ACM, 2002.                lang    does  scheduling.”  http://
                                                          jlouisramblings.blogspot.com/2013/
[11] S. Akhmechet, “Erlang style concurrency.”            01/how-erlang-does-scheduling.html,
     http://www.defmacro.org/ramblings/                   January 2013.
     concurrency.html, August 2006.
                                                     [22] “Efficiency guide 8.1 creation of an erlang
[12] D. V. Subramaniam, “What makes concur-               process.” http://www.erlang.org/doc/
     rency hard,” Healthy Code, July 2014.                efficiency_guide/processes.html.

                                                                                                   7

Comparison of Erlang Runtime System and Java Virtual Machine • May 2015

[23] R. Pressler, “Jvmls 2014: Lightweight       [26] K. F. Sagonas, M. Pettersson, R. Carlsson,
     threads.” http://medianetwork.oracle.            P. Gustafsson, and T. Lindahl, “All you
     com/video/player/3731008736001, Au-              wanted to know about the hipe compiler:
     gust 2014.                                       (but might have been afraid to ask).,” in
                                                      Erlang Workshop (B. Däcker and T. Arts,
[24] C. Moon, “Actors, green threads and csp          eds.), pp. 36–42, ACM, 2003.
     on the jvm – no, you can’t have a pony.”
     http://www.boundary.com/blog/2014/          [27] F. Drejhammar and L. Rasmusson,
     09/no-you-cant-have-a-pony/, March               “Beamjit: A just-in-time compiling runtime
     2014.                                            for erlang,” in Proceedings of the Thirteenth
                                                      ACM SIGPLAN Workshop on Erlang, Er-
[25] S. Oaks, Java Performance: The Definitive        lang ’14, (New York, NY, USA), pp. 61–72,
     Guide. O’Reilly Media, 2014.                     ACM, 2014.

8

You can also read