mirror of
https://github.com/wren-lang/wren.git
synced 2026-01-11 22:28:45 +01:00
280 lines
14 KiB
HTML
280 lines
14 KiB
HTML
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
|
||
<title>Performance – Wren</title>
|
||
<link rel="stylesheet" type="text/css" href="style.css" />
|
||
<link href='//fonts.googleapis.com/css?family=Source+Sans+Pro:400,700,400italic,700italic|Source+Code+Pro:400|Lato:400|Sanchez:400italic,400' rel='stylesheet' type='text/css'>
|
||
<!-- Tell mobile browsers we're optimized for them and they don't need to crop
|
||
the viewport. -->
|
||
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"/>
|
||
</head>
|
||
<body id="top">
|
||
<header>
|
||
<div class="page">
|
||
<div class="main-column">
|
||
<h1><a href="index.html">wren</a></h1>
|
||
<h2>a classy little scripting language</h2>
|
||
</div>
|
||
</div>
|
||
</header>
|
||
<div class="page">
|
||
<nav>
|
||
<ul>
|
||
<li><a href="getting-started.html">Getting Started</a></li>
|
||
</ul>
|
||
<section>
|
||
<h2>language</h2>
|
||
<ul>
|
||
<li><a href="syntax.html">Syntax</a></li>
|
||
<li><a href="expressions.html">Expressions</a></li>
|
||
<li><a href="variables.html">Variables</a></li>
|
||
<li><a href="control-flow.html">Control Flow</a></li>
|
||
<li><a href="error-handling.html">Error Handling</a></li>
|
||
</ul>
|
||
</section>
|
||
<section>
|
||
<h2>types</h2>
|
||
<ul>
|
||
<li><a href="values.html">Values</a></li>
|
||
<li><a href="classes.html">Classes</a></li>
|
||
<li><a href="fibers.html">Fibers</a></li>
|
||
<li><a href="functions.html">Functions</a></li>
|
||
<li><a href="lists.html">Lists</a></li>
|
||
<li><a href="maps.html">Maps</a></li>
|
||
</ul>
|
||
</section>
|
||
<section>
|
||
<h2>reference</h2>
|
||
<ul>
|
||
<li><a href="core-library.html">Core Library</a></li>
|
||
<li><a href="embedding-api.html">Embedding API</a></li>
|
||
<li><a href="performance.html">Performance</a></li>
|
||
<li><a href="contributing.html">Contributing</a></li>
|
||
<li><a href="qa.html">Q & A</a></li>
|
||
</ul>
|
||
</section>
|
||
</nav>
|
||
<main>
|
||
<h1>Performance</h1>
|
||
<p>Even though most benchmarks aren't worth the pixels they're printed on, people
|
||
seem to like them, so here's a few:</p>
|
||
<h3>Method Call</h3>
|
||
|
||
<table class="chart">
|
||
<tr>
|
||
<th>wren</th><td><div class="chart-bar wren" style="width: 99%;">5100 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 87%;">4441 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>ruby</th><td><div class="chart-bar" style="width: 56%;">2868 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>lua</th><td><div class="chart-bar" style="width: 34%;">1742 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python3</th><td><div class="chart-bar" style="width: 17%;">884 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python</th><td><div class="chart-bar" style="width: 15%;">779 </div></td>
|
||
</tr>
|
||
</table>
|
||
|
||
<h3>DeltaBlue</h3>
|
||
|
||
<table class="chart">
|
||
<tr>
|
||
<th>wren</th><td><div class="chart-bar wren" style="width: 99%;">7006 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python3</th><td><div class="chart-bar" style="width: 33%;">2333 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python</th><td><div class="chart-bar" style="width: 30%;">2141 </div></td>
|
||
</tr>
|
||
</table>
|
||
|
||
<h3>Binary Trees</h3>
|
||
|
||
<table class="chart">
|
||
<tr>
|
||
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 99%;">6165 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>wren</th><td><div class="chart-bar wren" style="width: 54%;">3338 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>ruby</th><td><div class="chart-bar" style="width: 43%;">2685 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python3</th><td><div class="chart-bar" style="width: 31%;">1952 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>lua</th><td><div class="chart-bar" style="width: 22%;">1409 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python</th><td><div class="chart-bar" style="width: 21%;">1340 </div></td>
|
||
</tr>
|
||
</table>
|
||
|
||
<h3>Recursive Fibonacci</h3>
|
||
|
||
<table class="chart">
|
||
<tr>
|
||
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 99%;">7061 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>ruby</th><td><div class="chart-bar" style="width: 43%;">3100 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>lua</th><td><div class="chart-bar" style="width: 40%;">2860 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>wren</th><td><div class="chart-bar wren" style="width: 34%;">2410 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python</th><td><div class="chart-bar" style="width: 17%;">1253 </div></td>
|
||
</tr>
|
||
<tr>
|
||
<th>python3</th><td><div class="chart-bar" style="width: 17%;">1252 </div></td>
|
||
</tr>
|
||
</table>
|
||
|
||
<p><strong>Longer bars are better.</strong> The score is the inverse of the running time, so if
|
||
one language's score is twice another's, that means the language is twice as
|
||
fast. Each benchmark is run ten times and the best time is kept. It only
|
||
measures the time taken to execute the benchmarked code itself, not interpreter
|
||
startup.</p>
|
||
<p>These were run on my MacBook Pro 2.3 GHz Intel Core i7 with 16 GB of 1,600 MHz
|
||
DDR3 RAM. Tested against Lua 5.2.3, LuaJIT 2.0.2, Python 2.7.5, Python 3.3.4,
|
||
ruby 2.0.0p247. LuaJIT is run with the JIT <em>disabled</em> (i.e. in bytecode
|
||
interpreter mode) since I want to support platforms where JIT-compilation is
|
||
disallowed. LuaJIT with the JIT enabled is <em>much</em> faster than all of the other
|
||
languages benchmarked, including Wren, because Mike Pall is a robot from the
|
||
future.</p>
|
||
<p>The benchmark harness and programs are
|
||
<a href="https://github.com/munificent/wren/tree/master/benchmark">here</a>.</p>
|
||
<h2>Why is Wren fast? <a href="#why-is-wren-fast" name="why-is-wren-fast" class="header-anchor">#</a></h2>
|
||
<p>Languages come in four rough performance buckets, from slowest to fastest:</p>
|
||
<ol>
|
||
<li>
|
||
<p>Tree-walk interpreters: Ruby 1.8.7 and earlier, Io, that
|
||
interpreter you wrote for a class in college.</p>
|
||
</li>
|
||
<li>
|
||
<p>Bytecode interpreters: CPython,
|
||
Ruby 1.9 and later, Lua, early JavaScript VMs.</p>
|
||
</li>
|
||
<li>
|
||
<p>JIT compiled dynamically typed languages: Modern JavaScript VMs,
|
||
LuaJIT, PyPy, some Lisp/Scheme implementations.</p>
|
||
</li>
|
||
<li>
|
||
<p>Statically typed languages: C, C++, Java, C#, Haskell, etc.</p>
|
||
</li>
|
||
</ol>
|
||
<p>Most languages in the first bucket aren't suitable for production use. (Servers
|
||
are one exception, because you can throw more hardware at a slow language
|
||
there.) Languages in the second bucket are fast enough for many use cases, even
|
||
on client hardware, as the success of the listed languages shows. Languages in
|
||
the third bucket are quite fast, but their implementations are breathtakingly
|
||
complex, often rivaling that of compilers for statically-typed languages.</p>
|
||
<p>Wren is in the second bucket. If you want a simple implementation that's fast
|
||
enough for real use, this is the sweet spot. In addition, Wren has a few tricks
|
||
up its sleeve:</p>
|
||
<h3>A compact value representation <a href="#a-compact-value-representation" name="a-compact-value-representation" class="header-anchor">#</a></h3>
|
||
<p>A core piece of a dynamic language implementation is the data structure used
|
||
for variables. It needs to be able to store (or reference) a value of any type,
|
||
while also being as compact as possible. Wren uses a technique called <em><a href="http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations">NaN
|
||
tagging</a></em> for this.</p>
|
||
<p>All values are stored internally in Wren as small, eight-byte double-precision
|
||
floats. Since that is also Wren's number type, in order to do arithmetic, no
|
||
conversion is needed before the "raw" number can be accessed: a value holding a
|
||
number <em>is</em> a valid double. This keeps arithmetic fast.</p>
|
||
<p>To store values of other types, it turns out there's a ton of unused bits in a
|
||
NaN double. You can stuff a pointer for heap-allocated objects, with room left
|
||
over for special values like <code>true</code>, <code>false</code>, and <code>null</code>. This means numbers,
|
||
bools, and null are unboxed. It also means an entire value is only eight bytes,
|
||
the native word size on 64-bit machines. Smaller = faster when you take into
|
||
account CPU caching and the cost of passing values around.</p>
|
||
<h3>Fixed object layout <a href="#fixed-object-layout" name="fixed-object-layout" class="header-anchor">#</a></h3>
|
||
<p>Most dynamic languages treat objects as loose bags of named properties. You can
|
||
freely add and remove properties from an object after you've created it.
|
||
Languages like Lua and JavaScript don't even have a well-defined concept of a
|
||
"type" of object.</p>
|
||
<p>Wren is strictly class-based. Every object is an instance of a class. Classes
|
||
in turn have a well-defined declarative syntax, and cannot be imperatively
|
||
modified. In addition, fields in Wren are private to the class—they can
|
||
only be accessed from methods defined directly on that class.</p>
|
||
<p>Put all of that together and it means you can determine at <em>compile</em> time
|
||
exactly how many fields an object has and what they are. In other languages,
|
||
when you create an object, you allocate some initial memory for it, but that
|
||
may have to be reallocated multiple times as fields are added and the object
|
||
grows. Wren just does a single allocation up front for exactly the right number
|
||
of fields.</p>
|
||
<p>Likewise, when you access a field in other languages, the interpreter has to
|
||
look it up by name in a hash table in the object, and then maybe walk its
|
||
inheritance chain if it can't find it. It must do this every time since fields
|
||
may be added freely. In Wren, field access is just accessing a slot in the
|
||
instance by an offset known at compile time: it's just adding a few pointers.</p>
|
||
<h3>Copy-down inheritance <a href="#copy-down-inheritance" name="copy-down-inheritance" class="header-anchor">#</a></h3>
|
||
<p>When you call a method on an object, the method must be located. It could be
|
||
defined directly on the object's class, or it may be inheriting it from some
|
||
superclass. This means that in the worst case, you may have to walk the
|
||
inheritance chain to find it.</p>
|
||
<p>Advanced implementations do very smart things to optimize this, but it's made
|
||
more difficult by the mutable nature of the underlying language: if you can add
|
||
new methods to existing classes freely or change the inheritance hierarchy, the
|
||
lookup for a given method may actually change over time. You have to check for
|
||
that which costs CPU cycles.</p>
|
||
<p>Wren's inheritance hierarchy is static and fixed at class definition time. This
|
||
means that we can copy down all inherited methods in the subclass when it's
|
||
created since we know those will never change. Method dispatch then just
|
||
requires locating the method in the class of the receiver.</p>
|
||
<h3>Computed gotos <a href="#computed-gotos" name="computed-gotos" class="header-anchor">#</a></h3>
|
||
<p>On compilers that support it, Wren's core bytecode interpreter loop uses
|
||
something called <a href="http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/"><em>computed gotos</em></a>. The hot core of a bytecode
|
||
interpreter is effectively a giant <code>switch</code> on the instruction being executed.</p>
|
||
<p>Doing that using an actual <code>switch</code> confounds the CPU's <a href="http://en.wikipedia.org/wiki/Branch_predictor">branch
|
||
predictor</a>—there is basically a single branch point for the entire
|
||
interpreter. That quickly saturates the predictor and it just gets confused and
|
||
fails to predict anything, which leads to more CPU stalls and pipeline flushes.</p>
|
||
<p>Using computed gotos gives you a separate branch point at the end of each
|
||
instruction. Each gets its own branch prediction, which often succeeds since
|
||
some instruction pairs are more common than others. In my rough testing, this
|
||
makes a 5-10% performance difference.</p>
|
||
<h3>A single-pass compiler <a href="#a-single-pass-compiler" name="a-single-pass-compiler" class="header-anchor">#</a></h3>
|
||
<p>Compile time is a relatively small component of a language's performance: code
|
||
only has to be compiled once but a given line of code may be run many times.
|
||
However, fast compilation helps with <em>startup</em> speed—the time it takes to
|
||
get anything up and running. For that, Wren's compiler is quite fast.</p>
|
||
<p>It's modeled after Lua's compiler. Instead of tokenizing and then parsing to
|
||
create a bunch of AST structures which are then consumed and deallocated by
|
||
later phases, it emits code directly during parsing. This means it does minimal
|
||
memory allocation during a parse and has very little overhead.</p>
|
||
<h2>Why don't other languages do this? <a href="#why-don't-other-languages-do-this" name="why-don't-other-languages-do-this" class="header-anchor">#</a></h2>
|
||
<p>Most of Wren's performance comes from language design decisions. While it's
|
||
dynamically <em>typed</em> and <em>dispatched</em>, classes are relatively statically
|
||
<em>defined</em>. That makes a lot of things much easier. Other languages have a much
|
||
more mutable object model, and cannot change that without breaking lots of
|
||
existing code.</p>
|
||
<p>Wren's closest sibling, by far, is Lua. Lua is more dynamic than Wren which
|
||
makes its job harder. Lua also tries very hard to be compatible across a wide
|
||
range of hardware and compilers. If you have a C89 compiler for it, odds are
|
||
very good that you can run Lua on it.</p>
|
||
<p>Wren cares about compatibility, but it requires C99 and IEEE double precision
|
||
floats. That may exclude some edge case hardware, but makes things like NaN
|
||
tagging, computed gotos, and some other tricks possible.</p>
|
||
</main>
|
||
</div>
|
||
<footer>
|
||
<div class="page">
|
||
<div class="main-column">
|
||
<p>Wren lives <a href="https://github.com/munificent/wren">on GitHub</a> — Made with ❤ by <a href="http://journal.stuffwithstuff.com/">Bob Nystrom</a>.</p>
|
||
<div class="main-column">
|
||
</div>
|
||
</footer>
|
||
</body>
|
||
</html> |