mirror of
https://github.com/wren-lang/wren.git
synced 2026-01-11 22:28:45 +01:00
346 lines
16 KiB
HTML
346 lines
16 KiB
HTML
<!DOCTYPE html>
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
|
|
<title>Performance – Wren</title>
|
|
<script type="application/javascript" src="prism.js" data-manual></script>
|
|
<script type="application/javascript" src="codejar.js"></script>
|
|
<script type="application/javascript" src="wren.js"></script>
|
|
<link rel="stylesheet" type="text/css" href="prism.css" />
|
|
<link rel="stylesheet" type="text/css" href="style.css" />
|
|
<link href='//fonts.googleapis.com/css?family=Source+Sans+Pro:400,700,400italic,700italic|Source+Code+Pro:400|Lato:400|Sanchez:400italic,400' rel='stylesheet' type='text/css'>
|
|
<!-- Tell mobile browsers we're optimized for them and they don't need to crop
|
|
the viewport. -->
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"/>
|
|
</head>
|
|
<body id="top">
|
|
<header>
|
|
<div class="page">
|
|
<div class="main-column">
|
|
<h1><a href="./">wren</a></h1>
|
|
<h2>a classy little scripting language</h2>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div class="page">
|
|
<nav class="big">
|
|
<a href="./"><img src="./wren.svg" class="logo"></a>
|
|
<ul>
|
|
<li><a href="getting-started.html">Getting Started</a></li>
|
|
<li><a href="contributing.html">Contributing</a></li>
|
|
<li><a href="blog">Blog</a></li>
|
|
<li><a href="try">Try it!</a></li>
|
|
</ul>
|
|
<section>
|
|
<h2>guides</h2>
|
|
<ul>
|
|
<li><a href="syntax.html">Syntax</a></li>
|
|
<li><a href="values.html">Values</a></li>
|
|
<li><a href="lists.html">Lists</a></li>
|
|
<li><a href="maps.html">Maps</a></li>
|
|
<li><a href="method-calls.html">Method Calls</a></li>
|
|
<li><a href="control-flow.html">Control Flow</a></li>
|
|
<li><a href="variables.html">Variables</a></li>
|
|
<li><a href="functions.html">Functions</a></li>
|
|
<li><a href="classes.html">Classes</a></li>
|
|
<li><a href="concurrency.html">Concurrency</a></li>
|
|
<li><a href="error-handling.html">Error Handling</a></li>
|
|
<li><a href="modularity.html">Modularity</a></li>
|
|
</ul>
|
|
</section>
|
|
<section>
|
|
<h2>API docs</h2>
|
|
<ul>
|
|
<li><a href="modules">Modules</a></li>
|
|
</ul>
|
|
</section>
|
|
<section>
|
|
<h2>reference</h2>
|
|
<ul>
|
|
<li><a href="cli">Wren CLI</a></li>
|
|
<li><a href="embedding">Embedding</a></li>
|
|
<li><a href="performance.html">Performance</a></li>
|
|
<li><a href="qa.html">Q & A</a></li>
|
|
</ul>
|
|
</section>
|
|
</nav>
|
|
<nav class="small">
|
|
<table>
|
|
<tr>
|
|
<div><a href="getting-started.html">Getting Started</a></div>
|
|
<div><a href="contributing.html">Contributing</a></div>
|
|
<div><a href="blog">Blog</a></div>
|
|
<div><a href="try">Try it!</a></div>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2"><h2>guides</h2></td>
|
|
<td><h2>reference</h2></td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<ul>
|
|
<li><a href="syntax.html">Syntax</a></li>
|
|
<li><a href="values.html">Values</a></li>
|
|
<li><a href="lists.html">Lists</a></li>
|
|
<li><a href="maps.html">Maps</a></li>
|
|
<li><a href="method-calls.html">Method Calls</a></li>
|
|
<li><a href="control-flow.html">Control Flow</a></li>
|
|
</ul>
|
|
</td>
|
|
<td>
|
|
<ul>
|
|
<li><a href="variables.html">Variables</a></li>
|
|
<li><a href="functions.html">Functions</a></li>
|
|
<li><a href="classes.html">Classes</a></li>
|
|
<li><a href="concurrency.html">Concurrency</a></li>
|
|
<li><a href="error-handling.html">Error Handling</a></li>
|
|
<li><a href="modularity.html">Modularity</a></li>
|
|
</ul>
|
|
</td>
|
|
<td>
|
|
<ul>
|
|
<li><a href="modules">API/Modules</a></li>
|
|
<li><a href="embedding">Embedding</a></li>
|
|
<li><a href="performance.html">Performance</a></li>
|
|
<li><a href="qa.html">Q & A</a></li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</nav>
|
|
<main>
|
|
<h2>Performance</h2>
|
|
<p>Even though most benchmarks aren’t worth the pixels they’re printed on, people
|
|
seem to like them, so here’s a few:</p>
|
|
<h3>Method Call</h3>
|
|
|
|
<table class="chart">
|
|
<tr>
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 14%;">0.12s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 18%;">0.16s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>ruby</th><td><div class="chart-bar" style="width: 23%;">0.20s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>lua</th><td><div class="chart-bar" style="width: 41%;">0.35s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python3</th><td><div class="chart-bar" style="width: 91%;">0.78s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python</th><td><div class="chart-bar" style="width: 100%;">0.85s </div></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h3>DeltaBlue</h3>
|
|
|
|
<table class="chart">
|
|
<tr>
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 22%;">0.13s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python3</th><td><div class="chart-bar" style="width: 83%;">0.48s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python</th><td><div class="chart-bar" style="width: 100%;">0.57s </div></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h3>Binary Trees</h3>
|
|
|
|
<table class="chart">
|
|
<tr>
|
|
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 20%;">0.11s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 41%;">0.22s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>ruby</th><td><div class="chart-bar" style="width: 46%;">0.24s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python</th><td><div class="chart-bar" style="width: 71%;">0.37s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python3</th><td><div class="chart-bar" style="width: 73%;">0.38s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>lua</th><td><div class="chart-bar" style="width: 100%;">0.52s </div></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h3>Recursive Fibonacci</h3>
|
|
|
|
<table class="chart">
|
|
<tr>
|
|
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 17%;">0.10s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 35%;">0.20s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>ruby</th><td><div class="chart-bar" style="width: 39%;">0.22s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>lua</th><td><div class="chart-bar" style="width: 49%;">0.28s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python</th><td><div class="chart-bar" style="width: 90%;">0.51s </div></td>
|
|
</tr>
|
|
<tr>
|
|
<th>python3</th><td><div class="chart-bar" style="width: 100%;">0.57s </div></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><strong>Shorter bars are better.</strong> Each benchmark is run ten times and the best time
|
|
is kept. It only measures the time taken to execute the benchmarked code
|
|
itself, not interpreter startup.</p>
|
|
<p>These were run on my MacBook Pro 2.3 GHz Intel Core i7 with 16 GB of 1,600 MHz
|
|
DDR3 RAM. Tested against Lua 5.2.3, LuaJIT 2.0.2, Python 2.7.5, Python 3.3.4,
|
|
ruby 2.0.0p247. LuaJIT is run with the JIT <em>disabled</em> (i.e. in bytecode
|
|
interpreter mode) since I want to support platforms where JIT-compilation is
|
|
disallowed. LuaJIT with the JIT enabled is <em>much</em> faster than all of the other
|
|
languages benchmarked, including Wren, because Mike Pall is a robot from the
|
|
future.</p>
|
|
<p>The benchmark harness and programs are
|
|
<a href="https://github.com/wren-lang/wren/tree/main/test/benchmark">here</a>.</p>
|
|
<h2>Why is Wren fast? <a href="#why-is-wren-fast" name="why-is-wren-fast" class="header-anchor">#</a></h2>
|
|
<p>Languages come in four rough performance buckets, from slowest to fastest:</p>
|
|
<ol>
|
|
<li>
|
|
<p>Tree-walk interpreters: Ruby 1.8.7 and earlier, Io, that
|
|
interpreter you wrote for a class in college.</p>
|
|
</li>
|
|
<li>
|
|
<p>Bytecode interpreters: CPython,
|
|
Ruby 1.9 and later, Lua, early JavaScript VMs.</p>
|
|
</li>
|
|
<li>
|
|
<p>JIT compiled dynamically typed languages: Modern JavaScript VMs,
|
|
LuaJIT, PyPy, some Lisp/Scheme implementations.</p>
|
|
</li>
|
|
<li>
|
|
<p>Statically typed languages: C, C++, Java, C#, Haskell, etc.</p>
|
|
</li>
|
|
</ol>
|
|
<p>Most languages in the first bucket aren’t suitable for production use. (Servers
|
|
are one exception, because you can throw more hardware at a slow language
|
|
there.) Languages in the second bucket are fast enough for many use cases, even
|
|
on client hardware, as the success of the listed languages shows. Languages in
|
|
the third bucket are quite fast, but their implementations are breathtakingly
|
|
complex, often rivaling that of compilers for statically-typed languages.</p>
|
|
<p>Wren is in the second bucket. If you want a simple implementation that’s fast
|
|
enough for real use, this is the sweet spot. In addition, Wren has a few tricks
|
|
up its sleeve:</p>
|
|
<h3>A compact value representation <a href="#a-compact-value-representation" name="a-compact-value-representation" class="header-anchor">#</a></h3>
|
|
<p>A core piece of a dynamic language implementation is the data structure used
|
|
for variables. It needs to be able to store (or reference) a value of any type,
|
|
while also being as compact as possible. Wren uses a technique called <em><a href="http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations">NaN
|
|
tagging</a></em> for this.</p>
|
|
<p>All values are stored internally in Wren as small, eight-byte double-precision
|
|
floats. Since that is also Wren’s number type, in order to do arithmetic, no
|
|
conversion is needed before the “raw” number can be accessed: a value holding a
|
|
number <em>is</em> a valid double. This keeps arithmetic fast.</p>
|
|
<p>To store values of other types, it turns out there’s a ton of unused bits in a
|
|
NaN double. You can stuff a pointer for heap-allocated objects, with room left
|
|
over for special values like <code>true</code>, <code>false</code>, and <code>null</code>. This means numbers,
|
|
bools, and null are unboxed. It also means an entire value is only eight bytes,
|
|
the native word size on 64-bit machines. Smaller = faster when you take into
|
|
account CPU caching and the cost of passing values around.</p>
|
|
<h3>Fixed object layout <a href="#fixed-object-layout" name="fixed-object-layout" class="header-anchor">#</a></h3>
|
|
<p>Most dynamic languages treat objects as loose bags of named properties. You can
|
|
freely add and remove properties from an object after you’ve created it.
|
|
Languages like Lua and JavaScript don’t even have a well-defined concept of a
|
|
“type” of object.</p>
|
|
<p>Wren is strictly class-based. Every object is an instance of a class. Classes
|
|
in turn have a well-defined declarative syntax, and cannot be imperatively
|
|
modified. In addition, fields in Wren are private to the class—they can
|
|
only be accessed from methods defined directly on that class.</p>
|
|
<p>Put all of that together and it means you can determine at <em>compile</em> time
|
|
exactly how many fields an object has and what they are. In other languages,
|
|
when you create an object, you allocate some initial memory for it, but that
|
|
may have to be reallocated multiple times as fields are added and the object
|
|
grows. Wren just does a single allocation up front for exactly the right number
|
|
of fields.</p>
|
|
<p>Likewise, when you access a field in other languages, the interpreter has to
|
|
look it up by name in a hash table in the object, and then maybe walk its
|
|
inheritance chain if it can’t find it. It must do this every time since fields
|
|
may be added freely. In Wren, field access is just accessing a slot in the
|
|
instance by an offset known at compile time: it’s just adding a few pointers.</p>
|
|
<h3>Copy-down inheritance <a href="#copy-down-inheritance" name="copy-down-inheritance" class="header-anchor">#</a></h3>
|
|
<p>When you call a method on an object, the method must be located. It could be
|
|
defined directly on the object’s class, or it may be inheriting it from some
|
|
superclass. This means that in the worst case, you may have to walk the
|
|
inheritance chain to find it.</p>
|
|
<p>Advanced implementations do very smart things to optimize this, but it’s made
|
|
more difficult by the mutable nature of the underlying language: if you can add
|
|
new methods to existing classes freely or change the inheritance hierarchy, the
|
|
lookup for a given method may actually change over time. You have to check for
|
|
that which costs CPU cycles.</p>
|
|
<p>Wren’s inheritance hierarchy is static and fixed at class definition time. This
|
|
means that we can copy down all inherited methods in the subclass when it’s
|
|
created since we know those will never change. Method dispatch then just
|
|
requires locating the method in the class of the receiver.</p>
|
|
<h3>Method signatures <a href="#method-signatures" name="method-signatures" class="header-anchor">#</a></h3>
|
|
<p>Wren supports overloading by arity using its concept of <a href="method-calls.html#signature">signatures</a>. This makes
|
|
the language more expressive, but also faster. When a method is called, we look
|
|
it up on the receiver’s class. If we succeed in finding it, we also know it has
|
|
the right number of parameters.</p>
|
|
<p>This lets Wren avoid the extra checking most languages need to do at runtime to
|
|
handle too few or too many arguments being passed to a method. In Wren, it’s not
|
|
<em>syntactically</em> possible to call a method with the wrong number of arguments.</p>
|
|
<h3>Computed gotos <a href="#computed-gotos" name="computed-gotos" class="header-anchor">#</a></h3>
|
|
<p>On compilers that support it, Wren’s core bytecode interpreter loop uses
|
|
something called <a href="http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/"><em>computed gotos</em></a>. The hot core of a bytecode
|
|
interpreter is effectively a giant <code>switch</code> on the instruction being executed.</p>
|
|
<p>Doing that using an actual <code>switch</code> confounds the CPU’s <a href="http://en.wikipedia.org/wiki/Branch_predictor">branch
|
|
predictor</a>—there is basically a single branch point for the entire
|
|
interpreter. That quickly saturates the predictor and it just gets confused and
|
|
fails to predict anything, which leads to more CPU stalls and pipeline flushes.</p>
|
|
<p>Using computed gotos gives you a separate branch point at the end of each
|
|
instruction. Each gets its own branch prediction, which often succeeds since
|
|
some instruction pairs are more common than others. In my rough testing, this
|
|
makes a 5-10% performance difference.</p>
|
|
<h3>A single-pass compiler <a href="#a-single-pass-compiler" name="a-single-pass-compiler" class="header-anchor">#</a></h3>
|
|
<p>Compile time is a relatively small component of a language’s performance: code
|
|
only has to be compiled once but a given line of code may be run many times.
|
|
However, fast compilation helps with <em>startup</em> speed—the time it takes to
|
|
get anything up and running. For that, Wren’s compiler is quite fast.</p>
|
|
<p>It’s modeled after Lua’s compiler. Instead of tokenizing and then parsing to
|
|
create a bunch of AST structures which are then consumed and deallocated by
|
|
later phases, it emits code directly during parsing. This means it does minimal
|
|
memory allocation during a parse and has very little overhead.</p>
|
|
<h2>Why don’t other languages do this? <a href="#why-don't-other-languages-do-this" name="why-don't-other-languages-do-this" class="header-anchor">#</a></h2>
|
|
<p>Most of Wren’s performance comes from language design decisions. While it’s
|
|
dynamically <em>typed</em> and <em>dispatched</em>, classes are relatively statically
|
|
<em>defined</em>. That makes a lot of things much easier. Other languages have a much
|
|
more mutable object model, and cannot change that without breaking lots of
|
|
existing code.</p>
|
|
<p>Wren’s closest sibling, by far, is Lua. Lua is more dynamic than Wren which
|
|
makes its job harder. Lua also tries very hard to be compatible across a wide
|
|
range of hardware and compilers. If you have a C89 compiler for it, odds are
|
|
very good that you can run Lua on it.</p>
|
|
<p>Wren cares about compatibility, but it requires C99 or C++98 and IEEE double
|
|
precision floats. That may exclude some edge case hardware, but makes things
|
|
like NaN tagging, computed gotos, and some other tricks possible.</p>
|
|
<script src="script.js"></script>
|
|
</main>
|
|
</div>
|
|
<footer>
|
|
<div class="page">
|
|
<div class="main-column">
|
|
<p>Wren lives
|
|
<a href="https://github.com/wren-lang/wren">on GitHub</a>
|
|
— Made with ❤ by
|
|
<a href="http://journal.stuffwithstuff.com/">Bob Nystrom</a> and
|
|
<a href="https://github.com/wren-lang/wren/blob/main/AUTHORS">friends</a>.
|
|
</p>
|
|
<div class="main-column">
|
|
</div>
|
|
</footer>
|
|
</body>
|
|
</html>
|