2013-12-10 21:51:58 -08:00
|
|
|
^title Performance
|
|
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Even though most benchmarks aren't worth the pixels they're printed on, people
|
|
|
|
|
seem to like them, so here's a few:
|
2014-04-20 21:04:41 -07:00
|
|
|
|
|
|
|
|
<h3>Method Call</h3>
|
|
|
|
|
<table class="chart">
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 11%;">0.15s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 17%;">0.23s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>ruby</th><td><div class="chart-bar" style="width: 26%;">0.36s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>lua</th><td><div class="chart-bar" style="width: 43%;">0.58s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python3</th><td><div class="chart-bar" style="width: 88%;">1.18s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python</th><td><div class="chart-bar" style="width: 100%;">1.33s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
<h3>DeltaBlue</h3>
|
|
|
|
|
<table class="chart">
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 27%;">0.13s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python3</th><td><div class="chart-bar" style="width: 94%;">0.44s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python</th><td><div class="chart-bar" style="width: 100%;">0.47s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
<h3>Binary Trees</h3>
|
|
|
|
|
<table class="chart">
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 20%;">0.16s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 39%;">0.29s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>ruby</th><td><div class="chart-bar" style="width: 49%;">0.37s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python3</th><td><div class="chart-bar" style="width: 68%;">0.52s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>lua</th><td><div class="chart-bar" style="width: 97%;">0.73s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python</th><td><div class="chart-bar" style="width: 100%;">0.75s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
<h3>Recursive Fibonacci</h3>
|
|
|
|
|
<table class="chart">
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>luajit (-joff)</th><td><div class="chart-bar" style="width: 17%;">0.15s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>wren</th><td><div class="chart-bar wren" style="width: 38%;">0.31s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>ruby</th><td><div class="chart-bar" style="width: 40%;">0.33s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>lua</th><td><div class="chart-bar" style="width: 43%;">0.35s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python3</th><td><div class="chart-bar" style="width: 96%;">0.78s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
<tr>
|
2015-01-16 17:51:10 -08:00
|
|
|
<th>python</th><td><div class="chart-bar" style="width: 100%;">0.81s </div></td>
|
2014-04-20 21:04:41 -07:00
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
2015-01-16 17:51:10 -08:00
|
|
|
**Shorter bars are better.** Each benchmark is run ten times and the best time
|
|
|
|
|
is kept. It only measures the time taken to execute the benchmarked code
|
|
|
|
|
itself, not interpreter startup.
|
2014-04-20 21:04:41 -07:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
These were run on my MacBook Pro 2.3 GHz Intel Core i7 with 16 GB of 1,600 MHz
|
|
|
|
|
DDR3 RAM. Tested against Lua 5.2.3, LuaJIT 2.0.2, Python 2.7.5, Python 3.3.4,
|
|
|
|
|
ruby 2.0.0p247. LuaJIT is run with the JIT *disabled* (i.e. in bytecode
|
|
|
|
|
interpreter mode) since I want to support platforms where JIT-compilation is
|
|
|
|
|
disallowed. LuaJIT with the JIT enabled is *much* faster than all of the other
|
|
|
|
|
languages benchmarked, including Wren, because Mike Pall is a robot from the
|
|
|
|
|
future.
|
2014-04-20 21:04:41 -07:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
The benchmark harness and programs are
|
2015-03-14 12:45:56 -07:00
|
|
|
[here](https://github.com/munificent/wren/tree/master/test/benchmark).
|
2014-04-20 21:04:41 -07:00
|
|
|
|
|
|
|
|
## Why is Wren fast?
|
|
|
|
|
|
2013-12-10 21:51:58 -08:00
|
|
|
Languages come in four rough performance buckets, from slowest to fastest:
|
|
|
|
|
|
|
|
|
|
1. Tree-walk interpreters: Ruby 1.8.7 and earlier, Io, that
|
|
|
|
|
interpreter you wrote for a class in college.
|
|
|
|
|
|
|
|
|
|
2. Bytecode interpreters: CPython,
|
|
|
|
|
Ruby 1.9 and later, Lua, early JavaScript VMs.
|
|
|
|
|
|
2014-04-14 21:23:46 -07:00
|
|
|
3. JIT compiled dynamically typed languages: Modern JavaScript VMs,
|
2013-12-10 21:51:58 -08:00
|
|
|
LuaJIT, PyPy, some Lisp/Scheme implementations.
|
|
|
|
|
|
2014-04-14 21:23:46 -07:00
|
|
|
4. Statically typed languages: C, C++, Java, C#, Haskell, etc.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Most languages in the first bucket aren't suitable for production use. (Servers
|
|
|
|
|
are one exception, because you can throw more hardware at a slow language
|
|
|
|
|
there.) Languages in the second bucket are fast enough for many use cases, even
|
|
|
|
|
on client hardware, as the success of the listed languages shows. Languages in
|
|
|
|
|
the third bucket are quite fast, but their implementations are breathtakingly
|
|
|
|
|
complex, often rivaling that of compilers for statically-typed languages.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Wren is in the second bucket. If you want a simple implementation that's fast
|
|
|
|
|
enough for real use, this is the sweet spot. In addition, Wren has a few tricks
|
|
|
|
|
up its sleeve:
|
2013-12-10 21:51:58 -08:00
|
|
|
|
|
|
|
|
### A compact value representation
|
|
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
A core piece of a dynamic language implementation is the data structure used
|
|
|
|
|
for variables. It needs to be able to store (or reference) a value of any type,
|
|
|
|
|
while also being as compact as possible. Wren uses a technique called *[NaN
|
|
|
|
|
tagging][]* for this.
|
2013-12-17 09:39:26 -08:00
|
|
|
|
|
|
|
|
[nan tagging]: http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
All values are stored internally in Wren as small, eight-byte double-precision
|
|
|
|
|
floats. Since that is also Wren's number type, in order to do arithmetic, no
|
|
|
|
|
conversion is needed before the "raw" number can be accessed: a value holding a
|
|
|
|
|
number *is* a valid double. This keeps arithmetic fast.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
To store values of other types, it turns out there's a ton of unused bits in a
|
|
|
|
|
NaN double. You can stuff a pointer for heap-allocated objects, with room left
|
|
|
|
|
over for special values like `true`, `false`, and `null`. This means numbers,
|
|
|
|
|
bools, and null are unboxed. It also means an entire value is only eight bytes,
|
|
|
|
|
the native word size on 64-bit machines. Smaller = faster when you take into
|
|
|
|
|
account CPU caching and the cost of passing values around.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
|
|
|
|
### Fixed object layout
|
|
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Most dynamic languages treat objects as loose bags of named properties. You can
|
|
|
|
|
freely add and remove properties from an object after you've created it.
|
|
|
|
|
Languages like Lua and JavaScript don't even have a well-defined concept of a
|
|
|
|
|
"type" of object.
|
|
|
|
|
|
|
|
|
|
Wren is strictly class-based. Every object is an instance of a class. Classes
|
|
|
|
|
in turn have a well-defined declarative syntax, and cannot be imperatively
|
|
|
|
|
modified. In addition, fields in Wren are private to the class—they can
|
|
|
|
|
only be accessed from methods defined directly on that class.
|
|
|
|
|
|
|
|
|
|
Put all of that together and it means you can determine at *compile* time
|
|
|
|
|
exactly how many fields an object has and what they are. In other languages,
|
|
|
|
|
when you create an object, you allocate some initial memory for it, but that
|
|
|
|
|
may have to be reallocated multiple times as fields are added and the object
|
|
|
|
|
grows. Wren just does a single allocation up front for exactly the right number
|
|
|
|
|
of fields.
|
|
|
|
|
|
|
|
|
|
Likewise, when you access a field in other languages, the interpreter has to
|
|
|
|
|
look it up by name in a hash table in the object, and then maybe walk its
|
|
|
|
|
inheritance chain if it can't find it. It must do this every time since fields
|
|
|
|
|
may be added freely. In Wren, field access is just accessing a slot in the
|
|
|
|
|
instance by an offset known at compile time: it's just adding a few pointers.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2014-02-16 19:28:40 -06:00
|
|
|
### Copy-down inheritance
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
When you call a method on an object, the method must be located. It could be
|
|
|
|
|
defined directly on the object's class, or it may be inheriting it from some
|
|
|
|
|
superclass. This means that in the worst case, you may have to walk the
|
|
|
|
|
inheritance chain to find it.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Advanced implementations do very smart things to optimize this, but it's made
|
|
|
|
|
more difficult by the mutable nature of the underlying language: if you can add
|
|
|
|
|
new methods to existing classes freely or change the inheritance hierarchy, the
|
|
|
|
|
lookup for a given method may actually change over time. You have to check for
|
|
|
|
|
that which costs CPU cycles.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Wren's inheritance hierarchy is static and fixed at class definition time. This
|
|
|
|
|
means that we can copy down all inherited methods in the subclass when it's
|
|
|
|
|
created since we know those will never change. Method dispatch then just
|
|
|
|
|
requires locating the method in the class of the receiver.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
|
|
|
|
### Computed gotos
|
|
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
On compilers that support it, Wren's core bytecode interpreter loop uses
|
|
|
|
|
something called [*computed gotos*][goto]. The hot core of a bytecode
|
|
|
|
|
interpreter is effectively a giant `switch` on the instruction being executed.
|
|
|
|
|
|
|
|
|
|
[goto]: http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/
|
|
|
|
|
|
|
|
|
|
Doing that using an actual `switch` confounds the CPU's [branch
|
|
|
|
|
predictor][]—there is basically a single branch point for the entire
|
|
|
|
|
interpreter. That quickly saturates the predictor and it just gets confused and
|
|
|
|
|
fails to predict anything, which leads to more CPU stalls and pipeline flushes.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
[branch predictor]: http://en.wikipedia.org/wiki/Branch_predictor
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Using computed gotos gives you a separate branch point at the end of each
|
|
|
|
|
instruction. Each gets its own branch prediction, which often succeeds since
|
|
|
|
|
some instruction pairs are more common than others. In my rough testing, this
|
|
|
|
|
makes a 5-10% performance difference.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
|
|
|
|
### A single-pass compiler
|
|
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Compile time is a relatively small component of a language's performance: code
|
|
|
|
|
only has to be compiled once but a given line of code may be run many times.
|
|
|
|
|
However, fast compilation helps with *startup* speed—the time it takes to
|
|
|
|
|
get anything up and running. For that, Wren's compiler is quite fast.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
It's modeled after Lua's compiler. Instead of tokenizing and then parsing to
|
|
|
|
|
create a bunch of AST structures which are then consumed and deallocated by
|
|
|
|
|
later phases, it emits code directly during parsing. This means it does minimal
|
|
|
|
|
memory allocation during a parse and has very little overhead.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
|
|
|
|
## Why don't other languages do this?
|
|
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Most of Wren's performance comes from language design decisions. While it's
|
|
|
|
|
dynamically *typed* and *dispatched*, classes are relatively statically
|
|
|
|
|
*defined*. That makes a lot of things much easier. Other languages have a much
|
|
|
|
|
more mutable object model, and cannot change that without breaking lots of
|
|
|
|
|
existing code.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-01-03 23:27:02 -08:00
|
|
|
Wren's closest sibling, by far, is Lua. Lua is more dynamic than Wren which
|
|
|
|
|
makes its job harder. Lua also tries very hard to be compatible across a wide
|
|
|
|
|
range of hardware and compilers. If you have a C89 compiler for it, odds are
|
|
|
|
|
very good that you can run Lua on it.
|
2013-12-10 21:51:58 -08:00
|
|
|
|
2015-09-22 21:19:38 -07:00
|
|
|
Wren cares about compatibility, but it requires C99 or C++98 and IEEE double
|
|
|
|
|
precision floats. That may exclude some edge case hardware, but makes things
|
|
|
|
|
like NaN tagging, computed gotos, and some other tricks possible.
|
2015-01-16 17:51:10 -08:00
|
|
|
|
|
|
|
|
<script src="script.js"></script>
|