forked from Mirror/wren
Punctuation choices
This commit is contained in:
@ -28,7 +28,7 @@ A core piece of a dynamic language implementation is the data structure used for
|
||||
|
||||
[nan tagging]: http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations
|
||||
|
||||
All values are stored internally in Wren as small, eight byte double-precision floats. Since that is also Wren's number type, in order to do arithmetic, no conversion is needed before the "raw" number can be accessed: a value holding a number *is* a valid double. This keeps arithmetic fast.
|
||||
All values are stored internally in Wren as small, eight-byte double-precision floats. Since that is also Wren's number type, in order to do arithmetic, no conversion is needed before the "raw" number can be accessed: a value holding a number *is* a valid double. This keeps arithmetic fast.
|
||||
|
||||
To store values of other types, it turns out there's a ton of unused bits in a "NaN" double. There's room in there for a pointer as well as some other stuff. For simple values like `true`, `false`, and `null`, Wren uses special bit patterns and stores them directly in the value. For other objects like strings that are heap allocated, Wren stores the pointer in there.
|
||||
|
||||
@ -44,11 +44,11 @@ Put all of that together and it means you can determine at *compile* time exactl
|
||||
|
||||
Likewise, when you access a field in other languages, the interpreter has to look it up by name in a hash table in the object, and then maybe walk its inheritance chain if it can't find it. It must do this every time since fields may be added freely. In Wren, field access is just accessing a slot in the instance by an offset known at compile time: it's just adding a few pointers.
|
||||
|
||||
### Copy down inheritance
|
||||
### Copy-down inheritance
|
||||
|
||||
When you call a method on an object, the method must be located. It could be defined directly on the object's class, or it may be inheriting it from some superclass. This means that in the worst case, you may have to walk the inheritance chain to find it.
|
||||
|
||||
Advanced implementations do very smart things to optimize this, but it's made more difficult by the mutable nature of the underlying language: if you can add new methods to existing classes freely or change the inheritance hierarchy, the lookup for a given method may actually change over time. You have to add guards to check for that, which cost CPU cycles.
|
||||
Advanced implementations do very smart things to optimize this, but it's made more difficult by the mutable nature of the underlying language; if you can add new methods to existing classes freely or change the inheritance hierarchy, the lookup for a given method may actually change over time. You have to add guards to check for that, which cost CPU cycles.
|
||||
|
||||
Wren's inheritance hierarchy is static and fixed at class declaration time. This means that we can copy down all inherited methods in the subclass when it's created since we know those will never change. That means method dispatch just requires locating the method in the class of the receiver.
|
||||
|
||||
@ -56,7 +56,7 @@ Wren's inheritance hierarchy is static and fixed at class declaration time. This
|
||||
|
||||
On compilers that support it, Wren's core bytecode interpreter loop will use something called [*computed gotos*](http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/). The hot core of a bytecode interpreter is effectively a giant `switch` on the instruction being executed.
|
||||
|
||||
Doing that using an actual `switch` wreaks havoc with the CPU's branch predictor: there is basically a single branch point for the entire interpreter. That quickly saturates the predictor and it just gets confused and fails to predict anything, which leads to more CPU stalls and pipeline flushes.
|
||||
Doing that using an actual `switch` wreaks havoc with the CPU's branch predictor; there is basically a single branch point for the entire interpreter. That quickly saturates the predictor and it just gets confused and fails to predict anything, which leads to more CPU stalls and pipeline flushes.
|
||||
|
||||
Using computed gotos gives you a separate branch point at the end of each instruction. Each gets its own branch prediction, which will often succeed since some instruction pairs are more common than others. In my rough testing, this made a 5-10% performance difference.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user