forked from Mirror/wren
Proposal for smarter imports.
This commit is contained in:
475
doc/rfc/0001-smarter-imports.md
Normal file
475
doc/rfc/0001-smarter-imports.md
Normal file
@ -0,0 +1,475 @@
|
||||
# Smarter Imports
|
||||
|
||||
Here's a proposal for improving how imported modules are identified and found
|
||||
to hopefully help us start growing an ecosystem of reusable Wren code. Please
|
||||
do [let me know][list] what you think!
|
||||
|
||||
[list]: https://groups.google.com/forum/#!forum/wren-lang
|
||||
|
||||
## Motivation
|
||||
|
||||
As [others][210] [have][325] [noted][346], the way imports work in Wren,
|
||||
particularly how the CLI resolves them, makes it much too hard to reuse code.
|
||||
This proposal aims to improve that. It doesn't intend to fix *everything* about
|
||||
imports and the module system, but should leave the door open for later
|
||||
improvements.
|
||||
|
||||
[210]: https://github.com/munificent/wren/issues/210
|
||||
[325]: https://github.com/munificent/wren/issues/325
|
||||
[346]: https://github.com/munificent/wren/issues/346
|
||||
|
||||
### Relative imports
|
||||
|
||||
Today, it's hard to reuse your own code unless you literally dump everything in
|
||||
a single directory. Say you have:
|
||||
|
||||
```text
|
||||
script_a.wren
|
||||
useful_stuff/
|
||||
script_b.wren
|
||||
thing_1.wren
|
||||
thing_2.wren
|
||||
```
|
||||
|
||||
`script_a.wren` and `script_b.wren` are both scripts you can run directly from
|
||||
the CLI. They would both like to use `thing_1.wren`, which in turn imports
|
||||
`thing_2.wren`. What does `thing_1.wren` look like? If you do:
|
||||
|
||||
```scala
|
||||
// thing_1.wren
|
||||
import "thing_2"
|
||||
```
|
||||
|
||||
Then it works fine if you run `script_b.wren` from the `useful_stuff/`
|
||||
directory. But if you try to run `script_a.wren` from the top level directory,
|
||||
then it looks for `thing_2.wren` *there* and fails to find it. If you change the
|
||||
import to:
|
||||
|
||||
```scala
|
||||
// thing_1.wren
|
||||
import "useful_stuff/thing_2"
|
||||
```
|
||||
|
||||
Then `script_a.wren` works, but now `script_b.wren` is broken. The problem is
|
||||
that all imports are treated as relative to the directory containing the
|
||||
*initial script* you run. That means you can't reuse modules from scripts that
|
||||
live in different directories.
|
||||
|
||||
In this example, if feels like imports should be treated as relative to the
|
||||
file that contains the import statement. Often you want to specify, "Here is
|
||||
*where* this other module is, relative to where *I* am."
|
||||
|
||||
### Logical imports
|
||||
|
||||
If we make imports relative, is that enough? Should *all* imports be relative? I
|
||||
don't think so. First of all, some modules are not even on the file system.
|
||||
There is no relative path that will take you to "random" — it's built into the
|
||||
VM itself. Likewise, "io" is baked into the CLI.
|
||||
|
||||
Today, when you write:
|
||||
|
||||
```scala
|
||||
import "io"
|
||||
```
|
||||
|
||||
You aren't saying *where* that module should be found, you're saying *what*
|
||||
module you want. Assuming we get a package manager at some point, these kinds of
|
||||
"logical" imports will be common. So I want these too.
|
||||
|
||||
If you look at other langauges' package managers, you'll find many times a
|
||||
single package offers a number of separate libraries you can use. So I also
|
||||
want to support logical imports that contain a path too — the import would say
|
||||
both *what* package to look in and *where* in that package to look.
|
||||
|
||||
### Only logical imports?
|
||||
|
||||
Given some kind of package-y import syntax, could we get rid of relative imports
|
||||
and use those for everything? You'd treat your own program like it was itself
|
||||
some kind of package and anything you wanted to import in it you'd import
|
||||
relative to your app's root directory.
|
||||
|
||||
The problem is that the "root directory" for your program's "package" isn't
|
||||
well-defined. We could say it's always the same directory as the script you're
|
||||
running, but that's probably too limiting. You may want to run scripts that live
|
||||
in subdirectories.
|
||||
|
||||
We could walk up the parent directories looking for some kind of "manifest" file
|
||||
that declares "the root of the package is here", but that seems like a lot of
|
||||
hassle if you just want to create a couple of text files and start getting some
|
||||
code running. So, for your own programs, I think it's nice to still support
|
||||
"pure" relative imports.
|
||||
|
||||
### Ambiguity?
|
||||
|
||||
OK, so we want both relative imports and logical imports. Can we use the same
|
||||
syntax for both? We could allow, say:
|
||||
|
||||
```scala
|
||||
import "a/b"
|
||||
```
|
||||
|
||||
And the semantics would be:
|
||||
|
||||
1. Look for a module "a/b.wren" relative to the file containing the import. If
|
||||
found, use it.
|
||||
|
||||
2. Otherwise, look inside some "package" directory for a package named "a" and
|
||||
a module named "b.wren" inside it. If found use that.
|
||||
|
||||
3. Otherwise, look for a built in module named "a".
|
||||
|
||||
This is pretty much how things work now, but I don't think it's a good idea.
|
||||
Relative imports will tend to be short — often single words like "utils".
|
||||
Assuming we get a healthy package ecosystem at some point, the chances of one of
|
||||
those colliding with a logical import name are high.
|
||||
|
||||
Also, when reading code, I think it's important to be able to easily tell "this
|
||||
import is from my own program" without having to know the names of all of the
|
||||
files and directories in the program.
|
||||
|
||||
## Proposal
|
||||
|
||||
OK, so here's my goals:
|
||||
|
||||
1. A way to import a module relative to the one containing the import.
|
||||
2. A way to import a module from some named logical package, possibly at a
|
||||
specific path within that package.
|
||||
3. Distinct syntaxes for each of these.
|
||||
|
||||
I tried a few different ideas, and my favorite is:
|
||||
|
||||
### Relative imports
|
||||
|
||||
Relative imports use the existing syntax:
|
||||
|
||||
```scala
|
||||
// Relative path.
|
||||
import "ast/expr"
|
||||
```
|
||||
|
||||
This looks for the file `ast/expr.wren` relative to the directory containing the
|
||||
module that has this import statement in it.
|
||||
|
||||
You can also walk out of directories if you need to import a module in a parent
|
||||
folder:
|
||||
|
||||
```scala
|
||||
import "../../other/stuff"
|
||||
```
|
||||
|
||||
### Logical imports
|
||||
|
||||
If you want to import a module from some named logical entity, you use an
|
||||
*unquoted* identifier:
|
||||
|
||||
```scala
|
||||
import random
|
||||
```
|
||||
|
||||
Being unquoted means the names must be valid Wren identifiers and can't be
|
||||
reserved words. I think that's OK. It would confuse the hell out of people if
|
||||
you had a library named "if". I think the above *looks* nice, and the fact that
|
||||
it's not quoted sends a signal (to me at least) that the name is a "what" more
|
||||
than a "where".
|
||||
|
||||
If you want to import a specific module within a logical entity, you can have a
|
||||
series of slash-separate identifiers after the name:
|
||||
|
||||
```scala
|
||||
import wrenalyzer/ast/expr
|
||||
```
|
||||
|
||||
This imports module "ast/expr" from "wrenalyzer".
|
||||
|
||||
## Implementation
|
||||
|
||||
That's the proposed syntax and basic semantics. The way we actually implement it
|
||||
is tricky because Wren is both a standalone interpreter you can run on the
|
||||
command line and an embedded scripting language. We have to figure out what goes
|
||||
into the VM and what lives in the CLI, and the interface between the two.
|
||||
|
||||
### VM
|
||||
|
||||
As usual, I want to keep the VM minimal and free of policy. We do need to add
|
||||
support for the new unquoted syntax. The more significant change is to the API
|
||||
the VM uses to talk to the host app when a module is imported. The VM doesn't
|
||||
know how to actually load modules. When it executes an import statement, it
|
||||
calls:
|
||||
|
||||
```c
|
||||
char* loadModuleFn(WrenVM* vm, const char* name);
|
||||
```
|
||||
|
||||
The VM tells the host app the import string and the host app returns the code.
|
||||
In order to distinguish relative imports (quoted) from an identical unquoted
|
||||
name and path, we need to pass in an extra to bit to tell the host whether there
|
||||
were quotes or not.
|
||||
|
||||
The more challenging change (and the reason I didn't support them when I first
|
||||
added imports to Wren) is relative imports. There are two tricky parts:
|
||||
|
||||
First, the host app doesn't have enough context to resolve a relative import.
|
||||
Right now, the VM only passes in the import string. It doesn't tell which module
|
||||
*contains* that import string, so the host has no way of knowing what that
|
||||
import should be relative *to*.
|
||||
|
||||
That's easy to fix. We have the VM pass in the name of the module that contains
|
||||
the import.
|
||||
|
||||
The harder problem is **canonicalization**. When you import the same module
|
||||
twice, the VM ensures it is only executed once and both places use the same
|
||||
module data. This is important to ensure you don't get confusing things like
|
||||
duplicate static state or other weird side effects.
|
||||
|
||||
To do that, the VM needs to be able to tell when two imports refer to the "same"
|
||||
module. Right now, it uses the import string itself. If two imports use the same
|
||||
string, they are the same module.
|
||||
|
||||
With relative imports, that is no longer valid. Consider:
|
||||
|
||||
```text
|
||||
script_a.wren
|
||||
useful_stuff/
|
||||
thing_1.wren
|
||||
thing_2.wren
|
||||
```
|
||||
|
||||
Now imagine those files contain:
|
||||
|
||||
```scala
|
||||
// script_a.wren
|
||||
import "useful_stuff/thing_1"
|
||||
import "useful_stuff/thing_2"
|
||||
|
||||
// useful_stuff/thing_1.wren
|
||||
import "thing_2"
|
||||
|
||||
// useful_stuff/thing_2.wren
|
||||
// Stuff...
|
||||
```
|
||||
|
||||
Both `script_a.wren` and `thing_1` import `thing_2`, but the import *strings*
|
||||
are different. The VM needs to be able to figure out that those two imports
|
||||
refer to the same module. I don't want path manipulation logic in the VM, so it
|
||||
will delegate to the host app for that as well.
|
||||
|
||||
Given the import string and the name of the module containing it, the host app
|
||||
produces a "fully-qualified" or "canonical" name for the imported module. It is
|
||||
*that* resulting string that the VM uses to tell if two imports resolve to the
|
||||
same module. (It's also the string it uses in things like stack traces.)
|
||||
|
||||
This means importing becomes a three stage process:
|
||||
|
||||
1. First the VM asks the host to resolve an import. It gives it the (previously
|
||||
resolved) name of the module containing the import, the imports string, and
|
||||
whether or not it was quoted. The host app returns a canonical string for
|
||||
that import.
|
||||
|
||||
2. The VM checks to see if a module with that canonical name has already been
|
||||
imported. If so, it reuses that and its done.
|
||||
|
||||
3. Otherwise, it circles back and asks the host for the source of the module
|
||||
with that given canonical name. It compiles and executes that and goes from
|
||||
there.
|
||||
|
||||
So we add a new callback to the embedding API. Something like:
|
||||
|
||||
```c
|
||||
char* resolveModuleFn(WrenVM* vm,
|
||||
// Canonical name of the module containing the import.
|
||||
const char* importer,
|
||||
|
||||
// The import string.
|
||||
const char* path,
|
||||
|
||||
// Whether the path name was quoted.
|
||||
bool isQuoted);
|
||||
```
|
||||
|
||||
The VM invokes this for step one above. The other two steps are the existing
|
||||
loading logic but now using the canonicalized string.
|
||||
|
||||
### CLI
|
||||
|
||||
All of the policy lives over in the CLI (or in your app if you are embedding the
|
||||
VM). You are free to use whatever canonicalization policy makes sense for you.
|
||||
For the CLI, and for the policy described up in motivation, it's something like
|
||||
this:
|
||||
|
||||
* Imports are slash-separated paths. Resolving a relative path is normal path
|
||||
joining relative to the directory containing the import. So if you're
|
||||
importing "a/b" from "c/d" (which is a file named "d.wren" in a directory
|
||||
"c"), then the canonical name is "c/a/b" and the file is "c/a/b.wren".
|
||||
|
||||
".." and "." are allowed and are normalized. So these imports all resolve
|
||||
to the same module:
|
||||
|
||||
```scala
|
||||
import "a/b/c"
|
||||
import "a/./b/./c"
|
||||
import "a/d/../b/c"
|
||||
```
|
||||
|
||||
* If an import is quoted, the path is considered relative to the importing
|
||||
module's path, and is in the same package as the importing module.
|
||||
|
||||
So, if the current file is "a/b/c.wren" in package "foo" then these are
|
||||
equivalent:
|
||||
|
||||
```scala
|
||||
import "d/e"
|
||||
import foo/a/b/d/e
|
||||
```
|
||||
|
||||
* If an import is unquoted, the first identifier is the logical "package"
|
||||
containing the module, and the remaining components are the path within that
|
||||
package. The canonicalized string is the logical name, a colon, then the
|
||||
resolved full path to the import (without the ".wren" file extension).
|
||||
So if you import:
|
||||
|
||||
```scala
|
||||
import wrenalyzer/ast/expr
|
||||
```
|
||||
|
||||
The canonical name is "wrenalyzer:ast/expr".
|
||||
|
||||
* If an import is a single unquoted name, the CLI implicitly uses the name as
|
||||
the module to look for within that package. These are equivalent:
|
||||
|
||||
```scala
|
||||
import foo
|
||||
import foo/foo
|
||||
```
|
||||
|
||||
We could use some default name like "module" instead of the package name,
|
||||
similar to Python, but I think this is actually a little more usable in
|
||||
practice. If you're hacking on a bunch of packages at the same time, it's
|
||||
annoying if every tab in your text editor just says "module.wren".
|
||||
|
||||
* The canonicalized string for the main script or a module imported using a
|
||||
relative path from the main script is just the normalized file path,
|
||||
probably relative to the working directory.
|
||||
|
||||
* Since colon is used to separate the name from path, path components with
|
||||
colons are not allowed.
|
||||
|
||||
### Finding logical imports
|
||||
|
||||
The last remaining piece is how the CLI physically locates logical imports. If
|
||||
you write:
|
||||
|
||||
```scala
|
||||
import foo
|
||||
```
|
||||
|
||||
Where does it look for "foo"? Of course, if "foo" is built into the VM like
|
||||
"random", then that's easy. Likewise, if it's built into the CLI like "io",
|
||||
that's easy too.
|
||||
|
||||
Otherwise, it will try to find it on the file system. We don't have a package
|
||||
manager yet, so we need some kind of simple policy so you can "hand-author" the
|
||||
layout a package manager would produce. Borrowing from Node, the basic idea is
|
||||
pretty simple.
|
||||
|
||||
To find a logical import, the CLI starts in the directory that contains the main
|
||||
script (not the directory containing the module doing the import), and looks for
|
||||
a directory named "wren_modules". If not found there, it starts walking up
|
||||
parent directories until it finds one. If it does, it looks for the logical
|
||||
import inside there. So, if you import "foo", it will try to find
|
||||
"wren_modules/foo/foo.wren".
|
||||
|
||||
Once it finds a "wren_modules" directory, it uses that one directory for all
|
||||
logical imports. You can't scatter stuff across multiple "wren_modules" folders
|
||||
at different levels of the hierarchy. If it can't find a "wren_modules"
|
||||
directory, or it can't find the requested module inside the directory, the
|
||||
import fails.
|
||||
|
||||
This means that to reuse someone else's Wren "package" (or your own for that
|
||||
matter), you can just stick a "wren_modules" directory next to the main script
|
||||
for your app or in some parent directory. Inside that "wren_modules" directory,
|
||||
copy in the package you want to reuse. If that package in turn uses other
|
||||
packages, copy those into the *same* "wren_modules" directory. In other words,
|
||||
the transitive dependencies get flattened. This is important to handle shared
|
||||
dependencies between packages without duplication.
|
||||
|
||||
You only need to worry about all of this if you actually have logical imports.
|
||||
If you just have a couple of files that import each other, you can use straight
|
||||
relative imports and everything just works.
|
||||
|
||||
## Migration
|
||||
|
||||
OK, that's the plan. How do we get there? I've start hacking on the
|
||||
implementation a little and, so far, it seems straightforward. Honestly, it will
|
||||
probably take less time than I spent writing this up.
|
||||
|
||||
The tricky part is that this is a breaking change. All of your existing quoted
|
||||
import strings will mean something different. We definitely *can* and will make
|
||||
breaking changes in Wren, so that's OK, but I'd like to minimize the pain. Right
|
||||
now, Wren is currently at version 0.1.0. I'll probably consider the commit right
|
||||
before I start landing this to be the "official" 0.1.0 release and then the
|
||||
import changes will land in "0.2.0". I'll work in a branch off master until
|
||||
everything looks solid and then merge it in.
|
||||
|
||||
If you have existing Wren code that you run on the CLI and that contains
|
||||
imports, you'll probably need to tweak them.
|
||||
|
||||
If you are hosting Wren in your own app, the imports are fine since your app
|
||||
has control over how they resolve. But you will have to fix your app a little
|
||||
since the import embedding API is going to change to deal with canonicalization.
|
||||
I think I can make it so that if you don't provide a canonicalization callback,
|
||||
then the original import string is treated as the canonical string and you
|
||||
fall back to the current behavior.
|
||||
|
||||
## Alternatives
|
||||
|
||||
Having both quoted and unquoted import strings is a little funny, but it's the
|
||||
best I could come up with. For what it's worth, I [borrowed it from
|
||||
Racket][racket].
|
||||
|
||||
[racket]: https://docs.racket-lang.org/guide/module-basics.html
|
||||
|
||||
I considered a couple of other ideas which are potentially on the table if
|
||||
most of you don't dig the main proposal:
|
||||
|
||||
### Node-style
|
||||
|
||||
In Node, [all imports are quoted][node]. To distinguish between relative and
|
||||
logical imports, relative imports always start with "./". In Wren, it would be:
|
||||
|
||||
[node]: https://nodejs.org/api/modules.html
|
||||
|
||||
```scala
|
||||
import "./something/relative"
|
||||
import "logical/thing"
|
||||
```
|
||||
|
||||
This is simpler than the main proposal since there are no syntax changes and we
|
||||
don't need to push the "was quoted?" bit through the embedding API. But I find
|
||||
the "./" pretty unintuitive especially if you're not steeped in the UNIX
|
||||
tradition. Even if you are, it's weird that you *need* to use "./" when it means
|
||||
nothing to the filesystem.
|
||||
|
||||
### Unquoted identifiers
|
||||
|
||||
The other idea I had was to allow both an unquoted identifier and a quoted
|
||||
path, like:
|
||||
|
||||
```scala
|
||||
import wrenalyzer "ast/expr"
|
||||
```
|
||||
|
||||
The unquoted name is the logical part — the package name. The quoted part is
|
||||
the path within that logical package. If you omit the unquoted name, it's a
|
||||
straight relative import. If you have a name but no path, it's desugars to use
|
||||
the name as the path.
|
||||
|
||||
This is a little more complex because we have to pass around the name and path
|
||||
separately between the VM and the host app during canonicalization. If we want
|
||||
the canonicalized form to keep those separate as well, then the way we keep
|
||||
track of previously-loaded modules needs to get more complex too. Likewise the
|
||||
way we show stack traces, etc.
|
||||
|
||||
The main proposal gloms everything into a single string using ":" to separate
|
||||
the logical name part from the path. That's a little arbitrary, but it keeps
|
||||
the VM a good bit simpler and means the idea of there being a "package name" is
|
||||
pure host app policy.
|
||||
Reference in New Issue
Block a user