From 9f79dfcdac466c1198cecedd12fc857d6e62bc36 Mon Sep 17 00:00:00 2001 From: Bob Nystrom Date: Tue, 31 Oct 2017 07:49:40 -0700 Subject: [PATCH] Proposal for smarter imports. --- doc/rfc/0001-smarter-imports.md | 475 ++++++++++++++++++++++++++++++++ 1 file changed, 475 insertions(+) create mode 100644 doc/rfc/0001-smarter-imports.md diff --git a/doc/rfc/0001-smarter-imports.md b/doc/rfc/0001-smarter-imports.md new file mode 100644 index 00000000..31141453 --- /dev/null +++ b/doc/rfc/0001-smarter-imports.md @@ -0,0 +1,475 @@ +# Smarter Imports + +Here's a proposal for improving how imported modules are identified and found +to hopefully help us start growing an ecosystem of reusable Wren code. Please +do [let me know][list] what you think! + +[list]: https://groups.google.com/forum/#!forum/wren-lang + +## Motivation + +As [others][210] [have][325] [noted][346], the way imports work in Wren, +particularly how the CLI resolves them, makes it much too hard to reuse code. +This proposal aims to improve that. It doesn't intend to fix *everything* about +imports and the module system, but should leave the door open for later +improvements. + +[210]: https://github.com/munificent/wren/issues/210 +[325]: https://github.com/munificent/wren/issues/325 +[346]: https://github.com/munificent/wren/issues/346 + +### Relative imports + +Today, it's hard to reuse your own code unless you literally dump everything in +a single directory. Say you have: + +```text +script_a.wren +useful_stuff/ + script_b.wren + thing_1.wren + thing_2.wren +``` + +`script_a.wren` and `script_b.wren` are both scripts you can run directly from +the CLI. They would both like to use `thing_1.wren`, which in turn imports +`thing_2.wren`. What does `thing_1.wren` look like? If you do: + +```scala +// thing_1.wren +import "thing_2" +``` + +Then it works fine if you run `script_b.wren` from the `useful_stuff/` +directory. But if you try to run `script_a.wren` from the top level directory, +then it looks for `thing_2.wren` *there* and fails to find it. If you change the +import to: + +```scala +// thing_1.wren +import "useful_stuff/thing_2" +``` + +Then `script_a.wren` works, but now `script_b.wren` is broken. The problem is +that all imports are treated as relative to the directory containing the +*initial script* you run. That means you can't reuse modules from scripts that +live in different directories. + +In this example, if feels like imports should be treated as relative to the +file that contains the import statement. Often you want to specify, "Here is +*where* this other module is, relative to where *I* am." + +### Logical imports + +If we make imports relative, is that enough? Should *all* imports be relative? I +don't think so. First of all, some modules are not even on the file system. +There is no relative path that will take you to "random" — it's built into the +VM itself. Likewise, "io" is baked into the CLI. + +Today, when you write: + +```scala +import "io" +``` + +You aren't saying *where* that module should be found, you're saying *what* +module you want. Assuming we get a package manager at some point, these kinds of +"logical" imports will be common. So I want these too. + +If you look at other langauges' package managers, you'll find many times a +single package offers a number of separate libraries you can use. So I also +want to support logical imports that contain a path too — the import would say +both *what* package to look in and *where* in that package to look. + +### Only logical imports? + +Given some kind of package-y import syntax, could we get rid of relative imports +and use those for everything? You'd treat your own program like it was itself +some kind of package and anything you wanted to import in it you'd import +relative to your app's root directory. + +The problem is that the "root directory" for your program's "package" isn't +well-defined. We could say it's always the same directory as the script you're +running, but that's probably too limiting. You may want to run scripts that live +in subdirectories. + +We could walk up the parent directories looking for some kind of "manifest" file +that declares "the root of the package is here", but that seems like a lot of +hassle if you just want to create a couple of text files and start getting some +code running. So, for your own programs, I think it's nice to still support +"pure" relative imports. + +### Ambiguity? + +OK, so we want both relative imports and logical imports. Can we use the same +syntax for both? We could allow, say: + +```scala +import "a/b" +``` + +And the semantics would be: + +1. Look for a module "a/b.wren" relative to the file containing the import. If + found, use it. + +2. Otherwise, look inside some "package" directory for a package named "a" and + a module named "b.wren" inside it. If found use that. + +3. Otherwise, look for a built in module named "a". + +This is pretty much how things work now, but I don't think it's a good idea. +Relative imports will tend to be short — often single words like "utils". +Assuming we get a healthy package ecosystem at some point, the chances of one of +those colliding with a logical import name are high. + +Also, when reading code, I think it's important to be able to easily tell "this +import is from my own program" without having to know the names of all of the +files and directories in the program. + +## Proposal + +OK, so here's my goals: + +1. A way to import a module relative to the one containing the import. +2. A way to import a module from some named logical package, possibly at a + specific path within that package. +3. Distinct syntaxes for each of these. + +I tried a few different ideas, and my favorite is: + +### Relative imports + +Relative imports use the existing syntax: + +```scala +// Relative path. +import "ast/expr" +``` + +This looks for the file `ast/expr.wren` relative to the directory containing the +module that has this import statement in it. + +You can also walk out of directories if you need to import a module in a parent +folder: + +```scala +import "../../other/stuff" +``` + +### Logical imports + +If you want to import a module from some named logical entity, you use an +*unquoted* identifier: + +```scala +import random +``` + +Being unquoted means the names must be valid Wren identifiers and can't be +reserved words. I think that's OK. It would confuse the hell out of people if +you had a library named "if". I think the above *looks* nice, and the fact that +it's not quoted sends a signal (to me at least) that the name is a "what" more +than a "where". + +If you want to import a specific module within a logical entity, you can have a +series of slash-separate identifiers after the name: + +```scala +import wrenalyzer/ast/expr +``` + +This imports module "ast/expr" from "wrenalyzer". + +## Implementation + +That's the proposed syntax and basic semantics. The way we actually implement it +is tricky because Wren is both a standalone interpreter you can run on the +command line and an embedded scripting language. We have to figure out what goes +into the VM and what lives in the CLI, and the interface between the two. + +### VM + +As usual, I want to keep the VM minimal and free of policy. We do need to add +support for the new unquoted syntax. The more significant change is to the API +the VM uses to talk to the host app when a module is imported. The VM doesn't +know how to actually load modules. When it executes an import statement, it +calls: + +```c +char* loadModuleFn(WrenVM* vm, const char* name); +``` + +The VM tells the host app the import string and the host app returns the code. +In order to distinguish relative imports (quoted) from an identical unquoted +name and path, we need to pass in an extra to bit to tell the host whether there +were quotes or not. + +The more challenging change (and the reason I didn't support them when I first +added imports to Wren) is relative imports. There are two tricky parts: + +First, the host app doesn't have enough context to resolve a relative import. +Right now, the VM only passes in the import string. It doesn't tell which module +*contains* that import string, so the host has no way of knowing what that +import should be relative *to*. + +That's easy to fix. We have the VM pass in the name of the module that contains +the import. + +The harder problem is **canonicalization**. When you import the same module +twice, the VM ensures it is only executed once and both places use the same +module data. This is important to ensure you don't get confusing things like +duplicate static state or other weird side effects. + +To do that, the VM needs to be able to tell when two imports refer to the "same" +module. Right now, it uses the import string itself. If two imports use the same +string, they are the same module. + +With relative imports, that is no longer valid. Consider: + +```text +script_a.wren +useful_stuff/ + thing_1.wren + thing_2.wren +``` + +Now imagine those files contain: + +```scala +// script_a.wren +import "useful_stuff/thing_1" +import "useful_stuff/thing_2" + +// useful_stuff/thing_1.wren +import "thing_2" + +// useful_stuff/thing_2.wren +// Stuff... +``` + +Both `script_a.wren` and `thing_1` import `thing_2`, but the import *strings* +are different. The VM needs to be able to figure out that those two imports +refer to the same module. I don't want path manipulation logic in the VM, so it +will delegate to the host app for that as well. + +Given the import string and the name of the module containing it, the host app +produces a "fully-qualified" or "canonical" name for the imported module. It is +*that* resulting string that the VM uses to tell if two imports resolve to the +same module. (It's also the string it uses in things like stack traces.) + +This means importing becomes a three stage process: + +1. First the VM asks the host to resolve an import. It gives it the (previously + resolved) name of the module containing the import, the imports string, and + whether or not it was quoted. The host app returns a canonical string for + that import. + +2. The VM checks to see if a module with that canonical name has already been + imported. If so, it reuses that and its done. + +3. Otherwise, it circles back and asks the host for the source of the module + with that given canonical name. It compiles and executes that and goes from + there. + +So we add a new callback to the embedding API. Something like: + +```c +char* resolveModuleFn(WrenVM* vm, + // Canonical name of the module containing the import. + const char* importer, + + // The import string. + const char* path, + + // Whether the path name was quoted. + bool isQuoted); +``` + +The VM invokes this for step one above. The other two steps are the existing +loading logic but now using the canonicalized string. + +### CLI + +All of the policy lives over in the CLI (or in your app if you are embedding the +VM). You are free to use whatever canonicalization policy makes sense for you. +For the CLI, and for the policy described up in motivation, it's something like +this: + +* Imports are slash-separated paths. Resolving a relative path is normal path + joining relative to the directory containing the import. So if you're + importing "a/b" from "c/d" (which is a file named "d.wren" in a directory + "c"), then the canonical name is "c/a/b" and the file is "c/a/b.wren". + + ".." and "." are allowed and are normalized. So these imports all resolve + to the same module: + + ```scala + import "a/b/c" + import "a/./b/./c" + import "a/d/../b/c" + ``` + +* If an import is quoted, the path is considered relative to the importing + module's path, and is in the same package as the importing module. + + So, if the current file is "a/b/c.wren" in package "foo" then these are + equivalent: + + ```scala + import "d/e" + import foo/a/b/d/e + ``` + +* If an import is unquoted, the first identifier is the logical "package" + containing the module, and the remaining components are the path within that + package. The canonicalized string is the logical name, a colon, then the + resolved full path to the import (without the ".wren" file extension). + So if you import: + + ```scala + import wrenalyzer/ast/expr + ``` + + The canonical name is "wrenalyzer:ast/expr". + +* If an import is a single unquoted name, the CLI implicitly uses the name as + the module to look for within that package. These are equivalent: + + ```scala + import foo + import foo/foo + ``` + + We could use some default name like "module" instead of the package name, + similar to Python, but I think this is actually a little more usable in + practice. If you're hacking on a bunch of packages at the same time, it's + annoying if every tab in your text editor just says "module.wren". + +* The canonicalized string for the main script or a module imported using a + relative path from the main script is just the normalized file path, + probably relative to the working directory. + +* Since colon is used to separate the name from path, path components with + colons are not allowed. + +### Finding logical imports + +The last remaining piece is how the CLI physically locates logical imports. If +you write: + +```scala +import foo +``` + +Where does it look for "foo"? Of course, if "foo" is built into the VM like +"random", then that's easy. Likewise, if it's built into the CLI like "io", +that's easy too. + +Otherwise, it will try to find it on the file system. We don't have a package +manager yet, so we need some kind of simple policy so you can "hand-author" the +layout a package manager would produce. Borrowing from Node, the basic idea is +pretty simple. + +To find a logical import, the CLI starts in the directory that contains the main +script (not the directory containing the module doing the import), and looks for +a directory named "wren_modules". If not found there, it starts walking up +parent directories until it finds one. If it does, it looks for the logical +import inside there. So, if you import "foo", it will try to find +"wren_modules/foo/foo.wren". + +Once it finds a "wren_modules" directory, it uses that one directory for all +logical imports. You can't scatter stuff across multiple "wren_modules" folders +at different levels of the hierarchy. If it can't find a "wren_modules" +directory, or it can't find the requested module inside the directory, the +import fails. + +This means that to reuse someone else's Wren "package" (or your own for that +matter), you can just stick a "wren_modules" directory next to the main script +for your app or in some parent directory. Inside that "wren_modules" directory, +copy in the package you want to reuse. If that package in turn uses other +packages, copy those into the *same* "wren_modules" directory. In other words, +the transitive dependencies get flattened. This is important to handle shared +dependencies between packages without duplication. + +You only need to worry about all of this if you actually have logical imports. +If you just have a couple of files that import each other, you can use straight +relative imports and everything just works. + +## Migration + +OK, that's the plan. How do we get there? I've start hacking on the +implementation a little and, so far, it seems straightforward. Honestly, it will +probably take less time than I spent writing this up. + +The tricky part is that this is a breaking change. All of your existing quoted +import strings will mean something different. We definitely *can* and will make +breaking changes in Wren, so that's OK, but I'd like to minimize the pain. Right +now, Wren is currently at version 0.1.0. I'll probably consider the commit right +before I start landing this to be the "official" 0.1.0 release and then the +import changes will land in "0.2.0". I'll work in a branch off master until +everything looks solid and then merge it in. + +If you have existing Wren code that you run on the CLI and that contains +imports, you'll probably need to tweak them. + +If you are hosting Wren in your own app, the imports are fine since your app +has control over how they resolve. But you will have to fix your app a little +since the import embedding API is going to change to deal with canonicalization. +I think I can make it so that if you don't provide a canonicalization callback, +then the original import string is treated as the canonical string and you +fall back to the current behavior. + +## Alternatives + +Having both quoted and unquoted import strings is a little funny, but it's the +best I could come up with. For what it's worth, I [borrowed it from +Racket][racket]. + +[racket]: https://docs.racket-lang.org/guide/module-basics.html + +I considered a couple of other ideas which are potentially on the table if +most of you don't dig the main proposal: + +### Node-style + +In Node, [all imports are quoted][node]. To distinguish between relative and +logical imports, relative imports always start with "./". In Wren, it would be: + +[node]: https://nodejs.org/api/modules.html + +```scala +import "./something/relative" +import "logical/thing" +``` + +This is simpler than the main proposal since there are no syntax changes and we +don't need to push the "was quoted?" bit through the embedding API. But I find +the "./" pretty unintuitive especially if you're not steeped in the UNIX +tradition. Even if you are, it's weird that you *need* to use "./" when it means +nothing to the filesystem. + +### Unquoted identifiers + +The other idea I had was to allow both an unquoted identifier and a quoted +path, like: + +```scala +import wrenalyzer "ast/expr" +``` + +The unquoted name is the logical part — the package name. The quoted part is +the path within that logical package. If you omit the unquoted name, it's a +straight relative import. If you have a name but no path, it's desugars to use +the name as the path. + +This is a little more complex because we have to pass around the name and path +separately between the VM and the host app during canonicalization. If we want +the canonicalized form to keep those separate as well, then the way we keep +track of previously-loaded modules needs to get more complex too. Likewise the +way we show stack traces, etc. + +The main proposal gloms everything into a single string using ":" to separate +the logical name part from the path. That's a little arbitrary, but it keeps +the VM a good bit simpler and means the idea of there being a "package name" is +pure host app policy.