Proposal for smarter imports.

2017-10-31 07:49:40 -07:00
parent 88ff8f90ef
commit 9f79dfcdac
1 changed files with 475 additions and 0 deletions
--- a/doc/rfc/0001-smarter-imports.md
+++ b/doc/rfc/0001-smarter-imports.md
@ -0,0 +1,475 @@
+# Smarter Imports
+
+Here's a proposal for improving how imported modules are identified and found
+to hopefully help us start growing an ecosystem of reusable Wren code. Please
+do [let me know][list] what you think!
+
+[list]: https://groups.google.com/forum/#!forum/wren-lang
+
+## Motivation
+
+As [others][210] [have][325] [noted][346], the way imports work in Wren,
+particularly how the CLI resolves them, makes it much too hard to reuse code.
+This proposal aims to improve that. It doesn't intend to fix *everything* about
+imports and the module system, but should leave the door open for later
+improvements.
+
+[210]: https://github.com/munificent/wren/issues/210
+[325]: https://github.com/munificent/wren/issues/325
+[346]: https://github.com/munificent/wren/issues/346
+
+### Relative imports
+
+Today, it's hard to reuse your own code unless you literally dump everything in
+a single directory. Say you have:
+
+```text
+script_a.wren
+useful_stuff/
+  script_b.wren
+  thing_1.wren
+  thing_2.wren
+```
+
+`script_a.wren` and `script_b.wren` are both scripts you can run directly from
+the CLI. They would both like to use `thing_1.wren`, which in turn imports
+`thing_2.wren`. What does `thing_1.wren` look like? If you do:
+
+```scala
+// thing_1.wren
+import "thing_2"
+```
+
+Then it works fine if you run `script_b.wren` from the `useful_stuff/`
+directory. But if you try to run `script_a.wren` from the top level directory,
+then it looks for `thing_2.wren` *there* and fails to find it. If you change the
+import to:
+
+```scala
+// thing_1.wren
+import "useful_stuff/thing_2"
+```
+
+Then `script_a.wren` works, but now `script_b.wren` is broken. The problem is
+that all imports are treated as relative to the directory containing the
+*initial script* you run. That means you can't reuse modules from scripts that
+live in different directories.
+
+In this example, if feels like imports should be treated as relative to the
+file that contains the import statement. Often you want to specify, "Here is
+*where* this other module is, relative to where *I* am."
+
+### Logical imports
+
+If we make imports relative, is that enough? Should *all* imports be relative? I
+don't think so. First of all, some modules are not even on the file system.
+There is no relative path that will take you to "random" — it's built into the
+VM itself. Likewise, "io" is baked into the CLI.
+
+Today, when you write:
+
+```scala
+import "io"
+```
+
+You aren't saying *where* that module should be found, you're saying *what*
+module you want. Assuming we get a package manager at some point, these kinds of
+"logical" imports will be common. So I want these too.
+
+If you look at other langauges' package managers, you'll find many times a
+single package offers a number of separate libraries you can use. So I also
+want to support logical imports that contain a path too — the import would say
+both *what* package to look in and *where* in that package to look.
+
+### Only logical imports?
+
+Given some kind of package-y import syntax, could we get rid of relative imports
+and use those for everything? You'd treat your own program like it was itself
+some kind of package and anything you wanted to import in it you'd import
+relative to your app's root directory.
+
+The problem is that the "root directory" for your program's "package" isn't
+well-defined. We could say it's always the same directory as the script you're
+running, but that's probably too limiting. You may want to run scripts that live
+in subdirectories.
+
+We could walk up the parent directories looking for some kind of "manifest" file
+that declares "the root of the package is here", but that seems like a lot of
+hassle if you just want to create a couple of text files and start getting some
+code running. So, for your own programs, I think it's nice to still support
+"pure" relative imports.
+
+### Ambiguity?
+
+OK, so we want both relative imports and logical imports. Can we use the same
+syntax for both? We could allow, say:
+
+```scala
+import "a/b"
+```
+
+And the semantics would be:
+
+1.  Look for a module "a/b.wren" relative to the file containing the import. If
+    found, use it.
+
+2.  Otherwise, look inside some "package" directory for a package named "a" and
+    a module named "b.wren" inside it. If found use that.
+
+3.  Otherwise, look for a built in module named "a".
+
+This is pretty much how things work now, but I don't think it's a good idea.
+Relative imports will tend to be short — often single words like "utils".
+Assuming we get a healthy package ecosystem at some point, the chances of one of
+those colliding with a logical import name are high.
+
+Also, when reading code, I think it's important to be able to easily tell "this
+import is from my own program" without having to know the names of all of the
+files and directories in the program.
+
+## Proposal
+
+OK, so here's my goals:
+
+1.  A way to import a module relative to the one containing the import.
+2.  A way to import a module from some named logical package, possibly at a
+    specific path within that package.
+3.  Distinct syntaxes for each of these.
+
+I tried a few different ideas, and my favorite is:
+
+### Relative imports
+
+Relative imports use the existing syntax:
+
+```scala
+// Relative path.
+import "ast/expr"
+```
+
+This looks for the file `ast/expr.wren` relative to the directory containing the
+module that has this import statement in it.
+
+You can also walk out of directories if you need to import a module in a parent
+folder:
+
+```scala
+import "../../other/stuff"
+```
+
+### Logical imports
+
+If you want to import a module from some named logical entity, you use an
+*unquoted* identifier:
+
+```scala
+import random
+```
+
+Being unquoted means the names must be valid Wren identifiers and can't be
+reserved words. I think that's OK. It would confuse the hell out of people if
+you had a library named "if". I think the above *looks* nice, and the fact that
+it's not quoted sends a signal (to me at least) that the name is a "what" more
+than a "where".
+
+If you want to import a specific module within a logical entity, you can have a
+series of slash-separate identifiers after the name:
+
+```scala
+import wrenalyzer/ast/expr
+```
+
+This imports module "ast/expr" from "wrenalyzer".
+
+## Implementation
+
+That's the proposed syntax and basic semantics. The way we actually implement it
+is tricky because Wren is both a standalone interpreter you can run on the
+command line and an embedded scripting language. We have to figure out what goes
+into the VM and what lives in the CLI, and the interface between the two.
+
+### VM
+
+As usual, I want to keep the VM minimal and free of policy. We do need to add
+support for the new unquoted syntax. The more significant change is to the API
+the VM uses to talk to the host app when a module is imported. The VM doesn't
+know how to actually load modules. When it executes an import statement, it
+calls:
+
+```c
+char* loadModuleFn(WrenVM* vm, const char* name);
+```
+
+The VM tells the host app the import string and the host app returns the code.
+In order to distinguish relative imports (quoted) from an identical unquoted
+name and path, we need to pass in an extra to bit to tell the host whether there
+were quotes or not.
+
+The more challenging change (and the reason I didn't support them when I first
+added imports to Wren) is relative imports. There are two tricky parts:
+
+First, the host app doesn't have enough context to resolve a relative import.
+Right now, the VM only passes in the import string. It doesn't tell which module
+*contains* that import string, so the host has no way of knowing what that
+import should be relative *to*.
+
+That's easy to fix. We have the VM pass in the name of the module that contains
+the import.
+
+The harder problem is **canonicalization**. When you import the same module
+twice, the VM ensures it is only executed once and both places use the same
+module data. This is important to ensure you don't get confusing things like
+duplicate static state or other weird side effects.
+
+To do that, the VM needs to be able to tell when two imports refer to the "same"
+module. Right now, it uses the import string itself. If two imports use the same
+string, they are the same module.
+
+With relative imports, that is no longer valid. Consider:
+
+```text
+script_a.wren
+useful_stuff/
+  thing_1.wren
+  thing_2.wren
+```
+
+Now imagine those files contain:
+
+```scala
+// script_a.wren
+import "useful_stuff/thing_1"
+import "useful_stuff/thing_2"
+
+// useful_stuff/thing_1.wren
+import "thing_2"
+
+// useful_stuff/thing_2.wren
+// Stuff...
+```
+
+Both `script_a.wren` and `thing_1` import `thing_2`, but the import *strings*
+are different. The VM needs to be able to figure out that those two imports
+refer to the same module. I don't want path manipulation logic in the VM, so it
+will delegate to the host app for that as well.
+
+Given the import string and the name of the module containing it, the host app
+produces a "fully-qualified" or "canonical" name for the imported module. It is
+*that* resulting string that the VM uses to tell if two imports resolve to the
+same module. (It's also the string it uses in things like stack traces.)
+
+This means importing becomes a three stage process:
+
+1.  First the VM asks the host to resolve an import. It gives it the (previously
+    resolved) name of the module containing the import, the imports string, and
+    whether or not it was quoted. The host app returns a canonical string for
+    that import.
+
+2.  The VM checks to see if a module with that canonical name has already been
+    imported. If so, it reuses that and its done.
+
+3.  Otherwise, it circles back and asks the host for the source of the module
+    with that given canonical name. It compiles and executes that and goes from
+    there.
+
+So we add a new callback to the embedding API. Something like:
+
+```c
+char* resolveModuleFn(WrenVM* vm,
+    // Canonical name of the module containing the import.
+    const char* importer,
+
+    // The import string.
+    const char* path,
+
+    // Whether the path name was quoted.
+    bool isQuoted);
+```
+
+The VM invokes this for step one above. The other two steps are the existing
+loading logic but now using the canonicalized string.
+
+### CLI
+
+All of the policy lives over in the CLI (or in your app if you are embedding the
+VM). You are free to use whatever canonicalization policy makes sense for you.
+For the CLI, and for the policy described up in motivation, it's something like
+this:
+
+*   Imports are slash-separated paths. Resolving a relative path is normal path
+    joining relative to the directory containing the import. So if you're
+    importing "a/b" from "c/d" (which is a file named "d.wren" in a directory
+    "c"), then the canonical name is "c/a/b" and the file is "c/a/b.wren".
+
+    ".." and "." are allowed and are normalized. So these imports all resolve
+    to the same module:
+
+    ```scala
+    import "a/b/c"
+    import "a/./b/./c"
+    import "a/d/../b/c"
+    ```
+
+*   If an import is quoted, the path is considered relative to the importing
+    module's path, and is in the same package as the importing module.
+
+    So, if the current file is "a/b/c.wren" in package "foo" then these are
+    equivalent:
+
+    ```scala
+    import "d/e"
+    import foo/a/b/d/e
+    ```
+
+*   If an import is unquoted, the first identifier is the logical "package"
+    containing the module, and the remaining components are the path within that
+    package. The canonicalized string is the logical name, a colon, then the
+    resolved full path to the import (without the ".wren" file extension).
+    So if you import:
+
+    ```scala
+    import wrenalyzer/ast/expr
+    ```
+
+    The canonical name is "wrenalyzer:ast/expr".
+
+*   If an import is a single unquoted name, the CLI implicitly uses the name as
+    the module to look for within that package. These are equivalent:
+
+    ```scala
+    import foo
+    import foo/foo
+    ```
+
+    We could use some default name like "module" instead of the package name,
+    similar to Python, but I think this is actually a little more usable in
+    practice. If you're hacking on a bunch of packages at the same time, it's
+    annoying if every tab in your text editor just says "module.wren".
+
+*   The canonicalized string for the main script or a module imported using a
+    relative path from the main script is just the normalized file path,
+    probably relative to the working directory.
+
+*   Since colon is used to separate the name from path, path components with
+    colons are not allowed.
+
+### Finding logical imports
+
+The last remaining piece is how the CLI physically locates logical imports. If
+you write:
+
+```scala
+import foo
+```
+
+Where does it look for "foo"? Of course, if "foo" is built into the VM like
+"random", then that's easy. Likewise, if it's built into the CLI like "io",
+that's easy too.
+
+Otherwise, it will try to find it on the file system. We don't have a package
+manager yet, so we need some kind of simple policy so you can "hand-author" the
+layout a package manager would produce. Borrowing from Node, the basic idea is
+pretty simple.
+
+To find a logical import, the CLI starts in the directory that contains the main
+script (not the directory containing the module doing the import), and looks for
+a directory named "wren_modules". If not found there, it starts walking up
+parent directories until it finds one. If it does, it looks for the logical
+import inside there. So, if you import "foo", it will try to find
+"wren_modules/foo/foo.wren".
+
+Once it finds a "wren_modules" directory, it uses that one directory for all
+logical imports. You can't scatter stuff across multiple "wren_modules" folders
+at different levels of the hierarchy. If it can't find a "wren_modules"
+directory, or it can't find the requested module inside the directory, the
+import fails.
+
+This means that to reuse someone else's Wren "package" (or your own for that
+matter), you can just stick a "wren_modules" directory next to the main script
+for your app or in some parent directory. Inside that "wren_modules" directory,
+copy in the package you want to reuse. If that package in turn uses other
+packages, copy those into the *same* "wren_modules" directory. In other words,
+the transitive dependencies get flattened. This is important to handle shared
+dependencies between packages without duplication.
+
+You only need to worry about all of this if you actually have logical imports.
+If you just have a couple of files that import each other, you can use straight
+relative imports and everything just works.
+
+## Migration
+
+OK, that's the plan. How do we get there? I've start hacking on the
+implementation a little and, so far, it seems straightforward. Honestly, it will
+probably take less time than I spent writing this up.
+
+The tricky part is that this is a breaking change. All of your existing quoted
+import strings will mean something different. We definitely *can* and will make
+breaking changes in Wren, so that's OK, but I'd like to minimize the pain. Right
+now, Wren is currently at version 0.1.0. I'll probably consider the commit right
+before I start landing this to be the "official" 0.1.0 release and then the
+import changes will land in "0.2.0". I'll work in a branch off master until
+everything looks solid and then merge it in.
+
+If you have existing Wren code that you run on the CLI and that contains
+imports, you'll probably need to tweak them.
+
+If you are hosting Wren in your own app, the imports are fine since your app
+has control over how they resolve. But you will have to fix your app a little
+since the import embedding API is going to change to deal with canonicalization.
+I think I can make it so that if you don't provide a canonicalization callback,
+then the original import string is treated as the canonical string and you
+fall back to the current behavior.
+
+## Alternatives
+
+Having both quoted and unquoted import strings is a little funny, but it's the
+best I could come up with. For what it's worth, I [borrowed it from
+Racket][racket].
+
+[racket]: https://docs.racket-lang.org/guide/module-basics.html
+
+I considered a couple of other ideas which are potentially on the table if
+most of you don't dig the main proposal:
+
+### Node-style
+
+In Node, [all imports are quoted][node]. To distinguish between relative and
+logical imports, relative imports always start with "./". In Wren, it would be:
+
+[node]: https://nodejs.org/api/modules.html
+
+```scala
+import "./something/relative"
+import "logical/thing"
+```
+
+This is simpler than the main proposal since there are no syntax changes and we
+don't need to push the "was quoted?" bit through the embedding API. But I find
+the "./" pretty unintuitive especially if you're not steeped in the UNIX
+tradition. Even if you are, it's weird that you *need* to use "./" when it means
+nothing to the filesystem.
+
+### Unquoted identifiers
+
+The other idea I had was to allow both an unquoted identifier and a quoted
+path, like:
+
+```scala
+import wrenalyzer "ast/expr"
+```
+
+The unquoted name is the logical part — the package name. The quoted part is
+the path within that logical package. If you omit the unquoted name, it's a
+straight relative import. If you have a name but no path, it's desugars to use
+the name as the path.
+
+This is a little more complex because we have to pass around the name and path
+separately between the VM and the host app during canonicalization. If we want
+the canonicalized form to keep those separate as well, then the way we keep
+track of previously-loaded modules needs to get more complex too. Likewise the
+way we show stack traces, etc.
+
+The main proposal gloms everything into a single string using ":" to separate
+the logical name part from the path. That's a little arbitrary, but it keeps
+the VM a good bit simpler and means the idea of there being a "package name" is
+pure host app policy.