Copy edit the string docs.

2026-01-11 06:08:41 +01:00 · 2015-09-12 09:42:31 -07:00
parent fe143644b3
commit 92c2b2d5e0
1 changed files with 19 additions and 17 deletions
--- a/doc/site/core/string.markdown
+++ b/doc/site/core/string.markdown
@ -2,46 +2,49 @@
 ^category core

 A string is an immutable array of bytes. Strings usually store text, in which
-case it will be the UTF-8 encoding of the text's code points. But you can put
-any kind of byte values in there you want, including null bytes or invalid UTF-8
-sequences.
+case the bytes are the UTF-8 encoding of the text's code points. But you can put
+any kind of byte values in there you want, including null bytes or invalid
+UTF-8.

 There are a few ways to think of a string:

 * As a searchable chunk of text composed of a sequence of textual code points.

-* As an iterable sequence of numeric code points.
+* As an iterable sequence of code point numbers.

 * As a flat array of directly indexable bytes.

 All of those are useful for some problems, so the string API supports all three.
-The first one is the most common, so that's what most methods directly on the
-string class cater towards.
+The first one is the most common, so that's what methods directly on the string
+class cater to.

 In UTF-8, a single Unicode code point&mdash;very roughly a single
-"character"&mdash; may be encoded as one or more bytes. This means you can't
+"character"&mdash;may encode to one or more bytes. This means you can't
 efficiently index by code point. There's no way to jump directly to, say, the
-fifth code unit in a string without walking the string from the beginning and
+fifth code point in a string without walking the string from the beginning and
 counting them as you go.

-Because counting code units is relatively slow, the indexes passed to string
+Because counting code points is relatively slow, the indexes passed to string
 methods are *byte* offsets, not *code point* offsets. When you do:

    :::dart
    someString[3]

-That means "get the code unit starting at *byte* three", not "get the third
-code unit in the string". This sounds scary, but keep in mind that the methods
-on string *return* byte indices too. So, for example, this does what you want:
+That means "get the code point starting at *byte* three", not "get the third
+code point in the string". This sounds scary, but keep in mind that the methods
+on strings *return* byte indexes too. So, for example, this does what you want:

    :::dart
    var metalBand = "Fäcëhämmër"
    var hPosition = metalBand.indexOf("h")
    IO.print(metalBand[hPosition]) // "h"

-If you want to work with a string as a sequence numeric code points, call the `codePoints` getter. It returns a [Sequence](sequence.html) that will decide UTF-8 and iterate over the code points, returning each as a number.
+If you want to work with a string as a sequence numeric code points, call the
+`codePoints` getter. It returns a [Sequence](sequence.html) that decodes UTF-8
+and iterates over the code points, returning each as a number.

-If you want to get at the raw bytes, call `bytes`. This returns a Sequence that ignores any UTF-8 encoding and works directly at the byte level.
+If you want to get at the raw bytes, call `bytes`. This returns a Sequence that
+ignores any UTF-8 encoding and works directly at the byte level.

 ## Static Methods

@ -156,7 +159,7 @@ Check if the string is not equal to `other`.

 ### **[**index**]** operator

-Returns a string containing the code unit starting at byte `index`.
+Returns a string containing the code point starting at byte `index`.

    :::dart
    IO.print("ʕ•ᴥ•ʔ"[5]) // "ᴥ".
@ -165,8 +168,7 @@ Since `ʕ` is two bytes in UTF-8 and `•` is three, the fifth byte points to th
 bear's nose.

 If `index` points into the middle of a UTF-8 sequence or at otherwise invalid
-UTF-8, this returns a one-byte string containing the value of the byte at that
-index:
+UTF-8, this returns a one-byte string containing the byte at that index:

    :::dart
    IO.print("I ♥ NY"[3]) // One-byte string whose value is 153.