wren/modules/core/string.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<title>String Class &ndash; Wren</title>
<link rel="stylesheet" type="text/css" href="../../style.css" />
<link href='//fonts.googleapis.com/css?family=Source+Sans+Pro:400,700,400italic,700italic|Source+Code+Pro:400|Lato:400|Sanchez:400italic,400' rel='stylesheet' type='text/css'>
<!-- Tell mobile browsers we're optimized for them and they don't need to crop
     the viewport. -->
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"/>
</head>
<body id="top" class="module">
<header>
  <div class="page">
    <div class="main-column">
      <h1><a href="../../">wren</a></h1>
      <h2>a classy little scripting language</h2>
    </div>
  </div>
</header>
<div class="page">
  <nav class="big">
    <ul>
      <li><a href="../">Modules</a></li>
      <li><a href="./">core</a></li>
    </ul>
    <section>
      <h2>core classes</h2>
      <ul>
        <li><a href="bool.html">Bool</a></li>
        <li><a href="class.html">Class</a></li>
        <li><a href="fiber.html">Fiber</a></li>
        <li><a href="fn.html">Fn</a></li>
        <li><a href="list.html">List</a></li>
        <li><a href="map.html">Map</a></li>
        <li><a href="null.html">Null</a></li>
        <li><a href="num.html">Num</a></li>
        <li><a href="object.html">Object</a></li>
        <li><a href="range.html">Range</a></li>
        <li><a href="sequence.html">Sequence</a></li>
        <li><a href="string.html">String</a></li>
        <li><a href="system.html">System</a></li>
      </ul>
    </section>
  </nav>
  <nav class="small">
    <table>
      <tr>
        <td><a href="../">Modules</a></td>
        <td><a href="./">core</a></td>
      </tr>
      <tr>
        <td colspan="2"><h2>core classes</h2></td>
      </tr>
      <tr>
        <td>
          <ul>
            <li><a href="bool.html">Bool</a></li>
            <li><a href="class.html">Class</a></li>
            <li><a href="fiber.html">Fiber</a></li>
            <li><a href="fn.html">Fn</a></li>
            <li><a href="list.html">List</a></li>
            <li><a href="map.html">Map</a></li>
            <li><a href="null.html">Null</a></li>
          </ul>
        </td>
        <td>
          <ul>
            <li><a href="num.html">Num</a></li>
            <li><a href="object.html">Object</a></li>
            <li><a href="range.html">Range</a></li>
            <li><a href="sequence.html">Sequence</a></li>
            <li><a href="string.html">String</a></li>
            <li><a href="system.html">System</a></li>
          </ul>
        </td>
      </tr>
    </table>
  </nav>
  <main>
    <h1>String Class</h1>
    <p>A string is an immutable array of bytes. Strings usually store text, in which
case the bytes are the UTF-8 encoding of the text&rsquo;s code points. But you can put
any kind of byte values in there you want, including null bytes or invalid
UTF-8. </p>
<p>There are a few ways to think of a string: </p>
<ul>
<li>
<p>As a searchable chunk of text composed of a sequence of textual code points. </p>
</li>
<li>
<p>As an iterable sequence of code point numbers. </p>
</li>
<li>
<p>As a flat array of directly indexable bytes. </p>
</li>
</ul>
<p>All of those are useful for some problems, so the string API supports all three.
The first one is the most common, so that&rsquo;s what methods directly on the string
class cater to. </p>
<p>In UTF-8, a single Unicode code point&mdash;very roughly a single
&ldquo;character&rdquo;&mdash;may encode to one or more bytes. This means you can&rsquo;t
efficiently index by code point. There&rsquo;s no way to jump directly to, say, the
fifth code point in a string without walking the string from the beginning and
counting them as you go. </p>
<p>Because counting code points is relatively slow, the indexes passed to string
methods are <em>byte</em> offsets, not <em>code point</em> offsets. When you do: </p>
<div class="codehilite"><pre><span></span>someString[3]
</pre></div>


<p>That means &ldquo;get the code point starting at <em>byte</em> three&rdquo;, not &ldquo;get the third
code point in the string&rdquo;. This sounds scary, but keep in mind that the methods
on strings <em>return</em> byte indexes too. So, for example, this does what you want: </p>
<div class="codehilite"><pre><span></span>var metalBand = &quot;Fäcëhämmër&quot;
var hPosition = metalBand.indexOf(&quot;h&quot;)
System.print(metalBand[hPosition]) //&gt; h
</pre></div>


<p>If you want to work with a string as a sequence numeric code points, call the
<code>codePoints</code> getter. It returns a <a href="sequence.html">Sequence</a> that decodes UTF-8
and iterates over the code points, returning each as a number. </p>
<p>If you want to get at the raw bytes, call <code>bytes</code>. This returns a Sequence that
ignores any UTF-8 encoding and works directly at the byte level. </p>
<h2>Static Methods <a href="#static-methods" name="static-methods" class="header-anchor">#</a></h2>
<h3>String.<strong>fromCodePoint</strong>(codePoint) <a href="#string.fromcodepoint(codepoint)" name="string.fromcodepoint(codepoint)" class="header-anchor">#</a></h3>
<p>Creates a new string containing the UTF-8 encoding of <code>codePoint</code>. </p>
<div class="codehilite"><pre><span></span>String.fromCodePoint(8225) //&gt; ‡
</pre></div>


<p>It is a runtime error if <code>codePoint</code> is not an integer between <code>0</code> and
<code>0x10ffff</code>, inclusive. </p>
<h2>Methods <a href="#methods" name="methods" class="header-anchor">#</a></h2>
<h3><strong>bytes</strong> <a href="#bytes" name="bytes" class="header-anchor">#</a></h3>
<p>Gets a <a href="sequence.html"><code>Sequence</code></a> that can be used to access the raw bytes of
the string and ignore any UTF-8 encoding. In addition to the normal sequence
methods, the returned object also has a subscript operator that can be used to
directly index bytes. </p>
<div class="codehilite"><pre><span></span>System.print(&quot;hello&quot;.bytes[1]) //&gt; 101 (for &quot;e&quot;)
</pre></div>


<p>The <code>count</code> method on the returned sequence returns the number of bytes in the
string. Unlike <code>count</code> on the string itself, it does not have to iterate over
the string, and runs in constant time instead. </p>
<h3><strong>codePoints</strong> <a href="#codepoints" name="codepoints" class="header-anchor">#</a></h3>
<p>Gets a <a href="sequence.html"><code>Sequence</code></a> that can be used to access the UTF-8 decode
code points of the string <em>as numbers</em>. Iteration and subscripting work similar
to the string itself. The difference is that instead of returning
single-character strings, this returns the numeric code point values. </p>
<div class="codehilite"><pre><span></span>var string = &quot;(ᵔᴥᵔ)&quot;
System.print(string.codePoints[0]) //&gt; 40 (for &quot;(&quot;)
System.print(string.codePoints[4]) //&gt; 7461 (for &quot;ᴥ&quot;)
</pre></div>


<p>If the byte at <code>index</code> does not begin a valid UTF-8 sequence, or the end of the
string is reached before the sequence is complete, returns <code>-1</code>. </p>
<div class="codehilite"><pre><span></span>var string = &quot;(ᵔᴥᵔ)&quot;
System.print(string.codePoints[2]) //&gt; -1 (in the middle of &quot;ᵔ&quot;)
</pre></div>


<h3><strong>contains</strong>(other) <a href="#contains(other)" name="contains(other)" class="header-anchor">#</a></h3>
<p>Checks if <code>other</code> is a substring of the string. </p>
<p>It is a runtime error if <code>other</code> is not a string. </p>
<h3><strong>count</strong> <a href="#count" name="count" class="header-anchor">#</a></h3>
<p>Returns the number of code points in the string. Since UTF-8 is a
variable-length encoding, this requires iterating over the entire string, which
is relatively slow. </p>
<p>If the string contains bytes that are invalid UTF-8, each byte adds one to the
count as well. </p>
<h3><strong>endsWith</strong>(suffix) <a href="#endswith(suffix)" name="endswith(suffix)" class="header-anchor">#</a></h3>
<p>Checks if the string ends with <code>suffix</code>. </p>
<p>It is a runtime error if <code>suffix</code> is not a string. </p>
<h3><strong>indexOf</strong>(search) <a href="#indexof(search)" name="indexof(search)" class="header-anchor">#</a></h3>
<p>Returns the index of the first byte matching <code>search</code> in the string or <code>-1</code> if
<code>search</code> was not found. </p>
<p>It is a runtime error if <code>search</code> is not a string. </p>
<h3><strong>indexOf</strong>(search, start) <a href="#indexof(search,-start)" name="indexof(search,-start)" class="header-anchor">#</a></h3>
<p>Returns the index of the first byte matching <code>search</code> in the string or <code>-1</code> if
<code>search</code> was not found, starting a byte offset <code>start</code>. The start can be
negative to count backwards from the end of the string. </p>
<p>It is a runtime error if <code>search</code> is not a string or <code>start</code> is not an integer
index within the string&rsquo;s byte length. </p>
<h3><strong>split</strong>(separator) <a href="#split(separator)" name="split(separator)" class="header-anchor">#</a></h3>
<p>Returns a list of one or more strings separated by <code>separator</code>. </p>
<div class="codehilite"><pre><span></span>var string = &quot;abc abc abc&quot;
System.print(string.split(&quot; &quot;)) //&gt; [abc, abc, abc]
</pre></div>


<p>It is a runtime error if <code>separator</code> is not a string or is an empty string. </p>
<h3><strong>replace</strong>(old, swap) <a href="#replace(old,-swap)" name="replace(old,-swap)" class="header-anchor">#</a></h3>
<p>Returns a new string with all occurences of <code>old</code> replaced with <code>swap</code>. </p>
<div class="codehilite"><pre><span></span>var string = &quot;abc abc abc&quot;
System.print(string.replace(&quot; &quot;, &quot;&quot;)) //&gt; abcabcabc
</pre></div>


<h3><strong>iterate</strong>(iterator), <strong>iteratorValue</strong>(iterator) <a href="#iterate(iterator),-iteratorvalue(iterator)" name="iterate(iterator),-iteratorvalue(iterator)" class="header-anchor">#</a></h3>
<p>Implements the <a href="../../control-flow.html#the-iterator-protocol">iterator protocol</a>
for iterating over the <em>code points</em> in the string: </p>
<div class="codehilite"><pre><span></span>var codePoints = []
for (c in &quot;(ᵔᴥᵔ)&quot;) {
  codePoints.add(c)
}

System.print(codePoints) //&gt; [(, ᵔ, ᴥ, ᵔ, )]
</pre></div>


<p>If the string contains any bytes that are not valid UTF-8, this iterates over
those too, one byte at a time. </p>
<h3><strong>startsWith</strong>(prefix) <a href="#startswith(prefix)" name="startswith(prefix)" class="header-anchor">#</a></h3>
<p>Checks if the string starts with <code>prefix</code>. </p>
<p>It is a runtime error if <code>prefix</code> is not a string. </p>
<h3><strong>+</strong>(other) operator <a href="#+(other)-operator" name="+(other)-operator" class="header-anchor">#</a></h3>
<p>Returns a new string that concatenates this string and <code>other</code>. </p>
<p>It is a runtime error if <code>other</code> is not a string. </p>
<h3><strong>==</strong>(other) operator <a href="#==(other)-operator" name="==(other)-operator" class="header-anchor">#</a></h3>
<p>Checks if the string is equal to <code>other</code>. </p>
<h3><strong>!=</strong>(other) operator <a href="#=(other)-operator" name="=(other)-operator" class="header-anchor">#</a></h3>
<p>Check if the string is not equal to <code>other</code>. </p>
<h3><strong>[</strong>index<strong>]</strong> operator <a href="#[index]-operator" name="[index]-operator" class="header-anchor">#</a></h3>
<p>Returns a string containing the code point starting at byte <code>index</code>. </p>
<div class="codehilite"><pre><span></span>System.print(&quot;ʕ•ᴥ•ʔ&quot;[5]) //&gt; ᴥ
</pre></div>


<p>Since <code>ʕ</code> is two bytes in UTF-8 and <code>•</code> is three, the fifth byte points to the
bear&rsquo;s nose. </p>
<p>If <code>index</code> points into the middle of a UTF-8 sequence or at otherwise invalid
UTF-8, this returns a one-byte string containing the byte at that index: </p>
<div class="codehilite"><pre><span></span>System.print(&quot;I ♥ NY&quot;[3]) //&gt; (one-byte string [153])
</pre></div>


<p>It is a runtime error if <code>index</code> is greater than the number of bytes in the
string. </p>
  </main>
</div>
<footer>
  <div class="page">
    <div class="main-column">
    <p>Wren lives
      <a href="https://github.com/munificent/wren">on GitHub</a>
      &mdash; Made with &#x2764; by
      <a href="http://journal.stuffwithstuff.com/">Bob Nystrom</a> and
      <a href="https://github.com/munificent/wren/blob/master/AUTHORS">friends</a>.
    </p>
    <div class="main-column">
  </div>
</footer>
</body>
</html>