Rationalize string lengths.

The .count getter on string returns the number of code points. That's O(n), but it's consistent with the rest of the main string API. If you want the number of bytes, it's "string".bytes.count. Updated the docs. Fixes 68. Woo!
2015-09-11 21:33:26 -07:00
parent c0b5ec9f15
commit fe143644b3
4 changed files with 78 additions and 30 deletions
--- a/test/core/string/count.wren
+++ b/test/core/string/count.wren
@ -6,3 +6,13 @@ IO.print("\0".count)  // expect: 1
 IO.print("a\0b".count)  // expect: 3
 IO.print("\0c".count)  // expect: 2
 IO.print(("a\0b" + "\0c").count)  // expect: 5
+
+// Treats a UTF-8 sequence as a single item.
+//
+// Bytes:           11111
+//        012345678901234
+// Chars: sø mé ஃ  thî ng
+IO.print("søméஃthîng".count) // expect: 10
+
+// Counts invalid UTF-8 one byte at a time.
+IO.print("\xefok\xf7".count) // expect: 4
--- a/test/core/string_code_point_sequence/count.wren
+++ b/test/core/string_code_point_sequence/count.wren
@ -0,0 +1,18 @@
+IO.print("".codePoints.count)   // expect: 0
+IO.print("a string".codePoints.count) // expect: 8
+
+// 8-bit clean.
+IO.print("\0".codePoints.count)  // expect: 1
+IO.print("a\0b".codePoints.count)  // expect: 3
+IO.print("\0c".codePoints.count)  // expect: 2
+IO.print(("a\0b" + "\0c").codePoints.count)  // expect: 5
+
+// Treats a UTF-8 sequence as a single item.
+//
+// Bytes:           11111
+//        012345678901234
+// Chars: sø mé ஃ  thî ng
+IO.print("søméஃthîng".codePoints.count) // expect: 10
+
+// Counts invalid UTF-8 one byte at a time.
+IO.print("\xefok\xf7".codePoints.count) // expect: 4