Unicode 9 updates (#70)

* Updates for Unicode 9.0.0 TR29 Changes

- New rules GB10/(12/13) are used to combine emoji-zwj sequences/
  (force grapheme breaks every two RI codepoints). Unfortunately this
  breaks statelessness of grapheme-boundary determination. Deal with
  this by ignoring the problem in utf8proc_grapheme_break, and by
  hacking in a special case in decompose

- ZWJ moved to its own boundclass, update what is now GB9 accordingly.

- Add comments to indicate which rule a given case implements

- The Number of bound classes Now exceeds 4 bits, expand to 8 and
  reorganize fields

* Import Unicode 9 data

* Update Grapheme break API to expose state override

* Bump MAJOR version
This commit is contained in:
Keno Fischer
2016-06-28 16:04:25 -04:00
committed by Steven G. Johnson
parent 3d0576a9b9
commit 41c6b23aab
7 changed files with 11517 additions and 11113 deletions

View File

@@ -2,6 +2,6 @@ include/
include/utf8proc.h
lib/
lib/libutf8proc.a
lib/libutf8proc.so -> libutf8proc.so.2.0.1
lib/libutf8proc.so.2 -> libutf8proc.so.2.0.1
lib/libutf8proc.so.2.0.1
lib/libutf8proc.so -> libutf8proc.so.3.0.0
lib/libutf8proc.so.3 -> libutf8proc.so.3.0.0
lib/libutf8proc.so.3.0.0