Unicode 9 updates (#70)

* Updates for Unicode 9.0.0 TR29 Changes

- New rules GB10/(12/13) are used to combine emoji-zwj sequences/
  (force grapheme breaks every two RI codepoints). Unfortunately this
  breaks statelessness of grapheme-boundary determination. Deal with
  this by ignoring the problem in utf8proc_grapheme_break, and by
  hacking in a special case in decompose

- ZWJ moved to its own boundclass, update what is now GB9 accordingly.

- Add comments to indicate which rule a given case implements

- The Number of bound classes Now exceeds 4 bits, expand to 8 and
  reorganize fields

* Import Unicode 9 data

* Update Grapheme break API to expose state override

* Bump MAJOR version
This commit is contained in:
Keno Fischer
2016-06-28 16:04:25 -04:00
committed by Steven G. Johnson
parent 3d0576a9b9
commit 41c6b23aab
7 changed files with 11517 additions and 11113 deletions

View File

@@ -19,9 +19,9 @@ UCFLAGS = $(CFLAGS) $(PICFLAG) $(C99FLAG) $(WCFLAGS) -DUTF8PROC_EXPORTS
# not API compatibility: MAJOR should be incremented whenever *binary*
# compatibility is broken, even if the API is backward-compatible
# Be sure to also update these in MANIFEST and CMakeLists.txt!
MAJOR=2
MAJOR=3
MINOR=0
PATCH=1
PATCH=0
OS := $(shell uname)
ifeq ($(OS),Darwin) # MacOS X