Timerix/utf8proc

Fork 0

Steven G. Johnson 3c4929495a

unicode 15 support (#247 )

2022-10-24 23:18:17 -04:00

14 KiB

Raw Blame History

utf8proc release history

Version 2.8.0-alpha

Unicode 15 support ([#247]).

Version 2.7.0

2021-12-16

Unicode 14 support (#233).
Support GNUInstallDirs in CMake build (#159).
cmake build now installs pkg-config file (#224).
Various build and portability improvements.

Version 2.6.1

2020-12-15

Bugfix in utf8proc_grapheme_break_stateful for NULL state argument, which also broke utf8proc_grapheme_break.

Version 2.6

2020-11-23

New utf8proc_islower and utf8proc_isupper functions (#196).
Bugfix for manual calls to grapheme_break_extended for initial characters (#205).
Various build and portability improvements.

Version 2.5

2019-03-27

Unicode 13 support (#179).
No longer report zero width for category Sk (#167).
cmake support improvements (#173).

Version 2.4

2019-05-10

Unicode 12.1 support (#156).
New -DUTF8PROC_INSTALL=No option for cmake builds to disable installation (#152).
Better make support for HP-UX (#154).
Fixed incorrect UTF8PROC_VERSION_MINOR version number in header and bumped shared-library version.

Version 2.3

2019-03-30

Unicode 12 support (#148).
New function utf8proc_unicode_version to return the supported Unicode version (#151).
Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 (#150).
Fix CHARBOUND option for utf8proc_map to preserve U+FFFE and U+FFFF non-characters (#149).
Various build-system improvements (#141, #142, #147).

Version 2.2

2018-07-24

Unicode 11 support (#132 and #140).
utf8proc_NFKC_Casefold convenience function for NFKC_Casefold normalization (#133).
UTF8PROC_STRIPNA option to strip unassigned codepoints (#133).
Support building static libraries on Windows (callers need to #define UTF8PROC_STATIC) (#123).
cmake fix to avoid defining UTF8PROC_EXPORTS globally (#121).
toupper of ß (U+00df) now yields ẞ (U+1E9E) (#134), similar to musl; case-folding still yields the standard "ss" mapping.
utf8proc_charwidth now returns 1 for U+00AD (soft hyphen) and for unassigned/PUA codepoints (#135).

Version 2.1.1

2018-04-27

Fixed composition bug (#128).
Minor build fixes (#94, #99, #113, #125).

Version 2.1

2016-12-26:

New functions utf8proc_map_custom and utf8proc_decompose_custom to allow user-supplied transformations of codepoints, in conjunction with other transformations (#89).
New function utf8proc_normalize_utf32 to apply normalizations directly to UTF-32 data (not just UTF-8) (#88).
Fixed stack overflow that could occur due to incorrect definition of UINT16_MAX with some compilers (#84).
Fixed conflict with stdbool.h in Visual Studio (#90).
Updated font metrics to use Unifont 9.0.04.

Version 2.0.2

2016-07-27:

Move -Wmissing-prototypes warning flag from Makefile to .travis.yml since MSVC does not understand this flag and it is occasionally useful to build using MSVC through the Makefile (#79).
Use a different variable name for a nested loop in bench/bench.c, and declare it in a C89 way rather than inside the for to avoid "error: 'for' loop initial declarations are only allowed in C99 mode" (#80).

Version 2.0.1

2016-07-13:

Bug fix in utf8proc_grapheme_break_stateful (#77).
Tests now use versioned Unicode files, so they will no longer break when a new version of Unicode is released (#78).

Version 2.0

2016-07-13:

Updated for Unicode 9.0 (#70).
New utf8proc_grapheme_break_stateful to handle the complicated grapheme-breaking rules in Unicode 9. The old utf8proc_grapheme_break is still provided, but may incorrectly identify grapheme breaks in some Unicode-9 sequences.
Smaller Unicode tables (#62, #68). This required changes in the utf8proc_property_t structure, which breaks backward compatibility if you access this struct directly. The functions in the API remain backward-compatible, however.
Buffer overrun fix (#66).

Version 1.3.1

2015-11-02:

Do not export symbol for internal function unsafe_encode_char() (#55).
Install relative symbolic links for shared libraries (#58).
Enable and fix compiler warnings (#55, #58).
Add missing files to make clean (#58).

Version 1.3

2015-07-06:

Updated for Unicode 8.0 (#45).
New utf8proc_tolower and utf8proc_toupper functions, portable replacements for towlower and towupper in the C library (#40).
Don't treat Unicode "non-characters" as invalid, and improved validity checking in general (#35).
Prefix all typedefs with utf8proc_, e.g. utf8proc_int32_t, to avoid collisions with other libraries (#32).
Rename DLLEXPORT to UTF8PROC_DLLEXPORT to prevent collisions.
Fix build breakage in the benchmark routines.
More fine-grained Makefile variables (PICFLAG etcetera), so that compilation flags can be selectively overridden, and in particular so that CFLAGS can be changed without accidentally eliminating necessary flags like -fPIC and -std=c99 (#43).
Updated character-width tables based on Unifont 8.0.01 (#51) and the Unicode 8 character categories (#47).

Version 1.2

2015-03-28:

Updated for Unicode 7.0 (#6).
New function utf8proc_grapheme_break(c1,c2) that returns whether there is a grapheme break between c1 and c2 (#20).
New function utf8proc_charwidth(c) that returns the number of column-positions that should be required for c; essentially a portable replacment for wcwidth(c) (#27).
New function utf8proc_category(c) that returns the Unicode category of c (as one of the constants UTF8PROC_CATEGORY_xx). Also, a function utf8proc_category_string(c) that returns the Unicode category of c as a two-character string.
cmake script CMakeLists.txt, in addition to Makefile, for easier compilation on Windows (#28).
Various Makefile improvements: a make check target to perform tests (#13), make install, a rule to automate updating the Unicode tables, etcetera.
The shared library is now versioned (e.g. has a soname on GNU/Linux) (#24).
C++/MSVC compatibility (#17).
Most #defined constants are now enums (#29).
New preprocessor constants UTF8PROC_VERSION_MAJOR, UTF8PROC_VERSION_MINOR, and UTF8PROC_VERSION_PATCH for compile-time detection of the API version.
Doxygen-formatted documentation (#29).
The Ruby and PostgreSQL plugins have been removed due to lack of testing (#22).

Version 1.1.6

2013-11-27:

PostgreSQL 9.2 and 9.3 compatibility (lowercase c language name)

Version 1.1.5

2009-08-20:

Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and RSTRING()->len for ruby1.9 compatibility (and #define them, if not existent)

2009-10-02:

Patches for compatibility with Microsoft Visual Studio

2009-10-08:

Fixes to make utf8proc usable in C++ programs

2009-10-16:

Version 1.1.4

2009-06-14:

replaced C++ style comments for compatibility reasons
added typecasts to suppress compiler warnings
removed redundant source files for ruby-gemfile generation

2009-08-19:

Changed copyright notice for Public Software Group e. V.
Minor changes in the README file

Version 1.1.3

2008-10-04:

Added a function utf8proc_version returning a string containing the version number of the library.
Included a target libutf8proc.dylib for MacOSX.

2009-05-01:

PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)

Version 1.1.2

2007-07-25:

Fixed a serious bug in the data file generator, which caused characters being treated incorrectly, when stripping default ignorable characters or calculating grapheme cluster boundaries.

Version 1.1.1

2007-06-25:

Added a new PostgreSQL function unistrip, which behaves like unifold, but also removes all character marks (e.g. accents).

2007-07-22:

Changed license from BSD to MIT style.
Added a new function utf8proc_codepoint_valid to the C library.
Changed compiler flags in Makefile from -g -O0 to -O2
The ruby script, which was used to build the utf8proc_data.c file, is now included in the distribution.

Version 1.0.3

2007-03-16:

Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method String#utf8chars).

Version 1.0.2

2006-09-21:

included a check in Integer#utf8, which raises an exception, if the given code-point is invalid because of being too high (this was missing yet)

2006-12-26:

added support for PostgreSQL version 8.2

Version 1.0.1

2006-09-20:

included a gem file for the ruby version of the library

Release of version 1.0.1

Version 1.0

2006-09-17:

added the LUMP option, which lumps certain characters together (see lump.md) (also used for the PostgreSQL unifold function)
added the STRIPMARK option, which strips marking characters (or marks of composed characters)
deprecated ruby method String#char_ary in favour of String#utf8chars

Version 0.3

2006-07-18:

changed normalization from NFC to NFKC for postgresql unifold function

2006-08-04:

added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
added the ruby method String#chars, which is returning an array of UTF-8 encoded grapheme clusters
added NLF2LF transformation in postgresql unifold function
added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
using integer constants rather than C-strings for character properties
fixed (hopefully) a problem with the ruby library on Mac OS X, which occurred when compiler optimization was switched on

Version 0.2

2006-06-05:

changed behaviour of PostgreSQL function to return NULL in case of invalid input, rather than raising an exceptional condition
improved efficiency of PostgreSQL function (no transformation to C string is done)

2006-06-20:

added -fpic compiler flag in Makefile
fixed bug in the C code for the ruby library (usage of non-existent function)

Version 0.1

2006-06-02: initial release of version 0.1

14 KiB Raw Blame History