C library for processing UTF-8 and UTF-32 data
Go to file
Justin Lecher 3a6fc5b2a2 Enhance build process
* Allow optimization flgas in CFLAGS to be overwritten
* Use Uppercase CC and CFLAGS
* Create all soname symlinks

Signed-off-by: Justin Lecher <jlec@gentoo.org>
2015-05-29 16:34:24 +02:00
bench build bench/bench for make check, to lessen the chance that it bitrots again 2015-03-28 14:47:29 -04:00
data Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
test Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
.gitignore Fix #26: use doxygen for generating API docs 2015-03-21 21:23:02 -04:00
.travis.yml fix #2: add charwidth function 2015-03-12 12:10:19 -04:00
appveyor.yml Run appveyor also on release branches 2015-04-04 21:30:29 -07:00
CMakeLists.txt Temporary fix for getting VERSION and SOVERSION into cmake 2015-03-09 16:27:40 -07:00
Doxyfile Fix #26: use doxygen for generating API docs 2015-03-21 21:23:02 -04:00
LICENSE.md updated NEWS etc. for 1.2 release 2015-03-28 09:10:00 -04:00
lump.md Minimal cmake build script 2015-03-08 17:30:09 -07:00
Makefile Enhance build process 2015-05-29 16:34:24 +02:00
NEWS.md updated NEWS etc. for 1.2 release 2015-03-28 09:10:00 -04:00
README.md fix link 2015-03-07 19:13:37 -05:00
utf8proc_data.c Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
utf8proc.c Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
utf8proc.h Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
utils.cmake Minimal cmake build script 2015-03-08 17:30:09 -07:00

utf8proc

Build Status

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects.

(utf8proc is used for basic Unicode support in the Julia language, and the Julia developers became involved because they wanted to add Unicode 7 support and other features.)

(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. We removed those from utf8proc in order to focus exclusively on the C library for the time being, but plan to add them back in or release them as separate packages.)

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

Quick Start

For compilation of the C library run make.

General Information

The C library is found in this directory after successful compilation and is named libutf8proc.a (for the static library) and libutf8proc.so (for the dynamic library).

The Unicode version being supported is 7.0.0.

For Unicode normalizations, the following options are used:

  • Normalization Form C: STABLE, COMPOSE
  • Normalization Form D: STABLE, DECOMPOSE
  • Normalization Form KC: STABLE, COMPOSE, COMPAT
  • Normalization Form KD: STABLE, DECOMPOSE, COMPAT

C Library

The documentation for the C library is found in the utf8proc.h header file. utf8proc_map is function you will most likely be using for mapping UTF-8 strings, unless you want to allocate memory yourself.

To Do

See the Github issues list.

Contact

Bug reports, feature requests, and other queries can be filed at the utf8proc issues page on Github.