Commit Graph

52 Commits

Author SHA1 Message Date
Tim Gates
6f7d73071a
docs: fix simple typo, encounted -> encountered (#201)
There is a small typo in utf8proc.h.

Should read `encountered` rather than `encounted`.
2020-10-09 08:30:50 -04:00
Steven G. Johnson
5622a0a51b
add islower/isupper functions (#196)
* add islower/isupper functions

* added test

* more tests + bugfix

* Makefile fix

* rm iscase test on make clean
2020-08-25 16:42:59 -04:00
Steven G. Johnson
2bb7d884b5 version bump to 2.5 2020-03-27 17:22:21 -04:00
Steven G. Johnson
e6fba4aa8c update header file comments (closes #157) 2019-05-14 10:53:55 -04:00
Steven G. Johnson
5c632c5742 NEWS for 2.4, updated version numbers (which I forgot in 2.3, grrr) 2019-05-10 21:24:14 -04:00
Steven G. Johnson
abf81603ba
add utf8proc_unicode_version (#151) 2019-03-30 16:31:02 -04:00
Steven G. Johnson
3637d51855 doc clarification (closing #110) 2019-03-30 16:04:14 -04:00
Steven G. Johnson
6a659a5843 doc fixes, don't export stdint and limits.h values UINT16_MAX and SSIZE_MAX 2018-07-24 13:32:42 -04:00
Steven G. Johnson
e0295be467 Merge branch 'master' of https://github.com/JuliaLang/utf8proc 2018-07-24 13:25:51 -04:00
Steven G. Johnson
60a2398184 copyright year updates 2018-07-24 13:20:49 -04:00
Steven G. Johnson
d4a58cfec5
update data and algorithms for Unicode 11 (#140) 2018-07-24 13:18:48 -04:00
Steven G. Johnson
8639450134 NEWS for upcoming 2.2 release, version bump 2018-05-02 08:23:40 -04:00
Steven G. Johnson
bdc8b9e4b2
Case folding fixes (#133)
* Fixes allowing for “Full” folding and NFKC_CaseFold compliance.

* Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive.
* Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF.

* Document the changes to UTF8PROC_IGNORE in header.

* Add NFKC_CF helper function with documentation.

* restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test

* success message

* test that IGNORE does not strip NA

* data update

* NFKC_Casefold shouldn't strip NA
2018-05-02 08:15:02 -04:00
past-due
48949bd3eb Static library support improvements (#123)
* `#define UTF8PROC_STATIC` to disable DLLEXPORT

`#define UTF8PROC_STATIC` to disable DLLEXPORT

* [CMake] Automatically define UTF8PROC_STATIC if BUILD_SHARED_LIBS is off

* [Makefile] Support additional UTF8PROC_DEFINES, which can be used to specify flags like `-DUTF8PROC_STATIC`
2018-04-29 21:37:12 -04:00
Steven G. Johnson
d688ac1226
version bump to 2.1.1 (#131) 2018-04-27 09:58:34 -04:00
Christopher Baker
2a2f97e193 Update documentation to reflect Unicode 9.0.0. (#107)
This makes the inline documentation match the README.
2017-06-08 09:29:54 -07:00
Árpád Goretity 
31a8788886 removed inclusion of non-portable header file (#94) 2017-01-14 08:12:29 -05:00
Steven G. Johnson
4ac3154acc whoops 2016-12-11 16:18:52 -05:00
Steven G. Johnson
78f336addd use ptrdiff_t rather than ssize_t, as ssize_t is non-standard (it is POSIX, not C) 2016-12-11 16:17:11 -05:00
Steven G. Johnson
59334e4499 use stdbool.h and inttypes.h in MSVC 2013 and later, and use more C99-compatible definitions of false and true earlier (fix #90) 2016-12-11 07:16:48 -05:00
Steven G. Johnson
b4621f43c3 new utf8proc_map_custom for hooking in user-defined custom mappings (#89)
* new utf8proc_map_custom for hooking in user-defined custom mappings

* whoops, add test program

* NEWS, version bump for 2.1

* change test functions to static so that gcc doesn't complain about missing prototypes
2016-11-30 10:40:26 -05:00
Steven G. Johnson
f5567f306a typo in docstrings 2016-11-29 13:49:03 -05:00
Michael Drake
70bbed8626 Tlsa/ucs4 normalize (#88)
* Split codepoint sequence normalisation out into separate function.

This creates utf8proc_normalize_utf32() which takes and returns
a UTF-32 string, applying the following options:

- UTF8PROC_NLF2LS
- UTF8PROC_NLF2PS
- UTF8PROC_NLF2LF
- UTF8PROC_STRIPCC
- UTF8PROC_COMPOSE
- UTF8PROC_STABLE

The utf8proc_reencode() function has been updated to call the
new utf8proc_normalize_utf32().

* Update code documentation: utf8proc_reencode handles UTF8PROC_CHARBOUND.
2016-11-21 09:22:39 -05:00
Jakub Vít
caef918abd Change definition of UINT16_MAX macro (#84)
Change UINT16_MAX from `~(utf8proc_uint16_t)0` to fixed value `65535U` to prevent weird behaviour in complex expressions.
2016-09-04 14:44:38 -04:00
Tony Kelman
8e3174f334 NEWS and version numbers for 2.0.2 (#81)
* Add NEWS.md items for #79 and #80

* Prepare version numbers for 2.0.2

* Also update API version to 2.0.2
2016-07-27 07:58:49 -04:00
Steven G. Johnson
f0bf106569 NEWS and version bump for 2.0.1 release, to come out shortly 2016-07-13 12:39:05 -04:00
Keno Fischer
c0a1ff81fc Walk back ABI breaking changes (#76) 2016-07-13 10:41:13 -04:00
Benito van der Zander
eeebf70bcf Smaller tables (#68)
* convert sequences to utf-16 (saves 25kb)

* store sequence length in properties instead using -1 termination (saves 10kb)

* cache index for slightly faster data creation

* store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time

* change combination array data type to uint16 (saves 40kb)

* merge 1st and 2nd comb index (saves 50kb)

* kill empty prefix/suffix in combination array (saves 50kb)

* there was no need to have a separate combination start array, it can be merged in a single array

* some fixes

* mark the table as const again

* and regen
2016-07-12 11:51:50 -04:00
Keno Fischer
41c6b23aab Unicode 9 updates (#70)
* Updates for Unicode 9.0.0 TR29 Changes

- New rules GB10/(12/13) are used to combine emoji-zwj sequences/
  (force grapheme breaks every two RI codepoints). Unfortunately this
  breaks statelessness of grapheme-boundary determination. Deal with
  this by ignoring the problem in utf8proc_grapheme_break, and by
  hacking in a special case in decompose

- ZWJ moved to its own boundclass, update what is now GB9 accordingly.

- Add comments to indicate which rule a given case implements

- The Number of bound classes Now exceeds 4 bits, expand to 8 and
  reorganize fields

* Import Unicode 9 data

* Update Grapheme break API to expose state override

* Bump MAJOR version
2016-06-28 16:04:25 -04:00
Michaël Meyer
26436c9775 Reduce the size of the binary.
Use integers instead of pointers in Unicode tables. Saves 226 kb / 716 kb in the
compiled library.
2015-12-09 19:55:48 +01:00
Steven G. Johnson
6d4d7a9acf update Unicode version in header-file comment 2015-11-01 08:36:04 -05:00
Steven G. Johnson
fd20b184dd update copyright statements to list recent contributors and year 2015-11-01 08:34:01 -05:00
Steven G. Johnson
d75985cf09 bump API/ABI version to 1.3, add NEWS 2015-05-29 23:07:29 -04:00
Steven G. Johnson
a8fb4b1772 add toupper/tolower functions (for JuliaLang/julia#11471) 2015-05-29 22:00:30 -04:00
Scott Paul Jones
6249e6b8b1 Fix #34 handle 66 Unicode non-characters, also improve performance and surrogate handling 2015-05-29 19:50:03 +02:00
Tony Kelman
0a818c7003 Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
Tony Kelman
ad27722923 Use a new typedef utf8proc_ssize_t to avoid define collisions
with MSVC
2015-04-05 20:06:13 -07:00
Steven G. Johnson
a1c429a45b rename DLLEXPORT to UTF8PROC_DLLEXPORT to prevent conflicts with other header files that define DLLEXPORT 2015-03-30 11:05:51 -04:00
Steven G. Johnson
41287a1116 more documentation English and formatting cleanups 2015-03-27 14:05:57 -04:00
Steven G. Johnson
2f8469c3cc some documentation improvements 2015-03-27 13:37:59 -04:00
Steven G. Johnson
11d2ece545 indentation consistency 2015-03-27 12:49:16 -04:00
Steven G. Johnson
c851c67888 put the API version as #defines in the header file (as discussed in #30) 2015-03-27 12:35:41 -04:00
Steven G. Johnson
32c605cfa7 mainpage dox tweaks 2015-03-23 11:06:19 -04:00
Jonas Fonseca
03a4e8854a Fix #26: use doxygen for generating API docs 2015-03-21 21:23:02 -04:00
Steven G. Johnson
dad0cbdcab update NEWS for 1.2-dev 2015-03-12 14:29:33 -04:00
Steven G. Johnson
3822984606 remove requirement that get_property and decompose_char argument be in range 0x0 to 0x10ffff 2015-03-12 14:17:27 -04:00
Steven G. Johnson
a4c84d2063 fix #2: add charwidth function 2015-03-12 12:10:19 -04:00
Tony Kelman
a8b688c734 Minimal cmake build script
move flags for MSVC

rename lump.txt to lump.md, add data/*.txt to .gitignore
2015-03-08 17:30:09 -07:00
Steven G. Johnson
402883c78e rename back to utf8proc now that we are taking over maintenance 2015-03-06 12:43:37 -05:00
Steven G. Johnson
2c4e520a17 utf8proc.h -> mojibake.h (closes #10) 2014-07-18 14:28:17 -04:00