utf8proc

Author	SHA1	Message	Date
Randy	93a88b4310	OSS-Fuzz integration updates (#219 ) * fix build * CIFuzz integration * update fuzzer * undo changes to build * ossfuzz.sh: fix copy path	2021-02-04 12:59:39 -05:00
Randy	c17ea5dfef	OSS-Fuzz initial integration (#216 ) * add fuzz target * update fuzzer * add fuzzer to build with basic entry point * add build script * cleanup * build fuzz target using cmake in oss-fuzz env * ossfuzz.sh add newline * update build	2021-01-29 13:54:58 -05:00
Mike Glorioso	610730f231	Fix Sign-Conversion warnings in library and test code (#214 ) * JuliaStrings#169 turn on sign-conversion warnings Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com> * JuliaStrings#169 fix sign-conversion warnings for utf8proc.c fix sign-converstion warnings for utf8proc_iterate uc requires at most 21 bits to identify a unicode codepoint, so there is no need for it to be unsigned multiple locations use, modify, or store uc with a signed value the only exception is line 137 where uc is compared with an unsigned value fix sign-converstion warnings for utf8proc_tolower, utf8proc_toupper, utf8proc_totitle all three methods have sign conversion warnings when calling seqindex_decode_index seqindex_decode_index uses the passed value as an index to an array utf8proc_sequences as utf8proc_sequences is hard-coded and smaller than 2^31 - 1 we can safely cast to unsigned fix sign-converstion warnings for utf8proc_decompose_char lines with this warning use the defined function utf8proc_decompose_lump in the function, a hardcoded unsigned value (1<<12) is complemented then cast as a signed value as the intent is to remove the 12th bit flag from options, a signed value, and explicit cast is safe fix sign-conversion warnings for utf8proc_map_custom result is declared as signed, but is only expected to contain values between 0 and 4 sizeof returns an unsigned value. result must be cast to unsigned Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com> * JuliaStrings#169 fix sign-conversion warnings for test/* fix sign-conversion warnings for test/tests.c encode change type for d to match return value of utf8proc_encode_char fix sign-conversion warnings for test/graphemetest.c checkline si, i, and j are unsigned size types, utf8proc_map and utf8proc_iterate accept and return signed size types utf8proc_map treats negative strlen values as 0. the strlen used by the test must be similarly limited utf8proc_iterate treats negative strlen values as 4 which will be less than the unsigned size fix unused-but-set-variable warning by checking the glen value fix sign-conversion warnings for test/case.c main the if block ensures that tested codepoint fits in wint_t, but needs to include u and l as well c, u, and l can be safely cast to wint_t fix sign-conversion warnings for test/iterate.c all values used for len are below 8, so an explicit cast is safe updated types for more portable test code fix sign-conversion warnings for test/printproperty.c main change type of c to signed to resolve all sign-converstion warnings. replace sscanf(... &c) wiht sscanf(... &x) followed by explicit sign converstion Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>	2021-01-14 12:59:49 -05:00
Steven G. Johnson	8239639e3f	fix NULL args in grapheme_break_stateful	2020-12-15 15:26:56 -05:00
Steven G. Johnson	0643a64479	Fix grapheme breaks on string-initial (#205 ) * Fix extended emoji + zwj combo * Patch initial repeated regional flags and extended+zwj emoj * Merge conditions for setting breaks bt region * updated fix * perform tests for both utf8proc_map and manual calls to utf8proc_grapheme_break_stateful * consolidate tests Co-authored-by: Thomas Marks <marksta@umich.edu>	2020-11-23 14:10:29 -05:00
Steven G. Johnson	5622a0a51b	add islower/isupper functions (#196 ) * add islower/isupper functions * added test * more tests + bugfix * Makefile fix * rm iscase test on make clean	2020-08-25 16:42:59 -04:00
Andreas-Schniertshauer	e51f416e0c	Fix memory leaks in tests case.c and misc.c (#189 ) * Add: tests to CMakeLists.txt * Disable compilation of charwidth, graphemetest and normtest because of missing getline * Refactoring: UTF8PROC_ENABLE_TESTING default Off, move tests that don't compile on windows to NOT MSVC section, add testing to appveyor.yml * Add: testing to travis * Changed: flag to WIN32 because MinGW has the same problem as MSVC * Commented out graphemetest and normtest because they fail. * Re-added: graphemetest and normtest added missing data to the path of the text files. * Fix: last commit was party wrong normtest failed. * * Commented out graphemetest and normtest because they fail, because in CMakeLists is missing building of data. * Add: mingw_static, mingw_shared, msvc_shared, msvc_static to ignore list * Add: prefix utf8proc. to tests * Fix: memory leaks in tests case.c and misc.c forgot to call free after calling utf8proc_NFKC_Casefold Co-authored-by: Andreas-Schniertshauer <Andreas-Schniertshauer@users.noreply.github.com>	2020-03-30 07:51:44 -04:00
Steven G. Johnson	c6858e955c	use unsigned char more consistently, silence -Wextra compiler warnings (#188 )	2020-03-29 10:44:42 -04:00
Steven G. Johnson	243875b456	fixes	2020-03-29 09:35:32 -04:00
Steven G. Johnson	11bb3d9dc7	fix grapheme test to work on unmodified data file	2020-03-29 08:53:11 -04:00
Steven G. Johnson	02fb59136d	silence warning (closes #184 )	2020-03-28 14:00:30 -04:00
Steven G. Johnson	6fff5f32bb	compile more tests on Windows (#183 ) * compile more tests on Windows * still disable charwidth tests * silence warnings on MSVC about sscanf * whoops * silence warning	2020-03-28 10:00:18 -04:00
Steven G. Johnson	5f15b515e1	simplifications	2020-03-28 09:42:29 -04:00
Steven G. Johnson	d588d7097c	portable getline replacement (closes #182 )	2020-03-28 09:36:58 -04:00
Steven G. Johnson	abf81603ba	add utf8proc_unicode_version (#151 )	2019-03-30 16:31:02 -04:00
Steven G. Johnson	4603e00cfc	fix CHARBOUND option for non-characters (#149 )	2019-03-30 15:22:25 -04:00
Steven G. Johnson	d4a58cfec5	update data and algorithms for Unicode 11 (#140 )	2018-07-24 13:18:48 -04:00
Steven G. Johnson	02f4e1890c	charwidth=1 for soft hyphen and unassigned codepoints (#135 ) * use width=1 for soft hyphen and for unassigned/PUA codepoints * don't count unassigned codepoints when comparing with system wcwidth * more tests * indentation fixes * NEWS for 135 * remove special-casing for arabic control characters affecting a span of numbers, which are sometimes zero-width and sometimes not * regenerate	2018-07-24 10:45:02 -04:00
Steven G. Johnson	d81308faba	uppercase mapping ß (U+00df) to ẞ (U+1E9E) (#134 ) * uppercase(0x00df) = 0x1e9e * tests for titlecase and u+00df uppercase * NEWS, another test	2018-05-02 14:18:26 -04:00
Steven G. Johnson	bdc8b9e4b2	Case folding fixes (#133 ) * Fixes allowing for “Full” folding and NFKC_CaseFold compliance. * Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive. * Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF. * Document the changes to UTF8PROC_IGNORE in header. * Add NFKC_CF helper function with documentation. * restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test * success message * test that IGNORE does not strip NA * data update * NFKC_Casefold shouldn't strip NA	2018-05-02 08:15:02 -04:00
Steven G. Johnson	ba042cf728	missing return code, success message in test/misc.c	2018-04-27 09:10:38 -04:00
Steven G. Johnson	d050c4636a	make internal function static	2018-04-27 08:57:54 -04:00
Steven G. Johnson	53d7968055	added test for #128	2018-04-27 08:46:44 -04:00
Steven G. Johnson	b4621f43c3	new utf8proc_map_custom for hooking in user-defined custom mappings (#89 ) * new utf8proc_map_custom for hooking in user-defined custom mappings * whoops, add test program * NEWS, version bump for 2.1 * change test functions to static so that gcc doesn't complain about missing prototypes	2016-11-30 10:40:26 -05:00
Benito van der Zander	eeebf70bcf	Smaller tables (#68 ) * convert sequences to utf-16 (saves 25kb) * store sequence length in properties instead using -1 termination (saves 10kb) * cache index for slightly faster data creation * store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time * change combination array data type to uint16 (saves 40kb) * merge 1st and 2nd comb index (saves 50kb) * kill empty prefix/suffix in combination array (saves 50kb) * there was no need to have a separate combination start array, it can be merged in a single array * some fixes * mark the table as const again * and regen	2016-07-12 11:51:50 -04:00
Michaël Meyer	1f17487aa9	Fix overrun	2016-02-04 04:06:28 +01:00
Peter Colberg	4b16193a25	Fix sscanf argument type for format %x	2015-10-30 15:27:18 -04:00
Peter Colberg	14b57791d8	Fix missing static declarations for internal functions	2015-10-30 15:24:34 -04:00
Peter Colberg	6acc41dfe9	Fix implicit function declarations	2015-10-30 15:22:09 -04:00
Peter Colberg	548497a398	Move common test functions to separate module This resolves warnings for missing function prototypes.	2015-10-30 15:13:48 -04:00
Peter Colberg	09360de186	Do not export internal unsafe_encode_char()	2015-10-29 00:45:39 -04:00
Steven G. Johnson	6a7f92da64	fix #46 (make sure symbol-like codepoints have nonzero width even if they aren't in Unifont)	2015-06-24 14:07:15 -04:00
Steven G. Johnson	a8fb4b1772	add toupper/tolower functions (for JuliaLang/julia#11471 )	2015-05-29 22:00:30 -04:00
ScottPJones	6a229a6776	Add tests for valid codepoints and iterate function	2015-05-29 20:11:10 +02:00
Scott Paul Jones	6249e6b8b1	Fix #34 handle 66 Unicode non-characters, also improve performance and surrogate handling	2015-05-29 19:50:03 +02:00
Tony Kelman	0a818c7003	Prefix other C99 typedefs with utf8proc_	2015-04-06 22:36:33 -07:00
Tony Kelman	ad27722923	Use a new typedef utf8proc_ssize_t to avoid define collisions with MSVC	2015-04-05 20:06:13 -07:00
Steven G. Johnson	c851c67888	put the API version as #defines in the header file (as discussed in #30 )	2015-03-27 12:35:41 -04:00
Steven G. Johnson	a4c84d2063	fix #2 : add charwidth function	2015-03-12 12:10:19 -04:00
Steven G. Johnson	90721f2d39	directory cleanup: move tests and data into subdirectories	2015-03-06 17:36:08 -05:00

40 Commits