Commit Graph

41 Commits

Author SHA1 Message Date
Randy
39dbf507d7
fuzz: limit input length (#238)
Longer inputs can lead to timeouts on oss-fuzz
2022-05-05 21:49:11 -04:00
Randy
93a88b4310
OSS-Fuzz integration updates (#219)
* fix build

* CIFuzz integration

* update fuzzer

* undo changes to build

* ossfuzz.sh: fix copy path
2021-02-04 12:59:39 -05:00
Randy
c17ea5dfef
OSS-Fuzz initial integration (#216)
* add fuzz target

* update fuzzer

* add fuzzer to build with basic entry point

* add build script

* cleanup

* build fuzz target using cmake in oss-fuzz env

* ossfuzz.sh add newline

* update build
2021-01-29 13:54:58 -05:00
Mike Glorioso
610730f231
Fix Sign-Conversion warnings in library and test code (#214)
* JuliaStrings#169 turn on sign-conversion warnings

Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>

* JuliaStrings#169 fix sign-conversion warnings for utf8proc.c

fix sign-converstion warnings for utf8proc_iterate
uc requires at most 21 bits to identify a unicode codepoint, so there is no need for it to be unsigned
multiple locations use, modify, or store uc with a signed value
the only exception is line 137 where uc is compared with an unsigned value

fix sign-converstion warnings for utf8proc_tolower, utf8proc_toupper, utf8proc_totitle
all three methods have sign conversion warnings when calling seqindex_decode_index
seqindex_decode_index uses the passed value as an index to an array utf8proc_sequences
as utf8proc_sequences is hard-coded and smaller than 2^31 - 1 we can safely cast to unsigned

fix sign-converstion warnings for utf8proc_decompose_char
lines with this warning use the defined function utf8proc_decompose_lump
in the function, a hardcoded unsigned value (1<<12) is complemented then cast as a signed value
as the intent is to remove the 12th bit flag from options, a signed value, and explicit cast is safe

fix sign-conversion warnings for utf8proc_map_custom
result is declared as signed, but is only expected to contain values between 0 and 4
sizeof returns an unsigned value. result must be cast to unsigned

Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>

* JuliaStrings#169 fix sign-conversion warnings for test/*

fix sign-conversion warnings for test/tests.c encode
change type for d to match return value of utf8proc_encode_char

fix sign-conversion warnings for test/graphemetest.c checkline
si, i, and j are unsigned size types, utf8proc_map and utf8proc_iterate accept and return signed size types
utf8proc_map treats negative strlen values as 0. the strlen used by the test must be similarly limited
utf8proc_iterate treats negative strlen values as 4 which will be less than the unsigned size
fix unused-but-set-variable warning by checking the glen value

fix sign-conversion warnings for test/case.c main
the if block ensures that tested codepoint fits in wint_t, but needs to include u and l as well
c, u, and l can be safely cast to wint_t

fix sign-conversion warnings for test/iterate.c
all values used for len are below 8, so an explicit cast is safe
updated types for more portable test code

fix sign-conversion warnings for test/printproperty.c main
change type of c to signed to resolve all sign-converstion warnings.
replace sscanf(... &c) wiht sscanf(... &x) followed by explicit sign converstion

Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>
2021-01-14 12:59:49 -05:00
Steven G. Johnson
8239639e3f fix NULL args in grapheme_break_stateful 2020-12-15 15:26:56 -05:00
Steven G. Johnson
0643a64479
Fix grapheme breaks on string-initial (#205)
* Fix extended emoji + zwj combo

* Patch initial repeated regional flags and extended+zwj emoj

* Merge conditions for setting breaks bt region

* updated fix

* perform tests for both utf8proc_map and manual calls to utf8proc_grapheme_break_stateful

* consolidate tests

Co-authored-by: Thomas Marks <marksta@umich.edu>
2020-11-23 14:10:29 -05:00
Steven G. Johnson
5622a0a51b
add islower/isupper functions (#196)
* add islower/isupper functions

* added test

* more tests + bugfix

* Makefile fix

* rm iscase test on make clean
2020-08-25 16:42:59 -04:00
Andreas-Schniertshauer
e51f416e0c
Fix memory leaks in tests case.c and misc.c (#189)
* Add: tests to CMakeLists.txt

* Disable compilation of charwidth, graphemetest and normtest because of missing getline

* Refactoring: UTF8PROC_ENABLE_TESTING default Off, move tests that don't compile on windows to NOT MSVC section, add testing to appveyor.yml

* Add: testing to travis

* Changed: flag to WIN32 because MinGW has the same problem as MSVC

* Commented out graphemetest and normtest because they fail.

* Re-added: graphemetest and normtest added missing data to the path of the text files.

* Fix: last commit was party wrong normtest failed.

* * Commented out graphemetest and normtest because they fail, because in CMakeLists is missing building of data.

* Add: mingw_static, mingw_shared, msvc_shared, msvc_static to ignore list

* Add: prefix utf8proc. to tests

* Fix: memory leaks in tests case.c and misc.c forgot to call free after calling utf8proc_NFKC_Casefold

Co-authored-by: Andreas-Schniertshauer <Andreas-Schniertshauer@users.noreply.github.com>
2020-03-30 07:51:44 -04:00
Steven G. Johnson
c6858e955c
use unsigned char more consistently, silence -Wextra compiler warnings (#188) 2020-03-29 10:44:42 -04:00
Steven G. Johnson
243875b456 fixes 2020-03-29 09:35:32 -04:00
Steven G. Johnson
11bb3d9dc7 fix grapheme test to work on unmodified data file 2020-03-29 08:53:11 -04:00
Steven G. Johnson
02fb59136d silence warning (closes #184) 2020-03-28 14:00:30 -04:00
Steven G. Johnson
6fff5f32bb
compile more tests on Windows (#183)
* compile more tests on Windows

* still disable charwidth tests

* silence warnings on MSVC about sscanf

* whoops

* silence warning
2020-03-28 10:00:18 -04:00
Steven G. Johnson
5f15b515e1 simplifications 2020-03-28 09:42:29 -04:00
Steven G. Johnson
d588d7097c portable getline replacement (closes #182) 2020-03-28 09:36:58 -04:00
Steven G. Johnson
abf81603ba
add utf8proc_unicode_version (#151) 2019-03-30 16:31:02 -04:00
Steven G. Johnson
4603e00cfc
fix CHARBOUND option for non-characters (#149) 2019-03-30 15:22:25 -04:00
Steven G. Johnson
d4a58cfec5
update data and algorithms for Unicode 11 (#140) 2018-07-24 13:18:48 -04:00
Steven G. Johnson
02f4e1890c
charwidth=1 for soft hyphen and unassigned codepoints (#135)
* use width=1 for soft hyphen and for unassigned/PUA codepoints

* don't count unassigned codepoints when comparing with system wcwidth

* more tests

* indentation fixes

* NEWS for 135

* remove special-casing for arabic control characters affecting a span of numbers, which are sometimes zero-width and sometimes not

* regenerate
2018-07-24 10:45:02 -04:00
Steven G. Johnson
d81308faba
uppercase mapping ß (U+00df) to ẞ (U+1E9E) (#134)
* uppercase(0x00df) = 0x1e9e

* tests for titlecase and u+00df uppercase

* NEWS, another test
2018-05-02 14:18:26 -04:00
Steven G. Johnson
bdc8b9e4b2
Case folding fixes (#133)
* Fixes allowing for “Full” folding and NFKC_CaseFold compliance.

* Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive.
* Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF.

* Document the changes to UTF8PROC_IGNORE in header.

* Add NFKC_CF helper function with documentation.

* restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test

* success message

* test that IGNORE does not strip NA

* data update

* NFKC_Casefold shouldn't strip NA
2018-05-02 08:15:02 -04:00
Steven G. Johnson
ba042cf728 missing return code, success message in test/misc.c 2018-04-27 09:10:38 -04:00
Steven G. Johnson
d050c4636a make internal function static 2018-04-27 08:57:54 -04:00
Steven G. Johnson
53d7968055 added test for #128 2018-04-27 08:46:44 -04:00
Steven G. Johnson
b4621f43c3 new utf8proc_map_custom for hooking in user-defined custom mappings (#89)
* new utf8proc_map_custom for hooking in user-defined custom mappings

* whoops, add test program

* NEWS, version bump for 2.1

* change test functions to static so that gcc doesn't complain about missing prototypes
2016-11-30 10:40:26 -05:00
Benito van der Zander
eeebf70bcf Smaller tables (#68)
* convert sequences to utf-16 (saves 25kb)

* store sequence length in properties instead using -1 termination (saves 10kb)

* cache index for slightly faster data creation

* store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time

* change combination array data type to uint16 (saves 40kb)

* merge 1st and 2nd comb index (saves 50kb)

* kill empty prefix/suffix in combination array (saves 50kb)

* there was no need to have a separate combination start array, it can be merged in a single array

* some fixes

* mark the table as const again

* and regen
2016-07-12 11:51:50 -04:00
Michaël Meyer
1f17487aa9 Fix overrun 2016-02-04 04:06:28 +01:00
Peter Colberg
4b16193a25 Fix sscanf argument type for format %x 2015-10-30 15:27:18 -04:00
Peter Colberg
14b57791d8 Fix missing static declarations for internal functions 2015-10-30 15:24:34 -04:00
Peter Colberg
6acc41dfe9 Fix implicit function declarations 2015-10-30 15:22:09 -04:00
Peter Colberg
548497a398 Move common test functions to separate module
This resolves warnings for missing function prototypes.
2015-10-30 15:13:48 -04:00
Peter Colberg
09360de186 Do not export internal unsafe_encode_char() 2015-10-29 00:45:39 -04:00
Steven G. Johnson
6a7f92da64 fix #46 (make sure symbol-like codepoints have nonzero width even if they aren't in Unifont) 2015-06-24 14:07:15 -04:00
Steven G. Johnson
a8fb4b1772 add toupper/tolower functions (for JuliaLang/julia#11471) 2015-05-29 22:00:30 -04:00
ScottPJones
6a229a6776 Add tests for valid codepoints and iterate function 2015-05-29 20:11:10 +02:00
Scott Paul Jones
6249e6b8b1 Fix #34 handle 66 Unicode non-characters, also improve performance and surrogate handling 2015-05-29 19:50:03 +02:00
Tony Kelman
0a818c7003 Prefix other C99 typedefs with utf8proc_ 2015-04-06 22:36:33 -07:00
Tony Kelman
ad27722923 Use a new typedef utf8proc_ssize_t to avoid define collisions
with MSVC
2015-04-05 20:06:13 -07:00
Steven G. Johnson
c851c67888 put the API version as #defines in the header file (as discussed in #30) 2015-03-27 12:35:41 -04:00
Steven G. Johnson
a4c84d2063 fix #2: add charwidth function 2015-03-12 12:10:19 -04:00
Steven G. Johnson
90721f2d39 directory cleanup: move tests and data into subdirectories 2015-03-06 17:36:08 -05:00