* JuliaStrings#169 turn on sign-conversion warnings
Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>
* JuliaStrings#169 fix sign-conversion warnings for utf8proc.c
fix sign-converstion warnings for utf8proc_iterate
uc requires at most 21 bits to identify a unicode codepoint, so there is no need for it to be unsigned
multiple locations use, modify, or store uc with a signed value
the only exception is line 137 where uc is compared with an unsigned value
fix sign-converstion warnings for utf8proc_tolower, utf8proc_toupper, utf8proc_totitle
all three methods have sign conversion warnings when calling seqindex_decode_index
seqindex_decode_index uses the passed value as an index to an array utf8proc_sequences
as utf8proc_sequences is hard-coded and smaller than 2^31 - 1 we can safely cast to unsigned
fix sign-converstion warnings for utf8proc_decompose_char
lines with this warning use the defined function utf8proc_decompose_lump
in the function, a hardcoded unsigned value (1<<12) is complemented then cast as a signed value
as the intent is to remove the 12th bit flag from options, a signed value, and explicit cast is safe
fix sign-conversion warnings for utf8proc_map_custom
result is declared as signed, but is only expected to contain values between 0 and 4
sizeof returns an unsigned value. result must be cast to unsigned
Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>
* JuliaStrings#169 fix sign-conversion warnings for test/*
fix sign-conversion warnings for test/tests.c encode
change type for d to match return value of utf8proc_encode_char
fix sign-conversion warnings for test/graphemetest.c checkline
si, i, and j are unsigned size types, utf8proc_map and utf8proc_iterate accept and return signed size types
utf8proc_map treats negative strlen values as 0. the strlen used by the test must be similarly limited
utf8proc_iterate treats negative strlen values as 4 which will be less than the unsigned size
fix unused-but-set-variable warning by checking the glen value
fix sign-conversion warnings for test/case.c main
the if block ensures that tested codepoint fits in wint_t, but needs to include u and l as well
c, u, and l can be safely cast to wint_t
fix sign-conversion warnings for test/iterate.c
all values used for len are below 8, so an explicit cast is safe
updated types for more portable test code
fix sign-conversion warnings for test/printproperty.c main
change type of c to signed to resolve all sign-converstion warnings.
replace sscanf(... &c) wiht sscanf(... &x) followed by explicit sign converstion
Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>
* Fix extended emoji + zwj combo
* Patch initial repeated regional flags and extended+zwj emoj
* Merge conditions for setting breaks bt region
* updated fix
* perform tests for both utf8proc_map and manual calls to utf8proc_grapheme_break_stateful
* consolidate tests
Co-authored-by: Thomas Marks <marksta@umich.edu>
* Add: tests to CMakeLists.txt
* Disable compilation of charwidth, graphemetest and normtest because of missing getline
* Refactoring: UTF8PROC_ENABLE_TESTING default Off, move tests that don't compile on windows to NOT MSVC section, add testing to appveyor.yml
* Add: testing to travis
* Changed: flag to WIN32 because MinGW has the same problem as MSVC
* Commented out graphemetest and normtest because they fail.
* Re-added: graphemetest and normtest added missing data to the path of the text files.
* Fix: last commit was party wrong normtest failed.
* * Commented out graphemetest and normtest because they fail, because in CMakeLists is missing building of data.
* Add: mingw_static, mingw_shared, msvc_shared, msvc_static to ignore list
* Add: prefix utf8proc. to tests
* Fix: memory leaks in tests case.c and misc.c forgot to call free after calling utf8proc_NFKC_Casefold
Co-authored-by: Andreas-Schniertshauer <Andreas-Schniertshauer@users.noreply.github.com>
* use width=1 for soft hyphen and for unassigned/PUA codepoints
* don't count unassigned codepoints when comparing with system wcwidth
* more tests
* indentation fixes
* NEWS for 135
* remove special-casing for arabic control characters affecting a span of numbers, which are sometimes zero-width and sometimes not
* regenerate
* Fixes allowing for “Full” folding and NFKC_CaseFold compliance.
* Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive.
* Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF.
* Document the changes to UTF8PROC_IGNORE in header.
* Add NFKC_CF helper function with documentation.
* restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test
* success message
* test that IGNORE does not strip NA
* data update
* NFKC_Casefold shouldn't strip NA
* new utf8proc_map_custom for hooking in user-defined custom mappings
* whoops, add test program
* NEWS, version bump for 2.1
* change test functions to static so that gcc doesn't complain about missing prototypes
* convert sequences to utf-16 (saves 25kb)
* store sequence length in properties instead using -1 termination (saves 10kb)
* cache index for slightly faster data creation
* store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time
* change combination array data type to uint16 (saves 40kb)
* merge 1st and 2nd comb index (saves 50kb)
* kill empty prefix/suffix in combination array (saves 50kb)
* there was no need to have a separate combination start array, it can be merged in a single array
* some fixes
* mark the table as const again
* and regen