markdown and other cosmetic updates
This commit is contained in:
parent
c0f2b512a0
commit
0d7224a6d8
10
.gitignore
vendored
Normal file
10
.gitignore
vendored
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
*.tar.gz
|
||||||
|
*.exe
|
||||||
|
*.dll
|
||||||
|
*.do
|
||||||
|
*.o
|
||||||
|
*.so
|
||||||
|
*.a
|
||||||
|
*.dll
|
||||||
|
*.dylib
|
||||||
|
*.dSYM
|
||||||
@ -1,5 +1,13 @@
|
|||||||
|
== libutf8proc license ==
|
||||||
|
|
||||||
Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany
|
**libutf8proc** is a lightly updated version of the **utf8proc**
|
||||||
|
library by Jan Behrens and the rest of the Public Software Group, who
|
||||||
|
deserve nearly all of the credit for this library. Like utf8proc,
|
||||||
|
whose copyright and license statements are reproduced below, all new
|
||||||
|
work on the libutf8proc library is licensed under the [MIT "expat"
|
||||||
|
license](http://opensource.org/licenses/MIT):
|
||||||
|
|
||||||
|
*Copyright © 2014 by Steven G. Johnson.*
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a
|
Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
copy of this software and associated documentation files (the "Software"),
|
copy of this software and associated documentation files (the "Software"),
|
||||||
@ -19,14 +27,37 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|||||||
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
DEALINGS IN THE SOFTWARE.
|
DEALINGS IN THE SOFTWARE.
|
||||||
|
|
||||||
|
== Original utf8proc license ==
|
||||||
|
|
||||||
|
*Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany*
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
|
copy of this software and associated documentation files (the "Software"),
|
||||||
|
to deal in the Software without restriction, including without limitation
|
||||||
|
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||||
|
and/or sell copies of the Software, and to permit persons to whom the
|
||||||
|
Software is furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in
|
||||||
|
all copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
|
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
|
DEALINGS IN THE SOFTWARE.
|
||||||
|
|
||||||
|
== Unicode data license ==
|
||||||
|
|
||||||
This software distribution contains derived data from a modified version of
|
This software distribution contains derived data from a modified version of
|
||||||
the Unicode data files. The following license applies to that data:
|
the Unicode data files. The following license applies to that data:
|
||||||
|
|
||||||
COPYRIGHT AND PERMISSION NOTICE
|
**COPYRIGHT AND PERMISSION NOTICE**
|
||||||
|
|
||||||
Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
|
*Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
|
||||||
under the Terms of Use in http://www.unicode.org/copyright.html.
|
under the Terms of Use in http://www.unicode.org/copyright.html.*
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a
|
Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
copy of the Unicode data files and any associated documentation (the "Data
|
copy of the Unicode data files and any associated documentation (the "Data
|
||||||
@ -57,8 +88,6 @@ not be used in advertising or otherwise to promote the sale, use or other
|
|||||||
dealings in these Data Files or Software without prior written
|
dealings in these Data Files or Software without prior written
|
||||||
authorization of the copyright holder.
|
authorization of the copyright holder.
|
||||||
|
|
||||||
|
|
||||||
Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
|
Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
|
||||||
registered in some jurisdictions. All other trademarks and registered
|
registered in some jurisdictions. All other trademarks and registered
|
||||||
trademarks mentioned herein are the property of their respective owners.
|
trademarks mentioned herein are the property of their respective owners.
|
||||||
|
|
||||||
41
Makefile
41
Makefile
@ -9,20 +9,12 @@ cc = $(CC) $(cflags)
|
|||||||
|
|
||||||
# meta targets
|
# meta targets
|
||||||
|
|
||||||
|
all: c-library
|
||||||
|
|
||||||
c-library: libutf8proc.a libutf8proc.so
|
c-library: libutf8proc.a libutf8proc.so
|
||||||
|
|
||||||
ruby-library: ruby/utf8proc_native.so
|
clean:
|
||||||
|
|
||||||
pgsql-library: pgsql/utf8proc_pgsql.so
|
|
||||||
|
|
||||||
all: c-library ruby-library ruby-gem pgsql-library
|
|
||||||
|
|
||||||
clean::
|
|
||||||
rm -f utf8proc.o libutf8proc.a libutf8proc.so
|
rm -f utf8proc.o libutf8proc.a libutf8proc.so
|
||||||
cd ruby/ && test -e Makefile && (make clean && rm -f Makefile) || true
|
|
||||||
rm -Rf ruby/gem/lib ruby/gem/ext
|
|
||||||
rm -f ruby/gem/utf8proc-*.gem
|
|
||||||
cd pgsql/ && make clean
|
|
||||||
|
|
||||||
# real targets
|
# real targets
|
||||||
|
|
||||||
@ -39,30 +31,3 @@ libutf8proc.so: utf8proc.o
|
|||||||
|
|
||||||
libutf8proc.dylib: utf8proc.o
|
libutf8proc.dylib: utf8proc.o
|
||||||
$(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@
|
$(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@
|
||||||
|
|
||||||
ruby/Makefile: ruby/extconf.rb
|
|
||||||
cd ruby && ruby extconf.rb
|
|
||||||
|
|
||||||
ruby/utf8proc_native.so: utf8proc.h utf8proc.c utf8proc_data.c \
|
|
||||||
ruby/utf8proc_native.c ruby/Makefile
|
|
||||||
cd ruby && make
|
|
||||||
|
|
||||||
ruby/gem/lib/utf8proc.rb: ruby/utf8proc.rb
|
|
||||||
test -e ruby/gem/lib || mkdir ruby/gem/lib
|
|
||||||
cp ruby/utf8proc.rb ruby/gem/lib/
|
|
||||||
|
|
||||||
ruby/gem/ext/extconf.rb: ruby/extconf.rb
|
|
||||||
test -e ruby/gem/ext || mkdir ruby/gem/ext
|
|
||||||
cp ruby/extconf.rb ruby/gem/ext/
|
|
||||||
|
|
||||||
ruby/gem/ext/utf8proc_native.c: utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c
|
|
||||||
test -e ruby/gem/ext || mkdir ruby/gem/ext
|
|
||||||
cat utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | grep -v '#include "utf8proc.h"' | grep -v '#include "utf8proc_data.c"' | grep -v '#include "../utf8proc.c"' > ruby/gem/ext/utf8proc_native.c
|
|
||||||
|
|
||||||
ruby-gem:: ruby/gem/lib/utf8proc.rb ruby/gem/ext/extconf.rb ruby/gem/ext/utf8proc_native.c
|
|
||||||
cd ruby/gem && gem build utf8proc.gemspec
|
|
||||||
|
|
||||||
pgsql/utf8proc_pgsql.so: utf8proc.h utf8proc.c utf8proc_data.c \
|
|
||||||
pgsql/utf8proc_pgsql.c
|
|
||||||
cd pgsql && make
|
|
||||||
|
|
||||||
|
|||||||
63
README
63
README
@ -1,63 +0,0 @@
|
|||||||
|
|
||||||
Please read the LICENSE file, which is shipping with this software.
|
|
||||||
|
|
||||||
|
|
||||||
*** QUICK START ***
|
|
||||||
|
|
||||||
For compilation of the C library call "make c-library", for compilation of
|
|
||||||
the ruby library call "make ruby-library" and for compilation of the
|
|
||||||
PostgreSQL extension call "make pgsql-library".
|
|
||||||
|
|
||||||
For ruby you can also create a gem-file by calling "make ruby-gem".
|
|
||||||
|
|
||||||
"make all" can be used to build everything, but both ruby and PostgreSQL
|
|
||||||
installations are required in this case.
|
|
||||||
|
|
||||||
|
|
||||||
*** GENERAL INFORMATION ***
|
|
||||||
|
|
||||||
The C library is found in this directory after successful compilation and
|
|
||||||
is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of
|
|
||||||
the files "utf8proc.rb" and "utf8proc_native.so", which are found in the
|
|
||||||
subdirectory "ruby/". If you chose to create a gem-file it is placed in the
|
|
||||||
"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so"
|
|
||||||
and resides in the "pgsql/" directory.
|
|
||||||
|
|
||||||
Both the ruby library and the PostgreSQL extension are built as stand-alone
|
|
||||||
libraries and are therefore not dependent the dynamic version of the
|
|
||||||
C library files, but this behaviour might change in future releases.
|
|
||||||
|
|
||||||
The Unicode version being supported is 5.0.0.
|
|
||||||
Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as
|
|
||||||
version 5.0.0 had not been available at the time of implementation.
|
|
||||||
|
|
||||||
For Unicode normalizations, the following options have to be used:
|
|
||||||
Normalization Form C: STABLE, COMPOSE
|
|
||||||
Normalization Form D: STABLE, DECOMPOSE
|
|
||||||
Normalization Form KC: STABLE, COMPOSE, COMPAT
|
|
||||||
Normalization Form KD: STABLE, DECOMPOSE, COMPAT
|
|
||||||
|
|
||||||
|
|
||||||
*** C LIBRARY ***
|
|
||||||
|
|
||||||
The documentation for the C library is found in the utf8proc.h header file.
|
|
||||||
"utf8proc_map" is most likely function you will be using for mapping UTF-8
|
|
||||||
strings, unless you want to allocate memory yourself.
|
|
||||||
|
|
||||||
|
|
||||||
*** TODO ***
|
|
||||||
|
|
||||||
- detect stable code points and process segments independently in order to
|
|
||||||
save memory
|
|
||||||
- do a quick check before normalizing strings to optimize speed
|
|
||||||
- support stream processing
|
|
||||||
|
|
||||||
|
|
||||||
*** CONTACT ***
|
|
||||||
|
|
||||||
If you find any bugs or experience difficulties in compiling this software,
|
|
||||||
please contact us:
|
|
||||||
|
|
||||||
Project page: http://www.public-software-group.org/utf8proc
|
|
||||||
|
|
||||||
|
|
||||||
68
README.md
Normal file
68
README.md
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
== libutf8proc ==
|
||||||
|
|
||||||
|
The [libutf8proc package](https://github.com/JuliaLang/libutf8proc) is
|
||||||
|
a lightly updated fork of the [utf8proc
|
||||||
|
library](http://www.public-software-group.org/utf8proc) from Jan
|
||||||
|
Behrens and the rest of the [Public Software
|
||||||
|
Group](http://www.public-software-group.org/), who deserve *nearly all
|
||||||
|
of the credit* for this package: a small, clean C library that
|
||||||
|
provides Unicode normalization, case-folding, and other operations for
|
||||||
|
data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8).
|
||||||
|
|
||||||
|
The reason for this fork is that utf8proc is used for basic Unicode
|
||||||
|
support in the [Julia language](http://julialang.org/) and the Julia
|
||||||
|
developers wanted Unicode 7 support and other features, but the
|
||||||
|
Public Software Group currently does not seem to have the resources
|
||||||
|
necessary to update utf8proc. We hope that the fork can be merged
|
||||||
|
back into the mainline utf8proc package before too long.
|
||||||
|
|
||||||
|
(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.
|
||||||
|
We removed those from libutf8proc in order to focus exclusively on the C
|
||||||
|
library for the time being. We will strive to keep API changes to a minimum,
|
||||||
|
so libutf8proc should still be usable with the old plug-in code.)
|
||||||
|
|
||||||
|
Like utf8proc, the libutf8proc package is licensed under the
|
||||||
|
free/open-source [MIT "expat"
|
||||||
|
license](http://opensource.org/licenses/MIT) (plus certain Unicode
|
||||||
|
data governed by the similarly permissive [Unicode data
|
||||||
|
license](http://www.unicode.org/copyright.html#Exhibit1)); please see
|
||||||
|
the included `LICENSE.md` file for more detailed information.
|
||||||
|
|
||||||
|
=== Quick Start ===
|
||||||
|
|
||||||
|
For compilation of the C library run `make`.
|
||||||
|
|
||||||
|
=== General Information ===
|
||||||
|
|
||||||
|
The C library is found in this directory after successful compilation
|
||||||
|
and is named `libutf8proc.a` (for the static library) and
|
||||||
|
`libutf8proc.so` (for the dynamic library).
|
||||||
|
|
||||||
|
The Unicode version being supported is 5.0.0.
|
||||||
|
*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as
|
||||||
|
version 5.0.0 had not been available at the time of implementation.
|
||||||
|
|
||||||
|
For Unicode normalizations, the following options are used:
|
||||||
|
|
||||||
|
* Normalization Form C: `STABLE`, COMPOSE`
|
||||||
|
* Normalization Form D: `STABLE`, `DECOMPOSE`
|
||||||
|
* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
|
||||||
|
* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`
|
||||||
|
|
||||||
|
=== C Library ===
|
||||||
|
|
||||||
|
The documentation for the C library is found in the `utf8proc.h` header file.
|
||||||
|
`utf8proc_map` is function you will most likely be using for mapping UTF-8
|
||||||
|
strings, unless you want to allocate memory yourself.
|
||||||
|
|
||||||
|
=== To Do ===
|
||||||
|
|
||||||
|
* detect stable code points and process segments independently in order to save memory
|
||||||
|
* do a quick check before normalizing strings to optimize speed
|
||||||
|
* support stream processing
|
||||||
|
|
||||||
|
=== Contact ===
|
||||||
|
|
||||||
|
Bug reports, feature requests, and other queries can be filed at
|
||||||
|
the [libutf8proc page on Github](https://github.com/JuliaLang/libutf8proc).
|
||||||
|
|
||||||
Loading…
Reference in New Issue
Block a user