markdown and other cosmetic updates

This commit is contained in:
Steven G. Johnson 2014-07-15 16:04:36 -04:00
parent c0f2b512a0
commit 0d7224a6d8
5 changed files with 116 additions and 107 deletions

10
.gitignore vendored Normal file
View File

@ -0,0 +1,10 @@
*.tar.gz
*.exe
*.dll
*.do
*.o
*.so
*.a
*.dll
*.dylib
*.dSYM

View File

@ -1,5 +1,13 @@
== libutf8proc license ==
Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany **libutf8proc** is a lightly updated version of the **utf8proc**
library by Jan Behrens and the rest of the Public Software Group, who
deserve nearly all of the credit for this library. Like utf8proc,
whose copyright and license statements are reproduced below, all new
work on the libutf8proc library is licensed under the [MIT "expat"
license](http://opensource.org/licenses/MIT):
*Copyright © 2014 by Steven G. Johnson.*
Permission is hereby granted, free of charge, to any person obtaining a Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"), copy of this software and associated documentation files (the "Software"),
@ -19,14 +27,37 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE. DEALINGS IN THE SOFTWARE.
== Original utf8proc license ==
*Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany*
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
== Unicode data license ==
This software distribution contains derived data from a modified version of This software distribution contains derived data from a modified version of
the Unicode data files. The following license applies to that data: the Unicode data files. The following license applies to that data:
COPYRIGHT AND PERMISSION NOTICE **COPYRIGHT AND PERMISSION NOTICE**
Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed *Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
under the Terms of Use in http://www.unicode.org/copyright.html. under the Terms of Use in http://www.unicode.org/copyright.html.*
Permission is hereby granted, free of charge, to any person obtaining a Permission is hereby granted, free of charge, to any person obtaining a
copy of the Unicode data files and any associated documentation (the "Data copy of the Unicode data files and any associated documentation (the "Data
@ -57,8 +88,6 @@ not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written dealings in these Data Files or Software without prior written
authorization of the copyright holder. authorization of the copyright holder.
Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
registered in some jurisdictions. All other trademarks and registered registered in some jurisdictions. All other trademarks and registered
trademarks mentioned herein are the property of their respective owners. trademarks mentioned herein are the property of their respective owners.

View File

@ -9,20 +9,12 @@ cc = $(CC) $(cflags)
# meta targets # meta targets
all: c-library
c-library: libutf8proc.a libutf8proc.so c-library: libutf8proc.a libutf8proc.so
ruby-library: ruby/utf8proc_native.so clean:
pgsql-library: pgsql/utf8proc_pgsql.so
all: c-library ruby-library ruby-gem pgsql-library
clean::
rm -f utf8proc.o libutf8proc.a libutf8proc.so rm -f utf8proc.o libutf8proc.a libutf8proc.so
cd ruby/ && test -e Makefile && (make clean && rm -f Makefile) || true
rm -Rf ruby/gem/lib ruby/gem/ext
rm -f ruby/gem/utf8proc-*.gem
cd pgsql/ && make clean
# real targets # real targets
@ -39,30 +31,3 @@ libutf8proc.so: utf8proc.o
libutf8proc.dylib: utf8proc.o libutf8proc.dylib: utf8proc.o
$(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@ $(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@
ruby/Makefile: ruby/extconf.rb
cd ruby && ruby extconf.rb
ruby/utf8proc_native.so: utf8proc.h utf8proc.c utf8proc_data.c \
ruby/utf8proc_native.c ruby/Makefile
cd ruby && make
ruby/gem/lib/utf8proc.rb: ruby/utf8proc.rb
test -e ruby/gem/lib || mkdir ruby/gem/lib
cp ruby/utf8proc.rb ruby/gem/lib/
ruby/gem/ext/extconf.rb: ruby/extconf.rb
test -e ruby/gem/ext || mkdir ruby/gem/ext
cp ruby/extconf.rb ruby/gem/ext/
ruby/gem/ext/utf8proc_native.c: utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c
test -e ruby/gem/ext || mkdir ruby/gem/ext
cat utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | grep -v '#include "utf8proc.h"' | grep -v '#include "utf8proc_data.c"' | grep -v '#include "../utf8proc.c"' > ruby/gem/ext/utf8proc_native.c
ruby-gem:: ruby/gem/lib/utf8proc.rb ruby/gem/ext/extconf.rb ruby/gem/ext/utf8proc_native.c
cd ruby/gem && gem build utf8proc.gemspec
pgsql/utf8proc_pgsql.so: utf8proc.h utf8proc.c utf8proc_data.c \
pgsql/utf8proc_pgsql.c
cd pgsql && make

63
README
View File

@ -1,63 +0,0 @@
Please read the LICENSE file, which is shipping with this software.
*** QUICK START ***
For compilation of the C library call "make c-library", for compilation of
the ruby library call "make ruby-library" and for compilation of the
PostgreSQL extension call "make pgsql-library".
For ruby you can also create a gem-file by calling "make ruby-gem".
"make all" can be used to build everything, but both ruby and PostgreSQL
installations are required in this case.
*** GENERAL INFORMATION ***
The C library is found in this directory after successful compilation and
is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of
the files "utf8proc.rb" and "utf8proc_native.so", which are found in the
subdirectory "ruby/". If you chose to create a gem-file it is placed in the
"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so"
and resides in the "pgsql/" directory.
Both the ruby library and the PostgreSQL extension are built as stand-alone
libraries and are therefore not dependent the dynamic version of the
C library files, but this behaviour might change in future releases.
The Unicode version being supported is 5.0.0.
Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as
version 5.0.0 had not been available at the time of implementation.
For Unicode normalizations, the following options have to be used:
Normalization Form C: STABLE, COMPOSE
Normalization Form D: STABLE, DECOMPOSE
Normalization Form KC: STABLE, COMPOSE, COMPAT
Normalization Form KD: STABLE, DECOMPOSE, COMPAT
*** C LIBRARY ***
The documentation for the C library is found in the utf8proc.h header file.
"utf8proc_map" is most likely function you will be using for mapping UTF-8
strings, unless you want to allocate memory yourself.
*** TODO ***
- detect stable code points and process segments independently in order to
save memory
- do a quick check before normalizing strings to optimize speed
- support stream processing
*** CONTACT ***
If you find any bugs or experience difficulties in compiling this software,
please contact us:
Project page: http://www.public-software-group.org/utf8proc

68
README.md Normal file
View File

@ -0,0 +1,68 @@
== libutf8proc ==
The [libutf8proc package](https://github.com/JuliaLang/libutf8proc) is
a lightly updated fork of the [utf8proc
library](http://www.public-software-group.org/utf8proc) from Jan
Behrens and the rest of the [Public Software
Group](http://www.public-software-group.org/), who deserve *nearly all
of the credit* for this package: a small, clean C library that
provides Unicode normalization, case-folding, and other operations for
data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8).
The reason for this fork is that utf8proc is used for basic Unicode
support in the [Julia language](http://julialang.org/) and the Julia
developers wanted Unicode 7 support and other features, but the
Public Software Group currently does not seem to have the resources
necessary to update utf8proc. We hope that the fork can be merged
back into the mainline utf8proc package before too long.
(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.
We removed those from libutf8proc in order to focus exclusively on the C
library for the time being. We will strive to keep API changes to a minimum,
so libutf8proc should still be usable with the old plug-in code.)
Like utf8proc, the libutf8proc package is licensed under the
free/open-source [MIT "expat"
license](http://opensource.org/licenses/MIT) (plus certain Unicode
data governed by the similarly permissive [Unicode data
license](http://www.unicode.org/copyright.html#Exhibit1)); please see
the included `LICENSE.md` file for more detailed information.
=== Quick Start ===
For compilation of the C library run `make`.
=== General Information ===
The C library is found in this directory after successful compilation
and is named `libutf8proc.a` (for the static library) and
`libutf8proc.so` (for the dynamic library).
The Unicode version being supported is 5.0.0.
*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as
version 5.0.0 had not been available at the time of implementation.
For Unicode normalizations, the following options are used:
* Normalization Form C: `STABLE`, COMPOSE`
* Normalization Form D: `STABLE`, `DECOMPOSE`
* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`
=== C Library ===
The documentation for the C library is found in the `utf8proc.h` header file.
`utf8proc_map` is function you will most likely be using for mapping UTF-8
strings, unless you want to allocate memory yourself.
=== To Do ===
* detect stable code points and process segments independently in order to save memory
* do a quick check before normalizing strings to optimize speed
* support stream processing
=== Contact ===
Bug reports, feature requests, and other queries can be filed at
the [libutf8proc page on Github](https://github.com/JuliaLang/libutf8proc).