### 测试用例(新增、改动、可能影响的功能)
@ -1,53 +0,0 @@
@ -14,63 +14,77 @@ if (defined(ohos_lite)) {
config("harfbuzz_config") {
include_dirs = [ "//third_party/harfbuzz/src" ]
include_dirs = [ "${target_gen_dir}/harfbuzz-2.8.2/src" ]
harfbuzz_source = [
action("harfbuzz_action") {
script = "//third_party/harfbuzz/install.py"
outputs = [
inputs = [ "//third_party/harfbuzz/harfbuzz-2.8.2.tar.xz" ]
harfbuzz_path = rebase_path("${target_gen_dir}", root_build_dir)
harfbuzz_source_path = rebase_path("//third_party/harfbuzz", root_build_dir)
args = [
if (defined(ohos_lite)) {
lite_library("harfbuzz") {
output_name = "harfbuzz"
sources = harfbuzz_source
sources = get_target_outputs(":harfbuzz_action")
deps = [ ":harfbuzz_action" ]
public_configs = [ ":harfbuzz_config" ]
if (defined(board_toolchain_type) && board_toolchain_type == "iccarm") {
target_type = "static_library"
@ -91,11 +105,12 @@ if (defined(ohos_lite)) {
} else {
ohos_static_library("harfbuzz_static") {
output_dir = "${root_out_dir}/thirdparty/harfbuzz"
output_name = "harfbuzz"
sources = harfbuzz_source
include_dirs = [ "src\base" ]
sources = get_target_outputs(":harfbuzz_action")
deps = [ ":harfbuzz_action" ]
include_dirs = [ "${target_gen_dir}/harfbuzz-2.8.2/src" ]
defines = [ "HAVE_PTHREAD = 1" ]
public_configs = [ ":harfbuzz_config" ]
part_name = "harfbuzz"
subsystem_name = "thirdparty"
@ -1,15 +0,0 @@
This is HarfBuzz, a text shaping library.
For bug reports, mailing list, and other information please visit:
For license information, see https://github.com/harfbuzz/harfbuzz/blob/master/COPYING
For build information, see https://github.com/harfbuzz/harfbuzz/blob/master/BUILD.md
For custom configurations, see https://github.com/harfbuzz/harfbuzz/blob/master/CONFIG.md
For test execution, see https://github.com/harfbuzz/harfbuzz/blob/master/TESTING.md
Documentation: https://harfbuzz.github.io
@ -3,7 +3,7 @@
"Name" : "harfbuzz",
"License" : "MIT License",
"License File" : "COPYING",
"Version Number" : "2.8.1",
"Version Number" : "2.8.2",
"Owner" : "liyujia4@huawei.com",
"Upstream URL" : "https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.1",
"Description" : "HarfBuzz is a text shaping engine. It primarily supports OpenType, but also Apple Advanced Typography. HarfBuzz is used in Android, Chrome, ChromeOS, Firefox, GNOME, GTK+, KDE, LibreOffice, OpenJDK, PlayStation, Qt, XeTeX, and other places."
@ -1,33 +0,0 @@
[](https://github.com/harfbuzz/harfbuzz/workflows/linux-ci/badge.svg)
[](https://circleci.com/gh/harfbuzz/harfbuzz/tree/master)
[](https://oss-fuzz-build-logs.storage.googleapis.com/index.html)
[](https://scan.coverity.com/projects/behdad-harfbuzz)
[](https://app.codacy.com/app/behdad/harfbuzz)
[](https://codecov.io/gh/harfbuzz/harfbuzz)
[](https://coveralls.io/r/harfbuzz/harfbuzz)
[](https://repology.org/project/harfbuzz/versions)
[ABI Tracker](http://abi-laboratory.pro/tracker/timeline/harfbuzz/)
This is HarfBuzz, a text shaping library.
For bug reports, mailing list, and other information please visit:
For license information, see [COPYING](COPYING).
For build information, see [BUILD.md](BUILD.md).
For custom configurations, see [CONFIG.md](CONFIG.md).
For test execution, see [TESTING.md](TESTING.md).
Documentation: https://harfbuzz.github.io
<summary>Packaging status of HarfBuzz</summary>
[](https://repology.org/project/harfbuzz/versions)
@ -1,47 +0,0 @@
For the development of HarfBuzz, the Microsoft shaping technology, Uniscribe,
as a widely used and tested shaper is used as more-or-less OpenType reference
implementation and that specially is important where OpenType specification
is or wasn't that clear. For having access to Uniscribe on Linux/macOS these
steps are recommended:
You want to follow the 32bit instructions. The 64bit equivalents are included
for reference.
1. Install Wine.
- Fedora: `dnf install wine`.
2. Install `mingw-w64` compiler.
- Fedora, 32bit: `dnf install mingw32-gcc-c++`
- Fedora, 64bit: `dnf install mingw64-gcc-c++`
- Debian: `apt install g++-mingw-w64`
- Mac: `brew install mingw-w64`
3. If you have drank the `meson` koolaid, look at `.ci/build-win32.sh` to see how to
invoke `meson` now, or just run that script. Otherwise, here's how to use the
old trusty autotools instead:
a) Install dependencies.
- Fedora, 32bit: `dnf install mingw32-glib2 mingw32-cairo mingw32-freetype`
- Fedora, 64bit: `dnf install mingw64-glib2 mingw64-cairo mingw64-freetype`
b) Configure:
- `NOCONFIGURE=1 ./autogen.sh && mkdir winbuild && cd winbuild`
- 32bit: `../mingw-configure.sh i686`
- 64bit: `../mingw-configure.sh x86_64`
Now you can use `hb-shape` by `(cd win32build/util && wine hb-shape.exe)`
but if you like to shape with the Microsoft Uniscribe:
4. Bring a 32bit version of `usp10.dll` for yourself from `C:\Windows\SysWOW64\usp10.dll` of your
Windows installation (assuming you have a 64-bit installation, otherwise
`C:\Windows\System32\usp10.dll`) that it is not a DirectWrite proxy
([for more info](https://en.wikipedia.org/wiki/Uniscribe)).
Rule of thumb, your `usp10.dll` should have a size more than 500kb, otherwise
it is designed to work with DirectWrite which Wine can't work with its original one.
You want a Uniscribe from Windows 7 or older.
Put the DLL in the folder you are going to run the next command,
5. `WINEDLLOVERRIDES="usp10=n" wine hb-shape.exe fontname.ttf -u 0061,0062,0063 --shaper=uniscribe`
(`0061,0062,0063` means `abc`, use test/shaping/hb-unicode-decode to generate ones you need)
@ -1,44 +0,0 @@
HarfBuzz release walk-through checklist:
1. Open gitk and review changes since last release.
* `git diff $(git describe | sed 's/-.*//').. src/*.h` prints all public API
Document them in NEWS. All API and API semantic changes should be clearly
marked as API additions, API changes, or API deletions. Document
deprecations. Ensure all new API / deprecations are in listed correctly in
docs/harfbuzz-sections.txt. If release added new API, add entry for new
API index at the end of docs/harfbuzz-docs.xml.
If there's a backward-incompatible API change (including deletions for API
used anywhere), that's a release blocker. Do NOT release.
2. Based on severity of changes, decide whether it's a minor or micro release
number bump,
3. Search for REPLACEME on the repository and replace it with the chosen version
for the release.
4. Make sure you have correct date and new version at the top of NEWS file.
5. Bump version in line 3 of meson.build and configure.ac.
Do a `meson test -Cbuild` so it both checks the tests and updates
hb-version.h (use `git diff` to see if is really updated).
6. Commit NEWS, meson.build, configure.ac, and src/hb-version.h, as well as any REPLACEME
changes you made. The commit message is simply the release number. Eg. "1.4.7"
7. Do a `meson dist -Cbuild` that runs the tests against the latest commited changes.
If doesn't pass, something fishy is going on, reset the repo and start over.
8. Tag the release and sign it: Eg. "git tag -s 1.4.7 -m 1.4.7". Enter your
GPG password.
9. Build win32 bundle. See [README.mingw.md](README.mingw.md).
10. Push the commit and tag out: "git push --follow-tags".
11. Go to GitHub release page [here](https://github.com/harfbuzz/harfbuzz/releases),
edit the tag, upload win32 bundle and NEWS entry and save.
No need to upload source tarball as we rely to GitHub's automatic tar.gz generation.
@ -1,55 +0,0 @@
## Build and Test
meson build
ninja -Cbuild
meson test -Cbuild
### Debug with GDB
meson test -Cbuild --gdb testname
## Build and Run
Depending on what area you are working in change or add `HB_DEBUG_<whatever>`.
Values defined in `hb-debug.hh`.
CPPFLAGS='-DHB_DEBUG_SUBSET=100' meson setup build --reconfigure
meson test -C build
### Run tests with asan
meson setup build -Db_sanitize=address --reconfigure
meson compile -C build
meson test -C build
### Enable Debug Logging
CPPFLAGS=-DHB_DEBUG_SUBSET=100 meson build --reconfigure
ninja -C build
## Test with the Fuzzer
CXXFLAGS="-fsanitize=address,fuzzer-no-link" meson fuzzbuild --default-library=static -Dfuzzer_ldflags="-fsanitize=address,fuzzer" -Dexperimental_api=true
ninja -Cfuzzbuild test/fuzzing/hb-{shape,draw,subset,set}-fuzzer
fuzzbuild/test/fuzzing/hb-subset-fuzzer test/fuzzing/fonts
## Profiling
meson build --reconfigure
meson compile -C build
@ -1,7 +0,0 @@
Bradley Grainger
Kenichi Ishibashi
Ivan Kuckir <https://photopea.com/>
Ryan Lortie
Jeff Muizelaar
suzuki toshiya
Philip Withnall
@ -1,484 +0,0 @@
Before Width: | Height: | Size: 6.1 KiB |
@ -1,123 +0,0 @@
@ -1,188 +0,0 @@
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
<!ENTITY version SYSTEM "version.xml">
<book id="index">
<title>HarfBuzz Manual</title>
<graphic fileref="HarfBuzz.png" format="PNG" align="center"/>
HarfBuzz is a text shaping library. Using the HarfBuzz library allows
programs to convert a sequence of Unicode input into
properly formatted and positioned glyph output—for any writing
system and language.
The canonical source-code tree is available at
<ulink url="https://github.com/harfbuzz/harfbuzz">github.com/harfbuzz/harfbuzz</ulink>.
See <xref linkend="download" endterm="download.title"/> for
release tarballs.
<part id="user-manual">
<title>User's manual</title>
<xi:include href="usermanual-what-is-harfbuzz.xml"/>
<xi:include href="usermanual-install-harfbuzz.xml"/>
<xi:include href="usermanual-getting-started.xml"/>
<xi:include href="usermanual-shaping-concepts.xml"/>
<xi:include href="usermanual-object-model.xml"/>
<xi:include href="usermanual-buffers-language-script-and-direction.xml"/>
<xi:include href="usermanual-fonts-and-faces.xml"/>
<xi:include href="usermanual-opentype-features.xml"/>
<xi:include href="usermanual-clusters.xml"/>
<xi:include href="usermanual-utilities.xml"/>
<xi:include href="usermanual-integration.xml"/>
<part id="reference-manual">
This document is for HarfBuzz &version;.
<!--The latest version of this documentation can be found on-line at
<ulink role="online-location" url="http://[SERVER]/libharfbuzz/index.html">http://[SERVER]/libharfbuzz/</ulink>.-->
<title>Reference manual</title>
<chapter id="core-api">
<title>Core API</title>
<xi:include href="xml/hb-blob.xml"/>
<xi:include href="xml/hb-buffer.xml"/>
<xi:include href="xml/hb-common.xml"/>
<xi:include href="xml/hb-deprecated.xml"/>
<xi:include href="xml/hb-face.xml"/>
<xi:include href="xml/hb-font.xml"/>
<xi:include href="xml/hb-map.xml"/>
<xi:include href="xml/hb-set.xml"/>
<xi:include href="xml/hb-shape-plan.xml"/>
<xi:include href="xml/hb-shape.xml"/>
<xi:include href="xml/hb-unicode.xml"/>
<xi:include href="xml/hb-version.xml"/>
<chapter id="opentype-api">
<title>OpenType API</title>
<xi:include href="xml/hb-ot-color.xml"/>
<xi:include href="xml/hb-ot-font.xml"/>
<xi:include href="xml/hb-ot-layout.xml"/>
<xi:include href="xml/hb-ot-math.xml"/>
<xi:include href="xml/hb-ot-meta.xml"/>
<xi:include href="xml/hb-ot-metrics.xml"/>
<xi:include href="xml/hb-ot-name.xml"/>
<xi:include href="xml/hb-ot-shape.xml"/>
<xi:include href="xml/hb-ot-var.xml"/>
<chapter id="apple-advanced-typography-api">
<title>Apple Advanced Typography API</title>
<xi:include href="xml/hb-aat-layout.xml"/>
<chapter id="integration-api">
<title>Integration API</title>
<xi:include href="xml/hb-coretext.xml"/>
<xi:include href="xml/hb-ft.xml"/>
<xi:include href="xml/hb-glib.xml"/>
<xi:include href="xml/hb-graphite2.xml"/>
<xi:include href="xml/hb-icu.xml"/>
<xi:include href="xml/hb-uniscribe.xml"/>
<xi:include href="xml/hb-gdi.xml"/>
<xi:include href="xml/hb-directwrite.xml"/>
<!--chapter id="object-tree">
<title>Object Hierarchy</title>
<xi:include href="xml/tree_index.sgml"/>
<index id="api-index-full"><title>API Index</title><xi:include href="xml/api-index-full.xml"><xi:fallback /></xi:include></index>
<index id="deprecated-api-index" role="deprecated"><title>Index of deprecated API</title><xi:include href="xml/api-index-deprecated.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-7-3" role="2.7.3"><title>Index of new symbols in 2.7.3</title><xi:include href="xml/api-index-2.7.3.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-6-8" role="2.6.8"><title>Index of new symbols in 2.6.8</title><xi:include href="xml/api-index-2.6.8.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-6-5" role="2.6.5"><title>Index of new symbols in 2.6.5</title><xi:include href="xml/api-index-2.6.5.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-6-3" role="2.6.3"><title>Index of new symbols in 2.6.3</title><xi:include href="xml/api-index-2.6.3.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-6-0" role="2.6.0"><title>Index of new symbols in 2.6.0</title><xi:include href="xml/api-index-2.6.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-5-0" role="2.5.0"><title>Index of new symbols in 2.5.0</title><xi:include href="xml/api-index-2.5.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-4-0" role="2.4.0"><title>Index of new symbols in 2.4.0</title><xi:include href="xml/api-index-2.4.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-3-0" role="2.3.0"><title>Index of new symbols in 2.3.0</title><xi:include href="xml/api-index-2.3.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-2-0" role="2.2.0"><title>Index of new symbols in 2.2.0</title><xi:include href="xml/api-index-2.2.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-1-0" role="2.1.0"><title>Index of new symbols in 2.1.0</title><xi:include href="xml/api-index-2.1.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-2-0-0" role="2.0.0"><title>Index of new symbols in 2.0.0</title><xi:include href="xml/api-index-2.0.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-9-0" role="1.9.0"><title>Index of new symbols in 1.9.0</title><xi:include href="xml/api-index-1.9.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-8-6" role="1.8.6"><title>Index of new symbols in 1.8.6</title><xi:include href="xml/api-index-1.8.6.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-8-5" role="1.8.5"><title>Index of new symbols in 1.8.5</title><xi:include href="xml/api-index-1.8.5.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-8-1" role="1.8.1"><title>Index of new symbols in 1.8.1</title><xi:include href="xml/api-index-1.8.1.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-8-0" role="1.8.0"><title>Index of new symbols in 1.8.0</title><xi:include href="xml/api-index-1.8.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-7-7" role="1.7.7"><title>Index of new symbols in 1.7.7</title><xi:include href="xml/api-index-1.7.7.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-7-5" role="1.7.5"><title>Index of new symbols in 1.7.5</title><xi:include href="xml/api-index-1.7.5.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-7-2" role="1.7.2"><title>Index of new symbols in 1.7.2</title><xi:include href="xml/api-index-1.7.2.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-6-0" role="1.6.0"><title>Index of new symbols in 1.6.0</title><xi:include href="xml/api-index-1.6.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-5-0" role="1.5.0"><title>Index of new symbols in 1.5.0</title><xi:include href="xml/api-index-1.5.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-4-3" role="1.4.3"><title>Index of new symbols in 1.4.3</title><xi:include href="xml/api-index-1.4.3.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-4-2" role="1.4.2"><title>Index of new symbols in 1.4.2</title><xi:include href="xml/api-index-1.4.2.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-4-0" role="1.4.0"><title>Index of new symbols in 1.4.0</title><xi:include href="xml/api-index-1.4.0.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-3-3" role="1.3.3"><title>Index of new symbols in 1.3.3</title><xi:include href="xml/api-index-1.3.3.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-2-3" role="1.2.3"><title>Index of new symbols in 1.2.3</title><xi:include href="xml/api-index-1.2.3.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-1-3" role="1.1.3"><title>Index of new symbols in 1.1.3</title><xi:include href="xml/api-index-1.1.3.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-1-2" role="1.1.2"><title>Index of new symbols in 1.1.2</title><xi:include href="xml/api-index-1.1.2.xml"><xi:fallback /></xi:include></index>
<index id="api-index-1-0-5" role="1.0.5"><title>Index of new symbols in 1.0.5</title><xi:include href="xml/api-index-1.0.5.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-42" role="0.9.42"><title>Index of new symbols in 0.9.42</title><xi:include href="xml/api-index-0.9.42.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-41" role="0.9.41"><title>Index of new symbols in 0.9.41</title><xi:include href="xml/api-index-0.9.41.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-39" role="0.9.39"><title>Index of new symbols in 0.9.39</title><xi:include href="xml/api-index-0.9.39.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-38" role="0.9.38"><title>Index of new symbols in 0.9.38</title><xi:include href="xml/api-index-0.9.38.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-33" role="0.9.33"><title>Index of new symbols in 0.9.33</title><xi:include href="xml/api-index-0.9.33.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-31" role="0.9.31"><title>Index of new symbols in 0.9.31</title><xi:include href="xml/api-index-0.9.31.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-30" role="0.9.30"><title>Index of new symbols in 0.9.30</title><xi:include href="xml/api-index-0.9.30.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-28" role="0.9.28"><title>Index of new symbols in 0.9.28</title><xi:include href="xml/api-index-0.9.28.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-26" role="0.9.26"><title>Index of new symbols in 0.9.26</title><xi:include href="xml/api-index-0.9.26.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-22" role="0.9.22"><title>Index of new symbols in 0.9.22</title><xi:include href="xml/api-index-0.9.22.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-21" role="0.9.21"><title>Index of new symbols in 0.9.21</title><xi:include href="xml/api-index-0.9.21.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-20" role="0.9.20"><title>Index of new symbols in 0.9.20</title><xi:include href="xml/api-index-0.9.20.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-11" role="0.9.11"><title>Index of new symbols in 0.9.11</title><xi:include href="xml/api-index-0.9.11.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-10" role="0.9.10"><title>Index of new symbols in 0.9.10</title><xi:include href="xml/api-index-0.9.10.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-8" role="0.9.8"><title>Index of new symbols in 0.9.8</title><xi:include href="xml/api-index-0.9.8.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-7" role="0.9.7"><title>Index of new symbols in 0.9.7</title><xi:include href="xml/api-index-0.9.7.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-5" role="0.9.5"><title>Index of new symbols in 0.9.5</title><xi:include href="xml/api-index-0.9.5.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-9-2" role="0.9.2"><title>Index of new symbols in 0.9.2</title><xi:include href="xml/api-index-0.9.2.xml"><xi:fallback /></xi:include></index>
<index id="api-index-0-6-0" role="0.6.0"><title>Index of new symbols in 0.6.0</title><xi:include href="xml/api-index-0.6.0.xml"><xi:fallback /></xi:include></index>
<xi:include href="xml/annotation-glossary.xml"><xi:fallback /></xi:include>
The current HarfBuzz codebase is versioned 2.x.x and is stable
and under active maintenance. This is what is used in latest
versions of Firefox, GNOME, ChromeOS, Chrome, LibreOffice,
XeTeX, Android, and KDE, among other places.
Prior to 2012, the original HarfBuzz codebase (which, these days, is
referred to as <emphasis>harfbuzz-old</emphasis>) was derived from code
in <ulink url="http://freetype.org/">FreeType</ulink>,
<ulink url="http://pango.org/">Pango</ulink>, and
<ulink url="http://qt-project.org/">Qt</ulink>.
It is <emphasis>not</emphasis> actively developed or maintained, and is
extremely buggy. All users of harfbuzz-old are encouraged to switch over
to the new HarfBuzz as soon as possible.
To make this distinction clearer in discussions, the current HarfBuzz
codebase is sometimes referred to as <emphasis>harfbuzz-ng</emphasis>.
For reference purposes, the harfbuzz-old source tree is archived
<ulink url="http://cgit.freedesktop.org/harfbuzz.old/">here</ulink>.
There are no release tarballs of harfbuzz-old whatsoever.
@ -1,705 +0,0 @@
@ -1,412 +0,0 @@
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
<!ENTITY version SYSTEM "version.xml">
<chapter id="buffers-language-script-and-direction">
<title>Buffers, language, script and direction</title>
The input to the HarfBuzz shaper is a series of Unicode characters, stored in a
buffer. In this chapter, we'll look at how to set up a buffer with
the text that we want and how to customize the properties of the
buffer. We'll also look at a piece of lower-level machinery that
you will need to understand before proceeding: the functions that
HarfBuzz uses to retrieve Unicode information.
After shaping is complete, HarfBuzz puts its output back
into the buffer. But getting that output requires setting up a
face and a font first, so we will look at that in the next chapter
instead of here.
<section id="creating-and-destroying-buffers">
<title>Creating and destroying buffers</title>
As we saw in our <emphasis>Getting Started</emphasis> example, a
buffer is created and
initialized with <function>hb_buffer_create()</function>. This
produces a new, empty buffer object, instantiated with some
default values and ready to accept your Unicode strings.
HarfBuzz manages the memory of objects (such as buffers) that it
creates, so you don't have to. When you have finished working on
a buffer, you can call <function>hb_buffer_destroy()</function>:
<programlisting language="C">
hb_buffer_t *buf = hb_buffer_create();
This will destroy the object and free its associated memory -
unless some other part of the program holds a reference to this
buffer. If you acquire a HarfBuzz buffer from another subsystem
and want to ensure that it is not garbage collected by someone
else destroying it, you should increase its reference count:
<programlisting language="C">
void somefunc(hb_buffer_t *buf) {
buf = hb_buffer_reference(buf);
And then decrease it once you're done with it:
<programlisting language="C">
While we are on the subject of reference-counting buffers, it is
worth noting that an individual buffer can only meaningfully be
used by one thread at a time.
To throw away all the data in your buffer and start from scratch,
call <function>hb_buffer_reset(buf)</function>. If you want to
throw away the string in the buffer but keep the options, you can
instead call <function>hb_buffer_clear_contents(buf)</function>.
<section id="adding-text-to-the-buffer">
<title>Adding text to the buffer</title>
Now we have a brand new HarfBuzz buffer. Let's start filling it
with text! From HarfBuzz's perspective, a buffer is just a stream
of Unicode code points, but your input string is probably in one of
the standard Unicode character encodings (UTF-8, UTF-16, or
UTF-32). HarfBuzz provides convenience functions that accept
each of these encodings:
<function>hb_buffer_add_utf16()</function>, and
<function>hb_buffer_add_utf32()</function>. Other than the
character encoding they accept, they function identically.
You can add UTF-8 text to a buffer by passing in the text array,
the array's length, an offset into the array for the first
character to add, and the length of the segment to add:
<programlisting language="C">
hb_buffer_add_utf8 (hb_buffer_t *buf,
const char *text,
int text_length,
unsigned int item_offset,
int item_length)
So, in practice, you can say:
<programlisting language="C">
hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
This will append your new characters to
<parameter>buf</parameter>, not replace its existing
contents. Also, note that you can use <literal>-1</literal> in
place of the first instance of <function>strlen(text)</function>
if your text array is NULL-terminated. Similarly, you can also use
<literal>-1</literal> as the final argument want to add its full
Whatever start <parameter>item_offset</parameter> and
<parameter>item_length</parameter> you provide, HarfBuzz will also
attempt to grab the five characters <emphasis>before</emphasis>
the offset point and the five characters
<emphasis>after</emphasis> the designated end. These are the
before and after "context" segments, which are used internally
for HarfBuzz to make shaping decisions. They will not be part of
the final output, but they ensure that HarfBuzz's
script-specific shaping operations are correct. If there are
fewer than five characters available for the before or after
contexts, HarfBuzz will just grab what is there.
For longer text runs, such as full paragraphs, it might be
tempting to only add smaller sub-segments to a buffer and
shape them in piecemeal fashion. Generally, this is not a good
idea, however, because a lot of shaping decisions are
dependent on this context information. For example, in Arabic
and other connected scripts, HarfBuzz needs to know the code
points before and after each character in order to correctly
determine which glyph to return.
The safest approach is to add all of the text available (even
if your text contains a mix of scripts, directions, languages
and fonts), then use <parameter>item_offset</parameter> and
<parameter>item_length</parameter> to indicate which characters you
want shaped (which must all have the same script, direction,
language and font), so that HarfBuzz has access to any context.
You can also add Unicode code points directly with
<function>hb_buffer_add_codepoints()</function>. The arguments
to this function are the same as those for the UTF
encodings. But it is particularly important to note that
HarfBuzz does not do validity checking on the text that is added
to a buffer. Invalid code points will be replaced, but it is up
to you to do any deep-sanity checking necessary.
<section id="setting-buffer-properties">
<title>Setting buffer properties</title>
Buffers containing input characters still need several
properties set before HarfBuzz can shape their text correctly.
Initially, all buffers are set to the
<literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content
type. After adding text, the buffer should be set to
<literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which
indicates that it contains un-shaped input
characters. After shaping, the buffer will have the
<literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.
<function>hb_buffer_add_utf8()</function> and the
other UTF functions set the content type of their buffer
automatically. But if you are reusing a buffer you may want to
check its state with
<function>hb_buffer_get_content_type(buffer)</function>. If
necessary you can set the content type with
<programlisting language="C">
hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
to prepare for shaping.
Buffers also need to carry information about the script,
language, and text direction of their contents. You can set
these properties individually:
<programlisting language="C">
hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
hb_buffer_set_language(buf, hb_language_from_string("en", -1));
However, since these properties are often repeated for
multiple text runs, you can also save them in a
<literal>hb_segment_properties_t</literal> for reuse:
<programlisting language="C">
hb_segment_properties_t *savedprops;
hb_buffer_get_segment_properties (buf, savedprops);
hb_buffer_set_segment_properties (buf2, savedprops);
HarfBuzz also provides getter functions to retrieve a buffer's
direction, script, and language properties individually.
HarfBuzz recognizes four text directions in
<type>hb_direction_t</type>: left-to-right
(<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),
top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and
bottom-to-top (<literal>HB_DIRECTION_BTT</literal>). For the
script property, HarfBuzz uses identifiers based on the
url="https://unicode.org/iso15924/">ISO 15924
standard</ulink>. For languages, HarfBuzz uses tags based on the
<ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.
Helper functions are provided to convert character strings into
the necessary script and language tag types.
Two additional buffer properties to be aware of are the
"invisible glyph" and the replacement code point. The
replacement code point is inserted into buffer output in place of
any invalid code points encountered in the input. By default, it
is the Unicode <literal>REPLACEMENT CHARACTER</literal> code
point, <literal>U+FFFD</literal> "�". You can change this with
<programlisting language="C">
hb_buffer_set_replacement_codepoint(buf, replacement);
passing in the replacement Unicode code point as the
<parameter>replacement</parameter> parameter.
The invisible glyph is used to replace all output glyphs that
are invisible. By default, the standard space character
<literal>U+0020</literal> is used; you can replace this (for
example, when using a font that provides script-specific
spaces) with
<programlisting language="C">
hb_buffer_set_invisible_glyph(buf, replacement_glyph);
Do note that in the <parameter>replacement_glyph</parameter>
parameter, you must provide the glyph ID of the replacement you
wish to use, not the Unicode code point.
HarfBuzz supports a few additional flags you might want to set
on your buffer under certain circumstances. The
<literal>HB_BUFFER_FLAG_BOT</literal> and
<literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz
that the buffer represents the beginning or end (respectively)
of a text element (such as a paragraph or other block). Knowing
this allows HarfBuzz to apply certain contextual font features
when shaping, such as initial or final variants in connected
tells HarfBuzz not to hide glyphs with the
<literal>Default_Ignorable</literal> property in Unicode. This
property designates control characters and other non-printing
code points, such as joiners and variation selectors. Normally
HarfBuzz replaces them in the output buffer with zero-width
space glyphs (using the "invisible glyph" property discussed
above); setting this flag causes them to be printed, which can
be helpful for troubleshooting.
Conversely, setting the
tells HarfBuzz to remove <literal>Default_Ignorable</literal>
glyphs from the output buffer entirely. Finally, setting the
flag tells HarfBuzz not to insert the dotted-circle glyph
(<literal>U+25CC</literal>, "◌"), which is normally
inserted into buffer output when broken character sequences are
encountered (such as combining marks that are not attached to a
base character).
<section id="customizing-unicode-functions">
<title>Customizing Unicode functions</title>
HarfBuzz requires some simple functions for accessing
information from the Unicode Character Database (such as the
<literal>General_Category</literal> (gc) and
<literal>Script</literal> (sc) properties) that is useful
for shaping, as well as some useful operations like composing and
decomposing code points.
HarfBuzz includes its own internal, lightweight set of Unicode
functions. At build time, it is also possible to compile support
for some other options, such as the Unicode functions provided
by GLib or the International Components for Unicode (ICU)
library. Generally, this option is only of interest for client
programs that have specific integration requirements or that do
a significant amount of customization.
If your program has access to other Unicode functions, however,
such as through a system library or application framework, you
might prefer to use those instead of the built-in
options. HarfBuzz supports this by implementing its Unicode
functions as a set of virtual methods that you can replace —
without otherwise affecting HarfBuzz's functionality.
The Unicode functions are specified in a structure called
<literal>unicode_funcs</literal> which is attached to each
buffer. But even though <literal>unicode_funcs</literal> is
associated with a <type>hb_buffer_t</type>, the functions
themselves are called by other HarfBuzz APIs that access
buffers, so it would be unwise for you to hook different
functions into different buffers.
In addition, you can mark your <literal>unicode_funcs</literal>
as immutable by calling
<function>hb_unicode_funcs_make_immutable (ufuncs)</function>.
This is especially useful if your code is a
library or framework that will have its own client programs. By
marking your Unicode function choices as immutable, you prevent
your own client programs from changing the
<literal>unicode_funcs</literal> configuration and introducing
inconsistencies and errors downstream.
You can retrieve the Unicode-functions configuration for
your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:
<programlisting language="C">
hb_unicode_funcs_t *ufunctions;
ufunctions = hb_buffer_get_unicode_funcs(buf);
The current version of <literal>unicode_funcs</literal> uses six functions:
returns the Canonical Combining Class of a code point.
returns the General Category (gc) of a code point.
<function>hb_unicode_mirroring_func_t</function>: returns
the Mirroring Glyph code point (for bi-directional
replacement) of a code point.
<function>hb_unicode_script_func_t</function>: returns the
Script (sc) property of a code point.
<function>hb_unicode_compose_func_t</function>: returns the
canonical composition of a sequence of two code points.
<function>hb_unicode_decompose_func_t</function>: returns
the canonical decomposition of a code point.
Note, however, that future HarfBuzz releases may alter this set.
Each Unicode function has a corresponding setter, with which you
can assign a callback to your replacement function. For example,
to replace
<function>hb_unicode_general_category_func_t</function>, you can call
<programlisting language="C">
hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy)
Virtualizing this set of Unicode functions is primarily intended
to improve portability. There is no need for every client
program to make the effort to replace the default options, so if
you are unsure, do not feel any pressure to customize
@ -1,697 +0,0 @@
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
<!ENTITY version SYSTEM "version.xml">
<chapter id="clusters">
<section id="clusters-and-shaping">
<title>Clusters and shaping</title>
In text shaping, a <emphasis>cluster</emphasis> is a sequence of
characters that needs to be treated as a single, indivisible
unit. A single letter or symbol can be a cluster of its
own. Other clusters correspond to longer subsequences of the
input code points — such as a ligature or conjunct form
— and require the shaper to ensure that the cluster is not
broken during the shaping process.
A cluster is distinct from a <emphasis>grapheme</emphasis>,
which is the smallest unit of meaning in a writing system or
The definitions of the two terms are similar. However, clusters
are only relevant for script shaping and glyph layout. In
contrast, graphemes are a property of the underlying script, and
are of interest when client programs implement orthographic
or linguistic functionality.
For example, two individual letters are often two separate
graphemes. When two letters form a ligature, however, they
combine into a single glyph. They are then part of the same
cluster and are treated as a unit by the shaping engine —
even though the two original, underlying letters remain separate
HarfBuzz is concerned with clusters, <emphasis>not</emphasis>
with graphemes — although client programs using HarfBuzz
may still care about graphemes for other reasons from time to time.
During the shaping process, there are several shaping operations
that may merge adjacent characters (for example, when two code
points form a ligature or a conjunct form and are replaced by a
single glyph) or split one character into several (for example,
when decomposing a code point through the
<literal>ccmp</literal> feature). Operations like these alter
clusters; HarfBuzz tracks the changes to ensure that no clusters
get lost or broken during shaping.
HarfBuzz records cluster information independently from how
shaping operations affect the individual glyphs returned in an
output buffer. Consequently, a client program using HarfBuzz can
utilize the cluster information to implement features such as:
Correctly positioning the cursor within a shaped text run,
even when characters have formed ligatures, composed or
decomposed, reordered, or undergone other shaping operations.
Correctly highlighting a text selection that includes some,
but not all, of the characters in a word.
Applying text attributes (such as color or underlining) to
part, but not all, of a word.
Generating output document formats (such as PDF) with
embedded text that can be fully extracted.
Determining the mapping between input characters and output
glyphs, such as which glyphs are ligatures.
Performing line-breaking, justification, and other
line-level or paragraph-level operations that must be done
after shaping is complete, but which require examining
character-level properties.
<section id="working-with-harfbuzz-clusters">
<title>Working with HarfBuzz clusters</title>
When you add text to a HarfBuzz buffer, each code point must be
assigned a <emphasis>cluster value</emphasis>.
This cluster value is an arbitrary number; HarfBuzz uses it only
to distinguish between clusters. Many client programs will use
the index of each code point in the input text stream as the
cluster value. This is for the sake of convenience; the actual
value does not matter.
Some of the shaping operations performed by HarfBuzz —
such as reordering, composition, decomposition, and substitution
— may alter the cluster values of some characters. The
final cluster values in the buffer at the end of the shaping
process will indicate to client programs which subsequences of
glyphs represent a cluster and, therefore, must not be
In addition, client programs can query the final cluster values
to discern other potentially important information about the
glyphs in the output buffer (such as whether or not a ligature
was formed).
For example, if the initial sequence of cluster values was:
and the final sequence of cluster values is:
then there are two clusters in the output buffer: the first
cluster includes the first two glyphs, and the second cluster
includes the third and fourth glyphs. It is also evident that a
ligature or conjunct has been formed, because there are fewer
glyphs in the output buffer (four) than there were code points
in the input buffer (five).
Although client programs using HarfBuzz are free to assign
initial cluster values in any manner they choose to, HarfBuzz
does offer some useful guarantees if the cluster values are
assigned in a monotonic (either non-decreasing or non-increasing)
For buffers in the left-to-right (LTR)
or top-to-bottom (TTB) text flow direction,
HarfBuzz will preserve the monotonic property: client programs
are guaranteed that monotonically increasing initial cluster
values will be returned as monotonically increasing final
cluster values.
For buffers in the right-to-left (RTL)
or bottom-to-top (BTT) text flow direction,
the directionality of the buffer itself is reversed for final
output as a matter of design. Therefore, HarfBuzz inverts the
monotonic property: client programs are guaranteed that
monotonically increasing initial cluster values will be
returned as monotonically <emphasis>decreasing</emphasis> final
cluster values.
Client programs can adjust how HarfBuzz handles clusters during
shaping by setting the
<literal>cluster_level</literal> of the
buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
clustering support for this property:
<para><emphasis>Level 0</emphasis> is the default and
reproduces the behavior of the old HarfBuzz library.
The distinguishing feature of level 0 behavior is that, at
the beginning of processing the buffer, all code points that
are categorized as <emphasis>marks</emphasis>,
<emphasis>modifier symbols</emphasis>, or
<emphasis>Emoji extended pictographic</emphasis> modifiers,
as well as the <emphasis>Zero Width Joiner</emphasis> and
<emphasis>Zero Width Non-Joiner</emphasis> code points, are
assigned the cluster value of the closest preceding code
point from <emphasis>different</emphasis> category.
In essence, whenever a base character is followed by a mark
character or a sequence of mark characters, those marks are
reassigned to the same initial cluster value as the base
character. This reassignment is referred to as
"merging" the affected clusters. This behavior is based on
the Grapheme Cluster Boundary specification in <ulink
Technical Report 29</ulink>.
Client programs can specify level 0 behavior for a buffer by
setting its <literal>cluster_level</literal> to
<emphasis>Level 1</emphasis> tweaks the old behavior
slightly to produce better results. Therefore, level 1
clustering is recommended for code that is not required to
implement backward compatibility with the old HarfBuzz.
Level 1 differs from level 0 by not merging the
clusters of marks and other modifier code points with the
preceding "base" code point's cluster. By preserving the
separate cluster values of these marks and modifier code
points, script shapers can perform additional operations
that might lead to improved results (for example, reordering
a sequence of marks).
Client programs can specify level 1 behavior for a buffer by
setting its <literal>cluster_level</literal> to
<emphasis>Level 2</emphasis> differs significantly in how it
treats cluster values. In level 2, HarfBuzz never merges
This difference can be seen most clearly when HarfBuzz processes
ligature substitutions and glyph decompositions. In level 0
and level 1, ligatures and glyph decomposition both involve
merging clusters; in level 2, neither of these operations
triggers a merge.
Client programs can specify level 2 behavior for a buffer by
setting its <literal>cluster_level</literal> to
As mentioned earlier, client programs using HarfBuzz often
assign initial cluster values in a buffer by reusing the indices
of the code points in the input text. This gives a sequence of
cluster values that is monotonically increasing (for example,
It is not <emphasis>required</emphasis> that the cluster values
in a buffer be monotonically increasing. However, if the initial
cluster values in a buffer are monotonic and the buffer is
configured to use cluster level 0 or 1, then HarfBuzz
guarantees that the final cluster values in the shaped buffer
will also be monotonic. No such guarantee is made for cluster
level 2.
In levels 0 and 1, HarfBuzz implements the following conceptual
model for cluster values:
<itemizedlist spacing="compact">
If the sequence of input cluster values is monotonic, the
sequence of cluster values will remain monotonic.
Each cluster value represents a single cluster.
Each cluster contains one or more glyphs and one or more
In practice, this model offers several benefits. Assuming that
the initial cluster values were monotonically increasing
and distinct before shaping began, then, in the final output:
<itemizedlist spacing="compact">
All adjacent glyphs having the same final cluster
value belong to the same cluster.
Each character belongs to the cluster that has the highest
cluster value <emphasis>not larger than</emphasis> its
initial cluster value.
<section id="a-clustering-example-for-levels-0-and-1">
<title>A clustering example for levels 0 and 1</title>
The basic shaping operations affect clusters in a predictable
manner when using level 0 or level 1:
When two or more clusters <emphasis>merge</emphasis>, the
resulting merged cluster takes as its cluster value the
<emphasis>minimum</emphasis> of the incoming cluster values.
When a cluster <emphasis>decomposes</emphasis>, all of the
resulting child clusters inherit as their cluster value the
cluster value of the parent cluster.
When a character is <emphasis>reordered</emphasis>, the
reordered character and all clusters that the character
moves past as part of the reordering are merged into one cluster.
The functionality, guarantees, and benefits of level 0 and level
1 behavior can be seen with some examples. First, let us examine
what happens with cluster values when shaping involves cluster
merging with ligatures and decomposition.
Let's say we start with the following character sequence (top row) and
initial cluster values (bottom row):
During shaping, HarfBuzz maps these characters to glyphs from
the font. For simplicity, let us assume that each character maps
to the corresponding, identical-looking glyph:
Now if, for example, <literal>B</literal> and <literal>C</literal>
form a ligature, then the clusters to which they belong
"merge". This merged cluster takes for its cluster
value the minimum of all the cluster values of the clusters that
went in to the ligature. In this case, we get:
0,1 ,3,4
because 1 is the minimum of the set {1,2}, which were the
cluster values of <literal>B</literal> and
Next, let us say that the <literal>BC</literal> ligature glyph
decomposes into three components, and <literal>D</literal> also
decomposes into two components. Whenever a cluster decomposes,
its components each inherit the cluster value of their parent:
0,1 ,1 ,1 ,3 ,3 ,4
Next, if <literal>BC2</literal> and <literal>D0</literal> form a
ligature, then their clusters (cluster values 1 and 3) merge into
<literal>min(1,3) = 1</literal>:
0,1 ,1 ,1 ,1 ,4
Note that the entirety of cluster 3 merges into cluster 1, not
just the <literal>D0</literal> glyph. This reflects the fact
that the cluster <emphasis>must</emphasis> be treated as an
indivisible unit.
At this point, cluster 1 means: the character sequence
<literal>BCD</literal> is represented by glyphs
<literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
<section id="reordering-in-levels-0-and-1">
<title>Reordering in levels 0 and 1</title>
Another common operation in the more complex shapers is glyph
reordering. In order to maintain a monotonic cluster sequence
when glyph reordering takes place, HarfBuzz merges the clusters
of everything in the reordering sequence.
For example, let us again start with the character sequence (top
row) and initial cluster values (bottom row):
If <literal>D</literal> is reordered to the position immediately
before <literal>B</literal>, then HarfBuzz merges the
<literal>B</literal>, <literal>C</literal>, and
<literal>D</literal> clusters — all the clusters between
the final position of the reordered glyph and its original
position. This means that we get:
as the final cluster sequence.
Merging this many clusters is not ideal, but it is the only
sensible way for HarfBuzz to maintain the guarantee that the
sequence of cluster values remains monotonic and to retain the
true relationship between glyphs and characters.
<section id="the-distinction-between-levels-0-and-1">
<title>The distinction between levels 0 and 1</title>
The preceding examples demonstrate the main effects of using
cluster levels 0 and 1. The only difference between the two
levels is this: in level 0, at the very beginning of the shaping
process, HarfBuzz merges the cluster of each base character
with the clusters of all Unicode marks (combining or not) and
modifiers that follow it.
For example, let us start with the following character sequence
(top row) and accompanying initial cluster values (bottom row):
0,1 ,2
The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
using cluster level 0 on this sequence, then the
<literal>A</literal> and <literal>acute</literal> clusters will
merge, and the result will become:
0,0 ,2
This merger is performed before any other script-shaping
This initial cluster merging is the default behavior of the
Windows shaping engine, and the old HarfBuzz codebase copied
that behavior to maintain compatibility. Consequently, it has
remained the default behavior in the new HarfBuzz codebase.
But this initial cluster-merging behavior makes it impossible
for client programs to implement some features (such as to
color diacritic marks differently from their base
characters). That is why, in level 1, HarfBuzz does not perform
the initial merging step.
For client programs that rely on HarfBuzz cluster values to
perform cursor positioning, level 0 is more convenient. But
relying on cluster boundaries for cursor positioning is wrong: cursor
positions should be determined based on Unicode grapheme
boundaries, not on shaping-cluster boundaries. As such, using
level 1 clustering behavior is recommended.
One final facet of levels 0 and 1 is worth noting. HarfBuzz
currently does not allow any
<emphasis>multiple-substitution</emphasis> GSUB lookups to
replace a glyph with zero glyphs (in other words, to delete a
But, in some other situations, glyphs can be deleted. In
those cases, if the glyph being deleted is the last glyph of its
cluster, HarfBuzz makes sure to merge the deleted glyph's
cluster with a neighboring cluster.
This is done primarily to make sure that the starting cluster of the
text always has the cluster index pointing to the start of the text
for the run; more than one client program currently relies on this
Incidentally, Apple's CoreText does something different to
maintain the same promise: it inserts a glyph with id 65535 at
the beginning of the glyph string if the glyph corresponding to
the first character in the run was deleted. HarfBuzz might do
something similar in the future.
<section id="level-2">
<title>Level 2</title>
HarfBuzz's level 2 cluster behavior uses a significantly
different model than that of level 0 and level 1.
The level 2 behavior is easy to describe, but it may be
difficult to understand in practical terms. In brief, level 2
performs no merging of clusters whatsoever.
This means that there is no initial base-and-mark merging step
(as is done in level 0), and it means that reordering moves and
ligature substitutions do not trigger a cluster merge.
Only one shaping operation directly affects clusters when using
level 2:
When a cluster <emphasis>decomposes</emphasis>, all of the
resulting child clusters inherit as their cluster value the
cluster value of the parent cluster.
When glyphs do form a ligature (or when some other feature
substitutes multiple glyphs with one glyph) the cluster value
of the first glyph is retained as the cluster value for the
resulting ligature.
This occurrence sounds similar to a cluster merge, but it is
different. In particular, no subsequent characters —
including marks and modifiers — are affected. They retain
their previous cluster values.
Level 2 cluster behavior is ultimately less complex than level 0
or level 1, but there are several cases for which processing
cluster values produced at level 2 may be tricky.
<section id="ligatures-with-combining-marks-in-level-2">
<title>Ligatures with combining marks in level 2</title>
The first example of how HarfBuzz's level 2 cluster behavior
can be tricky is when the text to be shaped includes combining
marks attached to ligatures.
Let us start with an input sequence with the following
characters (top row) and initial cluster values (bottom row):
0,1 ,2,3 ,4,5
If the sequence <literal>A,B,C</literal> forms a ligature,
then these are the cluster values HarfBuzz will return under
the various cluster levels:
Level 0:
0 ,0 ,0 ,0
Level 1:
0 ,0 ,0 ,5
Level 2:
0 ,1 ,3 ,5
Making sense of the level 2 result is the hardest for a client
program, because there is nothing in the cluster values that
indicates that <literal>B</literal> and <literal>C</literal>
formed a ligature with <literal>A</literal>.
In contrast, the "merged" cluster values of the mark glyphs
that are seen in the level 0 and level 1 output are evidence
that a ligature substitution took place.
<section id="reordering-in-level-2">
<title>Reordering in level 2</title>
Another example of how HarfBuzz's level 2 cluster behavior
can be tricky is when glyphs reorder. Consider an input sequence
with the following characters (top row) and initial cluster
values (bottom row):
Now imagine <literal>D</literal> moves before
<literal>B</literal> in a reordering operation. The cluster
values will then be:
Next, if <literal>D</literal> forms a ligature with
<literal>B</literal>, the output is:
0,3 ,2,4
However, in a different scenario, in which the shaping rules
of the script instead caused <literal>A</literal> and
<literal>B</literal> to form a ligature
<emphasis>before</emphasis> the <literal>D</literal> reordered, the
result would be:
0 ,3,2,4
There is no way for a client program to differentiate between
these two scenarios based on the cluster values
alone. Consequently, client programs that use level 2 might
need to undertake additional work in order to manage cursor
positioning, text attributes, or other desired features.
<section id="other-considerations-in-level-2">
<title>Other considerations in level 2</title>
There may be other problems encountered with ligatures under
level 2, such as if the direction of the text is forced to
the opposite of its natural direction (for example, Arabic text
that is forced into left-to-right directionality). But,
generally speaking, these other scenarios are minor corner
cases that are too obscure for most client programs to need to
worry about.
<chapter id="fonts-and-faces">
<title>Fonts, faces, and output</title>
In the previous chapter, we saw how to set up a buffer and fill
it with text as Unicode code points. In order to shape this
buffer text with HarfBuzz, you will need also need a font
HarfBuzz provides abstractions to help you cache and reuse the
heavier parts of working with binary fonts, so we will look at
how to do that. We will also look at how to work with the
FreeType font-rendering library and at how you can customize
HarfBuzz to work with other libraries.
Finally, we will look at how to work with OpenType variable
fonts, the latest update to the OpenType font format, and at
some other recent additions to OpenType.
<section id="fonts-and-faces-objects">
<title>Font and face objects</title>
The outcome of shaping a run of text depends on the contents of
a specific font file (such as the substitutions and positioning
moves in the 'GSUB' and 'GPOS' tables), so HarfBuzz makes
accessing those internals fast.
An <type>hb_face_t</type> represents a <emphasis>face</emphasis>
in HarfBuzz. This data type is a wrapper around an
<type>hb_blob_t</type> blob that holds the contents of a binary
font file. Since HarfBuzz supports TrueType Collections and
OpenType Collections (each of which can include multiple
typefaces), a HarfBuzz face also requires an index number
specifying which typeface in the file you want to use. Most of
the font files you will encounter in the wild include just a
single face, however, so most of the time you would pass in
<literal>0</literal> as the index when you create a face:
On its own, a face object is not quite ready to use for
shaping. The typeface must be set to a specific point size in
order for some details (such as hinting) to work. In addition,
if the font file in question is an OpenType Variable Font, then
you may need to specify one or variation-axis settings (or a
named instance) in order to get the output you need.
In HarfBuzz, you do this by creating a <emphasis>font</emphasis>
object from your face.
Font objects also have the advantage of being considerably
lighter-weight than face objects (remember that a face contains
the contents of a binary font file mapped into memory). As a
result, you can cache and reuse a font object, but you could
also create a new one for each additional size you needed.
Creating new fonts incurs some additional overhead, of course,
but whether or not it is excessive is your call in the end. In
contrast, face objects are substantially larger, and you really
should cache them and reuse them whenever possible.
You can create a font object from a face object:
After creating a font, there are a few properties you should
set. Many fonts enable and disable hints based on the size it
is used at, so setting this is important for font
objects. <function>hb_font_set_ppem(font, x_ppem,
y_ppem)</function> sets the pixels-per-EM value of the font. You
can also set the point size of the font with
<function>hb_font_set_ptem(font, ptem)</function>. HarfBuzz uses the
industry standard 72 points per inch.
HarfBuzz lets you specify the degree subpixel precision you want
through a scaling factor. You can set horizontal and
vertical scaling factors on the
font by calling <function>hb_font_set_scale(font, x_scale,
There may be times when you are handed a font object and need to
access the face object that it comes from. For that, you can call
You can also create a font object from an existing font object
using the <function>hb_font_create_sub_font()</function>
function. This creates a child font object that is initiated
with the same attributes as its parent; it can be used to
quickly set up a new font for the purpose of overriding a specific
font-functions method.
All face objects and font objects are lifecycle-managed by
HarfBuzz. After creating a face, you increase its reference
count with <function>hb_face_reference(face)</function> and
decrease it with
<function>hb_face_destroy(face)</function>. Likewise, you
increase the reference count on a font with
<function>hb_font_reference(font)</function> and decrease it
with <function>hb_font_destroy(font)</function>.
You can also attach user data to face objects and font objects.
<section id="fonts-and-faces-custom-functions">
<title>Customizing font functions</title>
During shaping, HarfBuzz frequently needs to query font objects
to get at the contents and parameters of the glyphs in a font
file. It includes a built-in set of functions that is tailored
to working with OpenType fonts. However, as was the case with
Unicode functions in the buffers chapter, HarfBuzz also wants to
make it easy for you to assign a substitute set of font
functions if you are developing a program to work with a library
or platform that provides its own font functions.
Therefore, the HarfBuzz API defines a set of virtual
methods for accessing font-object properties, and you can
replace the defaults with your own selections without
interfering with the shaping process. Each font object in
HarfBuzz includes a structure called
<literal>font_funcs</literal> that serves as a vtable for the
font object. The virtual methods in
<literal>font_funcs</literal> are:
<function>hb_font_get_font_h_extents_func_t</function>: returns
the extents of the font for horizontal text.
<function>hb_font_get_font_v_extents_func_t</function>: returns
the extents of the font for vertical text.
<function>hb_font_get_nominal_glyph_func_t</function>: returns
the font's nominal glyph for a given code point.
<function>hb_font_get_variation_glyph_func_t</function>: returns
the font's glyph for a given code point when it is followed by a
given Variation Selector.
<function>hb_font_get_nominal_glyphs_func_t</function>: returns
the font's nominal glyphs for a series of code points.
<function>hb_font_get_glyph_advance_func_t</function>: returns
the advance for a glyph.
<function>hb_font_get_glyph_h_advance_func_t</function>: returns
the advance for a glyph for horizontal text.
the advance for a glyph for vertical text.
<function>hb_font_get_glyph_advances_func_t</function>: returns
the advances for a series of glyphs.
<function>hb_font_get_glyph_h_advances_func_t</function>: returns
the advances for a series of glyphs for horizontal text .
<function>hb_font_get_glyph_v_advances_func_t</function>: returns
the advances for a series of glyphs for vertical text.
<function>hb_font_get_glyph_origin_func_t</function>: returns
the origin coordinates of a glyph.
<function>hb_font_get_glyph_h_origin_func_t</function>: returns
the origin coordinates of a glyph for horizontal text.
<function>hb_font_get_glyph_v_origin_func_t</function>: returns
the origin coordinates of a glyph for vertical text.
<function>hb_font_get_glyph_extents_func_t</function>: returns
the extents for a glyph.
returns the coordinates of a specific contour point from a glyph.
<function>hb_font_get_glyph_name_func_t</function>: returns the
name of a glyph (from its glyph index).
<function>hb_font_get_glyph_from_name_func_t</function>: returns
the glyph index that corresponds to a given glyph name.
You can create new font-functions by calling
The individual methods can each be set with their own setter
function, such as
func, user_data, destroy)</function>.
Font-functions structures can be reused for multiple font
objects, and can be reference counted with
<function>hb_font_funcs_reference()</function> and
<function>hb_font_funcs_destroy()</function>. Just like other
objects in HarfBuzz, you can set user-data for each
font-functions structure and assign a destroy callback for
You can also mark a font-functions structure as immutable,
with <function>hb_font_funcs_make_immutable()</function>. This
is especially useful if your code is a library or framework that
will have its own client programs. By marking your
font-functions structures as immutable, you prevent your client
programs from changing the configuration and introducing
inconsistencies and errors downstream.
To override only some functions while using the default implementation
for the others, you will need to create a sub-font. By default, the
sub-font uses the font functions of its parent except for the functions
that were explicitly set. The following code will override only the
<function>hb_font_get_nominal_glyph_func_t</function> for the sub-font:
<section id="fonts-and-faces-native-opentype">
<title>Font objects and HarfBuzz's native OpenType implementation</title>
By default, whenever HarfBuzz creates a font object, it will
configure the font to use a built-in set of font functions that
supports contemporary OpenType font internals. If you want to
work with OpenType or TrueType fonts, you should be able to use
these functions without difficulty.
Many of the methods in the font-functions structure deal with
the fundamental properties of glyphs that are required for
shaping text: extents (the maximums and minimums on each axis),
origins (the <literal>(0,0)</literal> coordinate point which
glyphs are drawn in reference to), and advances (the amount that
the cursor needs to be moved after drawing each glyph, including
any empty space for the glyph's side bearings).
As you can see in the list of functions, there are separate "horizontal"
and "vertical" variants depending on whether the text is set in
the horizontal or vertical direction. For some scripts, fonts
that are designed to support text set horizontally or vertically (for
example, in Japanese) may include metrics for both text
directions. When fonts don't include this information, HarfBuzz
does its best to transform what the font provides.
In addition to the direction-specific functions, HarfBuzz
provides some higher-level functions for fetching information
like extents and advances for a glyph. If you call
then you can provide any <type>hb_direction_t</type> as the
<parameter>direction</parameter> parameter, and HarfBuzz will
use the correct function variant for the text direction. There
are similar higher-level versions of the functions for fetching
extents, origin coordinates, and contour-point
coordinates. There are also addition and subtraction functions
for moving points with respect to the origin.
There are also methods for fetching the glyph ID that
corresponds to a Unicode code point (possibly when followed by a
variation-selector code point), fetching the glyph name from the
font, and fetching the glyph ID that corresponds to a glyph name
you already have.
HarfBuzz also provides functions for converting between glyph
names and string
variables. <function>hb_font_glyph_to_string(font, glyph, s,
size)</function> retrieves the name for the glyph ID
<parameter>glyph</parameter> from the font object. It generates a
generic name of the form <literal>gidDDD</literal> (where DDD is
the glyph index) if there is no name for the glyph in the
font. The <function>hb_font_glyph_from_string(font, s, len,
glyph)</function> takes an input string <parameter>s</parameter>
and looks for a glyph with that name in the font, returning its
glyph ID in the <parameter>glyph</parameter>
output parameter. It automatically parses
<literal>gidDDD</literal> and <literal>uniUUUU</literal> strings.
<section id="fonts-and-faces-variable">
<title>Working with OpenType Variable Fonts</title>
If you are working with OpenType Variable Fonts, there are a few
additional functions you should use to specify the
variation-axis settings of your font object. Without doing so,
your variable font's font object can still be used, but only at
the default setting for every axis (which, of course, is
sometimes what you want, but does not cover general usage).
HarfBuzz manages variation settings in the
<type>hb_variation_t</type> data type, which holds a <property>tag</property> for the
variation-axis identifier tag and a <property>value</property> for its
setting. You can retrieve the list of variation axes in a font
binary from the face object (not from a font object, notably) by
calling <function>hb_ot_var_get_axis_count(face)</function> to
find the number of axes, then using
<function>hb_ot_var_get_axis_infos()</function> to collect the
axis structures:
For each axis returned in the array, you can can access the
identifier in its <property>tag</property>. HarfBuzz also has
tag definitions predefined for the five standard axes specified
in OpenType (<literal>ital</literal> for italic,
<literal>opsz</literal> for optical size,
<literal>slnt</literal> for slant, <literal>wdth</literal> for
width, and <literal>wght</literal> for weight). Each axis also
has a <property>min_value</property>, a
<property>default_value</property>, and a <property>max_value</property>.
To set your font object's variation settings, you call the
<function>hb_font_set_variations()</function> function with an
array of <type>hb_variation_t</type> variation settings. Let's
say our font has weight and width axes. We need to specify each
of the axes by tag and assign a value on the axis:
That should give us a slightly condensed font ("normal" on the
<literal>wdth</literal> axis is 100) at a noticeably bolder
weight ("regular" is 400 on the <literal>wght</literal> axis).
In practice, though, you should always check that the value you
want to set on the axis is within the
range actually implemented in the font's variation axis. After
all, a font might only provide lighter-than-regular weights, and
setting a heavier value on the <literal>wght</literal> axis will
not change that.
Once your variation settings are specified on your font object,
however, shaping with a variable font is just like shaping a
static font.
<chapter id="getting-started">
<title>Getting started with HarfBuzz</title>
<section id="an-overview-of-the-harfbuzz-shaping-api">
<title>An overview of the HarfBuzz shaping API</title>
The core of the HarfBuzz shaping API is the function
<function>hb_shape()</function>. This function takes a font, a
buffer containing a string of Unicode codepoints and
(optionally) a list of font features as its input. It replaces
the codepoints in the buffer with the corresponding glyphs from
the font, correctly ordered and positioned, and with any of the
optional font features applied.
In addition to holding the pre-shaping input (the Unicode
codepoints that comprise the input string) and the post-shaping
output (the glyphs and positions), a HarfBuzz buffer has several
properties that affect shaping. The most important are the
text-flow direction (e.g., left-to-right, right-to-left,
top-to-bottom, or bottom-to-top), the script tag, and the
language tag.
For input string buffers, flags are available to denote when the
buffer represents the beginning or end of a paragraph, to
indicate whether or not to visibly render Unicode <literal>Default
Ignorable</literal> codepoints, and to modify the cluster-merging
behavior for the buffer. For shaped output buffers, the
individual X and Y offsets and <literal>advances</literal>
(the logical dimensions) of each glyph are
accessible. HarfBuzz also flags glyphs as
<literal>UNSAFE_TO_BREAK</literal> if breaking the string at
that glyph (e.g., in a line-breaking or hyphenation process)
would require re-shaping the text.
HarfBuzz also provides methods to compare the contents of
buffers, join buffers, normalize buffer contents, and handle
invalid codepoints, as well as to determine the state of a
buffer (e.g., input codepoints or output glyphs). Buffer
lifecycles are managed and all buffers are reference-counted.
Although the default <function>hb_shape()</function> function is
sufficient for most use cases, a variant is also provide that
lets you specify which of HarfBuzz's shapers to use on a buffer.
HarfBuzz can read TrueType fonts, TrueType collections, OpenType
fonts, and OpenType collections. Functions are provided to query
font objects about metrics, Unicode coverage, available tables and
features, and variation selectors. Individual glyphs can also be
queried for metrics, variations, and glyph names. OpenType
variable fonts are supported, and HarfBuzz allows you to set
variation-axis coordinates on font objects.
HarfBuzz provides glue code to integrate with various other
libraries, including FreeType, GObject, and CoreText. Support
for integrating with Uniscribe and DirectWrite is experimental
at present.
<section id="terminology">
<?dbfo list-presentation="blocks"?>
In text shaping, a <emphasis>script</emphasis> is a
writing system: a set of symbols, rules, and conventions
that is used to represent a language or multiple
In general computing lingo, the word "script" can also
be used to mean an executable program (usually one
written in a human-readable programming language). For
the sake of clarity, HarfBuzz documents will always use
more specific terminology when referring to this
meaning, such as "Python script" or "shell script." In
all other instances, "script" refers to a writing system.
For developers using HarfBuzz, it is important to note
the distinction between a script and a language. Most
scripts are used to write a variety of different
languages, and many languages may be written in more
than one script.
In HarfBuzz, a <emphasis>shaper</emphasis> is a
handler for a specific script-shaping model. HarfBuzz
implements separate shapers for Indic, Arabic, Thai and
Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the
Universal Shaping Engine (USE), and a default shaper for
non-complex scripts.
In text shaping, a <emphasis>cluster</emphasis> is a
sequence of codepoints that must be treated as an
indivisible unit. Clusters can include code-point
sequences that form a ligature or base-and-mark
sequences. Tracking and preserving clusters is important
when shaping operations might separate or reorder
code points.
HarfBuzz provides three cluster
<emphasis>levels</emphasis> that implement different
approaches to the problem of preserving clusters during
shaping operations.
In linguistics, a <emphasis>grapheme</emphasis> is one
of the indivisible units that make up a writing system or
script. Often, graphemes are individual symbols (letters,
numbers, punctuation marks, logograms, etc.) but,
depending on the writing system, a particular grapheme
might correspond to a sequence of several Unicode code
In practice, HarfBuzz and other text-shaping engines
are not generally concerned with graphemes. However, it
is important for developers using HarfBuzz to recognize
that there is a difference between graphemes and shaping
clusters (see above). The two concepts may overlap
frequently, but there is no guarantee that they will be
In linguistics, a <emphasis>syllable</emphasis> is an
a sequence of sounds that makes up a building block of a
particular language. Every language has its own set of
rules describing what constitutes a valid syllable.
For text-shaping purposes, the various definitions of
"syllable" are important because script-specific shaping
operations may be applied at the syllable level. For
example, a reordering rule might specify that a vowel
mark be reordered to the beginning of the syllable.
Syllables will consist of one or more Unicode code
points. The definition of a syllable for a particular
writing system might correspond to how HarfBuzz
identifies clusters (see above) for the same writing
system. However, it is important for developers using
HarfBuzz to recognize that there is a difference between
syllables and shaping clusters. The two concepts may
overlap frequently, but there is no guarantee that they
will be identical.
<section id="a-simple-shaping-example">
<title>A simple shaping example</title>
Below is the simplest HarfBuzz shaping example possible.
<orderedlist numeration="arabic">
Create a buffer and put your text in it.
<orderedlist numeration="arabic">
<listitem override="2">
Set the script, language and direction of the buffer.
<orderedlist numeration="arabic">
<listitem override="3">
Create a face and a font from a font file.
<orderedlist numeration="arabic">
<listitem override="4">
hb_shape(font, buf, NULL, 0);
<orderedlist numeration="arabic">
<listitem override="5">
Get the glyph and position information.
<orderedlist numeration="arabic">
<listitem override="6">
Iterate over each glyph.
<orderedlist numeration="arabic">
<listitem override="7">
Tidy up.
<sect1 id="glyph-information">
<title>Glyph information</title>
<sect2 id="names-and-numbers">
<title>Names and numbers</title>
<chapter id="install-harfbuzz">
<title>Installing HarfBuzz</title>
<section id="download">
<title id="download.title">Downloading HarfBuzz</title>
The HarfBuzz source code is hosted at <ulink
Tarball releases and Win32 binary bundles (which include the
libharfbuzz DLL, hb-view.exe, hb-shape.exe, and all
dependencies) of HarfBuzz can be downloaded from <ulink
Release notes are posted with each new release to provide an
overview of the changes. The project <ulink url="https://github.com/harfbuzz/harfbuzz/issues">tracks bug
reports and other issues</ulink> on GitHub. Discussion and
questions are welcome on the <ulink
mailing list</ulink>.
The API included in the <filename
class='headerfile'>hb.h</filename> file will not change in a
compatibility-breaking way in any release. However, other,
peripheral headers are more likely to go through minor
modifications. We will do our best to never change APIs in an
incompatible way. We will <emphasis>never</emphasis> break the ABI.
<section id="building">
<title>Building HarfBuzz</title>
<section id="building.linux">
<title>Building on Linux</title>
<emphasis>(1)</emphasis> To build HarfBuzz on Linux, you must first install the
development packages for FreeType, Cairo, and GLib. The exact
commands required for this step will vary depending on
the Linux distribution you use.
For example, on an Ubuntu or Debian system, you would run:
<emphasis>(2)</emphasis> The next step depends on whether you
are building from the source in a downloaded release tarball or
from the source directly from the git repository.
<emphasis>(2)(a)</emphasis> If you downloaded the HarfBuzz
source code in a tarball, you can now extract the source.
From a shell in the top-level directory of the extracted source
code, you can run <command>meson build</command> followed by
<command>meson compile -C build</command> as with any other standard package.
This should leave you with a shared
library in the <filename>src/</filename> directory, and a few
utility programs including <command>hb-view</command> and
<command>hb-shape</command> under the <filename>util/</filename>
<emphasis>(2)(b)</emphasis> If you are building from the source in the HarfBuzz git
repository, rather than installing from a downloaded tarball
release, then you must install two more auxiliary tools before you
can build for the first time: <package>pkg-config</package>.
On Ubuntu or Debian, run:
With <package>pkg-config</package> installed, you can now run
<command>meson build</command> then
<command>meson compile -C build</command> to build HarfBuzz.
<section id="building.windows">
<title>Building on Windows</title>
<ulink url="https://mesonbuild.com/Getting-meson.html">Install meson</ulink>
and run (from the console) <command>meson build</command> (by default
bundled dependencies are not built, <command>--wrap-mode=default</command>
overrides this), then <command>meson compile -C build</command> to
build HarfBuzz.
<section id="building.macos">
<title>Building on macOS</title>
There are two ways to build HarfBuzz on Mac systems: MacPorts
and Homebrew. The process is similar to the process used on a
Linux system.
<emphasis>(1)</emphasis> You must first install the
development packages for FreeType, Cairo, and GLib. If you are
using MacPorts, you should run:
<emphasis>(2)</emphasis> The next step depends on whether you are building from the
source in a downloaded release tarball or from the source directly
from the git repository.
<emphasis>(2)(a)</emphasis> If you are installing HarfBuzz
from a downloaded tarball release, extract the tarball and
open a Terminal in the extracted source-code directory. Run:
<emphasis>(2)(b)</emphasis> Alternatively, if you are building
HarfBuzz from the source in the HarfBuzz git repository, then
you must install several built-time dependencies before
<para>If you are
<programlisting><command>meson build</command></programlisting>
<emphasis>(3)</emphasis> You can now build HarfBuzz (on either
a MacPorts or a Homebrew system) by running:
<section id="configuration">
<title>Configuration options</title>
The instructions in the "Building HarfBuzz" section will build
the source code under its default configuration. If needed,
the following additional configuration options are available.
<?dbfo list-presentation="blocks"?>
Use <ulink url="https://developer.gnome.org/glib/">GLib</ulink>. <emphasis>(Default = auto)</emphasis>
This option enables or disables usage of the GLib
library. The default setting is to check for the
presence of GLib and, if it is found, build with
GLib support. GLib is native to GNU/Linux systems but is
available on other operating system as well.
Use <ulink url="https://developer.gnome.org/gobject/stable/">GObject</ulink>. <emphasis>(Default = no)</emphasis>
This option enables or disables usage of the GObject
library. The default setting is to check for the
presence of GObject and, if it is found, build with
GObject support. GObject is native to GNU/Linux systems but is
available on other operating system as well.
Use <ulink url="https://cairographics.org/">Cairo</ulink>. <emphasis>(Default = auto)</emphasis>
This option enables or disables usage of the Cairo
graphics-rendering library. The default setting is to
check for the presence of Cairo and, if it is found,
build with Cairo support.
Note: Cairo is used only by the HarfBuzz
command-line utilities, and not by the HarfBuzz library.
Use the <ulink url="http://site.icu-project.org/home">ICU</ulink> library. <emphasis>(Default = auto)</emphasis>
This option enables or disables usage of the
<emphasis>International Components for
Unicode</emphasis> (ICU) library, which provides access
to Unicode Character Database (UCD) properties as well
as normalization and conversion functions. The default
setting is to check for the presence of ICU and, if it
is found, build with ICU support.
Use the <ulink url="http://graphite.sil.org/">Graphite2</ulink> library. <emphasis>(Default = no)</emphasis>
This option enables or disables usage of the Graphite2
library, which provides support for the Graphite shaping
Use the <ulink url="https://www.freetype.org/">FreeType</ulink> library. <emphasis>(Default = auto)</emphasis>
This option enables or disables usage of the FreeType
font-rendering library. The default setting is to check for the
presence of FreeType and, if it is found, build with
FreeType support.
Use the <ulink
library (experimental). <emphasis>(Default = no)</emphasis>
This option enables or disables usage of the Uniscribe
font-rendering library. Uniscribe is available on
Windows systems. Uniscribe support is used only for
testing purposes and does not need to be enabled for
HarfBuzz to run on Windows systems.
Use the <ulink url="https://docs.microsoft.com/en-us/windows/desktop/directwrite/direct-write-portal">DirectWrite</ulink> library (experimental). <emphasis>(Default = no)</emphasis>
This option enables or disables usage of the DirectWrite
font-rendering library. DirectWrite is available on
Windows systems. DirectWrite support is used only for
testing purposes and does not need to be enabled for
HarfBuzz to run on Windows systems.
Use the <ulink url="https://developer.apple.com/documentation/coretext">CoreText</ulink> library. <emphasis>(Default = no)</emphasis>
This option enables or disables usage of the CoreText
library. CoreText is available on macOS and iOS systems.
Use <ulink url="https://www.gtk.org/gtk-doc/">GTK-Doc</ulink>. <emphasis>(Default = no)</emphasis>
This option enables the building of the documentation.
<chapter id="integration">
<title>Platform Integration Guide</title>
HarfBuzz was first developed for use with the GNOME and GTK
software stack commonly found in desktop Linux
distributions. Nevertheless, it can be used on other operating
systems and platforms, from iOS and macOS to Windows. It can also
be used with other application frameworks and components, such as
Android, Qt, or application-specific widget libraries.
This chapter will look at how HarfBuzz fits into a typical
text-rendering pipeline, and will discuss the APIs available to
integrate HarfBuzz with contemporary Linux, Mac, and Windows
software. It will also show how HarfBuzz integrates with popular
external libraries like FreeType and International Components for
Unicode (ICU) and describe the HarfBuzz language bindings for
On a GNOME system, HarfBuzz is designed to tie in with several
other common system libraries. The most common architecture uses
Pango at the layer directly "above" HarfBuzz; Pango is responsible
for text segmentation and for ensuring that each input
<type>hb_buffer_t</type> passed to HarfBuzz for shaping contains
Unicode code points that share the same segment properties
(namely, direction, language, and script, but also higher-level
properties like the active font, font style, and so on).
The layer directly "below" HarfBuzz is typically FreeType, which
is used to rasterize glyph outlines at the necessary optical size,
hinting settings, and pixel resolution. FreeType provides APIs for
accessing font and face information, so HarfBuzz includes
functions to create <type>hb_face_t</type> and
<type>hb_font_t</type> objects directly from FreeType
objects. HarfBuzz can use FreeType's built-in functions for
<structfield>font_funcs</structfield> vtable in an <type>hb_font_t</type>.
FreeType's output is bitmaps of the rasterized glyphs; on a
typical Linux system these will then be drawn by a graphics
library like Cairo, but those details are beyond HarfBuzz's
control. On the other hand, at the top end of the stack, Pango is
part of the larger GNOME framework, and HarfBuzz does include APIs
for working with key components of GNOME's higher-level libraries
— most notably GLib.
For other operating systems or application frameworks, the
critical integration points are where HarfBuzz gets font and face
information about the font used for shaping and where HarfBuzz
gets Unicode data about the input-buffer code points.
The font and face information is necessary for text shaping
because HarfBuzz needs to retrieve the glyph indices for
particular code points, and to know the extents and advances of
glyphs. Note that, in an OpenType variable font, both of those
types of information can change with different variation-axis
The Unicode information is necessary for shaping because the
properties of a code point (such as its General Category (gc),
Canonical Combining Class (ccc), and decomposition) can directly
impact the shaping moves that HarfBuzz performs.
<section id="integration-glib">
<title>GNOME integration, GLib, and GObject</title>
As mentioned in the preceding section, HarfBuzz offers
integration APIs to help client programs using the
GNOME and GTK framework commonly found in desktop Linux
GLib is the main utility library for GNOME applications. It
provides basic data types and conversions, file abstractions,
string manipulation, and macros, as well as facilities like
memory allocation and the main event loop.
Where text shaping is concerned, GLib provides several utilities
that HarfBuzz can take advantage of, including a set of
Unicode-data functions and a data type for script
information. Both are useful when working with HarfBuzz
buffers. To make use of them, you will need to include the
<filename>hb-glib.h</filename> header file.
GLib's <ulink
manipulation API</ulink> includes all the functionality
necessary to retrieve Unicode data for the
<structfield>unicode_funcs</structfield> structure of a HarfBuzz
The function <function>hb_glib_get_unicode_funcs()</function>
sets up a <type>hb_unicode_funcs_t</type> structure configured
with the GLib Unicode functions and returns a pointer to it.
You can attach this Unicode-functions structure to your buffer,
and it will be ready for use with GLib:
For script information, GLib uses the
<type>GUnicodeScript</type> type. Like HarfBuzz's own
<type>hb_script_t</type>, this data type is an enumeration
of Unicode scripts, but text segments passed in from GLib code
will be tagged with a <type>GUnicodeScript</type>. Therefore,
when setting the script property on a <type>hb_buffer_t</type>,
you will need to convert between the <type>GUnicodeScript</type>
of the input provided by GLib and HarfBuzz's
<type>hb_script_t</type> type.
The <function>hb_glib_script_to_script()</function> function
takes an <type>GUnicodeScript</type> script identifier as its
sole argument and returns the corresponding <type>hb_script_t</type>.
The <function>hb_glib_script_from_script()</function> does the
reverse, taking an <type>hb_script_t</type> and returning the
<type>GUnicodeScript</type> identifier for GLib.
Finally, GLib also provides a reference-counted object type called <ulink
that is used for accessing raw memory segments with the benefits
of GLib's lifecycle management. HarfBuzz provides a
<function>hb_glib_blob_create()</function> function that lets
you create an <type>hb_blob_t</type> directly from a
<type>GBytes</type> object. This function takes only the
<type>GBytes</type> object as its input; HarfBuzz registers the
GLib <function>destroy</function> callback automatically.
The GNOME platform also features an object system called
GObject. For HarfBuzz, the main advantage of GObject is a
feature called <ulink
Introspection</ulink>. This is a middleware facility that can be
used to generate language bindings for C libraries. HarfBuzz uses it
to build its Python bindings, which we will look at in a separate section.
<section id="integration-freetype">
<title>FreeType integration</title>
FreeType is the free-software font-rendering engine included in
desktop Linux distributions, Android, ChromeOS, iOS, and multiple Unix
operating systems, and used by cross-platform programs like
Chrome, Java, and GhostScript. Used together, HarfBuzz can
perform shaping on Unicode text segments, outputting the glyph
IDs that FreeType should rasterize from the active font as well
as the positions at which those glyphs should be drawn.
HarfBuzz provides integration points with FreeType at the
face-object and font-object level and for the font-functions
virtual-method structure of a font object. To use the
FreeType-integration API, include the
<filename>hb-ft.h</filename> header.
In a typical client program, you will create your
<type>hb_face_t</type> face object and <type>hb_font_t</type>
font object from a FreeType <type>FT_Face</type>. HarfBuzz
provides a suite of functions for doing this.
In the most common case, you will want to use
<function>hb_ft_font_create_referenced()</function>, which
creates both an <type>hb_face_t</type> face object and
<type>hb_font_t</type> font object (linked to that face object),
and provides lifecycle management.
It is important to note,
though, that while HarfBuzz makes a distinction between its face and
font objects, FreeType's <type>FT_Face</type> does not. After
you create your <type>FT_Face</type>, you must set its size
parameter using <function>FT_Set_Char_Size()</function>, because
an <type>hb_font_t</type> is defined as an instance of an
<type>hb_face_t</type> with size specified.
<function>hb_ft_font_create_referenced()</function> is
the recommended function for creating an <type>hb_face_t</type> face
object. This function calls <function>FT_Reference_Face()</function>
before using the <type>FT_Face</type> and calls
<function>FT_Done_Face()</function> when it is finished using the
<type>FT_Face</type>. Consequently, your client program does not need
to worry about destroying the <type>FT_Face</type> while HarfBuzz
is still using it.
Although <function>hb_ft_font_create_referenced()</function> is
the recommended function, there is another variant for client code
where special circumstances make it necessary. The simpler
version of the function is <function>hb_ft_font_create()</function>,
which takes an <type>FT_Face</type> and an optional destroy callback
as its arguments. Because <function>hb_ft_font_create()</function>
does not offer lifecycle management, however, your client code will
be responsible for tracking references to the <type>FT_Face</type>
objects and destroying them when they are no longer needed. If you
do not have a valid reason for doing this, use
After you have created your font object from your
<type>FT_Face</type>, you can set or retrieve the
<structfield>load_flags</structfield> of the
<type>FT_Face</type> through the <type>hb_font_t</type>
object. HarfBuzz provides
<function>hb_ft_font_set_load_flags()</function> and
<function>hb_ft_font_get_load_flags()</function> for this
purpose. The ability to set the
<structfield>load_flags</structfield> through the font object
could be useful for enabling or disabling hinting, for example,
or to activate vertical layout.
HarfBuzz also provides a utility function called
<function>hb_ft_font_has_changed()</function> that you should
call whenever you have altered the properties of your underlying
<type>FT_Face</type>, as well as a
<function>hb_ft_get_face()</function> that you can call on an
<type>hb_font_t</type> font object to fetch its underlying <type>FT_Face</type>.
With an <type>hb_face_t</type> and <type>hb_font_t</type> both linked
to your <type>FT_Face</type>, you will typically also want to
use FreeType for the <structfield>font_funcs</structfield>
vtable of your <type>hb_font_t</type>. As a reminder, this
font-functions structure is the set of methods that HarfBuzz
will use to fetch important information from the font, such as
the advances and extents of individual glyphs.
All you need to do is call
As we noted above, an <type>hb_font_t</type> is derived from an
<type>hb_face_t</type> with size (and, perhaps, other
parameters, such as variation-axis coordinates)
specified. Consequently, you can reuse an <type>hb_face_t</type>
with several <type>hb_font_t</type> objects, and HarfBuzz
provides functions to simplify this.
The <function>hb_ft_face_create_referenced()</function>
function creates just an <type>hb_face_t</type> from a FreeType
<type>FT_Face</type> and, as with
<function>hb_ft_font_create_referenced()</function> above,
provides lifecycle management for the <type>FT_Face</type>.
Similarly, there is an <function>hb_ft_face_create()</function>
function variant that does not provide the lifecycle-management
feature. As with the font-object case, if you use this version
of the function, it will be your client code's respsonsibility
to track usage of the <type>FT_Face</type> objects.
A third variant of this function is
<function>hb_ft_face_create_cached()</function>, which is the
same as <function>hb_ft_face_create()</function> except that it
also uses the <structfield>generic</structfield> field of the
<type>FT_Face</type> structure to save a pointer to the newly
created <type>hb_face_t</type>. Subsequently, function calls
that pass the same <type>FT_Face</type> will get the same
<type>hb_face_t</type> returned — and the
<type>hb_face_t</type> will be correctly reference
counted. Still, as with
<function>hb_ft_face_create()</function>, your client code must
track references to the <type>FT_Face</type> itself, and destroy
it when it is unneeded.
<section id="integration-uniscribe">
<title>Uniscribe integration</title>
If your client program is running on Windows, HarfBuzz offers
an additional API that can help integrate with Microsoft's
Uniscribe engine and the Windows GDI.
Overall, the Uniscribe API covers a broader set of typographic
layout functions than HarfBuzz implements, but HarfBuzz's
shaping API can serve as a drop-in replacement for Uniscribe's shaping
functionality. In fact, one of HarfBuzz's design goals is to
accurately reproduce the same output for shaping a given text
segment that Uniscribe produces — even to the point of
duplicating known shaping bugs or deviations from the
specification — so you can be confident that your users'
documents with their existing fonts will not be affected adversely by
switching to HarfBuzz.
At a basic level, HarfBuzz's <function>hb_shape()</function>
function replaces both the <ulink url=""><function>ScriptShape()</function></ulink>
and <ulink
functions from Uniscribe.
However, whereas <function>ScriptShape()</function> returns the
glyphs and clusters for a shaped sequence and
<function>ScriptPlace()</function> returns the advances and
offsets for those glyphs, <function>hb_shape()</function>
handles both. After <function>hb_shape()</function> shapes a
buffer, the output glyph IDs and cluster IDs are returned as
an array of <structname>hb_glyph_info_t</structname> structures, and the
glyph advances and offsets are returned as an array of
<structname>hb_glyph_position_t</structname> structures.
Your client program only needs to ensure that it coverts
correctly between HarfBuzz's low-level data types (such as
<type>hb_position_t</type>) and Windows's corresponding types
(such as <type>GOFFSET</type> and <type>ABC</type>). Be sure you
read the <xref linkend="buffers-language-script-and-direction"
chapter for a full explanation of how HarfBuzz input buffers are
used, and see <xref linkend="shaping-buffer-output" /> for the
details of what <function>hb_shape()</function> returns in the
output buffer when shaping is complete.
Although <function>hb_shape()</function> itself is functionally
equivalent to Uniscribe's shaping routines, there are two
additional HarfBuzz functions you may want to use to integrate
the libraries in your code. Both are used to link HarfBuzz font
objects to the equivalent Windows structures.
The <function>hb_uniscribe_font_get_logfontw()</function>
function takes a <type>hb_font_t</type> font object and returns
a pointer to the <ulink
"logical font" that corresponds to it. A <type>LOGFONTW</type>
structure holds font-wide attributes, including metrics, size,
and style information.
In Uniscribe's model, the <type>SCRIPT_CACHE</type> holds the
device context, including the logical font that the shaping
functions apply.
The <function>hb_uniscribe_font_get_hfont()</function> function
also takes a <type>hb_font_t</type> font object, but it returns
an <type>HFONT</type> — a handle to the underlying logical
font — instead.
<type>LOGFONTW</type>s and <type>HFONT</type>s are both needed
by other Uniscribe functions.
As a final note, you may notice a reference to an optional
<literal>uniscribe</literal> shaper back-end in the <xref
linkend="configuration" /> section of the HarfBuzz manual. This
option is not a Uniscribe-integration facility.
Instead, it is a internal code path used in the
<command>hb-shape</command> command-line utility, which hands
shaping functionality over to Uniscribe entirely, when run on a
Windows system. That allows testing HarfBuzz's native output
against the Uniscribe engine, for tracking compatibility and
Because this back-end is only used when testing HarfBuzz
functionality, it is disabled by default when building the
HarfBuzz binaries.
<section id="integration-coretext">
<title>Core Text integration</title>
If your client program is running on macOS or iOS, HarfBuzz offers
an additional API that can help integrate with Apple's
Core Text engine and the underlying Core Graphics
framework. HarfBuzz does not attempt to offer the same
drop-in-replacement functionality for Core Text that it strives
for with Uniscribe on Windows, but you can still use HarfBuzz
to perform text shaping in native macOS and iOS applications.
Note, though, that if your interest is just in using fonts that
contain Apple Advanced Typography (AAT) features, then you do
not need to add Core Text integration. HarfBuzz natively
supports AAT features and will shape AAT fonts (on any platform)
automatically, without requiring additional work on your
part. This includes support for AAT-specific TrueType tables
such as <literal>mort</literal>, <literal>morx</literal>, and
<literal>kerx</literal>, which AAT fonts use instead of
<literal>GSUB</literal> and <literal>GPOS</literal>.
On a macOS or iOS system, the primary integration points offered
by HarfBuzz are for face objects and font objects.
The Apple APIs offer a pair of data structures that map well to
HarfBuzz's face and font objects. The Core Graphics API, which
is slightly lower-level than Core Text, provides
<ulink url="https://developer.apple.com/documentation/coregraphics/cgfontref"><type>CGFontRef</type></ulink>, which enables access to typeface
properties, but does not include size information. Core Text's
<ulink url="https://developer.apple.com/documentation/coretext/ctfont-q6r"><type>CTFontRef</type></ulink> is analagous to a HarfBuzz font object,
with all of the properties required to render text at a specific
size and configuration.
Consequently, a HarfBuzz <type>hb_font_t</type> font object can
be hooked up to a Core Text <type>CTFontRef</type>, and a HarfBuzz
<type>hb_face_t</type> face object can be hooked up to a
You can create a <type>hb_face_t</type> from a
<type>CGFontRef</type> by using the
<function>hb_coretext_face_create()</function>. Subsequently,
you can retrieve the <type>CGFontRef</type> from a
<type>hb_face_t</type> with <function>hb_coretext_face_get_cg_font()</function>.
Likewise, you create a <type>hb_font_t</type> from a
<type>CTFontRef</type> by calling
<function>hb_coretext_font_create()</function>, and you can
fetch the associated <type>CTFontRef</type> from a
<type>hb_font_t</type> font object with
HarfBuzz also offers a <function>hb_font_set_ptem()</function>
that you an use to set the nominal point size on any
<type>hb_font_t</type> font object. Core Text uses this value to
implement optical scaling.
When integrating your client code with Core Text, it is
important to recognize that Core Text <literal>points</literal>
are not typographic points (standardized at 72 per inch) as the
term is used elsewhere in OpenType. Instead, Core Text points
are CSS points, which are standardized at 96 per inch.
HarfBuzz's font functions take this distinction into account,
but it can be an easy detail to miss in cross-platform
As a final note, you may notice a reference to an optional
<literal>coretext</literal> shaper back-end in the <xref
linkend="configuration" /> section of the HarfBuzz manual. This
option is not a Core Text-integration facility.
Instead, it is a internal code path used in the
<command>hb-shape</command> command-line utility, which hands
shaping functionality over to Core Text entirely, when run on a
macOS system. That allows testing HarfBuzz's native output
against the Core Text engine, for tracking compatibility and debugging.
Because this back-end is only used when testing HarfBuzz
functionality, it is disabled by default when building the
HarfBuzz binaries.
<section id="integration-icu">
<title>ICU integration</title>
Although HarfBuzz includes its own Unicode-data functions, it
also provides integration APIs for using the International
Components for Unicode (ICU) library as a source of Unicode data
on any supported platform.
The principal integration point with ICU is the
<type>hb_unicode_funcs_t</type> Unicode-functions structure
attached to a buffer. This structure holds the virtual methods
used for retrieving Unicode character properties, such as
General Category, Script, Combining Class, decomposition
mappings, and mirroring information.
To use ICU in your client program, you need to call
<function>hb_icu_get_unicode_funcs()</function>, which creates a
Unicode-functions structure populated with the ICU function for
each included method. Subsequently, you can attach the
Unicode-functions structure to your buffer:
and ICU will be used for Unicode-data access.
HarfBuzz also supplies a pair of functions
(<function>hb_icu_script_from_script()</function> and
<function>hb_icu_script_to_script()</function>) for converting
between ICU's and HarfBuzz's internal enumerations of Unicode
scripts. The <function>hb_icu_script_from_script()</function>
function converts from a HarfBuzz <type>hb_script_t</type> to an
ICU <type>UScriptCode</type>. The
<function>hb_icu_script_to_script()</function> function does the
reverse: converting from a <type>UScriptCode</type> identifier
to a <type>hb_script_t</type>.
By default, HarfBuzz's ICU support is built as a separate shared
library (<filename class="libraryfile">libharfbuzz-icu.so</filename>)
when compiling HarfBuzz from source. This allows client programs
that do not need ICU to link against HarfBuzz without unnecessarily
adding ICU as a dependency. You can also build HarfBuzz with ICU
support built directly into the main HarfBuzz shared library
(<filename class="libraryfile">libharfbuzz.so</filename>),
by specifying the <literal>--with-icu=builtin</literal>
compile-time option.
<section id="integration-python">
<title>Python bindings</title>
As noted in the <xref linkend="integration-glib" /> section,
HarfBuzz uses a feature called <ulink
Introspection</ulink> (GI) to provide bindings for Python.
At compile time, the GI scanner analyzes the HarfBuzz C source
and builds metadata objects connecting the language bindings to
the C library. Your Python code can then use the HarfBuzz binary
through its Python interface.
HarfBuzz's Python bindings support Python 2 and Python 3. To use
them, you will need to have the <literal>pygobject</literal>
package installed. Then you should import
<literal>HarfBuzz</literal> from
Do note, however, that the Python API is subject to change
without advance notice. GI allows the bindings to be
automatically updated, which is one of its advantages, but you
may need to update your Python code.
@ -1,258 +0,0 @@
<chapter id="object-model">
<title>The HarfBuzz object model</title>
<section id="object-model-intro">
<title>An overview of data types in HarfBuzz</title>
HarfBuzz features two kinds of data types: non-opaque,
pass-by-value types and opaque, heap-allocated types. This kind
of separation is common in C libraries that have to provide
API/ABI compatibility (almost) indefinitely.
<emphasis>Value types:</emphasis> The non-opaque, pass-by-value
types include integer types, enums, and small structs. Exposing
a struct in the public API makes it impossible to expand the
struct in the future. As such, exposing structs is reserved for
cases where it’s extremely inefficient to do otherwise.
In HarfBuzz, several structs, like <literal>hb_glyph_info_t</literal> and
<literal>hb_glyph_position_t</literal>, fall into that efficiency-sensitive
category and are non-opaque.
For all non-opaque structs where future extensibility may be
necessary, reserved members are included to hold space for
possible future members. As such, it’s important to provide
<function>equal()</function>, and <function>hash()</function>
methods for such structs, allowing users of the API do
effectively deal with the type without having to
adapt their code to future changes.
Important value types provided by HarfBuzz include the structs
for working with Unicode code points, glyphs, and tags for font
tables and features, as well as the enums for many Unicode and
OpenType properties.
<section id="object-model-object-types">
<title>Objects in HarfBuzz</title>
<emphasis>Object types:</emphasis> Opaque struct types are used
for what HarfBuzz loosely calls "objects." This doesn’t have
much to do with the terminology from object-oriented programming
(OOP), although some of the concepts are similar.
In HarfBuzz, all object types provide certain
lifecycle-management APIs. Objects are reference-counted, and
constructed with various <function>create()</function> methods, referenced via
After construction, each object's properties are accessible only
through the setter and getter functions described in the API
Reference manual.
Key object types provided by HarfBuzz include:
<itemizedlist spacing="compact">
<emphasis>blobs</emphasis>, which act as low-level wrappers around binary
data. Blobs are typically used to hold the contents of a
binary font file.
<emphasis>faces</emphasis>, which represent typefaces from a
font file, but without specific parameters (such as size) set.
<emphasis>fonts</emphasis>, which represent instances of a
face with all of their parameters specified.
<emphasis>buffers</emphasis>, which hold Unicode code points
for characters (before shaping) and the shaped glyph output
(after shaping).
<emphasis>shape plans</emphasis>, which store the settings
that HarfBuzz will use when shaping a particular text
segment. Shape plans are not generally used by client
programs directly, but as we will see in a later chapter,
they are still valuable to understand.
<section id="object-model-lifecycle">
<title>Object lifecycle management</title>
Each object type in HarfBuzz provides a
<function>create()</function> method. Some object types provide
additional variants of <function>create()</function> to handle
special cases or to speed up common tasks; those variants are
documented in the API reference. For example,
<function>hb_blob_create_from_file()</function> constructs a new
blob directly from the contents of a file.
All objects are created with an initial reference count of
<literal>1</literal>. Client programs can increase the reference
count on an object by calling its
<function>reference()</function> method. Whenever a client
program is finished with an object, it should call its
corresponding <function>destroy()</function> method. The destroy
method will decrease the reference count on the object and,
whenever the reference count reaches zero, it will also destroy
the object and free all of the associated memory.
All of HarfBuzz's object-lifecycle-management APIs are
thread-safe (unless you compiled HarfBuzz from source with the
<literal>HB_NO_MT</literal> configuration flag), even when the
object as a whole is not thread-safe.
It is also permissible to <function>reference()</function> or to
<function>destroy()</function> the <literal>NULL</literal>
Some objects are thread-safe after they have been constructed
and set up. The general pattern is to
<function>create()</function> the object, make a few
<function>set_*()</function> calls to set up the
object, and then use it without further modification.
To ensure that such an object is not modified, client programs
can explicitly mark an object as immutable. HarfBuzz provides
<function>make_immutable()</function> methods to mark an object
as immutable and <function>is_immutable()</function> methods to
test whether or not an object is immutable. Attempts to use
setter functions on immutable objects will fail silently; see the API
Reference manual for specifics.
Note also that there are no "make mutable" methods. If client
programs need to alter an object previously marked as immutable,
they will need to make a duplicate of the original.
Finally, object constructors (and, indeed, as much of the
shaping API as possible) will never return
<literal>NULL</literal>. Instead, if there is an allocation
error, each constructor will return an “empty” object
These empty-object singletons are inert and safe (although
typically useless) to pass around. This design choice avoids
having to check for <literal>NULL</literal> pointers all
throughout the code.
In addition, this “empty” object singleton can also be accessed
using the <function>get_empty()</function> method of the object
type in question.
<section id="object-model-user-data">
<title>User data</title>
To better integrate with client programs, HarfBuzz's objects
offer a "user data" mechanism that can be used to attach
arbitrary data to the object. User-data attachment can be
useful for tying the lifecycles of various pieces of data
together, or for creating language bindings.
Each object type has a <function>set_user_data()</function>
method and a <function>get_user_data()</function> method. The
<function>set_user_data()</function> methods take a client-provided
<literal>key</literal> and a pointer,
<literal>user_data</literal>, pointing to the data itself. Once
the key-data pair has been attached to the object, the
<function>get_user_data()</function> method can be called with
the key, returning the <function>user_data</function> pointer.
The <function>set_user_data()</function> methods also support an
optional <function>destroy</function> callback. Client programs
can set the <function>destroy</function> callback and receive
notification from HarfBuzz whenever the object is destructed.
Finally, each <function>set_user_data()</function> method allows
the client program to set a <literal>replace</literal> Boolean
indicating whether or not the function call should replace any
existing <literal>user_data</literal>
associated with the specified key.
<section id="object-model-blobs">
While most of HarfBuzz's object types are specific to the
shaping process, <emphasis>blobs</emphasis> are somewhat
Blobs are an abstraction designed to negotiate lifecycle and
permissions for raw pieces of data. For example, when you load
the raw font data into memory and want to pass it to HarfBuzz,
you do so in a <literal>hb_blob_t</literal> wrapper.
This allows you to take advantage of HarffBuzz's
reference-counting and <function>destroy</function>
callbacks. If you allocated the memory for the data using
<function>malloc()</function>, you would create the blob using
@ -1,336 +0,0 @@
<chapter id="shaping-and-shape-plans">
<title>Shaping and shape plans</title>
Once you have your face and font objects configured as desired and
your input buffer is filled with the characters you need to shape,
all you need to do is call <function>hb_shape()</function>.
HarfBuzz will return the shaped version of the text in the same
buffer that you provided, but it will be in output mode. At that
point, you can iterate through the glyphs in the buffer, drawing
each one at the specified position or handing them off to the
appropriate graphics library.
For the most part, HarfBuzz's shaping step is straightforward from
the outside. But that doesn't mean there will never be cases where
you want to look under the hood and see what is happening on the
inside. HarfBuzz provides facilities for doing that, too.
<section id="shaping-buffer-output">
<title>Shaping and buffer output</title>
The <function>hb_shape()</function> function call takes four arguments: the font
object to use, the buffer of characters to shape, an array of
user-specified features to apply, and the length of that feature
array. The feature array can be NULL, so for the sake of
simplicity we will start with that case.
Internally, HarfBuzz looks at the tables of the font file to
determine where glyph classes, substitutions, and positioning
are defined, using that information to decide which
<emphasis>shaper</emphasis> to use (<literal>ot</literal> for
OpenType fonts, <literal>aat</literal> for Apple Advanced
Typography fonts, and so on). It also looks at the direction,
script, and language properties of the segment to figure out
which script-specific shaping model is needed (at least, in
shapers that support multiple options).
If a font has a GDEF table, then that is used for
glyph classes; if not, HarfBuzz will fall back to Unicode
categorization by code point. If a font has an AAT <literal>morx</literal> table,
then it is used for substitutions; if not, but there is a GSUB
table, then the GSUB table is used. If the font has an AAT
<literal>kerx</literal> table, then it is used for positioning; if not, but
there is a GPOS table, then the GPOS table is used. If neither
table is found, but there is a <literal>kern</literal> table, then HarfBuzz will
use the <literal>kern</literal> table. If there is no <literal>kerx</literal>, no GPOS, and no
<literal>kern</literal>, HarfBuzz will fall back to positioning marks itself.
With a well-behaved OpenType font, you expect GDEF, GSUB, and
GPOS tables to all be applied. HarfBuzz implements the
script-specific shaping models in internal functions, rather
than in the public API.
The algorithms
used for complex scripts can be quite involved; HarfBuzz tries
to be compatible with the OpenType Layout specification
and, wherever there is any ambiguity, HarfBuzz attempts to replicate the
output of Microsoft's Uniscribe engine. See the <ulink
Typography pages</ulink> for more detail.
In general, though, all that you need to know is that
<function>hb_shape()</function> returns the results of shaping
in the same buffer that you provided. The buffer's content type
will now be set to
<literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal>, indicating
that it contains shaped output, rather than input text. You can
now extract the glyph information and positioning arrays:
The glyph information array holds a <type>hb_glyph_info_t</type>
for each output glyph, which has two fields:
<parameter>codepoint</parameter> and
<parameter>cluster</parameter>. Whereas, in the input buffer,
the <parameter>codepoint</parameter> field contained the Unicode
code point, it now contains the glyph ID of the corresponding
glyph in the font. The <parameter>cluster</parameter> field is
an integer that you can use to help identify when shaping has
reordered, split, or combined code points; we will say more
about that in the next chapter.
The glyph positions array holds a corresponding
<type>hb_glyph_position_t</type> for each output glyph,
containing four fields: <parameter>x_advance</parameter>,
<parameter>x_offset</parameter>, and
<parameter>y_offset</parameter>. The advances tell you how far
you need to move the drawing point after drawing this glyph,
depending on whether you are setting horizontal text (in which
case you will have x advances) or vertical text (for which you
will have y advances). The x and y offsets tell you where to
move to start drawing the glyph; usually you will have both and
x and a y offset, regardless of the text direction.
Most of the time, you will rely on a font-rendering library or
other graphics library to do the actual drawing of glyphs, so
you will need to iterate through the glyphs in the buffer and
pass the corresponding values off.
<section id="shaping-opentype-features">
<title>OpenType features</title>
OpenType features enable fonts to include smart behavior,
implemented as "lookup" rules stored in the GSUB and GPOS
tables. The OpenType specification defines a long list of
standard features that fonts can use for these behaviors; each
feature has a four-character reserved name and a well-defined
semantic meaning.
Some OpenType features are defined for the purpose of supporting
complex-script shaping, and are automatically activated, but
only when a buffer's script property is set to a script that the
feature supports.
Other features are more generic and can apply to several (or
any) script, and shaping engines are expected to implement
them. By default, HarfBuzz activates several of these features
on every text run. They include <literal>abvm</literal>,
<literal>blwm</literal>, <literal>ccmp</literal>,
<literal>locl</literal>, <literal>mark</literal>,
<literal>mkmk</literal>, and <literal>rlig</literal>.
In addition, if the text direction is horizontal, HarfBuzz
also applies the <literal>calt</literal>,
<literal>clig</literal>, <literal>curs</literal>,
<literal>dist</literal>, <literal>kern</literal>,
<literal>liga</literal> and <literal>rclt</literal>, features.
Additionally, when HarfBuzz encounters a fraction slash
(<literal>U+2044</literal>), it looks backward and forward for decimal
digits (Unicode General Category = Nd), and enables features
<literal>numr</literal> on the sequence before the fraction slash,
<literal>dnom</literal> on the sequence after the fraction slash,
and <literal>frac</literal> on the whole sequence including the fraction
Some script-specific shaping models
(see <xref linkend="opentype-shaping-models" />) disable some of the
features listed above:
Hangul: <literal>calt</literal>
Indic: <literal>liga</literal>
Khmer: <literal>liga</literal>
If the text direction is vertical, HarfBuzz applies
the <literal>vert</literal> feature by default.
Still other features are designed to be purely optional and left
up to the application or the end user to enable or disable as desired.
You can adjust the set of features that HarfBuzz applies to a
buffer by supplying an array of <type>hb_feature_t</type>
features as the third argument to
<function>hb_shape()</function>. For a simple case, let's just
enable the <literal>dlig</literal> feature, which turns on any
"discretionary" ligatures in the font:
<literal>HB_FEATURE_GLOBAL_END</literal> are macros we can use
to indicate that the features will be applied to the entire
buffer. We could also have used a literal <literal>0</literal>
for the start and a <literal>-1</literal> to indicate the end of
the buffer (or have selected other start and end positions, if needed).
When we pass the <varname>userfeatures</varname> array to
<function>hb_shape()</function>, any discretionary ligature
substitutions from our font that match the text in our buffer
will get performed:
Just like we enabled the <literal>dlig</literal> feature by
setting its <parameter>value</parameter> to
<literal>1</literal>, you would disable a feature by setting its
<parameter>value</parameter> to <literal>0</literal>. Some
features can take other <parameter>value</parameter> settings;
be sure you read the full specification of each feature tag to
understand what it does and how to control it.
<section id="shaping-shaper-selection">
<title>Shaper selection</title>
The basic version of <function>hb_shape()</function> determines
its shaping strategy based on examining the capabilities of the
font file. OpenType font tables cause HarfBuzz to try the
<literal>ot</literal> shaper, while AAT font tables cause HarfBuzz to try the
<literal>aat</literal> shaper.
In the real world, however, a font might include some unusual
mix of tables, or one of the tables might simply be broken for
the script you need to shape. So, sometimes, you might not
want to rely on HarfBuzz's process for deciding what to do, and
just tell <function>hb_shape()</function> what you want it to try.
<function>hb_shape_full()</function> is an alternate shaping
function that lets you supply a list of shapers for HarfBuzz to
try, in order, when shaping your buffer. For example, if you
have determined that HarfBuzz's attempts to work around broken
tables gives you better results than the AAT shaper itself does,
you might move the AAT shaper to the end of your list of
preferences and call <function>hb_shape_full()</function>
You may also want to call
<function>hb_shape_list_shapers()</function> to get a list of
the shapers that were built at compile time in your copy of HarfBuzz.
<section id="shaping-plans-and-caching">
<title>Plans and caching</title>
Internally, HarfBuzz uses a structure called a shape plan to
track its decisions about how to shape the contents of a
buffer. The <function>hb_shape()</function> function builds up the shape plan by
examining segment properties and by inspecting the contents of
the font.
This process can involve some decision-making and
trade-offs — for example, HarfBuzz inspects the GSUB and GPOS
lookups for the script and language tags set on the segment
properties, but it falls back on the lookups under the
<literal>DFLT</literal> tag (and sometimes other common tags)
if there are actually no lookups for the tag requested.
HarfBuzz also includes some work-arounds for
handling well-known older font conventions that do not follow
OpenType or Unicode specifications, for buggy system fonts, and for
peculiarities of Microsoft Uniscribe. All of that means that a
shape plan, while not something that you should edit directly in
client code, still might be an object that you want to
inspect. Furthermore, if resources are tight, you might want to
cache the shape plan that HarfBuzz builds for your buffer and
font, so that you do not have to rebuild it for every shaping call.
You can create a cacheable shape plan with
<function>hb_shape_plan_create_cached(face, props,
user_features, num_user_features, shaper_list)</function>, where
<parameter>face</parameter> is a face object (not a font object,
notably), <parameter>props</parameter> is an
<parameter>user_features</parameter> is an array of
<type>hb_feature_t</type>s (with length
<parameter>num_user_features</parameter>), and
<parameter>shaper_list</parameter> is a list of shapers to try.
Shape plans are objects in HarfBuzz, so there are
reference-counting functions and user-data attachment functions
you can
use. <function>hb_shape_plan_reference(shape_plan)</function>
increases the reference count on a shape plan, while
<function>hb_shape_plan_destroy(shape_plan)</function> decreases
the reference count, destroying the shape plan when the last
reference is dropped.
You can attach user data to a shaper (with a key) using the
function, optionally supplying a <function>destroy</function>
callback to use. You can then fetch the user data attached to a
shape plan with
<function>hb_shape_plan_get_user_data(shape_plan, key)</function>.
@ -1,375 +0,0 @@
<chapter id="shaping-concepts">
<title>Shaping concepts</title>
<section id="text-shaping-concepts">
<title>Text shaping</title>
Text shaping is the process of transforming a sequence of Unicode
codepoints that represent individual characters (letters,
diacritics, tone marks, numbers, symbols, etc.) into the
orthographically and linguistically correct two-dimensional layout
of glyph shapes taken from a specified font.
For some writing systems (or <emphasis>scripts</emphasis>) and
languages, the process is simple, requiring the shaper to do
little more than advance the horizontal position forward by the
correct amount for each successive glyph.
But, for <emphasis>complex scripts</emphasis>, any combination of
several shaping operations may be required, and the rules for how
and when they are applied vary from script to script. HarfBuzz and
other shaping engines implement these rules.
The exact rules and necessary operations for a particular script
constitute a shaping <emphasis>model</emphasis>. OpenType
specifies a set of shaping models that covers all of
Unicode. Other shaping models are available, however, including
Graphite and Apple Advanced Typography (AAT).
<section id="complex-scripts">
<title>Complex scripts</title>
In text-shaping terminology, scripts are generally classified as
either <emphasis>complex</emphasis> or <emphasis>non-complex</emphasis>.
Complex scripts are those for which transforming the input
sequence into the final layout requires some combination of
operations—such as context-dependent substitutions,
context-dependent mark positioning, glyph-to-glyph joining,
glyph reordering, or glyph stacking.
In some complex scripts, the shaping rules require that a text
run be divided into syllables before the operations can be
applied. Other complex scripts may apply shaping operations over
entire words or over the entire text run, with no subdivision
Non-complex scripts, by definition, do not require these
operations. However, correctly shaping a text run in a
non-complex script may still involve Unicode normalization,
ligature substitutions, mark positioning, kerning, and applying
other font features. The key difference is that a text run in a
non-complex script can be processed sequentially and in the same
order as the input sequence of Unicode codepoints, without
requiring an analysis stage.
<section id="shaping-operations">
<title>Shaping operations</title>
Shaping a complex-script text run involves transforming the
input sequence of Unicode codepoints with some combination of
operations that is specified in the shaping model for the
The specific conditions that trigger a given operation for a
text run varies from script to script, as do the order that the
operations are performed in and which codepoints are
affected. However, the same general set of shaping operations is
common to all of the complex-script shaping models.
A <emphasis>reordering</emphasis> operation moves a glyph
from its original ("logical") position in the sequence to
some other ("visual") position.
The shaping model for a given complex script might involve
more than one reordering step.
A <emphasis>joining</emphasis> operation replaces a glyph
with an alternate form that is designed to connect with one
or more of the adjacent glyphs in the sequence.
A contextual <emphasis>substitution</emphasis> operation
replaces either a single glyph or a subsequence of several
glyphs with an alternate glyph. This substitution is
performed when the original glyph or subsequence of glyphs
occurs in a specified position with respect to the
surrounding sequence. For example, one substitution might be
performed only when the target glyph is the first glyph in
the sequence, while another substitution is performed only
when a different target glyph occurs immediately after a
particular string pattern.
The shaping model for a given complex script might involve
multiple contextual-substitution operations, each applying
to different target glyphs and patterns, and which are
performed in separate steps.
A contextual <emphasis>positioning</emphasis> operation
moves the horizontal and/or vertical position of a
glyph. This positioning move is performed when the glyph
occurs in a specified position with respect to the
surrounding sequence.
Many contextual positioning operations are used to place
<emphasis>mark</emphasis> glyphs (such as diacritics, vowel
signs, and tone markers) with respect to
<emphasis>base</emphasis> glyphs. However, some complex
scripts may use contextual positioning operations to
correctly place base glyphs as well, such as
when the script uses <emphasis>stacking</emphasis> characters.
<section id="unicode-character-categories">
<title>Unicode character categories</title>
Shaping models are typically specified with respect to how
scripts are defined in the Unicode standard.
Every codepoint in the Unicode Character Database (UCD) is
assigned a <emphasis>Unicode General Category</emphasis> (UGC),
which provides the most fundamental information about the
codepoint: whether the codepoint represents a
<emphasis>Letter</emphasis>, a <emphasis>Mark</emphasis>, a
<emphasis>Number</emphasis>, <emphasis>Punctuation</emphasis>, a
<emphasis>Symbol</emphasis>, a <emphasis>Separator</emphasis>,
or something else (<emphasis>Other</emphasis>).
These UGC properties are "Major" categories. Each codepoint is
further assigned to a "minor" category within its Major
category, such as "Letter, uppercase" (<literal>Lu</literal>) or
"Letter, modifier" (<literal>Lm</literal>).
Shaping models are concerned primarily with Letter and Mark
codepoints. The minor categories of Mark codepoints are
particularly important for shaping. Marks can be nonspacing
(<literal>Mn</literal>), spacing combining
(<literal>Mc</literal>), or enclosing (<literal>Me</literal>).
In addition to the UGC property, codepoints in the Indic and
Southeast Asian scripts are also assigned
<emphasis>Unicode Indic Syllabic Category</emphasis> (UISC) and
<emphasis>Unicode Indic Positional Category</emphasis> (UIPC)
properties that provide more detailed information needed for
The UISC property sub-categorizes Letters and Marks according to
common script-shaping behaviors. For example, UISC distinguishes
between consonant letters, vowel letters, and vowel marks. The
UIPC property sub-categorizes Mark codepoints by the relative visual
position that they occupy (above, below, right, left, or in
multiple positions).
Some complex scripts require that the text run be split into
syllables. What constitutes a valid syllable in these
scripts is specified in regular expressions, formed from the
Letter and Mark codepoints, that take the UISC and UIPC
properties into account.
<section id="text-runs">
<title>Text runs</title>
Real-world text usually contains codepoints from a mixture of
different Unicode scripts (including punctuation, numbers, symbols,
white-space characters, and other codepoints that do not belong
to any script). Real-world text may also be marked up with
formatting that changes font properties (including the font,
font style, and font size).
For shaping purposes, all real-world text streams must be first
segmented into runs that have a uniform set of properties.
In particular, shaping models always assume that every codepoint
in a text run has the same <emphasis>direction</emphasis>,
<emphasis>script</emphasis> tag, and
<emphasis>language</emphasis> tag.
<section id="opentype-shaping-models">
<title>OpenType shaping models</title>
OpenType provides shaping models for the following scripts:
The <emphasis>default</emphasis> shaping model handles all
non-complex scripts, and may also be used as a fallback for
handling unrecognized scripts.
The <emphasis>Indic</emphasis> shaping model handles the Indic
scripts Bengali, Devanagari, Gujarati, Gurmukhi, Kannada,
Malayalam, Oriya, Tamil, Telugu, and Sinhala.
The Indic shaping model was revised significantly in
2005. To denote the change, a new set of <emphasis>script
tags</emphasis> was assigned for Bengali, Devanagari,
Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and
Telugu. For the sake of clarity, the term "Indic2" is
sometimes used to refer to the current, revised shaping
The <emphasis>Arabic</emphasis> shaping model supports
Arabic, Mongolian, N'Ko, Syriac, and several other connected
or cursive scripts.
The <emphasis>Thai/Lao</emphasis> shaping model supports
the Thai and Lao scripts.
The <emphasis>Khmer</emphasis> shaping model supports the
Khmer script.
The <emphasis>Myanmar</emphasis> shaping model supports the
Myanmar (or Burmese) script.
The <emphasis>Tibetan</emphasis> shaping model supports the
Tibetan script.
The <emphasis>Hangul</emphasis> shaping model supports the
Hangul script.
The <emphasis>Hebrew</emphasis> shaping model supports the
Hebrew script.
The <emphasis>Universal Shaping Engine</emphasis> (USE)
shaping model supports complex scripts not covered by one of
the above, script-specific shaping models, including
Javanese, Balinese, Buginese, Batak, Chakma, Lepcha, Modi,
Phags-pa, Tagalog, Siddham, Sundanese, Tai Le, Tai Tham, Tai
Viet, and many others.
Text runs that do not fall under one of the above shaping
models may still require processing by a shaping engine. Of
particular note is <emphasis>Emoji</emphasis> shaping, which
may involve variation-selector sequences and glyph
substitution. Emoji shaping is handled by the default
shaping model.
<section id="graphite-shaping">
<title>Graphite shaping</title>
In contrast to OpenType shaping, Graphite shaping does not
specify a predefined set of shaping models or a set of supported
Instead, each Graphite font contains a complete set of rules that
implement the required shaping model for the intended
script. These rules include finite-state machines to match
sequences of codepoints to the shaping operations to perform.
Graphite shaping can perform the same shaping operations used in
OpenType shaping, as well as other functions that have not been
defined for OpenType shaping.
<section id="aat-shaping">
<title>AAT shaping</title>
In contrast to OpenType shaping, AAT shaping does not specify a
predefined set of shaping models or a set of supported scripts.
Instead, each AAT font includes a complete set of rules that
implement the desired shaping model for the intended
script. These rules include finite-state machines to match glyph
sequences and the shaping operations to perform.
Notably, AAT shaping rules are expressed for glyphs in the font,
not for Unicode codepoints. AAT shaping can perform the same
shaping operations used in OpenType shaping, as well as other
functions that have not been defined for OpenType shaping.
@ -1,218 +0,0 @@
<chapter id="utilities">
HarfBuzz includes several auxiliary components in addition to the
main APIs. These include a set of command-line tools, a set of
lower-level APIs for common data types that may be of interest to
client programs.
<section id="utilities-command-line-tools">
<title>Command-line tools</title>
HarfBuzz include three command-line tools:
<command>hb-shape</command>, <command>hb-view</command>, and
<command>hb-subset</command>. They can be used to examine
HarfBuzz's functionality, debug font binaries, or explore the
various shaping models and features from a terminal.
<section id="utilities-command-line-hbshape">
<emphasis><command>hb-shape</command></emphasis> allows you to run HarfBuzz's
<function>hb_shape()</function> function on an input string and
to examine the outcome, in human-readable form, as terminal
output. <command>hb-shape</command> does
<emphasis>not</emphasis> render the results of the shaping call
into rendered text (you can use <command>hb-view</command>, below, for
that). Instead, it prints out the final glyph indices and
positions, taking all shaping operations into account, as if the
input string were a HarfBuzz input buffer.
You can specify the font to be used for shaping and, with
command-line options, you can add various aspects of the
internal state to the output that is sent to the terminal. The
general format is
output, and to include additional data, such as Unicode
code-point values, glyph extents, glyph flags, or interim
shaping results.
Output can also be redirected to a file, or input read from a
file. Additional options enable you to enable or disable
specific font features, to set variation-font axis values, to
alter the language, script, direction, and clustering settings
used, to enable sanity checks, or to change which shaping engine is used.
For a complete explanation of the options available, run
<command>hb-shape</command> <parameter>--help</parameter>
<section id="utilities-command-line-hbview">
<emphasis><command>hb-view</command></emphasis> allows you to
see the shaped output of an input string in rendered
form. Like <command>hb-shape</command>,
<command>hb-view</command> takes a font file and a text string
as its arguments:
As with <command>hb-shape</command>, a lengthy set of options
is available, with which you can enable or disable
specific font features, set variation-font axis values,
alter the language, script, direction, and clustering settings
used, enable sanity checks, or change which shaping engine is
You can also set the foreground and background colors used for
the output, independently control the width of all four
margins, alter the line spacing, and annotate the output image
In general, <command>hb-view</command> is a quick way to
verify that the output of HarfBuzz's shaping operation looks
correct for a given text-and-font combination, but you may
want to use <command>hb-shape</command> to figure out exactly
why something does not appear as expected.
<section id="utilities-command-line-hbsubset">
<emphasis><command>hb-subset</command></emphasis> allows you
to generate a subset of a given font, with a limited set of
supported characters, features, and variation settings.
By default, you provide an input font and an input text string
as the arguments to <command>hb-subset</command>, and it will
generate a font that covers the input text exactly like the
input font does, but includes no other characters or features.
subsetted font and to specify a list of variation-axis settings.
<section id="utilities-common-types-apis">
<title>Common data types and APIs</title>
HarfBuzz includes several APIs for working with general-purpose
data that you may find convenient to leverage in your own
software. They include set operations and integer-to-integer
mapping operations.
HarfBuzz uses set operations for internal bookkeeping, such as
when it collects all of the glyph IDs covered by a particular
font feature. You can also use the set API to build sets, add
and remove elements, test whether or not sets contain particular
elements, or compute the unions, intersections, or differences
between sets.
All set elements are integers (specifically,
<type>hb_codepoint_t</type> 32-bit unsigned ints), and there are
functions for fetching the minimum and maximum element from a
set. The set API also includes some functions that might not
be part of a generic set facility, such as the ability to add a
contiguous range of integer elements to a set in bulk, and the
ability to fetch the next-smallest or next-largest element.
The HarfBuzz set API includes some conveniences as well. All
sets are lifecycle-managed, just like other HarfBuzz
objects. You increase the reference count on a set with
<function>hb_set_reference()</function> and decrease it with
<function>hb_set_destroy()</function>. You can also attach
user data to a set, just like you can to blobs, buffers, faces,
fonts, and other objects, and set destroy callbacks.
HarfBuzz also provides an API for keeping track of
integer-to-integer mappings. As with the set API, each integer is
stored as an unsigned 32-bit <type>hb_codepoint_t</type>
element. Maps, like other objects, are reference counted with
reference and destroy functions, and you can attach user data to
them. The mapping operations include adding and deleting
integer-to-integer key:value pairs to the map, testing for the
presence of a key, fetching the population of the map, and so on.
There are several other internal HarfBuzz facilities that are
exposed publicly and which you may want to take advantage of
while processing text. HarfBuzz uses a common
<type>hb_tag_t</type> for a variety of OpenType tag identifiers (for
scripts, languages, font features, table names, variation-axis
names, and more), and provides functions for converting strings
to tags and vice-versa.
Finally, HarfBuzz also includes data type for Booleans, bit
masks, and other simple types.
@ -1,442 +0,0 @@
<chapter id="what-is-harfbuzz">
<title>What is HarfBuzz?</title>
HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
give HarfBuzz a font and a string containing a sequence of Unicode
codepoints, HarfBuzz selects and positions the corresponding
glyphs from the font, applying all of the necessary layout rules
and font features. HarfBuzz then returns the string to you in the
form that is correctly arranged for the language and writing
HarfBuzz can properly shape all of the world's major writing
systems. It runs on all major operating systems and software
platforms and it supports the major font formats in use
<section id="what-is-text-shaping">
<title>What is text shaping?</title>
Text shaping is the process of translating a string of character
codes (such as Unicode codepoints) into a properly arranged
sequence of glyphs that can be rendered onto a screen or into
final output form for inclusion in a document.
The shaping process is dependent on the input string, the active
font, the script (or writing system) that the string is in, and
the language that the string is in.
Modern software systems generally only deal with strings in the
Unicode encoding scheme (although legacy systems and documents may
involve other encodings).
There are several font formats that a program might
encounter, each of which has a set of standard text-shaping
<para>The dominant format is <ulink
url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
various scripts from around the world. These shaping models depend on
the font incorporating certain features as
<emphasis>lookups</emphasis> in its <literal>GSUB</literal>
and <literal>GPOS</literal> tables.
Alternatively, OpenType fonts can include shaping features for
the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
TrueType fonts can also include OpenType shaping
features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
Advanced Typography</ulink> (AAT) tables to implement shaping
support. AAT fonts are generally only found on macOS and iOS systems.
Text strings will usually be tagged with a script and language
tag that provide the context needed to perform text shaping
correctly. The necessary <ulink
and <ulink
tags are defined by OpenType.
<section id="why-do-i-need-a-shaping-engine">
<title>Why do I need a shaping engine?</title>
Text shaping is an integral part of preparing text for
display. Before a Unicode sequence can be rendered, the
codepoints in the sequence must be mapped to the corresponding
glyphs provided in the font, and those glyphs must be positioned
correctly relative to each other. For many of the scripts
supported in Unicode, these steps involve script-specific layout
rules, including complex joining, reordering, and positioning
behavior. Implementing these rules is the job of the shaping engine.
Text shaping is a fairly low-level operation. HarfBuzz is
used directly by text-handling libraries like <ulink
url="https://www.pango.org/">Pango</ulink>, as well as by the layout
engines in Firefox, LibreOffice, and Chromium. Unless you are
<emphasis>writing</emphasis> one of these layout engines
yourself, you will probably not need to use HarfBuzz: normally,
a layout engine, toolkit, or other library will turn text into
glyphs for you.
However, if you <emphasis>are</emphasis> writing a layout engine
or graphics library yourself, then you will need to perform text
shaping, and this is where HarfBuzz can help you.
Here are some specific scenarios where a text-shaping engine
like HarfBuzz helps you:
OpenType fonts contain a set of glyphs (that is, shapes
to represent the letters, numbers, punctuation marks, and
all other symbols), which are indexed by a <literal>glyph ID</literal>.
A particular glyph ID within the font does not necessarily
correlate to a predictable Unicode codepoint. For instance,
some fonts have the letter "a" as glyph ID 1, but
many others do not. In order to retrieve the right glyph
from the font to display "a", you need to consult
the table inside the font (the <literal>cmap</literal>
table) that maps Unicode codepoints to glyph IDs. In other
words, <emphasis>text shaping turns codepoints into glyph
Many OpenType fonts contain ligatures: combinations of
characters that are rendered as a single unit. For instance,
it is common for the "f, i" letter
sequence to appear in print as the single ligature glyph
Whether you should render an "f, i" sequence
as <literal>fi</literal> or as "fi" does not
depend on the input text. Instead, it depends on the whether
or not the font includes an "fi" glyph and on the
level of ligature application you wish to perform. The font
and the amount of ligature application used are under your
control. In other words, <emphasis>text shaping involves
querying the font's ligature tables and determining what
substitutions should be made</emphasis>.
While ligatures like "fi" are optional typographic
refinements, some languages <emphasis>require</emphasis> certain
substitutions to be made in order to display text correctly.
For example, in Tamil, when the letter "TTA" (ட)
letter is followed by the vowel sign "U" (ு), the pair
must be replaced by the single glyph "டு". The
sequence of Unicode characters "ட,ு" needs to be
substituted with a single "டு" glyph from the
But "டு" does not have a Unicode codepoint. To
find this glyph, you need to consult the table inside
the font (the <literal>GSUB</literal> table) that contains
substitution information. In other words, <emphasis>text shaping
chooses the correct glyph for a sequence of characters
Similarly, each Arabic character has four different variants
corresponding to the different positions it might appear in
within a sequence. Inside a font, there will be separate
glyphs for the initial, medial, final, and isolated forms of
each letter, each at a different glyph ID.
Unicode only assigns one codepoint per character, so a
Unicode string will not tell you which glyph variant to use
for each character. To decide, you need to analyze the whole
string and determine the appropriate glyph for each character
based on its position. In other words, <emphasis>text
shaping chooses the correct form of the letter by its
position and returns the correct glyph from the font</emphasis>.
Other languages involve marks and accents that need to be
rendered in specific positions relative a base character. For
instance, the Moldovan language includes the Cyrillic letter
"zhe" (ж) with a breve accent, like so: "ӂ".
Some fonts will provide this character as a single
zhe-with-breve glyph, but other fonts will not and, instead,
will expect the rendering engine to form the character by
superimposing the separate "ж" and "˘"
But exactly where you should draw the breve depends on the
height and width of the preceding zhe glyph. To find the
right position, you need to consult the table inside
the font (the <literal>GPOS</literal> table) that contains
positioning information.
In other words, <emphasis>text shaping tells you whether you
have a precomposed glyph within your font or if you need to
compose a glyph yourself out of combining marks—and,
if so, where to position those marks.</emphasis>
If tasks like these are something that you need to do, then you
need a text shaping engine. You could use Uniscribe if you are
writing Windows software; you could use CoreText on macOS; or
you could use HarfBuzz.
In the rest of this manual, the text will assume that the reader
is that implementor of a text-layout engine.
<section id="what-does-harfbuzz-do">
<title>What does HarfBuzz do?</title>
HarfBuzz provides text shaping through a cross-platform
C API that accepts sequences of Unicode codepoints as input. Currently,
the following OpenType shaping models are supported:
Indic (covering Devanagari, Bengali, Gujarati,
Gurmukhi, Kannada, Malayalam, Oriya, Tamil, Telugu, and
Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
Thai and Lao
The Universal Shaping Engine or <emphasis>USE</emphasis>
(covering complex scripts not covered by the above shaping
A default shaping model for non-complex scripts
(covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
and many others)
Emoji (including emoji modifier sequences, flag sequences,
and ZWJ sequences)
In addition to OpenType shaping, HarfBuzz supports the latest
version of Graphite shaping (the "Graphite 2" model) and AAT
HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
collections (.ttc), and OpenType fonts (.otf, including those
fonts that contain TrueType-style outlines and those that
contain PostScript CFF or CFF2 outlines).
HarfBuzz is designed and tested to run on top of the FreeType
font renderer. It can run on Linux, Android, Windows, macOS, and
iOS systems.
In addition to its core shaping functionality, HarfBuzz provides
functions for accessing other font features, including optional
GSUB and GPOS OpenType features, as well as
all color-font formats (<literal>CBDT</literal>,
<literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
<literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
also includes a font-subsetting feature. HarfBuzz can perform
some low-level math-shaping operations, although it does not
currently perform full shaping for mathematical typesetting.
A suite of command-line utilities is also provided in the
source-code tree, designed to help users test and debug
HarfBuzz's features on real-world fonts and input.
<section id="what-harfbuzz-doesnt-do">
<title>What HarfBuzz doesn't do</title>
HarfBuzz will take a Unicode string, shape it, and give you the
information required to lay it out correctly on a single
horizontal (or vertical) line using the font provided. That is the
extent of HarfBuzz's responsibility.
It is important to note that if you are implementing a complete
text-layout engine you may have other responsibilities that
HarfBuzz will <emphasis>not</emphasis> help you with. For example:
HarfBuzz won't help you with bidirectionality. If you want to
lay out text that includes a mix of Hebrew and English, you
will need to ensure that each buffer provided to HarfBuzz
has all of its characters in the same order and that the
directionality of the buffer is set correctly. This may mean
segmenting the text before it is placed into HarfBuzz buffers. In
other words, the user will hit the keys in the following
A B C [space] ג ב א [space] D E F
but will expect to see in the output:
This reordering is called <emphasis>bidi processing</emphasis>
("bidi" is short for bidirectional), and there's an
algorithm as an annex to the Unicode Standard which tells you how
to process a string of mixed directionality.
Before sending your string to HarfBuzz, you may need to apply the
bidi algorithm to it. Libraries such as <ulink
url="http://icu-project.org/">ICU</ulink> and <ulink
url="http://fribidi.org/">fribidi</ulink> can do this for you.
HarfBuzz won't help you with text that contains different font
properties. For instance, if you have the string "a
<emphasis>huge</emphasis> breakfast", and you expect
"huge" to be italic, then you will need to send three
strings to HarfBuzz: <literal>a</literal>, in your Roman font;
<literal>huge</literal> using your italic font; and
<literal>breakfast</literal> using your Roman font again.
Similarly, if you change the font, font size, script,
language, or direction within your string, then you will
need to shape each run independently and output them
independently. HarfBuzz expects to shape a run of characters
that all share the same properties.
HarfBuzz won't help you with line breaking, hyphenation, or
justification. As mentioned above, HarfBuzz lays out the string
along a <emphasis>single line</emphasis> of, notionally,
infinite length. If you want to find out where the potential
word, sentence and line break points are in your text, you
could use the ICU library's break iterator functions.
HarfBuzz can tell you how wide a shaped piece of text is, which is
useful input to a justification algorithm, but it knows nothing
about paragraphs, lines or line lengths. Nor will it adjust the
space between words to fit them proportionally into a line.
As a layout-engine implementor, HarfBuzz will help you with the
interface between your text and your font, and that's something
that you'll need—what you then do with the glyphs that your font
returns is up to you.
<section id="why-is-it-called-harfbuzz">
<title>Why is it called HarfBuzz?</title>
HarfBuzz began its life as text-shaping code within the FreeType
project (and you will see references to the FreeType authors
within the source code copyright declarations), but was then
extracted out to its own project. This project is maintained by
Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
shaping engine for OpenType fonts—"HarfBuzz" is
the Persian for "open type".
@ -1 +0,0 @@
@ -0,0 +1,113 @@
Name: harfbuzz
Version: 2.8.2
Release: 4
Summary: A text shaping engine
License: MIT
URL: https://harfbuzz.github.io/what-is-harfbuzz.html
Source0: https://github.com/harfbuzz/harfbuzz/releases/download/2.8.2/%{name}-%{version}.tar.xz
Patch0001: backport-CVE-2022-33068.patch
Patch0002: backport-0001-CVE-2023-25193.patch
Patch0003: backport-0002-CVE-2023-25193.patch
BuildRequires: gcc-c++ freetype-devel cairo-devel glib2-devel graphite2-devel
BuildRequires: gtk-doc libicu-devel gobject-introspection-devel
Provides: harfbuzz-icu
Obsoletes: harfbuzz-icu
HarfBuzz is a text-shaping engine. If you give HarfBuzz a font and a string
containing a sequence of Unicode codepoints, HarfBuzz selects and positions
the corresponding glyphs from the font, applying all of the necessary layout
rules and font features. HarfBuzz then returns the string to you in the form
that is correctly arranged for the language and writing system.
%package devel
Summary: The development environment for %{name}
Requires: %{name} = %{version}-%{release}
%description devel
Header files and libraries for building a extension library for %{name}.
%autosetup -n %{name}-%{version} -p1
%configure --disable-static --with-graphite2 --with-gobject --enable-introspection
make %{?_smp_mflags}
make check
make install DESTDIR=$RPM_BUILD_ROOT INSTALL="install -p"
%license COPYING
%dir %{_libdir}/girepository-1.0
%files devel
%dir %{_datadir}/gir-1.0
%files help
* Wed Feb 15 2023 zhouwenpei <zhouwenpei1@h-partners.com> - 2.8.2-4
- fix CVE-2023-25193
* Thu Jul 14 2022 zhouwenpei <zhouwenpei1@h-partners.com> - 2.8.2-3
- fix CVE-2022-33068
* Tue May 24 2022 loong_C <loong_c@yeah.net> - 2.8.2-2
- fix spec changelog date
* Fri Dec 03 2021 liuyumeng <liuyumeng5@huawei.com> - 2.8.2-1
- update to harfbuzz-2.8.2-1
* Mon Jul 05 2021 wangkerong <wangkerong@huawei.com> - 2.8.1-2
- enable make check
* Fri Jun 25 2021 wangkerong <wangkerong@huawei.com> - 2.8.1-1
- update to 2.8.1
* Thu Jan 28 2021 zhanzhimin <zhanzhimin@huawei.com> - 2.7.4-1
- update to 2.7.4
* Thu Sep 10 2020 chengguipeng <chengguipeng1@huawei.com> - 2.6.8-3
- Type:bugfix
- DESC:modify source0 url
* Wed Jul 29 2020 hanhui <hanhui15@huawei.com> - 2.6.8-2
- modify HarfBuzz-0.0.gir patch
* Tue Jul 21 2020 hanhui <hanhui15@huawei.com> - 2.6.8-1
- Update to 2.6.8
* Mon Jun 15 2020 hanhui <hanhui15@huawei.com> - 2.6.1-1
- Update to 2.6.1
* Mon Aug 26 2019 openEuler Buildteam <buildteam@openeuler.org> - 1.8.7-2
- Package Init
Normal file
Normal file
Normal file
Normal file
@ -0,0 +1,476 @@
build_summary = {
{'prefix': get_option('prefix'),
'bindir': get_option('bindir'),
'libdir': get_option('libdir'),
'includedir': get_option('includedir'),
'datadir': get_option('datadir'),
'Unicode callbacks (you want at least one)':
{'Builtin': true,
'Glib': conf.get('HAVE_GLIB', 0) == 1,
'ICU': conf.get('HAVE_ICU', 0) == 1,
'Font callbacks (the more the merrier)':
{'FreeType': conf.get('HAVE_FREETYPE', 0) == 1,
'Dependencies used for command-line utilities':
{'Cairo': conf.get('HAVE_CAIRO', 0) == 1,
'Chafa': conf.get('HAVE_CHAFA', 0) == 1,
'Additional shapers':
{'Graphite2': conf.get('HAVE_GRAPHITE2', 0) == 1,
'Platform shapers (not normally needed)':
{'CoreText': conf.get('HAVE_CORETEXT', 0) == 1,
'DirectWrite': conf.get('HAVE_DIRECTWRITE', 0) == 1,
'GDI/Uniscribe': (conf.get('HAVE_GDI', 0) == 1) and (conf.get('HAVE_UNISCRIBE', 0) == 1),
'Other features':
{'Documentation': conf.get('HAVE_GTK_DOC', 0) == 1,
'GObject bindings': conf.get('HAVE_GOBJECT', 0) == 1,
'Introspection': conf.get('HAVE_INTROSPECTION', 0) == 1,
'Experimental APIs': conf.get('HB_EXPERIMENTAL_API', 0) == 1,
{'Tests': get_option('tests').enabled(),
'Benchmark': get_option('benchmark').enabled(),
if meson.version().version_compare('>=0.53')
foreach section_title, section : build_summary
summary(section, bool_yn: true, section: section_title)
summary = ['']
foreach section_title, section : build_summary
summary += ' @0@:'.format(section_title)
foreach feature, value : section
summary += ' @0@:'.format(feature)
summary += ' @0@'.format(value)
summary += ''
