mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-11-27 23:02:20 +00:00
5c1255c280
Following the fix for bug 1659674, `BulkWrite()` now takes three parameters. Differential Revision: https://phabricator.services.mozilla.com/D138080
1111 lines
45 KiB
ReStructuredText
1111 lines
45 KiB
ReStructuredText
String Guide
|
||
============
|
||
|
||
Most of the Mozilla code uses a C++ class hierarchy to pass string data,
|
||
rather than using raw pointers. This guide documents the string classes which
|
||
are visible to code within the Mozilla codebase (code which is linked into
|
||
``libxul``).
|
||
|
||
Introduction
|
||
------------
|
||
|
||
The string classes are a library of C++ classes which are used to manage
|
||
buffers of wide (16-bit) and narrow (8-bit) character strings. The headers
|
||
and implementation are in the `xpcom/string
|
||
<https://searchfox.org/mozilla-central/source/xpcom/string>`_ directory. All
|
||
strings are stored as a single contiguous buffer of characters.
|
||
|
||
The 8-bit and 16-bit string classes have completely separate base classes,
|
||
but share the same APIs. As a result, you cannot assign a 8-bit string to a
|
||
16-bit string without some kind of conversion helper class or routine. For
|
||
the purpose of this document, we will refer to the 16-bit string classes in
|
||
class documentation. Every 16-bit class has an equivalent 8-bit class:
|
||
|
||
===================== ======================
|
||
Wide Narrow
|
||
===================== ======================
|
||
``nsAString`` ``nsACString``
|
||
``nsString`` ``nsCString``
|
||
``nsAutoString`` ``nsAutoCString``
|
||
``nsDependentString`` ``nsDependentCString``
|
||
===================== ======================
|
||
|
||
The string classes distinguish, as part of the type hierarchy, between
|
||
strings that must have a null-terminator at the end of their buffer
|
||
(``ns[C]String``) and strings that are not required to have a null-terminator
|
||
(``nsA[C]String``). nsA[C]String is the base of the string classes (since it
|
||
imposes fewer requirements) and ``ns[C]String`` is a class derived from it.
|
||
Functions taking strings as parameters should generally take one of these
|
||
four types.
|
||
|
||
In order to avoid unnecessary copying of string data (which can have
|
||
significant performance cost), the string classes support different ownership
|
||
models. All string classes support the following three ownership models
|
||
dynamically:
|
||
|
||
* reference counted, copy-on-write, buffers (the default)
|
||
|
||
* adopted buffers (a buffer that the string class owns, but is not reference
|
||
counted, because it came from somewhere else)
|
||
|
||
* dependent buffers, that is, an underlying buffer that the string class does
|
||
not own, but that the caller that constructed the string guarantees will
|
||
outlive the string instance
|
||
|
||
Auto strings will prefer reference counting an existing reference-counted
|
||
buffer over their stack buffer, but will otherwise use their stack buffer for
|
||
anything that will fit in it.
|
||
|
||
There are a number of additional string classes:
|
||
|
||
|
||
* Classes which exist primarily as constructors for the other types,
|
||
particularly ``nsDependent[C]String`` and ``nsDependent[C]Substring``. These
|
||
types are really just convenient notation for constructing an
|
||
``nsA[C]String`` with a non-default ownership mode; they should not be
|
||
thought of as different types.
|
||
|
||
* ``nsLiteral[C]String`` which should rarely be constructed explicitly but
|
||
usually through the ``""_ns`` and ``u""_ns`` user-defined string literals.
|
||
``nsLiteral[C]String`` is trivially constructible and destructible, and
|
||
therefore does not emit construction/destruction code when stored in statics,
|
||
as opposed to the other string classes.
|
||
|
||
The Major String Classes
|
||
------------------------
|
||
|
||
The list below describes the main base classes. Once you are familiar with
|
||
them, see the appendix describing What Class to Use When.
|
||
|
||
|
||
* **nsAString**/**nsACString**: the abstract base class for all strings. It
|
||
provides an API for assignment, individual character access, basic
|
||
manipulation of characters in the string, and string comparison. This class
|
||
corresponds to the XPIDL ``AString`` or ``ACString`` parameter types.
|
||
``nsA[C]String`` is not necessarily null-terminated.
|
||
|
||
* **nsString**/**nsCString**: builds on ``nsA[C]String`` by guaranteeing a
|
||
null-terminated storage. This allows for a method (``.get()``) to access the
|
||
underlying character buffer.
|
||
|
||
The remainder of the string classes inherit from either ``nsA[C]String`` or
|
||
``ns[C]String``. Thus, every string class is compatible with ``nsA[C]String``.
|
||
|
||
.. note::
|
||
|
||
In code which is generic over string width, ``nsA[C]String`` is sometimes
|
||
known as ``nsTSubstring<CharT>``. ``nsAString`` is a type alias for
|
||
``nsTSubstring<char16_t>``, and ``nsACString`` is a type alias for
|
||
``nsTSubstring<char>``.
|
||
|
||
.. note::
|
||
|
||
The type ``nsLiteral[C]String`` technically does not inherit from
|
||
``nsA[C]String``, but instead inherits from ``nsStringRepr<CharT>``. This
|
||
allows the type to not generate destructors when stored in static
|
||
storage.
|
||
|
||
It can be implicitly coerced to ``const ns[C]String&`` (though can never
|
||
be accessed mutably) and generally acts as-if it was a subclass of
|
||
``ns[C]String`` in most cases.
|
||
|
||
Since every string derives from ``nsAString`` (or ``nsACString``), they all
|
||
share a simple API. Common read-only methods include:
|
||
|
||
* ``.Length()`` - the number of code units (bytes for 8-bit string classes and ``char16_t`` for 16-bit string classes) in the string.
|
||
* ``.IsEmpty()`` - the fastest way of determining if the string has any value. Use this instead of testing ``string.Length() == 0``
|
||
* ``.Equals(string)`` - ``true`` if the given string has the same value as the current string. Approximately the same as ``operator==``.
|
||
|
||
Common methods that modify the string:
|
||
|
||
* ``.Assign(string)`` - Assigns a new value to the string. Approximately the same as ``operator=``.
|
||
* ``.Append(string)`` - Appends a value to the string.
|
||
* ``.Insert(string, position)`` - Inserts the given string before the code unit at position.
|
||
* ``.Truncate(length)`` - shortens the string to the given length.
|
||
|
||
More complete documentation can be found in the `Class Reference`_.
|
||
|
||
As function parameters
|
||
~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
In general, use ``nsA[C]String`` references to pass strings across modules. For example:
|
||
|
||
.. code-block:: c++
|
||
|
||
// when passing a string to a method, use const nsAString&
|
||
nsFoo::PrintString(const nsAString& str);
|
||
|
||
// when getting a string from a method, use nsAString&
|
||
nsFoo::GetString(nsAString& result);
|
||
|
||
The Concrete Classes - which classes to use when
|
||
------------------------------------------------
|
||
|
||
The concrete classes are for use in code that actually needs to store string
|
||
data. The most common uses of the concrete classes are as local variables,
|
||
and members in classes or structs.
|
||
|
||
.. digraph:: concreteclasses
|
||
|
||
node [shape=rectangle]
|
||
|
||
"nsA[C]String" -> "ns[C]String";
|
||
"ns[C]String" -> "nsDependent[C]String";
|
||
"nsA[C]String" -> "nsDependent[C]Substring";
|
||
"nsA[C]String" -> "ns[C]SubstringTuple";
|
||
"ns[C]String" -> "nsAuto[C]StringN";
|
||
"ns[C]String" -> "nsLiteral[C]String" [style=dashed];
|
||
"nsAuto[C]StringN" -> "nsPromiseFlat[C]String";
|
||
"nsAuto[C]StringN" -> "nsPrintfCString";
|
||
|
||
The following is a list of the most common concrete classes. Once you are
|
||
familiar with them, see the appendix describing What Class to Use When.
|
||
|
||
* ``ns[C]String`` - a null-terminated string whose buffer is allocated on the
|
||
heap. Destroys its buffer when the string object goes away.
|
||
|
||
* ``nsAuto[C]String`` - derived from ``nsString``, a string which owns a 64
|
||
code unit buffer in the same storage space as the string itself. If a string
|
||
less than 64 code units is assigned to an ``nsAutoString``, then no extra
|
||
storage will be allocated. For larger strings, a new buffer is allocated on
|
||
the heap.
|
||
|
||
If you want a number other than 64, use the templated types ``nsAutoStringN``
|
||
/ ``nsAutoCStringN``. (``nsAutoString`` and ``nsAutoCString`` are just
|
||
typedefs for ``nsAutoStringN<64>`` and ``nsAutoCStringN<64>``, respectively.)
|
||
|
||
* ``nsDependent[C]String`` - derived from ``nsString``, this string does not
|
||
own its buffer. It is useful for converting a raw string pointer (``const
|
||
char16_t*`` or ``const char*``) into a class of type ``nsAString``. Note that
|
||
you must null-terminate buffers used by to ``nsDependentString``. If you
|
||
don't want to or can't null-terminate the buffer, use
|
||
``nsDependentSubstring``.
|
||
|
||
* ``nsPrintfCString`` - derived from ``nsCString``, this string behaves like an
|
||
``nsAutoCString``. The constructor takes parameters which allows it to
|
||
construct a 8-bit string from a printf-style format string and parameter
|
||
list.
|
||
|
||
There are also a number of concrete classes that are created as a side-effect
|
||
of helper routines, etc. You should avoid direct use of these classes. Let
|
||
the string library create the class for you.
|
||
|
||
* ``ns[C]SubstringTuple`` - created via string concatenation
|
||
* ``nsDependent[C]Substring`` - created through ``Substring()``
|
||
* ``nsPromiseFlat[C]String`` - created through ``PromiseFlatString()``
|
||
* ``nsLiteral[C]String`` - created through the ``""_ns`` and ``u""_ns`` user-defined literals
|
||
|
||
Of course, there are times when it is necessary to reference these string
|
||
classes in your code, but as a general rule they should be avoided.
|
||
|
||
Iterators
|
||
---------
|
||
|
||
Because Mozilla strings are always a single buffer, iteration over the
|
||
characters in the string is done using raw pointers:
|
||
|
||
.. code-block:: c++
|
||
|
||
/**
|
||
* Find whether there is a tab character in `data`
|
||
*/
|
||
bool HasTab(const nsAString& data) {
|
||
const char16_t* cur = data.BeginReading();
|
||
const char16_t* end = data.EndReading();
|
||
|
||
for (; cur < end; ++cur) {
|
||
if (char16_t('\t') == *cur) {
|
||
return true;
|
||
}
|
||
}
|
||
return false;
|
||
}
|
||
|
||
Note that ``end`` points to the character after the end of the string buffer.
|
||
It should never be dereferenced.
|
||
|
||
Writing to a mutable string is also simple:
|
||
|
||
.. code-block:: c++
|
||
|
||
/**
|
||
* Replace every tab character in `data` with a space.
|
||
*/
|
||
void ReplaceTabs(nsAString& data) {
|
||
char16_t* cur = data.BeginWriting();
|
||
char16_t* end = data.EndWriting();
|
||
|
||
for (; cur < end; ++cur) {
|
||
if (char16_t('\t') == *cur) {
|
||
*cur = char16_t(' ');
|
||
}
|
||
}
|
||
}
|
||
|
||
You may change the length of a string via ``SetLength()``. Note that
|
||
Iterators become invalid after changing the length of a string. If a string
|
||
buffer becomes smaller while writing it, use ``SetLength`` to inform the
|
||
string class of the new size:
|
||
|
||
.. code-block:: c++
|
||
|
||
/**
|
||
* Remove every tab character from `data`
|
||
*/
|
||
void RemoveTabs(nsAString& data) {
|
||
int len = data.Length();
|
||
char16_t* cur = data.BeginWriting();
|
||
char16_t* end = data.EndWriting();
|
||
|
||
while (cur < end) {
|
||
if (char16_t('\t') == *cur) {
|
||
len -= 1;
|
||
end -= 1;
|
||
if (cur < end)
|
||
memmove(cur, cur + 1, (end - cur) * sizeof(char16_t));
|
||
} else {
|
||
cur += 1;
|
||
}
|
||
}
|
||
|
||
data.SetLength(len);
|
||
}
|
||
|
||
Note that using ``BeginWriting()`` to make a string longer is not OK.
|
||
``BeginWriting()`` must not be used to write past the logical length of the
|
||
string indicated by ``EndWriting()`` or ``Length()``. Calling
|
||
``SetCapacity()`` before ``BeginWriting()`` does not affect what the previous
|
||
sentence says. To make the string longer, call ``SetLength()`` before
|
||
``BeginWriting()`` or use the ``BulkWrite()`` API described below.
|
||
|
||
Bulk Write
|
||
----------
|
||
|
||
``BulkWrite()`` allows capacity-aware cache-friendly low-level writes to the
|
||
string's buffer.
|
||
|
||
Capacity-aware means that the caller is made aware of how the
|
||
caller-requested buffer capacity was rounded up to mozjemalloc buckets. This
|
||
is useful when initially requesting best-case buffer size without yet knowing
|
||
the true size need. If the data that actually needs to be written is larger
|
||
than the best-case estimate but still fits within the rounded-up capacity,
|
||
there is no need to reallocate despite requesting the best-case capacity.
|
||
|
||
Cache-friendly means that the zero terminator for C compatibility is written
|
||
after the new content of the string has been written, so the result is a
|
||
forward-only linear write access pattern instead of a non-linear
|
||
back-and-forth sequence resulting from using ``SetLength()`` followed by
|
||
``BeginWriting()``.
|
||
|
||
Low-level means that writing via a raw pointer is possible as with
|
||
``BeginWriting()``.
|
||
|
||
``BulkWrite()`` takes three arguments: The new capacity (which may be rounded
|
||
up), the number of code units at the beginning of the string to preserve
|
||
(typically the old logical length), and a boolean indicating whether
|
||
reallocating a smaller buffer is OK if the requested capacity would fit in a
|
||
buffer that's smaller than current one. It returns a ``mozilla::Result`` which
|
||
contains either a usable ``mozilla::BulkWriteHandle<T>`` (where ``T`` is the
|
||
string's ``char_type``) or an ``nsresult`` explaining why none can be had
|
||
(presumably OOM).
|
||
|
||
The actual writes are performed through the returned
|
||
``mozilla::BulkWriteHandle<T>``. You must not access the string except via this
|
||
handle until you call ``Finish()`` on the handle in the success case or you let
|
||
the handle go out of scope without calling ``Finish()`` in the failure case, in
|
||
which case the destructor of the handle puts the string in a mostly harmless but
|
||
consistent state (containing a single REPLACEMENT CHARACTER if a capacity
|
||
greater than 0 was requested, or in the ``char`` case if the three-byte UTF-8
|
||
representation of the REPLACEMENT CHARACTER doesn't fit, an ASCII SUBSTITUTE).
|
||
|
||
``mozilla::BulkWriteHandle<T>`` autoconverts to a writable
|
||
``mozilla::Span<T>`` and also provides explicit access to itself as ``Span``
|
||
(``AsSpan()``) or via component accessors named consistently with those on
|
||
``Span``: ``Elements()`` and ``Length()``. (The latter is not the logical
|
||
length of the string but the writable length of the buffer.) The buffer
|
||
exposed via these methods includes the prefix that you may have requested to
|
||
be preserved. It's up to you to skip past it so as to not overwrite it.
|
||
|
||
If there's a need to request a different capacity before you are ready to
|
||
call ``Finish()``, you can call ``RestartBulkWrite()`` on the handle. It
|
||
takes three arguments that match the first three arguments of
|
||
``BulkWrite()``. It returns ``mozilla::Result<mozilla::Ok, nsresult>`` to
|
||
indicate success or OOM. Calling ``RestartBulkWrite()`` invalidates
|
||
previously-obtained span, raw pointer or length.
|
||
|
||
Once you are done writing, call ``Finish()``. It takes two arguments: the new
|
||
logical length of the string (which must not exceed the capacity retuned by
|
||
the ``Length()`` method of the handle) and a boolean indicating whether it's
|
||
OK to attempt to reallocate a smaller buffer in case a smaller mozjemalloc
|
||
bucket could accommodate the new logical length.
|
||
|
||
Helper Classes and Functions
|
||
----------------------------
|
||
|
||
Converting Cocoa strings
|
||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Use ``mozilla::CopyCocoaStringToXPCOMString()`` in
|
||
``mozilla/MacStringHelpers.h`` to convert Cocoa strings to XPCOM strings.
|
||
|
||
Searching strings - looking for substrings, characters, etc.
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The ``nsReadableUtils.h`` header provides helper methods for searching in runnables.
|
||
|
||
.. code-block:: c++
|
||
|
||
bool FindInReadable(const nsAString& pattern,
|
||
nsAString::const_iterator start, nsAString::const_iterator end,
|
||
nsStringComparator& aComparator = nsDefaultStringComparator());
|
||
|
||
To use this, ``start`` and ``end`` should point to the beginning and end of a
|
||
string that you would like to search. If the search string is found,
|
||
``start`` and ``end`` will be adjusted to point to the beginning and end of
|
||
the found pattern. The return value is ``true`` or ``false``, indicating
|
||
whether or not the string was found.
|
||
|
||
An example:
|
||
|
||
.. code-block:: c++
|
||
|
||
const nsAString& str = GetSomeString();
|
||
nsAString::const_iterator start, end;
|
||
|
||
str.BeginReading(start);
|
||
str.EndReading(end);
|
||
|
||
constexpr auto valuePrefix = u"value="_ns;
|
||
|
||
if (FindInReadable(valuePrefix, start, end)) {
|
||
// end now points to the character after the pattern
|
||
valueStart = end;
|
||
}
|
||
|
||
Checking for Memory Allocation failure
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Like other types in Gecko, the string classes use infallible memory
|
||
allocation by default, so you do not need to check for success when
|
||
allocating/resizing "normal" strings.
|
||
|
||
Most functions that modify strings (``Assign()``, ``SetLength()``, etc.) also
|
||
have an overload that takes a ``mozilla::fallible_t`` parameter. These
|
||
overloads return ``false`` instead of aborting if allocation fails. Use them
|
||
when creating/allocating strings which may be very large, and which the
|
||
program could recover from if the allocation fails.
|
||
|
||
Substrings (string fragments)
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
It is very simple to refer to a substring of an existing string without
|
||
actually allocating new space and copying the characters into that substring.
|
||
``Substring()`` is the preferred method to create a reference to such a
|
||
string.
|
||
|
||
.. code-block:: c++
|
||
|
||
void ProcessString(const nsAString& str) {
|
||
const nsAString& firstFive = Substring(str, 0, 5); // from index 0, length 5
|
||
// firstFive is now a string representing the first 5 characters
|
||
}
|
||
|
||
Unicode Conversion
|
||
------------------
|
||
|
||
Strings can be stored in two basic formats: 8-bit code unit (byte/``char``)
|
||
strings, or 16-bit code unit (``char16_t``) strings. Any string class with a
|
||
capital "C" in the classname contains 8-bit bytes. These classes include
|
||
``nsCString``, ``nsDependentCString``, and so forth. Any string class without
|
||
the "C" contains 16-bit code units.
|
||
|
||
A 8-bit string can be in one of many character encodings while a 16-bit
|
||
string is always in potentially-invalid UTF-16. (You can make a 16-bit string
|
||
guaranteed-valid UTF-16 by passing it to ``EnsureUTF16Validity()``.) The most
|
||
common encodings are:
|
||
|
||
|
||
* ASCII - 7-bit encoding for basic English-only strings. Each ASCII value
|
||
is stored in exactly one byte in the array with the most-significant 8th bit
|
||
set to zero.
|
||
|
||
* `UCS2 <http://www.unicode.org/glossary/#UCS_2>`_ - 16-bit encoding for a
|
||
subset of Unicode, `BMP <http://www.unicode.org/glossary/#BMP>`_. The Unicode
|
||
value of a character stored in UCS2 is stored in exactly one 16-bit
|
||
``char16_t`` in a string class.
|
||
|
||
* `UTF-8 <http://www.faqs.org/rfcs/rfc3629.html>`_ - 8-bit encoding for
|
||
Unicode characters. Each Unicode characters is stored in up to 4 bytes in a
|
||
string class. UTF-8 is capable of representing the entire Unicode character
|
||
repertoire, and it efficiently maps to `UTF-32
|
||
<http://www.unicode.org/glossary/#UTF_32>`_. (Gtk and Rust natively use
|
||
UTF-8.)
|
||
|
||
* `UTF-16 <http://www.unicode.org/glossary/#UTF_16>`_ - 16-bit encoding for
|
||
Unicode storage, backwards compatible with UCS2. The Unicode value of a
|
||
character stored in UTF-16 may require one or two 16-bit ``char16_t`` in a
|
||
string class. The contents of ``nsAString`` always has to be regarded as in
|
||
this encoding instead of UCS2. UTF-16 is capable of representing the entire
|
||
Unicode character repertoire, and it efficiently maps to UTF-32. (Win32 W
|
||
APIs and Mac OS X natively use UTF-16.)
|
||
|
||
* Latin1 - 8-bit encoding for the first 256 Unicode code points. Used for
|
||
HTTP headers and for size-optimized storage in text node and SpiderMonkey
|
||
strings. Latin1 converts to UTF-16 by zero-extending each byte to a 16-bit
|
||
code unit. Note that this kind of "Latin1" is not available for encoding
|
||
HTML, CSS, JS, etc. Specifying ``charset=latin1`` means the same as
|
||
``charset=windows-1252``. Windows-1252 is a similar but different encoding
|
||
used for interchange.
|
||
|
||
In addition, there exist multiple other (legacy) encodings. The Web-relevant
|
||
ones are defined in the `Encoding Standard <https://encoding.spec.whatwg.org/>`_.
|
||
Conversions from these encodings to
|
||
UTF-8 and UTF-16 are provided by `mozilla::Encoding
|
||
<https://searchfox.org/mozilla-central/source/intl/Encoding.h#109>`_.
|
||
Additonally, on Windows the are some rare cases (e.g. drag&drop) where it's
|
||
necessary to call a system API with data encoded in the Windows
|
||
locale-dependent legacy encoding instead of UTF-16. In those rare cases, use
|
||
``MultiByteToWideChar``/``WideCharToMultiByte`` from kernel32.dll. Do not use
|
||
``iconv`` on *nix. We only support UTF-8-encoded file paths on *nix, non-path
|
||
Gtk strings are always UTF-8 and Cocoa and Java strings are always UTF-16.
|
||
|
||
When working with existing code, it is important to examine the current usage
|
||
of the strings that you are manipulating, to determine the correct conversion
|
||
mechanism.
|
||
|
||
When writing new code, it can be confusing to know which storage class and
|
||
encoding is the most appropriate. There is no single answer to this question,
|
||
but the important points are:
|
||
|
||
|
||
* **Surprisingly many strings are very often just ASCII.** ASCII is a subset of
|
||
UTF-8 and is, therefore, efficient to represent as UTF-8. Representing ASCII
|
||
as UTF-16 bad both for memory usage and cache locality.
|
||
|
||
* **Rust strongly prefers UTF-8.** If your C++ code is interacting with Rust
|
||
code, using UTF-8 in ``nsACString`` and merely validating it when converting
|
||
to Rust strings is more efficient than using ``nsAString`` on the C++ side.
|
||
|
||
* **Networking code prefers 8-bit strings.** Networking code tends to use 8-bit
|
||
strings: either with UTF-8 or Latin1 (byte value is the Unicode scalar value)
|
||
semantics.
|
||
|
||
* **JS and DOM prefer UTF-16.** Most Gecko code uses UTF-16 for compatibility
|
||
with JS strings and DOM string which are potentially-invalid UTF-16. However,
|
||
both DOM text nodes and JS strings store strings that only contain code points
|
||
below U+0100 as Latin1 (byte value is the Unicode scalar value).
|
||
|
||
* **Windows and Cocoa use UTF-16.** Windows system APIs take UTF-16. Cocoa
|
||
``NSString`` is UTF-16.
|
||
|
||
* **Gtk uses UTF-8.** Gtk APIs take UTF-8 for non-file paths. In the Gecko
|
||
case, we support only UTF-8 file paths outside Windows, so all Gtk strings
|
||
are UTF-8 for our purposes though file paths received from Gtk may not be
|
||
valid UTF-8.
|
||
|
||
To assist with ASCII, Latin1, UTF-8, and UTF-16 conversions, there are some
|
||
helper methods and classes. Some of these classes look like functions,
|
||
because they are most often used as temporary objects on the stack.
|
||
|
||
Short zero-terminated ASCII strings
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
If you have a short zero-terminated string that you are certain is always
|
||
ASCII, use these special-case methods instead of the conversions described in
|
||
the later sections.
|
||
|
||
* If you are assigning an ASCII literal to an ``nsACString``, use
|
||
``AssignLiteral()``.
|
||
* If you are assigning a literal to an ``nsAString``, use ``AssignLiteral()``
|
||
and make the literal a ``u""`` literal. If the literal has to be a ``""``
|
||
literal (as opposed to ``u""``) and is ASCII, still use ``AppendLiteral()``,
|
||
but be aware that this involves a run-time inflation.
|
||
* If you are assigning a zero-terminated ASCII string that's not a literal from
|
||
the compiler's point of view at the call site and you don't know the length
|
||
of the string either (e.g. because it was looked up from an array of literals
|
||
of varying lengths), use ``AssignASCII()``.
|
||
|
||
UTF-8 / UTF-16 conversion
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
.. cpp:function:: NS_ConvertUTF8toUTF16(const nsACString&)
|
||
|
||
a ``nsAutoString`` subclass that converts a UTF-8 encoded ``nsACString``
|
||
or ``const char*`` to a 16-bit UTF-16 string. If you need a ``const
|
||
char16_t*`` buffer, you can use the ``.get()`` method. For example:
|
||
|
||
.. code-block:: c++
|
||
|
||
/* signature: void HandleUnicodeString(const nsAString& str); */
|
||
object->HandleUnicodeString(NS_ConvertUTF8toUTF16(utf8String));
|
||
|
||
/* signature: void HandleUnicodeBuffer(const char16_t* str); */
|
||
object->HandleUnicodeBuffer(NS_ConvertUTF8toUTF16(utf8String).get());
|
||
|
||
.. cpp:function:: NS_ConvertUTF16toUTF8(const nsAString&)
|
||
|
||
a ``nsAutoCString`` which converts a 16-bit UTF-16 string (``nsAString``)
|
||
to a UTF-8 encoded string. As above, you can use ``.get()`` to access a
|
||
``const char*`` buffer.
|
||
|
||
.. code-block:: c++
|
||
|
||
/* signature: void HandleUTF8String(const nsACString& str); */
|
||
object->HandleUTF8String(NS_ConvertUTF16toUTF8(utf16String));
|
||
|
||
/* signature: void HandleUTF8Buffer(const char* str); */
|
||
object->HandleUTF8Buffer(NS_ConvertUTF16toUTF8(utf16String).get());
|
||
|
||
.. cpp:function:: CopyUTF8toUTF16(const nsACString&, nsAString&)
|
||
|
||
converts and copies:
|
||
|
||
.. code-block:: c++
|
||
|
||
// return a UTF-16 value
|
||
void Foo::GetUnicodeValue(nsAString& result) {
|
||
CopyUTF8toUTF16(mLocalUTF8Value, result);
|
||
}
|
||
|
||
.. cpp:function:: AppendUTF8toUTF16(const nsACString&, nsAString&)
|
||
|
||
converts and appends:
|
||
|
||
.. code-block:: c++
|
||
|
||
// return a UTF-16 value
|
||
void Foo::GetUnicodeValue(nsAString& result) {
|
||
result.AssignLiteral("prefix:");
|
||
AppendUTF8toUTF16(mLocalUTF8Value, result);
|
||
}
|
||
|
||
.. cpp:function:: CopyUTF16toUTF8(const nsAString&, nsACString&)
|
||
|
||
converts and copies:
|
||
|
||
.. code-block:: c++
|
||
|
||
// return a UTF-8 value
|
||
void Foo::GetUTF8Value(nsACString& result) {
|
||
CopyUTF16toUTF8(mLocalUTF16Value, result);
|
||
}
|
||
|
||
.. cpp:function:: AppendUTF16toUTF8(const nsAString&, nsACString&)
|
||
|
||
converts and appends:
|
||
|
||
.. code-block:: c++
|
||
|
||
// return a UTF-8 value
|
||
void Foo::GetUnicodeValue(nsACString& result) {
|
||
result.AssignLiteral("prefix:");
|
||
AppendUTF16toUTF8(mLocalUTF16Value, result);
|
||
}
|
||
|
||
|
||
Latin1 / UTF-16 Conversion
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The following should only be used when you can guarantee that the original
|
||
string is ASCII or Latin1 (in the sense that the byte value is the Unicode
|
||
scalar value; not in the windows-1252 sense). These helpers are very similar
|
||
to the UTF-8 / UTF-16 conversion helpers above.
|
||
|
||
|
||
UTF-16 to Latin1 converters
|
||
```````````````````````````
|
||
|
||
These converters are **very dangerous** because they **lose information**
|
||
during the conversion process. You should **avoid UTF-16 to Latin1
|
||
conversions** unless your strings are guaranteed to be Latin1 or ASCII. (In
|
||
the future, these conversions may start asserting in debug builds that their
|
||
input is in the permissible range.) If the input is actually in the Latin1
|
||
range, each 16-bit code unit in narrowed to an 8-bit byte by removing the
|
||
high half. Unicode code points above U+00FF result in garbage whose nature
|
||
must not be relied upon. (In the future the nature of the garbage will be CPU
|
||
architecture-dependent.) If you want to ``printf()`` something and don't care
|
||
what happens to non-ASCII, please convert to UTF-8 instead.
|
||
|
||
|
||
.. cpp:function:: NS_LossyConvertUTF16toASCII(const nsAString&)
|
||
|
||
A ``nsAutoCString`` which holds a temporary buffer containing the Latin1
|
||
value of the string.
|
||
|
||
.. cpp:function:: void LossyCopyUTF16toASCII(Span<const char16_t>, nsACString&)
|
||
|
||
Does an in-place conversion from UTF-16 into an Latin1 string object.
|
||
|
||
.. cpp:function:: void LossyAppendUTF16toASCII(Span<const char16_t>, nsACString&)
|
||
|
||
Appends a UTF-16 string to a Latin1 string.
|
||
|
||
Latin1 to UTF-16 converters
|
||
```````````````````````````
|
||
|
||
These converters are very dangerous because they will **produce wrong results
|
||
for non-ASCII UTF-8 or windows-1252 input** into a meaningless UTF-16 string.
|
||
You should **avoid ASCII to UTF-16 conversions** unless your strings are
|
||
guaranteed to be ASCII or Latin1 in the sense of the byte value being the
|
||
Unicode scalar value. Every byte is zero-extended into a 16-bit code unit.
|
||
|
||
It is correct to use these on most HTTP header values, but **it's always
|
||
wrong to use these on HTTP response bodies!** (Use ``mozilla::Encoding`` to
|
||
deal with response bodies.)
|
||
|
||
.. cpp:function:: NS_ConvertASCIItoUTF16(const nsACString&)
|
||
|
||
A ``nsAutoString`` which holds a temproary buffer contianing the value of
|
||
the Latin1 to UTF-16 conversion.
|
||
|
||
.. cpp:function:: void CopyASCIItoUTF16(Span<const char>, nsAString&)
|
||
|
||
does an in-place conversion from Latin1 to UTF-16.
|
||
|
||
.. cpp:function:: void AppendASCIItoUTF16(Span<const char>, nsAString&)
|
||
|
||
appends a Latin1 string to a UTF-16 string.
|
||
|
||
Comparing ns*Strings with C strings
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
You can compare ``ns*Strings`` with C strings by converting the ``ns*String``
|
||
to a C string, or by comparing directly against a C String.
|
||
|
||
.. cpp:function:: bool nsAString::EqualsASCII(const char*)
|
||
|
||
Compares with an ASCII C string.
|
||
|
||
.. cpp:function:: bool nsAString::EqualsLiteral(...)
|
||
|
||
Compares with a string literal.
|
||
|
||
Common Patterns
|
||
---------------
|
||
|
||
Literal Strings
|
||
~~~~~~~~~~~~~~~
|
||
|
||
A literal string is a raw string value that is written in some C++ code. For
|
||
example, in the statement ``printf("Hello World\n");`` the value ``"Hello
|
||
World\n"`` is a literal string. It is often necessary to insert literal
|
||
string values when an ``nsAString`` or ``nsACString`` is required. Two
|
||
user-defined literals are provided that implicitly convert to ``const
|
||
nsString&`` resp. ``const nsCString&``:
|
||
|
||
* ``""_ns`` for 8-bit literals, converting implicitly to ``const nsCString&``
|
||
* ``u""_ns`` for 16-bit literals, converting implicitly to ``const nsString&``
|
||
|
||
The benefits of the user-defined literals may seem unclear, given that
|
||
``nsDependentCString`` will also wrap a string value in an ``nsCString``. The
|
||
advantage of the user-defined literals is twofold.
|
||
|
||
* The length of these strings is calculated at compile time, so the string does
|
||
not need to be scanned at runtime to determine its length.
|
||
|
||
* Literal strings live for the lifetime of the binary, and can be moved between
|
||
the ``ns[C]String`` classes without being copied or freed.
|
||
|
||
Here are some examples of proper usage of the literals (both standard and
|
||
user-defined):
|
||
|
||
.. code-block:: c++
|
||
|
||
// call Init(const nsLiteralString&) - enforces that it's only called with literals
|
||
Init(u"start value"_ns);
|
||
|
||
// call Init(const nsAString&)
|
||
Init(u"start value"_ns);
|
||
|
||
// call Init(const nsACString&)
|
||
Init("start value"_ns);
|
||
|
||
In case a literal is defined via a macro, you can just convert it to
|
||
``nsLiteralString`` or ``nsLiteralCString`` using their constructor. You
|
||
could consider not using a macro at all but a named ``constexpr`` constant
|
||
instead.
|
||
|
||
In some cases, an 8-bit literal is defined via a macro, either within code or
|
||
from the environment, but it can't be changed or is used both as an 8-bit and
|
||
a 16-bit string. In these cases, you can use the
|
||
``NS_LITERAL_STRING_FROM_CSTRING`` macro to construct a ``nsLiteralString``
|
||
and do the conversion at compile-time.
|
||
|
||
String Concatenation
|
||
~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Strings can be concatenated together using the + operator. The resulting
|
||
string is a ``const nsSubstringTuple`` object. The resulting object can be
|
||
treated and referenced similarly to a ``nsAString`` object. Concatenation *does
|
||
not copy the substrings*. The strings are only copied when the concatenation
|
||
is assigned into another string object. The ``nsSubstringTuple`` object holds
|
||
pointers to the original strings. Therefore, the ``nsSubstringTuple`` object is
|
||
dependent on all of its substrings, meaning that their lifetime must be at
|
||
least as long as the ``nsSubstringTuple`` object.
|
||
|
||
For example, you can use the value of two strings and pass their
|
||
concatenation on to another function which takes an ``const nsAString&``:
|
||
|
||
.. code-block:: c++
|
||
|
||
void HandleTwoStrings(const nsAString& one, const nsAString& two) {
|
||
// call HandleString(const nsAString&)
|
||
HandleString(one + two);
|
||
}
|
||
|
||
NOTE: The two strings are implicitly combined into a temporary ``nsString``
|
||
in this case, and the temporary string is passed into ``HandleString``. If
|
||
``HandleString`` assigns its input into another ``nsString``, then the string
|
||
buffer will be shared in this case negating the cost of the intermediate
|
||
temporary. You can concatenate N strings and store the result in a temporary
|
||
variable:
|
||
|
||
.. code-block:: c++
|
||
|
||
constexpr auto start = u"start "_ns;
|
||
constexpr auto middle = u"middle "_ns;
|
||
constexpr auto end = u"end"_ns;
|
||
// create a string with 3 dependent fragments - no copying involved!
|
||
nsString combinedString = start + middle + end;
|
||
|
||
// call void HandleString(const nsAString&);
|
||
HandleString(combinedString);
|
||
|
||
It is safe to concatenate user-defined literals because the temporary
|
||
``nsLiteral[C]String`` objects will live as long as the temporary
|
||
concatenation object (of type ``nsSubstringTuple``).
|
||
|
||
.. code-block:: c++
|
||
|
||
// call HandlePage(const nsAString&);
|
||
// safe because the concatenated-string will live as long as its substrings
|
||
HandlePage(u"start "_ns + u"end"_ns);
|
||
|
||
Local Variables
|
||
~~~~~~~~~~~~~~~
|
||
|
||
Local variables within a function are usually stored on the stack. The
|
||
``nsAutoString``/``nsAutoCString`` classes are subclasses of the
|
||
``nsString``/``nsCString`` classes. They own a 64-character buffer allocated
|
||
in the same storage space as the string itself. If the ``nsAutoString`` is
|
||
allocated on the stack, then it has at its disposal a 64-character stack
|
||
buffer. This allows the implementation to avoid allocating extra memory when
|
||
dealing with small strings. ``nsAutoStringN``/``nsAutoCStringN`` are more
|
||
general alternatives that let you choose the number of characters in the
|
||
inline buffer.
|
||
|
||
.. code-block:: c++
|
||
|
||
...
|
||
nsAutoString value;
|
||
GetValue(value); // if the result is less than 64 code units,
|
||
// then this just saved us an allocation
|
||
...
|
||
|
||
Member Variables
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
In general, you should use the concrete classes ``nsString`` and
|
||
``nsCString`` for member variables.
|
||
|
||
.. code-block:: c++
|
||
|
||
class Foo {
|
||
...
|
||
// these store UTF-8 and UTF-16 values respectively
|
||
nsCString mLocalName;
|
||
nsString mTitle;
|
||
};
|
||
|
||
A common incorrect pattern is to use ``nsAutoString``/``nsAutoCString``
|
||
for member variables. As described in `Local Variables`_, these classes have
|
||
a built in buffer that make them very large. This means that if you include
|
||
them in a class, they bloat the class by 64 bytes (``nsAutoCString``) or 128
|
||
bytes (``nsAutoString``).
|
||
|
||
|
||
Raw Character Pointers
|
||
~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
``PromiseFlatString()`` and ``PromiseFlatCString()`` can be used to create a
|
||
temporary buffer which holds a null-terminated buffer containing the same
|
||
value as the source string. ``PromiseFlatString()`` will create a temporary
|
||
buffer if necessary. This is most often used in order to pass an
|
||
``nsAString`` to an API which requires a null-terminated string.
|
||
|
||
In the following example, an ``nsAString`` is combined with a literal string,
|
||
and the result is passed to an API which requires a simple character buffer.
|
||
|
||
.. code-block:: c++
|
||
|
||
// Modify the URL and pass to AddPage(const char16_t* url)
|
||
void AddModifiedPage(const nsAString& url) {
|
||
constexpr auto httpPrefix = u"http://"_ns;
|
||
const nsAString& modifiedURL = httpPrefix + url;
|
||
|
||
// creates a temporary buffer
|
||
AddPage(PromiseFlatString(modifiedURL).get());
|
||
}
|
||
|
||
``PromiseFlatString()`` is smart when handed a string that is already
|
||
null-terminated. It avoids creating the temporary buffer in such cases.
|
||
|
||
.. code-block:: c++
|
||
|
||
// Modify the URL and pass to AddPage(const char16_t* url)
|
||
void AddModifiedPage(const nsAString& url, PRBool addPrefix) {
|
||
if (addPrefix) {
|
||
// MUST create a temporary buffer - string is multi-fragmented
|
||
constexpr auto httpPrefix = u"http://"_ns;
|
||
AddPage(PromiseFlatString(httpPrefix + modifiedURL));
|
||
} else {
|
||
// MIGHT create a temporary buffer, does a runtime check
|
||
AddPage(PromiseFlatString(url).get());
|
||
}
|
||
}
|
||
|
||
.. note::
|
||
|
||
It is **not** possible to efficiently transfer ownership of a string
|
||
class' internal buffer into an owned ``char*`` which can be safely
|
||
freed by other components due to the COW optimization.
|
||
|
||
If working with a legacy API which requires malloced ``char*`` buffers,
|
||
prefer using ``ToNewUnicode``, ``ToNewCString`` or ``ToNewUTF8String``
|
||
over ``strdup`` to create owned ``char*`` pointers.
|
||
|
||
``printf`` and a UTF-16 string
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
For debugging, it's useful to ``printf`` a UTF-16 string (``nsString``,
|
||
``nsAutoString``, etc). To do this usually requires converting it to an 8-bit
|
||
string, because that's what ``printf`` expects. Use:
|
||
|
||
.. code-block:: c++
|
||
|
||
printf("%s\n", NS_ConvertUTF16toUTF8(yourString).get());
|
||
|
||
Sequence of appends without reallocating
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
``SetCapacity()`` allows you to give the string a hint of the future string
|
||
length caused by a sequence of appends (excluding appends that convert
|
||
between UTF-16 and UTF-8 in either direction) in order to avoid multiple
|
||
allocations during the sequence of appends. However, the other
|
||
allocation-avoidance features of XPCOM strings interact badly with
|
||
``SetCapacity()`` making it something of a footgun.
|
||
|
||
``SetCapacity()`` is appropriate to use before a sequence of multiple
|
||
operations from the following list (without operations that are not on the
|
||
list between the ``SetCapacity()`` call and operations from the list):
|
||
|
||
* ``Append()``
|
||
* ``AppendASCII()``
|
||
* ``AppendLiteral()``
|
||
* ``AppendPrintf()``
|
||
* ``AppendInt()``
|
||
* ``AppendFloat()``
|
||
* ``LossyAppendUTF16toASCII()``
|
||
* ``AppendASCIItoUTF16()``
|
||
|
||
**DO NOT** call ``SetCapacity()`` if the subsequent operations on the string
|
||
do not meet the criteria above. Operations that undo the benefits of
|
||
``SetCapacity()`` include but are not limited to:
|
||
|
||
* ``SetLength()``
|
||
* ``Truncate()``
|
||
* ``Assign()``
|
||
* ``AssignLiteral()``
|
||
* ``Adopt()``
|
||
* ``CopyASCIItoUTF16()``
|
||
* ``LossyCopyUTF16toASCII()``
|
||
* ``AppendUTF16toUTF8()``
|
||
* ``AppendUTF8toUTF16()``
|
||
* ``CopyUTF16toUTF8()``
|
||
* ``CopyUTF8toUTF16()``
|
||
|
||
If your string is an ``nsAuto[C]String`` and you are calling
|
||
``SetCapacity()`` with a constant ``N``, please instead declare the string as
|
||
``nsAuto[C]StringN<N+1>`` without calling ``SetCapacity()`` (while being
|
||
mindful of not using such a large ``N`` as to overflow the run-time stack).
|
||
|
||
There is no need to include room for the null terminator: it is the job of
|
||
the string class.
|
||
|
||
Note: Calling ``SetCapacity()`` does not give you permission to use the
|
||
pointer obtained from ``BeginWriting()`` to write past the current length (as
|
||
returned by ``Length()``) of the string. Please use either ``BulkWrite()`` or
|
||
``SetLength()`` instead.
|
||
|
||
.. _stringguide.xpidl:
|
||
|
||
XPIDL
|
||
-----
|
||
|
||
The string library is also available through IDL. By declaring attributes and
|
||
methods using the specially defined IDL types, string classes are used as
|
||
parameters to the corresponding methods.
|
||
|
||
XPIDL String types
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
The C++ signatures follow the abstract-type convention described above, such
|
||
that all method parameters are based on the abstract classes. The following
|
||
table describes the purpose of each string type in IDL.
|
||
|
||
+-----------------+----------------+----------------------------------------------------------------------------------+
|
||
| XPIDL Type | C++ Type | Purpose |
|
||
+=================+================+==================================================================================+
|
||
| ``string`` | ``char*`` | Raw character pointer to ASCII (7-bit) string, no string classes used. |
|
||
| | | |
|
||
| | | High bit is not guaranteed across XPConnect boundaries. |
|
||
+-----------------+----------------+----------------------------------------------------------------------------------+
|
||
| ``wstring`` | ``char16_t*`` | Raw character pointer to UTF-16 string, no string classes used. |
|
||
+-----------------+----------------+----------------------------------------------------------------------------------+
|
||
| ``AString`` | ``nsAString`` | UTF-16 string. |
|
||
+-----------------+----------------+----------------------------------------------------------------------------------+
|
||
| ``ACString`` | ``nsACString`` | 8-bit string. All bits are preserved across XPConnect boundaries. |
|
||
+-----------------+----------------+----------------------------------------------------------------------------------+
|
||
| ``AUTF8String`` | ``nsACString`` | UTF-8 string. |
|
||
| | | |
|
||
| | | Converted to UTF-16 as necessary when value is used across XPConnect boundaries. |
|
||
+-----------------+----------------+----------------------------------------------------------------------------------+
|
||
|
||
Callers should prefer using the string classes ``AString``, ``ACString`` and
|
||
``AUTF8String`` over the raw pointer types ``string`` and ``wstring`` in
|
||
almost all situations.
|
||
|
||
C++ Signatures
|
||
~~~~~~~~~~~~~~
|
||
|
||
In XPIDL, ``in`` parameters are read-only, and the C++ signatures for
|
||
``*String`` parameters follows the above guidelines by using ``const
|
||
nsAString&`` for these parameters. ``out`` and ``inout`` parameters are
|
||
defined simply as ``nsAString&`` so that the callee can write to them.
|
||
|
||
.. code-block::
|
||
|
||
interface nsIFoo : nsISupports {
|
||
attribute AString utf16String;
|
||
AUTF8String getValue(in ACString key);
|
||
};
|
||
|
||
.. code-block:: c++
|
||
|
||
class nsIFoo : public nsISupports {
|
||
NS_IMETHOD GetUtf16String(nsAString& aResult) = 0;
|
||
NS_IMETHOD SetUtf16String(const nsAString& aValue) = 0;
|
||
NS_IMETHOD GetValue(const nsACString& aKey, nsACString& aResult) = 0;
|
||
};
|
||
|
||
In the above example, ``utf16String`` is treated as a UTF-16 string. The
|
||
implementation of ``GetUtf16String()`` will use ``aResult.Assign`` to
|
||
"return" the value. In ``SetUtf16String()`` the value of the string can be
|
||
used through a variety of methods including `Iterators`_,
|
||
``PromiseFlatString``, and assignment to other strings.
|
||
|
||
In ``GetValue()``, the first parameter, ``aKey``, is treated as a raw
|
||
sequence of 8-bit values. Any non-ASCII characters in ``aKey`` will be
|
||
preserved when crossing XPConnect boundaries. The implementation of
|
||
``GetValue()`` will assign a UTF-8 encoded 8-bit string into ``aResult``. If
|
||
the this method is called across XPConnect boundaries, such as from a script,
|
||
then the result will be decoded from UTF-8 into UTF-16 and used as a Unicode
|
||
value.
|
||
|
||
String Guidelines
|
||
-----------------
|
||
|
||
Follow these simple rules in your code to keep your fellow developers,
|
||
reviewers, and users happy.
|
||
|
||
* Use the most abstract string class that you can. Usually this is:
|
||
* ``nsAString`` for function parameters
|
||
* ``nsString`` for member variables
|
||
* ``nsAutoString`` for local (stack-based) variables
|
||
* Use the ``""_ns`` and ``u""_ns`` user-defined literals to represent literal strings (e.g. ``"foo"_ns``) as nsAString-compatible objects.
|
||
* Use string concatenation (i.e. the "+" operator) when combining strings.
|
||
* Use ``nsDependentString`` when you have a raw character pointer that you need to convert to an nsAString-compatible string.
|
||
* Use ``Substring()`` to extract fragments of existing strings.
|
||
* Use `iterators`_ to parse and extract string fragments.
|
||
|
||
Class Reference
|
||
---------------
|
||
|
||
.. cpp:class:: template<T> nsTSubstring<T>
|
||
|
||
.. note::
|
||
|
||
The ``nsTSubstring<char_type>`` class is usually written as
|
||
``nsAString`` or ``nsACString``.
|
||
|
||
.. cpp:function:: size_type Length() const
|
||
|
||
.. cpp:function:: bool IsEmpty() const
|
||
|
||
.. cpp:function:: bool IsVoid() const
|
||
|
||
.. cpp:function:: const char_type* BeginReading() const
|
||
|
||
.. cpp:function:: const char_type* EndReading() const
|
||
|
||
.. cpp:function:: bool Equals(const self_type&, comparator_type = ...) const
|
||
|
||
.. cpp:function:: char_type First() const
|
||
|
||
.. cpp:function:: char_type Last() const
|
||
|
||
.. cpp:function:: size_type CountChar(char_type) const
|
||
|
||
.. cpp:function:: int32_t FindChar(char_type, index_type aOffset = 0) const
|
||
|
||
.. cpp:function:: void Assign(const self_type&)
|
||
|
||
.. cpp:function:: void Append(const self_type&)
|
||
|
||
.. cpp:function:: void Insert(const self_type&, index_type aPos)
|
||
|
||
.. cpp:function:: void Cut(index_type aCutStart, size_type aCutLength)
|
||
|
||
.. cpp:function:: void Replace(index_type aCutStart, size_type aCutLength, const self_type& aStr)
|
||
|
||
.. cpp:function:: void Truncate(size_type aLength)
|
||
|
||
.. cpp:function:: void SetIsVoid(bool)
|
||
|
||
Make it null. XPConnect and WebIDL will convert void nsAStrings to
|
||
JavaScript ``null``.
|
||
|
||
.. cpp:function:: char_type* BeginWriting()
|
||
|
||
.. cpp:function:: char_type* EndWriting()
|
||
|
||
.. cpp:function:: void SetCapacity(size_type)
|
||
|
||
Inform the string about buffer size need before a sequence of calls
|
||
to ``Append()`` or converting appends that convert between UTF-16 and
|
||
Latin1 in either direction. (Don't use if you use appends that
|
||
convert between UTF-16 and UTF-8 in either direction.) Calling this
|
||
method does not give you permission to use ``BeginWriting()`` to
|
||
write past the logical length of the string. Use ``SetLength()`` or
|
||
``BulkWrite()`` as appropriate.
|
||
|
||
.. cpp:function:: void SetLength(size_type)
|
||
|
||
.. cpp:function:: Result<BulkWriteHandle<char_type>, nsresult> BulkWrite(size_type aCapacity, size_type aPrefixToPreserve, bool aAllowShrinking)
|
||
|
||
|
||
Original Document Information
|
||
-----------------------------
|
||
|
||
This document was originally hosted on MDN as part of the XPCOM guide.
|
||
|
||
* Author: `Alec Flett <mailto:alecf@flett.org>`_
|
||
* Copyright Information: Portions of this content are © 1998–2007 by individual mozilla.org contributors; content available under a Creative Commons license.
|
||
* Thanks to David Baron for `actual docs <http://dbaron.org/mozilla/coding-practices>`_,
|
||
* Peter Annema for lots of direction
|
||
* Myk Melez for some more docs
|
||
* David Bradley for a diagram
|
||
* Revised by Darin Fisher for Mozilla 1.7
|
||
* Revised by Jungshik Shin to clarify character encoding issues
|
||
* Migrated to in-tree documentation by Nika Layzell
|