mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-12-05 12:05:22 +00:00
4358844234
(not part of build, just a doc)
2493 lines
82 KiB
HTML
2493 lines
82 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>an incomplete guide to mozilla/string</title>
|
|
|
|
<link rel="stylesheet" href="http://www.mozilla.org/projects/string/string-guide.css" title="remote stylesheet" type="text/css">
|
|
<link rel="alternate stylesheet" href="string-guide.css" title="local stylesheet" type="text/css">
|
|
</head>
|
|
<body>
|
|
<!-- ----|---------|---------|---------|---------|---------|---------|---------| -->
|
|
<!-- ...............................................................Front Matter -->
|
|
<h1>an incomplete guide to <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/string/">mozilla/string</a></h1>
|
|
<h1><font color="red">This document is now deprecated in favor of <a href="http://www.mozilla.org/projects/xpcom/string-guide.html">The new string guide</a>.</font></h1>
|
|
<div class="author-note">
|
|
<p>by <a href="http://ScottCollins.net/">Scott Collins</a><!-- /p -->
|
|
<p>last modified 8 April 2001<!-- /p -->
|
|
</div>
|
|
|
|
<div class="abstract">
|
|
<p>
|
|
<h1>Abstract</h1>
|
|
This document <span class="LXRSHORTDESC">provides
|
|
an <a href="#users_guide">introduction</a> to the design and use of the string classes in mozilla,
|
|
<a href="#implementors_guide">detailed information</a> on their implementation and how one may extend them,
|
|
and <a href="#faq">answers</a> to frequently asked questions about strings</span>.
|
|
</p>
|
|
</div>
|
|
|
|
|
|
|
|
<h2><a name="contents">contents</a></h2>
|
|
|
|
<div class="contents">
|
|
<ul>
|
|
<li><a href="#users_guide" >user's guide</a></li>
|
|
<li><a href="#implementors_guide">implementor's guide</a></li>
|
|
<li><a href="#faq" >frequently asked questions</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<p>
|
|
Please direct all comments, requests, and contributions to,
|
|
in order of preference,
|
|
the tracking bug <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=70076">#70076</a> for this document,
|
|
the author <a class="exact-uri" href="mailto:scc@mozilla.org?subject=string-guide">scc@mozilla.org</a>, and/or
|
|
the newsgroup <a class="exact-uri" href="news:netscape.public.mozilla.xpcom">news:netscape.public.mozilla.xpcom</a>
|
|
(should there be a strings newsgroup?)
|
|
</p>
|
|
|
|
<div class="author-note">
|
|
<p>
|
|
A note to potential editors:
|
|
don't even <strong>consider</strong> modifying this document with an HTML editor.
|
|
That would destroy the internal formatting,
|
|
and make patches unmanagable.
|
|
</p>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
<!-- ...............................................................User's Guide -->
|
|
<hr>
|
|
<h1><a name="users_guide">user's guide</a></h1>
|
|
|
|
<div class="author-note">
|
|
<p>
|
|
Strings in mozilla are a world apart from <span class="code">char*</span>s.
|
|
If you don't know why they are different,
|
|
this section is the place for you to start.
|
|
If you're already familiar with the hierarchy of string classes in mozilla,
|
|
then you might want to skip ahead to the <a href="#implementors_guide">implementor's guide</a>
|
|
or the <a href="#faq">FAQ</a>.
|
|
</p>
|
|
</div>
|
|
|
|
<div class="contents">
|
|
<ul>
|
|
<li><a href="#users_guide_introduction">introduction</a></li>
|
|
<li><a href="#users_guide_how_to" >using the string classes correctly; using the correct string class</a></li>
|
|
<li><a href="#users_guide_iterators" >using string iterators</a></li>
|
|
<li><a href="#users_guide_summary" >summary</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<h2><a name="users_guide_introduction">introduction</a></h2>
|
|
<h3>what and what isn't a string?</h3>
|
|
<p>
|
|
A string is an opaque container holding a, possibly zero length, linear sequence of characters.
|
|
Understanding the implications of this statement is the foundation for understanding all mozilla's string classes.
|
|
</p>
|
|
|
|
<h3>readable and writable</h3>
|
|
<h3>dependent strings</h3>
|
|
<h3>flat strings</h3>
|
|
<h3>encoding</h3>
|
|
<h3>sharing</h3>
|
|
|
|
<h2><a name="users_guide_how_to">using the string classes correctly; using the correct string class</a></h2>
|
|
<h3>basic string operations</h3>
|
|
<h4>comparison</h4>
|
|
<h4>concatenation</h4>
|
|
<h4>substrings</h4>
|
|
<h4>find and replace</h4>
|
|
<h3>conversions</h3>
|
|
<h4>calling a function that expects a different kind of string</h4>
|
|
<h4>converting between string classes</h4>
|
|
<h4>converting between encodings</h4>
|
|
<h3>selecting the right string class</h3>
|
|
<h4>user string classes</h4>
|
|
<h4>selecting the right string class for a parameter</h4>
|
|
<h4>selecting the right string class for a local variable</h4>
|
|
<h4>selecting the right string class for a member variable</h4>
|
|
<h4>selecting the right string class for a return value</h4>
|
|
<h4>selecting the right string class in IDL</h4>
|
|
<h3>dont's</h3>
|
|
|
|
<h2><a name="users_guide_iterators">using string iterators</a></h2>
|
|
<h3>what is an iterator?</h3>
|
|
<h3>reading iterators and writing iterators</h3>
|
|
<h3>`chunky' iterating for efficiency</h3>
|
|
<h3><span class="code">copy_string</span>, character sources and sinks</h3>
|
|
<h3>encoding conversion iterators</h3>
|
|
|
|
<h2><a name="users_guide_summary">summary</a></h2>
|
|
|
|
|
|
<!-- ........................................................Implementor's Guide -->
|
|
<hr>
|
|
<h1><a name="implementors_guide">implementor's guide</a></h1>
|
|
|
|
<div class="author-note">
|
|
<p>
|
|
|
|
</p>
|
|
</div>
|
|
|
|
<div class="contents">
|
|
<ul>
|
|
<!-- li></li -->
|
|
</ul>
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ........................................................................FAQ -->
|
|
<hr>
|
|
<h1><a name="faq">frequently asked questions</a></h1>
|
|
|
|
<div class="author-note">
|
|
</div>
|
|
|
|
<div class="contents">
|
|
<ul>
|
|
<!--
|
|
<li>
|
|
I have a wide string, i.e., an instance of a class derived from <span class="code">nsAString</span>
|
|
<ul>
|
|
<li>I want a pointer to the characters</span>
|
|
<li>I want a narrow string</li>
|
|
<li>I want to <span class="code">printf</span> it</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
I have a <span class="code">PRUnichar*</span>
|
|
<ul>
|
|
<li>I want a wide string</span>
|
|
<li>I want a narrow string</span>
|
|
<li>I want to <span class="code">printf</span> it</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
I have a narrow string, i.e., an instance of a class derived from <span class="code">nsACString</span>
|
|
<ul>
|
|
<li>I want a pointer to the characters</span>
|
|
<li>I want a narrow string</li>
|
|
<li>I want to <span class="code">printf</span> it</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
I have a <span class="code">char*</span>
|
|
<ul>
|
|
<li>I want a wide string</span>
|
|
<li>I want a narrow string</span>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
I have a literal character sequence, e.g., <span class="code">"Hello, World!\n"</span>
|
|
<ul>
|
|
<li>I want a wide string</span>
|
|
<li>I want a narrow string</span>
|
|
</ul>
|
|
</li>
|
|
<li>What's the best way to return a string?</li>
|
|
<li>How can I get a pointer to the characters in a string?</li>
|
|
<li>How can I <span class="code">printf</span> a string?</li>
|
|
</ul>
|
|
-->
|
|
</div>
|
|
|
|
|
|
<table class="chart">
|
|
<tr>
|
|
<th></th>
|
|
<th colspan="5">you have some <span class="code">char</span>s</th>
|
|
</tr>
|
|
<tr>
|
|
<th>you want</th>
|
|
<th><span class="code">'x'</span></th>
|
|
<th><span class="code">char c</span></th>
|
|
<th><span class="code">"foo"</span></th>
|
|
<th><span class="code">char* cp</span></th>
|
|
<th><span class="code">nsACString& cs</span></th>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">char</span></th>
|
|
<td colspan="2">.</td>
|
|
<!-- "foo" --> <td><span class="code">[]</span></td>
|
|
<!-- char* cp --> <td><span class="code">[]</span></td>
|
|
<!-- nsACString& cs --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">PRUnichar</span></th>
|
|
<!-- 'x' --> <td><span class="code">PRUnichar('x')</span></td>
|
|
<!-- char c --> <td><span class="code">PRUnichar(c)</span></td>
|
|
<td colspan="3"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_extract_a_character">extract a character</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">char*</span></th>
|
|
<!-- 'x' --> <td><span class="code">&</span></td>
|
|
<!-- char c --> <td><span class="code">&</span></td>
|
|
<!-- "foo" --> <td><span class="code">&</span></td>
|
|
<!-- char* cp --> <td>.</td>
|
|
<!-- nsACString& cs --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">PRUnichar*</span></th>
|
|
<td colspan="5"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">nsACString</span></th>
|
|
<!-- 'x' --> <td><span class="code">NS_LITERAL_CSTRING("x")</span></td>
|
|
<!-- char c --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
|
|
<!-- "foo" --> <td><span class="code">NS_LITERAL_CSTRING("foo")</td>
|
|
<!-- char* cp --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
|
|
<!-- nsACString& cs --> <td>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">nsAString</span></th>
|
|
<!-- 'x' --> <td><span class="code">NS_LITERAL_STRING("x")</span></td>
|
|
<!-- char c --> <td><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
|
|
<!-- "foo" --> <td><span class="code">NS_LITERAL_STRING("foo")</span></td>
|
|
<td colspan="2"><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label">to call <span class="code">printf</span></th>
|
|
<td colspan="4">.</td>
|
|
<!-- nsACString& cs --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<table class="chart">
|
|
<tr>
|
|
<th></th>
|
|
<th colspan="3">you have some <span class="code">PRUnichar</span>s</th>
|
|
</tr>
|
|
<tr>
|
|
<th>you want</th>
|
|
<th><span class="code">PRUnichar w</span></th>
|
|
<th><span class="code">PRUnichar* wp</span></th>
|
|
<th><span class="code">nsAString& s</span></th>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">char</span></th>
|
|
<!-- PRUnichar w --> <td></td>
|
|
<!-- PRUnichar* wp --> <td></td>
|
|
<!-- nsAString& s --> <td></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">PRUnichar</span></th>
|
|
<!-- PRUnichar w --> <td></td>
|
|
<!-- PRUnichar* wp --> <td><span class="code">[]</span></td>
|
|
<!-- nsAString& s --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">char*</span></th>
|
|
<!-- PRUnichar w --> <td></td>
|
|
<!-- PRUnichar* wp --> <td></td>
|
|
<!-- nsAString& s --> <td></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">PRUnichar*</span></th>
|
|
<!-- PRUnichar w --> <td><span class="code">&</span></td>
|
|
<!-- PRUnichar* wp --> <td></td>
|
|
<!-- nsAString& s --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">nsACString</span></th>
|
|
<!-- PRUnichar w --> <td></td>
|
|
<!-- PRUnichar* wp --> <td></td>
|
|
<!-- nsAString& s --> <td></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label"><span class="code">nsAString</span></th>
|
|
<!-- PRUnichar w --> <td></td>
|
|
<!-- PRUnichar* wp --> <td></td>
|
|
<!-- nsAString& s --> <td></td>
|
|
</tr>
|
|
<tr>
|
|
<th class="row-label">to call <span class="code">printf</span></th>
|
|
<!-- PRUnichar w --> <td></td>
|
|
<!-- PRUnichar* wp --> <td></td>
|
|
<!-- nsAString& s --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<div class="faq">
|
|
<dl>
|
|
<dt>
|
|
is there any string doc?
|
|
</dt>
|
|
<dd>
|
|
Yes, you're soaking in it!
|
|
</dd>
|
|
|
|
|
|
|
|
<!-- getting a pointer -->
|
|
<dt>
|
|
<a name="faq_how_to_get_a_pointer">I have a string, how do I get a pointer to the characters?</a>
|
|
</dt>
|
|
<dd>
|
|
You want to avoid this situation.
|
|
In your own interfaces, prefer string types over raw pointers.
|
|
Any interface that wants to process a string using a single pointer is making two expensive assumptions.
|
|
First, that the string is stored in one contiguous hunk; and
|
|
second, that the string is zero-terminated.
|
|
If this isn't the case,
|
|
then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated.
|
|
You may not be able to avoid needing a pointer when interacting with system calls.
|
|
</dd>
|
|
<dd>
|
|
Some string classes guarantee that they are `flat'.
|
|
That is, that their data is stored in one contiguous zero-terminated hunk.
|
|
This <strong>does not</strong> imply that there are no embedded nulls. Caveat emptor.
|
|
All strings that explicitly promise flatness
|
|
inherit from the class <span class="code">nsAFlatString</span>
|
|
or <span class="code">nsAFlatCString</span>
|
|
and can produce a constant pointer to their data with the <span class="code">get()</span> member function.
|
|
Even strings that don't explicitly promise to be flat
|
|
may happen to be flat.
|
|
The helper function <span class="code">PromiseFlatString</span> will produce
|
|
a <span class="code">const</span> dependent string that is guaranteed to be flat.
|
|
If you use this on a string that already happens to be flat,
|
|
the result is simply a reference through to that string.
|
|
Otherwise,
|
|
<span class="code">PromiseFlatString</span> does the work to allocate, copy, terminate, and manage
|
|
a temporary flat string.
|
|
Since the result of <span class="code">PromiseFlatString</span> is a temporary,
|
|
you must be careful not to get and hold a pointer to its data for longer than the temporary itself lives.
|
|
</dd>
|
|
<dd>
|
|
<div class="source-code">
|
|
<pre>
|
|
/* I have a string, how do I get a pointer to the characters? */
|
|
|
|
extern void EvilNarrowOSFunction( const char* ); // evil OS routines that want a pointers
|
|
extern void EvilWideOSFunction( const PRUnichar* );
|
|
|
|
void func( const nsAString& aString, const nsACString& aCString )
|
|
{
|
|
EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").<span class="notice">get()</span> );
|
|
// literal strings are flat already (as are |nsString|s, et al), just use |.get()|
|
|
|
|
EvilWideOSFunction( <span class="notice">PromiseFlatString(</span>aString<span class="notice">).get()</span> );
|
|
// for strings that don't explicitly guarantee flatness, use |PromiseFlatString|
|
|
|
|
|
|
// beware holding the pointer for longer than the life of the promise
|
|
<span class="warning">const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles
|
|
EvilWideOSFunction(wp);</span>
|
|
|
|
// if you really need to use the pointer from |PromiseFlatString| in more than one expression...
|
|
const nsAFlatString& flat = <span class="notice">PromiseFlatString(</span>aString<span class="notice">)</span>;
|
|
EvilWideOSFunction(flat.<span class="notice">get()</span>);
|
|
SomeOtherFunction(flat.<span class="notice">get()</span>);
|
|
|
|
// similarly for |char| strings
|
|
EvilNarrowOSFunction( <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span> );
|
|
}
|
|
</pre>
|
|
</div>
|
|
</dd>
|
|
|
|
|
|
|
|
<!-- extracting a character -->
|
|
<dt>
|
|
<a name="faq_how_to_extract_a_character">How do I get a particular character out of a string?</a>
|
|
</dt>
|
|
<dd>
|
|
Flat strings provide <span class="code">operator[]</span> and <span class="code">CharAt()</span>.
|
|
All strings provide <span class="code">First()</span>, <span class="code">Last()</span>, and access with iterators.
|
|
<strong>Don't</strong> promise a string flat just to do character indexing.
|
|
Prefer, instead, to get an iterator and <span class="code">advance</span> it to the position you care about.
|
|
</dd>
|
|
<dd>
|
|
<div class="source-code">
|
|
<pre>
|
|
/* How do I get a particular character out of a string? */
|
|
|
|
PRUnichar Get5thCharacterOf( const nsAString& aString )
|
|
{
|
|
if ( aString.Length() >= 5 )
|
|
{
|
|
nsAString::const_iterator iter;
|
|
aString.BeginReading(iter); // make |iter| point to the beginning of |aString|
|
|
iter.advance(5);
|
|
return *iter;
|
|
}
|
|
|
|
return PRUnichar(0);
|
|
}
|
|
</pre>
|
|
</div>
|
|
</dd>
|
|
<dd>
|
|
Using iterators isn't as bad as the example above makes it feel.
|
|
The typical use is for advancing through a string, examining many characters.
|
|
</dd>
|
|
|
|
|
|
|
|
<!-- how to convert encoding -->
|
|
<dt>
|
|
<a name="faq_how_to_convert_encoding">How do I convert from one encoding to another?</a>
|
|
</dt>
|
|
<dd>
|
|
</dd>
|
|
|
|
|
|
|
|
<!-- how to make a string -->
|
|
<dt>
|
|
<a name="faq_how_to_make_a_string">How do I create a string?</a>
|
|
</dt>
|
|
<dd>
|
|
</dd>
|
|
|
|
|
|
<!-- how to return a string -->
|
|
<dt>
|
|
What is the best way to return a string?
|
|
</dt>
|
|
<dd>
|
|
<p>
|
|
There are several reasonable ways to produce a string result from a function.
|
|
If you are already holding the answer as a sharable string,
|
|
you can simply return that string (pass-by-value).
|
|
Otherwise,
|
|
the most efficient and flexible way to return a string is
|
|
to assign your result into a non-<span class="code">const</span> reference parameter.
|
|
Don't bother to create a sharable string from scratch with your generated result.
|
|
</p>
|
|
<p>
|
|
Why?
|
|
The two things you want to minimize in string manipulation are,
|
|
in order of importance,
|
|
heap allocation, and
|
|
moving characters around.
|
|
</p>
|
|
</dd>
|
|
<dd>
|
|
<div class="source-code">
|
|
<pre>
|
|
/* What is the best way to return a string? */
|
|
|
|
class foo
|
|
{
|
|
public:
|
|
// ...
|
|
void GetShortName( nsAString& aResult ) const;
|
|
nsCommonString GetFullName() const;
|
|
|
|
private:
|
|
nsCommonString mFullName;
|
|
|
|
const PRUnichar* mShortName;
|
|
PRUint32 mShortNameLength;
|
|
|
|
};
|
|
|
|
nsCommonString
|
|
foo::GetFullName() const
|
|
{
|
|
return mFullName;
|
|
}
|
|
|
|
void
|
|
foo::GetShortName( nsAString& aResult ) const
|
|
{
|
|
aResult = DependentString(mShortName, mShortNameLength);
|
|
}
|
|
</pre>
|
|
</div>
|
|
</dd>
|
|
|
|
|
|
<dt>
|
|
<a name="faq_how_to_call_printf">How do I <span class="code">printf</span> a string, e.g., for debugging.</a>
|
|
</dt>
|
|
<dd>
|
|
If your string is already narrow, you just have to worry about <a href="#faq_how_to_get_a_pointer">making it flat, and then getting a pointer</a>.
|
|
</dd>
|
|
<dd>
|
|
If your string happens to be wide,
|
|
you'll need to convert it before you can <span class="code">printf</span> something reasonable.
|
|
If it's just for debugging,
|
|
you probably wouldn't care if something odd was printed in the case of a UCS2 character that didn't have
|
|
an ASCII equivalent.
|
|
The simplest thing in this case is to make a temporary conversion using <span class="code">NS_ConvertUCS2toUTF8</span>.
|
|
The result is conveniently flat already, so getting the pointer is simple.
|
|
Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary.
|
|
</dd>
|
|
<dd>
|
|
<div class="source-code">
|
|
<pre>
|
|
/* How do I |printf| a string? */
|
|
|
|
|
|
void PrintSomeStrings( const nsAString& aString, const PRUnichar* aKey, const nsACString& aCString )
|
|
{
|
|
// |printf|ing a narrow string is easy
|
|
printf("%s\n", <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span>); // GOOD
|
|
|
|
// the simplest way to get a |printf|-able |const char*| out of a string
|
|
printf("%s\n", <span class="notice">NS_ConvertUCS2toUTF8(</span>aKey<span class="notice">).get()</span>); // GOOD
|
|
|
|
// works just as well with an formal wide string type...
|
|
printf("%s\n", <span class="notice">NS_ConvertUCS2toUTF8(</span>aString<span class="notice">).get()</span>);
|
|
|
|
|
|
// But don't hold onto the pointer longer than the lifetime of the temporary!
|
|
<span class="warning">const char* cstring = NS_ConvertUCS2toUTF8(aKey).get(); // BAD! |cstring| is dangling
|
|
printf("%s\n", cstring);</span>
|
|
}
|
|
</pre>
|
|
</div>
|
|
</dd>
|
|
|
|
</dl>
|
|
|
|
<p>
|
|
Here are the email answers I have yet to format into the FAQ.
|
|
Some of the URLs may be out-dated or moved.
|
|
The messages are in order from oldest to newest.
|
|
</p>
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 13 Apr 2000 19:41:47 -0400
|
|
</pre>
|
|
|
|
<p>Encoding Wars
|
|
|
|
<p>This message is all about strings and the various encodings that might
|
|
be used to interpret their contents, the ramifications of that, and
|
|
where we're heading. The point of this message is to say what we're
|
|
currently thinking, and get feedback. I apologize in advance for the
|
|
rambling, and for the fact that this message may accidentally mix
|
|
discussion of how things <strong>are</strong> and how they will be.
|
|
|
|
<p>There are many different possible encodings. Three in common use in
|
|
the Mozilla source base are: ASCII, UCS2, and UTF8. In ASCII, every
|
|
character fits in 7-bits and is typically stored in an 8-bit byte. We
|
|
usually represent ASCII strings with <span class="code">nsCString</span>s, <span class="code">nsXPIDLCString</span>s,
|
|
or <span class="code">char</span> string literals. In UCS2, characters occupy 16 bits each.
|
|
We usually represent UCS2 strings as <span class="code">nsString</span>s, etc., i.e., two-byte
|
|
or `wide' strings. UTF8 is a multi-byte encoding. A character might
|
|
occupy one, two, or three bytes. It is easiest to store and
|
|
manipulate such a string within a single-byte or `narrow' string
|
|
implementation.
|
|
|
|
<p>None of our current string implementations know the encoding of the
|
|
data they hold at any given moment. An <span class="code">nsCString</span> might legitimately
|
|
hold data encoded in ASCII, UTF8, or even EBCDIC for that matter.
|
|
|
|
<p>Operations that convert from one encoding to another, or operations
|
|
that are encoding sensitive (e.g., <span class="code">to_upper</span>), rightly belong in
|
|
i18n. The fact that our current string interfaces automatically and
|
|
implicitly convert between wide and narrow strings is actually the
|
|
source of many errors in two particular categories: (1) unintended
|
|
extra work, (2) mistaken re-encoding, e.g., accidentally `converting'
|
|
a UTF8 string to UCS2 by pretending the UTF8 string is ASCII and then
|
|
padding with <span class="code">'\0'</span>s.
|
|
|
|
<p>We've known these were bad for a long time, and have been trying to
|
|
find the right way to fix them. The current thinking is to just byte
|
|
the bullet and eliminate implicit conversions. That has interesting
|
|
ramifications.
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void foo( const nsString& aUCS2string );
|
|
|
|
foo("hello"); // works! constructs a temporary |nsString| by
|
|
// converting the ASCII literal with padding.
|
|
// Note: this requires an allocation
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Though we've always hated this form since it requires a heap
|
|
allocation. In current code, we recommend
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
foo( nsAutoString("hello") );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>which still copy/converts, but at least it probably doesn't need to do
|
|
a heap allocation. In the best of all worlds, no conversion, copying,
|
|
or allocation would be necessary. To do that, you would need to be
|
|
able to directly specify a UCS2 string, e.g., with the <span class="code">L"hello"</span>
|
|
notation, and wrap that in an interface that just held a pointer.
|
|
E.g., something like
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void foo( const nsAReadableString& aUCS2string );
|
|
|
|
foo( nsLiteralString(L"hello") );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>There are problems with this example, however. The <span class="code">L</span> notation
|
|
specifically makes objects that are arrays of <span class="code">wchar_t</span>, which under
|
|
GCC is a 4-byte element. This leads to incompatibility with JS, and
|
|
the annoyance of possibly bloated storage (I'm sort of minimizing the
|
|
situation here. It's worse that I make it sound). More about tricks
|
|
to get around this in a bit, but first, let me talk about what to do
|
|
in the meantime while we're just getting rid of implicit constructors.
|
|
Initially to get around this problem (what problem? The problem that
|
|
<span class="code">foo("hello")</span> stopped compiling on my machine when I threw the
|
|
switch) I made a routine called <span class="code">NS_ConvertToString</span> which looked like
|
|
this
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
inline
|
|
nsAutoString
|
|
NS_ConvertToString( const char* anASCIIstring )
|
|
{
|
|
nsAutoString aUCS2string;
|
|
aUCS2string.AssignWithConversion(anASCIIstring);
|
|
return aUCS2string;
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Which lets me write
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
foo( NS_ConvertToString("hello") );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This was <strong>OK</strong>, but in discussion there were concerns about performance
|
|
on machines that didn't <span class="code">inline</span> well, and issues about naming. In
|
|
that meeting we came up with an alternate naming strategy that we
|
|
think has room for growth and an implementation more likely to be
|
|
efficient on every platform. The implementation is to define a new
|
|
class that derives from <span class="code">nsAutoString</span>, but allows construction from a
|
|
<span class="code">char*</span>
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
class NS_ConvertASCIItoUCS2 : public nsAutoString
|
|
{
|
|
public:
|
|
NS_ConvertASCIItoUCS2( const char* );
|
|
// ...
|
|
};
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Which gives identical (though renamed) notation for calling <span class="code">foo</span>:
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
foo( NS_ConvertASCIItoUCS2("hello") );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>It looks like a function call to an explicit encoding conversion. It
|
|
acts like a function call to an explicit encoding conversion. It <strong>is</strong>
|
|
a function call to an explicit encoding conversion. We think that
|
|
this naming pattern has room for growth. In the meeting, we concluded
|
|
that the best representation for encoding conversions is a family of
|
|
functions, and <span class="code">NS_ConvertASCIItoUCS2</span> fits right in. We think that
|
|
XPCOM probably can't live without the ASCII to UCS2 conversion (though
|
|
as explicit as possible) but that all others rightly belong in i18n
|
|
land.
|
|
|
|
<p>You can probably deduce from the clues in <span class="code">NS_ConvertToString</span>, above,
|
|
that constructors weren't the only thing that became explicit.
|
|
Assignment, appending, comparison, et al, got renamed so that when
|
|
assigning, appending, or comparing to a value in a different encoding
|
|
the `WithConversion' form must be used. E.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsString aUCS2string;
|
|
nsCString anASCIIstring;
|
|
// ...
|
|
|
|
aUCS2string += anASCIIstring; // Currently legal, but not for long
|
|
aUCS2string.Append(anASCIIstring); // same
|
|
|
|
aUCS2string.AppendWithConversion(anASCIIstring); // the new way
|
|
|
|
if ( aUCS2string == anASCIIstring ) // Sorry, this is going away too
|
|
// ...
|
|
|
|
if ( aUCS2string.EqualsWithConversion(anASCIIstring) )
|
|
// ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Yes, it's long and annoying. Just like the extra work you were
|
|
implicitly asking to have done, perhaps incorrectly. There are other
|
|
reasons to rename these functions. When <span class="code">nsString</span> and <span class="code">nsCString</span>
|
|
defined a ton of, e.g., <span class="code">Append</span>s each there was no problem, because
|
|
nobody wanted to override <span class="code">Append</span>. Now, with strings inheriting from
|
|
abstract base classes we immediately run into the problem that
|
|
overriding and overloading don't mix very well in C++. Because of a
|
|
feature of C++ called name hiding, it is problematic to override only
|
|
a single signature of a name overloaded in a base class. The base
|
|
<span class="code">nsAWritableString</span> provides several <span class="code">Append</span>s, all for objects of
|
|
(hopefully) the same encoding. <span class="code">nsString</span> can't easily add a bunch of
|
|
new <span class="code">Append</span>s (the converting ones) without running face first into
|
|
the name hiding problem. The discussion of the fix for this is mostly
|
|
unrelated to encoding issues, so I'll defer it to another post.
|
|
|
|
<p>In hindsight, after the meeting, it seemed clear that all the
|
|
`WithConversion' forms would be better named
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
xxxConvertingASCIItoUCS2
|
|
xxxConvertingUCS2toASCII
|
|
</pre>
|
|
</div>
|
|
|
|
<p>however, the <strong>real</strong> goal (probably) is to move most such conversions
|
|
into i18n. Just bringing attention to the previously implicit
|
|
conversions is a good first step. Renaming these conversions as just
|
|
suggested is probably the right thing to do, though it sort of
|
|
validates them, which I'm not sure we really want. This is a decision
|
|
we need to discuss further.
|
|
|
|
<p>Now, back to the string literal problem above. One possible solution
|
|
is to use a macro. Imagine
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_LITERAL_STRING("Hello")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>which on a machine where the <span class="code">L</span> trick works, turns into
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsLiteralString(L"Hello")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>but on a machine where there is trouble, turns into something less
|
|
appealing, but more likely to work, like
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_ConvertASCIItoUCS2("Hello")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Another solution is to add a compilation step that fixes <span class="code">L</span> strings
|
|
on bad platforms to be non-<span class="code">L</span> strings, but padded with <span class="code">\0</span>s. E.g.,
|
|
<span class="code">L"Hello"</span> gets preprocessed into <span class="code">"\000H\000e\000l\000l\000o\000"</span>.
|
|
This solution is more annoying to the developer, where the prior
|
|
solution is more annoying during the runtime.
|
|
|
|
<p>Before we go to too much trouble on this specific feature, we will
|
|
probably want to do more measurement to see just how much and how
|
|
often we are converting constant literal strings, and why.
|
|
|
|
|
|
<p>I'm currently ripping through the tree fixing things to use the
|
|
`WithConversion' forms where appropriate. I was also converting
|
|
things to use <span class="code">NS_ConvertToString</span> where appropriate; unless I get
|
|
talked out of it, I want to switch midstream to
|
|
<span class="code">NS_ConvertASCIItoUCS2</span>, then go back and fix up the
|
|
<span class="code">NS_ConvertToString</span> instances later. I've set things up so I can
|
|
check in as I go. After all these conversions have been done, I'll be
|
|
able to throw the switch (what switch? NEW_STRING_APIS) which will
|
|
make <span class="code">nsString</span> inherit from <span class="code">nsAWritableString</span>, etc. and allow us to
|
|
start exploiting these other opportunities (e.g., for literal strings,
|
|
shared strings, etc. See
|
|
<a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=28221">http://bugzilla.mozilla.org/show_bug.cgi?id=28221</a> for details and
|
|
reasoning.)
|
|
|
|
<p>I guess I'm expecting comments on:
|
|
|
|
<ul>
|
|
<li>how really annoying this whole topic is
|
|
<li>how bad <span class="code">L"xxx"</span> is
|
|
<li>whether to move forward with <span class="code">NS_ConvertASCIItoUCS2</span>
|
|
<li>whether we should move to xxxConvertingASCIItoUCS2 etc instead
|
|
of `WithConverting'
|
|
<li>arguments about where encoding conversions should live
|
|
<li>arguments about whether going between 1 and 2 byte storage is an
|
|
encoding conversion
|
|
<li>questions about stuff I didn't mention or didn't explain well
|
|
<li>pointing out stuff I'm just plain wrong about, or things I forgot
|
|
<li>etc
|
|
</ul>
|
|
|
|
<p>So as not to jumble the discussion, I'll be separately posting other
|
|
requests for comments about specific features of the design of the new
|
|
string hierarchy.
|
|
|
|
<p>I hope this helps keep everybody filled in on what we're thinking and
|
|
able to point out what we're forgetting or screwing up :-)
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Wed, 19 Apr 2000 21:12:47 -0400
|
|
Subject: more string info
|
|
</pre>
|
|
|
|
<p> <a class="exact-uri" href="news://news.mozilla.org/scc-705460.16423913042000@news.mozilla.org">news://news.mozilla.org/scc-705460.16423913042000@news.mozilla.org</a>
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Fri, 26 May 2000 15:31:37 -0400
|
|
Subject: Re: Question on ==
|
|
</pre>
|
|
|
|
<p>I would prefer you compare with <span class="code">Equals</span> (which should really be named
|
|
<span class="code">IsEqualTo</span>) rather than <span class="code">operator==()</span> because of this:
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
char* a;
|
|
char* b;
|
|
|
|
// ...
|
|
|
|
if ( a == b )
|
|
// ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Comparing two raw `string' pointers doesn't compare the characters
|
|
they point to, but instead compares the bits of the pointers. For
|
|
this reason, I may eventually make comparison of a string with a
|
|
pointer using operators just go away.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Wed, 14 Jun 2000 14:38:55 -0400
|
|
Subject: Re: Fix to XprtDefs.h
|
|
</pre>
|
|
|
|
<p>Yes, we're aware that turning off <span class="code">wchar_t</span> support makes <span class="code">wchar_t</span> be
|
|
a synonym for <span class="code">unsigned short</span> under Metrowerks. We know that the
|
|
current version of VC++ also makes these types equivalent. In theory,
|
|
though, the types are distinct even when they are the same size and
|
|
shape. By using real <span class="code">wchar_t</span> support, we are forced to recognize
|
|
the distinction and navigate it appropriately with <span class="code">reinterpret_cast</span>
|
|
(via <span class="code">NS_REINTERPRET_CAST</span>). The win here is that we aren't caught by
|
|
compiler changes that suddenly make some set of compilers compliant
|
|
and therefore break our code. We will add an autoconf test that lets
|
|
UNIX compilers opt in to our string scheme when they have an
|
|
appropriately shaped <span class="code">wchar_t</span>. If these happen to be compliant
|
|
compilers, all will be well. If they don't, the casts don't hurt,
|
|
because they are type correct. We are writing our code to meet the
|
|
standard as we move forward.
|
|
|
|
<p>The win for us is realized by the following macros
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
#ifdef HAVE_CPP_2BYTE_WCHAR_T
|
|
#define NS_LITERAL_STRING(s) nsLiteralString(L##s, \
|
|
(sizeof(L##s)/sizeof(wchar_t))-1)
|
|
#else
|
|
#define NS_LITERAL_STRING(s) NS_ConvertASCIItoUCS2(s, \
|
|
sizeof(s)-1)
|
|
#endif
|
|
</pre>
|
|
</div>
|
|
|
|
<p>An <span class="code">nsLiteralString</span> points directly to the literal characters. No
|
|
copying, no conversion, and the length calculation happens at compile
|
|
time. This has turned out to be as large a savings as 15% of code
|
|
space and 8% of data space, net, in our string test harness It's
|
|
faster as well, again by eliminating the copying, conversion, and
|
|
length calculation. We don't know yet what those numbers translate
|
|
into in our real code base, but we have high hopes.
|
|
|
|
<p>I don't want to be in the position to ask you to change your code. I
|
|
don't think it's appropriate for me to do so. The AIM application
|
|
that is your client is our client as well. They need to resolve this
|
|
difference between us in whatever way they think best. That may mean
|
|
asking you if changing your apis is the right thing to do. Or it may
|
|
mean applying the casts. Our code-base and yours, Justin, are more
|
|
like cousins. I don't think you should have to change just to conform
|
|
to us. You may think my arguments for using real <span class="code">wchar_t</span> have
|
|
merit, and adopt similar usage just because you agree; but I think the
|
|
only obligation you have is to follow the technical solution you think
|
|
is right for your code.
|
|
|
|
<p>If you decide to make this api change, it will mean shipping a new
|
|
binary (on Mac) for your library to clients who want to switch over to
|
|
the new api (since the name mangling will be different, and therefore,
|
|
the link requirements will change).
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 15 Jun 2000 19:36:55 -0400
|
|
Subject: Re: Checkin approval for bug 32336
|
|
</pre>
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
S.Equals(NS_LITERAL_STRING("bar"), PR_TRUE, 3)
|
|
</pre>
|
|
</div>
|
|
|
|
<p>doesn't compile because there is no three parameter form for <span class="code">Equals</span>.
|
|
For all definitions of <span class="code">Equals</span> on strings, see "nsAReadableString.h"
|
|
|
|
<p><a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
|
|
|
|
<p>There is an <span class="code">EqualsWithConversion</span> that takes three parameters.
|
|
|
|
<p> <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731</a>
|
|
|
|
<p>It is ``EqualsWithConversion'' because it admits the possibility of an
|
|
encoding specific transformation, in this case to provide
|
|
case-insensitive comparison. This also wouldn't compile, however,
|
|
since, at the moment, an <span class="code">nsLiteralString</span> doesn't provide an operator
|
|
to produce a <span class="code">const PRUnichar*</span> (though perhaps it should), and it
|
|
doesn't satisfy the other interfaces that match this call, e.g., a
|
|
<span class="code">const nsString&</span>.
|
|
|
|
<p>Perhaps I need to move case-insensitive comparison up out of
|
|
<span class="code">nsString</span> into a global encoding specific transformations and
|
|
algorithms file (which was on its way anyway as Waterson, knows); this
|
|
use is one bit of evidence to support this. In the short term, this
|
|
can be fixed (if we think the current behavior is wrong) by providing
|
|
<span class="code">operator const CharT*() const</span> on literal string.
|
|
|
|
<p>If you can live with out case-folding, the earlier form is preferred
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
S == NS_LITERAL_STRING("bar")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>if you can't, then one of the fixes I mentioned is in order.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 15 Jun 2000 19:47:12 -0400
|
|
Subject: Re: [Fwd: how to use nsString ?]
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>I see these same examples time and again in the embedding
|
|
>samples/docs, but I can't compile them.
|
|
</pre>
|
|
|
|
<p>Apologies. Documentation mentioning strings is getting out of date.
|
|
Here are some specific answers.
|
|
|
|
|
|
<pre class="email-quote">
|
|
>nsString URLString("http://www.mozilla.org");
|
|
</pre>
|
|
|
|
<p>...is now perhaps best expressed as
|
|
|
|
nsString URLString( NS_LITERAL_STRING("http://www.mozilla.org") );
|
|
|
|
<p>since an <span class="code">nsString</span> is a sequence of 2-byte wide characters, and the
|
|
routines that implicitly convert 1-byte sequences (like the literal
|
|
sequence you specified, "http:...") are now gone.
|
|
|
|
<p>Up until not too long ago, one would have had to say
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsString URLString;
|
|
URLString.AssignWithConversion("http://www.mozilla.org");
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The <span class="code">NS_LITERAL_STRING</span> construction is new machinery that has the
|
|
potential to make many operations much more efficient.
|
|
|
|
<pre class="email-quote">
|
|
>nsString URLString;
|
|
>URLString.SetString("www.mozilla.org");
|
|
</pre>
|
|
|
|
<p><span class="code">SetString</span> was a synonym for <span class="code">Assign</span> or assignment with
|
|
<span class="code">operator=()</span>, it too went away. The equivalent is the second
|
|
example I gave above, that is, the one with <span class="code">AssignWithConversion</span>.
|
|
|
|
<p><span class="code">Assign</span> still exists. <span class="code">AssignWithConversion</span> takes on that
|
|
functionality for assignments that require encoding transformations
|
|
(e.g., from ASCII to UCS2). <span class="code">SetString</span> is gone, since it was always
|
|
a synonym for <span class="code">Assign</span>.
|
|
|
|
<p>Learn more about the general APIs for strings that we are trying to
|
|
move to by examining
|
|
|
|
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
|
|
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 15 Jun 2000 21:26:51 -0400
|
|
Subject: Re: Checkin approval for bug 32336
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>I *need* the count attribute, because I need to compare only the first
|
|
>chars (that's inherent to the logic).
|
|
</pre>
|
|
|
|
<p>This is what substrings are for. In that case, you could use
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
Substring(S, 0, 3) == NS_LITERAL_STRING("bar")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>As for case-folding, it's best if you can case-fold everything up
|
|
front, instead of doing it repeatedly. I'll have to get back to you
|
|
on a general solution to that problem, or what my schedule for getting
|
|
it checked in would be. I'm sorry, I know that's not what you needed
|
|
to hear. If the source string is an <span class="code">nsString</span>, you can continue to
|
|
exploit its implementation of these routines, e.g., <span class="code">ToLower</span> all
|
|
up-front.
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Mon, 19 Jun 2000 14:23:47 -0400
|
|
Subject: Re: string fu
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>It seems less convenient to have to first check path.IsEmpty, and
|
|
>then if false get path.Last and test it.
|
|
</pre>
|
|
|
|
<p>What would you prefer? That extracting a character not in the string
|
|
always return <span class="code">CharT(0)</span>? Can't do it for two reasons: (1) <span class="code">0</span> may be
|
|
a valid character in a particular encoding, so it can't be used in
|
|
general as a ``no character at that position'' marker; and (2) I can't
|
|
control what an individual string implementation does when asked to
|
|
get an out-of-bounds fragment, it's explicitly undefined. That means
|
|
the result of <span class="code">CharAt</span> is explicitly undefined for indexes outside the
|
|
defined contents of the string. As a debugging convenience, I have
|
|
made this assert, but it has always been the case that retrieving such
|
|
a character had undefined results ... even in [the old] code.
|
|
|
|
<p>OK, you might say, well at least let me ask for a character that is
|
|
only off the end by one. E.g., <span class="code">Last</span> of an empty string. Reason (1)
|
|
from above still applies. How bad is it to say, for the case you gave
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
PRBool needsDelim = PR_FALSE;
|
|
if ( !path.IsEmpty() )
|
|
{
|
|
PRUnichar last = path.Last();
|
|
needsDelim = !(last == '/' || last == '\\');
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>In general, you probably want to opt out of a whole lot of work when
|
|
the source string is empty. It is slightly less convenient, but it
|
|
doesn't tie us to a bunch of implementation specific mojo.
|
|
|
|
|
|
<pre class="email-quote">
|
|
>Can we fix GetUnicode in this case?
|
|
</pre>
|
|
|
|
<p>This is an annoying property of auto strings, e.g., that they always
|
|
have an allocated buffer. I'm happy to fix this bug, however, be
|
|
aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts of [the old]
|
|
implementation that we don't want to support. They are not part of
|
|
the abstract interface. We will keep them no longer than we have to.
|
|
They don't support our multi-fragment paradigm. People who require a
|
|
contiguous hunk of characters in the future, and are unwilling to
|
|
switch over to chunky-iterators, may be forced to copy the string to
|
|
their own buffer. There will be an implementation of narrow character
|
|
string that guarantees contiguous allocation and a zero-terminator,
|
|
much as <span class="code">nsCString</span> does now, for compatibility with platform uses,
|
|
but this won't be the default string class.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Mon, 19 Jun 2000 17:22:31 -0400
|
|
</pre>
|
|
|
|
<p>Clarifying String Sematics
|
|
|
|
<p>Recently, I added an assert to the string operations that extract
|
|
characters, namely <span class="code">First()</span>, <span class="code">Last()</span>, <span class="code">CharAt()</span>, and
|
|
<span class="code">operator[]()</span>. This assert fires when any of these routines are used
|
|
to access a character outside the defined contents of the string. For
|
|
<span class="code">First()</span> and <span class="code">Last()</span> that means whenever they are applied to an
|
|
empty string. For <span class="code">CharAt()</span> and <span class="code">operator[]()</span>, that means whenever
|
|
they are used to access an index outside the range of
|
|
<span class="code">0</span>..<span class="code">Length()-1</span>. There have been some complaints, however, the
|
|
result was always undefined. What follows is extracted from an email
|
|
exchange between me and warren on this topic. I hope it clarifies
|
|
strings semantics
|
|
|
|
<p>Warren writes:
|
|
<pre class="email-quote">
|
|
>I hit your funky CharAt assertion tonight in this piece of code:
|
|
|
|
>NS_IMETHODIMP
|
|
>nsIOService::ResolveRelativePath(
|
|
> const char *relativePath,
|
|
> const char* basePath,
|
|
> char **result )
|
|
> {
|
|
> nsCAutoString name;
|
|
> nsCAutoString path(basePath);
|
|
>
|
|
> PRUnichar last = path.Last();
|
|
> PRBool needsDelim = !(last == '/' || last == '\\' || last ==
|
|
> '\0');
|
|
> ...
|
|
|
|
>where basePath is null. It seems less convenient to have to first
|
|
>check path.IsEmpty, and then if false get path.Last and test it.
|
|
</pre>
|
|
|
|
<p>I replied:
|
|
<pre class="email-quote">
|
|
>What would you prefer? That extracting a character not in the
|
|
>string always return <span class="code">CharT(0)</span>? Can't do it for two reasons:
|
|
>(1) <span class="code">0</span> may be a valid character in a particular encoding, so it
|
|
>can't be used in general as a ``no character at that position''
|
|
>marker; and (2) I can't control what an individual string
|
|
>implementation does when asked to get an out-of-bounds fragment,
|
|
>it's explicitly undefined. That means the result of <span class="code">CharAt</span> is
|
|
>explicitly undefined for indexes outside the defined contents of
|
|
>the string. As a debugging convenience, I have made this assert,
|
|
>but it has always been the case that retrieving such a character
|
|
>had undefined results ... even in [the old] code.
|
|
|
|
>OK, you might say, well at least let me ask for a character that
|
|
>is only off the end by one. E.g., <span class="code">Last</span> of an empty string.
|
|
>Reason (1) from above still applies. How bad is it to say, for the
|
|
>case you gave
|
|
|
|
> PRBool needsDelim = PR_FALSE;
|
|
> if ( !path.IsEmpty() )
|
|
> {
|
|
> PRUnichar last = path.Last();
|
|
> needsDelim = !(last == '/' || last == '\\');
|
|
> }
|
|
|
|
>In general, you probably want to opt out of a whole lot of work
|
|
>when the source string is empty. It is slightly less convenient,
|
|
>but it doesn't tie us to a bunch of implementation specific mojo.
|
|
</pre>
|
|
|
|
<p>Warren also asks:
|
|
<pre class="email-quote">
|
|
>Here's another issue, perhaps more serious. If I say this:
|
|
|
|
> foo(const PRUnichar* s) {
|
|
> nsAutoString str(s);
|
|
> bar(str.get());
|
|
> }
|
|
|
|
>where s is null, bar will get passed a zero-length PRUnichar
|
|
>sequence instead of null. This makes it so that you can't just
|
|
>test for the argument == null. You have to nsCRT::strlen(arg) == 0
|
|
>which is much less efficient. Can we fix GetUnicode in this case?
|
|
</pre>
|
|
|
|
<p>And I reply:
|
|
<pre class="email-quote">
|
|
>This is an annoying property of auto strings, e.g., that they
|
|
>always have an allocated buffer. I'm happy to fix this bug,
|
|
>however, be aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts
|
|
>of [the old] implementation that we don't want to support. They
|
|
>are not part of the abstract interface. We will keep them no
|
|
>longer than we have to. They don't support our multi-fragment
|
|
>paradigm. People who require a contiguous hunk of characters in
|
|
>the future, and are unwilling to switch over to chunky-iterators,
|
|
>may be forced to copy the string to their own buffer. There will
|
|
>be an implementation of narrow character string that guarantees
|
|
>contiguous allocation and a zero-terminator, much as <span class="code">nsCString</span>
|
|
>does now, for compatibility with platform uses, but this won't be
|
|
>the default string class.
|
|
</pre>
|
|
|
|
<p>In a later message, Chris Waterson asks a related question
|
|
<pre class="email-quote">
|
|
>scc: should we add <span class="code">operator PRUnichar*()</span> to
|
|
>NS_ConvertASCIItoUCS2?
|
|
</pre>
|
|
|
|
<p>And I reply:
|
|
<pre class="email-quote">
|
|
>It seems reasonable. A lot more reasonable that forcing people to
|
|
>call <span class="code">GetUnicode()</span>. I alluded to platform specific classes in an
|
|
>earlier message to warren that you were cc'd on, Chris. I imagine
|
|
>that the <span class="code">...Convert...</span> routines would be required to produce
|
|
>contiguous allocation 0-terminated strings (though the as yet
|
|
>unimplemented <span class="code">...Copy...</span> forms, of course wouldn't. So <span class="code">operator
|
|
>const PRUnichar*() const</span> makes perfect sense to me here.
|
|
</pre>
|
|
|
|
<p>Hope this makes sense,
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Tue, 20 Jun 2000 04:05:31 -0400
|
|
Subject: Re: NS_LITERAL_STRING is broken
|
|
</pre>
|
|
|
|
<p>The behavior you describe sounds exactly like when you say
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
const char* foobar = "foobar";
|
|
|
|
... NS_LITERAL_STRING(foobar).get() ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>because in this case, the thing passed in is a <span class="code">const char*</span>.
|
|
<span class="code">NS_LITERAL_STRING</span> is not meant to be used in this way. It is only
|
|
meant to be used around a <span class="code">"</span> delimited string. The type of such is
|
|
<span class="code">const char[N]</span> where N is the number of characters in the string + 1
|
|
for the zero terminator it helpfully adds. <span class="code">sizeof</span> such a type is
|
|
<span class="code">N</span>.
|
|
|
|
<p>Are you sure you had the actual string as an argument, as in your
|
|
example to me? Or could the actual code have been like my sample,
|
|
above?
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 29 Jun 2000 13:35:10 -0400
|
|
Subject: Re: a fix
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
> + if (Length() == 0) { return nsnull; }
|
|
</pre>
|
|
|
|
|
|
<p>Dave,
|
|
|
|
<p>please read
|
|
|
|
<a class="exact-uri" href="news://news.mozilla.org/scc-314ABF.14261619062000@news.mozilla.org">news://news.mozilla.org/scc-314ABF.14261619062000@news.mozilla.org</a>
|
|
|
|
<p>It's just plain wrong to let people try to index into a string outside
|
|
its defined contents. I can't just return <span class="code">'\0'</span> or <span class="code">PRUnichar('\0')</span>
|
|
there as that <strong>could</strong> be a legal value to have somewhere in your
|
|
string for some encodings ... and the encoding is not specified. So
|
|
your patch has the basic problem of defeating my plan to stop people
|
|
from doing this bad thing.
|
|
|
|
<p>The second problem with your patch is that you use the symbolic
|
|
constant <span class="code">nsnull</span>, which is ostensibly a pointer value; <span class="code">Last</span> returns
|
|
a character. <span class="code">nsnull</span> is not appropriate for that purpose. In fact,
|
|
C++ gurus pretty much eschew the use of symbolic constants for <span class="code">0</span>.
|
|
<span class="code">NULL</span> is to be avoided. <span class="code">nsnull</span> is wrong-headed in that it presumes
|
|
we could have some <strong>other</strong> application specific value for <span class="code">NULL</span>. We
|
|
can't, it would never work. It's just wasted brain-print. Always use
|
|
<span class="code">0</span> for these situations, and if you want to communicate the fact that
|
|
something is a pointer type, either use a comment or a
|
|
(construction-style) cast, like so (graded examples from worst to
|
|
best:)
|
|
|
|
<ul>
|
|
<li>F: FindChildByNameWithHint("Chuck", nsnull);
|
|
|
|
<li>D: FindChildByNameWithHint("Chuck", NULL);
|
|
|
|
<li>C: FindChildByNameWithHint("Chuck", /* Child* */ 0);
|
|
|
|
<li>B: typedef Child* Child_ptr;
|
|
FindChildByNameWithHint("Chuck", Child_ptr(0));
|
|
|
|
<li>A: FindChildByNameWithHint("Chuck", 0);
|
|
</ul>
|
|
|
|
<p>Don't let this discourage you; keep up the good work :-)
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Tue, 8 Aug 2000 23:47:16 -0400
|
|
Subject: Re: nsWritingIterator?
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>Can you give me any pointers to examples, or docs, or just some
|
|
>general advice?
|
|
</pre>
|
|
|
|
<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
|
|
|
|
<p>does this help?
|
|
|
|
<p>I can personally walk you through any specific scenario you need.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Wed, 9 Aug 2000 02:35:03 -0400
|
|
Subject: Re: nsWritingIterator?
|
|
</pre>
|
|
|
|
<p>You got it right... it's <span class="code">nsWritingIterator<CharT></span> for whichever
|
|
character type you care about, either <span class="code">char</span> or <span class="code">PRUnichar</span>. You
|
|
_can_ use this iterator like a character pointer ... that is, you can
|
|
dereference it, assign into its dereference, etc. It is more
|
|
efficient, though, to directly address a particular range of
|
|
characters around where it points by asking it for its actual
|
|
character pointer with <span class="code">get</span>, and knowing that there are
|
|
<span class="code">size_forward()</span> characters available ahead of that pointer and
|
|
<span class="code">size_backward()</span> characters available behind it. After examining
|
|
those characters by hand, you can advance the iterator beyond the
|
|
characters you have examined (and possibly into the next chunk, should
|
|
one exist) by adding into it (with +=) the count of the characters you
|
|
have processed.
|
|
|
|
<p>Here are three examples of running through a string and modifying some
|
|
of the characters in it. All use <span class="code">nsWritingIterator</span>s.
|
|
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
// inefficient, but works in a pinch:
|
|
// iterators can hide all details of chunks by acting like
|
|
// a raw character pointer
|
|
|
|
nsWritingIterator<PRUnichar> s = S.BeginWriting();
|
|
nsWritingIterator<PRUnichar> done_with_string = S.EndWriting();
|
|
|
|
// for each character in the string |S|
|
|
while ( s != done_with_string )
|
|
{
|
|
// if the character is lower case, capitalize it
|
|
if ( 'a' <= *s && *s <= 'z' )
|
|
*s = *s -'a' + 'A';
|
|
}
|
|
|
|
|
|
|
|
|
|
// efficient
|
|
// iterators provide a mechanism by which you can process
|
|
// a chunk-at-a-time
|
|
|
|
nsWritingIterator<PRUnichar> iter = S.BeginWriting();
|
|
nsWritingIterator<PRUnichar> done_with_string = S.EndWriting();
|
|
|
|
// for each chunk of the string
|
|
while ( iter != done_with_string )
|
|
{
|
|
size_t N = iter.size_forward(); // # of chars in this chunk
|
|
PRUnichar* s = iter.get();
|
|
PRUnichar* done_with_chunk = s + N;
|
|
|
|
// for each character in this chunk
|
|
for ( ; s < done_with_chunk; ++s )
|
|
{
|
|
// if the character is lower case, capitalize it
|
|
if ( 'a' <= *s && *s <= 'z' )
|
|
*s = *s - 'a' + 'A';
|
|
}
|
|
|
|
// advance the iterator past characters
|
|
// we examined (and into the next chunk, if any)
|
|
s += N;
|
|
}
|
|
|
|
|
|
|
|
// elegant
|
|
// pull your transformation into a `sink', and |copy_string|
|
|
// will efficiently pump any kind of string into it
|
|
|
|
struct Capitalize
|
|
{
|
|
// inline
|
|
PRUint32
|
|
write( PRUnichar* s, PRUint32 N )
|
|
// processes one chunk, called repeatedly by |copy_string|
|
|
{
|
|
PRUnichar* done_with_chunk = s + N;
|
|
|
|
// for each character in this chunk
|
|
for ( ; s < done_with_chunk; ++s )
|
|
{
|
|
// if the character is lower case, capitalize it
|
|
if ( 'a' <= *s && *s <= 'z' )
|
|
*s = *s - 'a' + 'A';
|
|
}
|
|
}
|
|
};
|
|
|
|
copy_string(S.BeginWriting(), S.EndWriting(), Capitalize());
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
|
|
<p>Does this show it better?
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 17 Aug 2000 18:23:22 -0400
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>I tried looking at the string header files but they
|
|
>are awfully complicated.
|
|
</pre>
|
|
|
|
<p>I'll explain things in a little <strong>more</strong> detail than you need, then so
|
|
that some of the stuff you see in these headers will make more sense.
|
|
I'll also answer your questions out of order.
|
|
|
|
<p>First: the string hierarchy looks like this
|
|
|
|
<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_hierarchy.gif">http://ScottCollins.net/Journal/discussion/string_hierarchy.gif</a>
|
|
|
|
<p>The two most important headers are:
|
|
|
|
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
|
|
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
|
|
|
|
<p>These abstract classes, <span class="code">nsAReadable[C]String</span>, and
|
|
<span class="code">nsAWritable[C]String</span> are typically what you will want to use in the
|
|
interfaces of new code. If you write a piece of code that takes a
|
|
string for input, consider, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void consumes_a_string( const nsAReadableString& aInput );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>If you write a piece of code that modifies a string, consider
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void modifies_a_string( nsAWritableString& aResult );
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<p>When creating your own classes, member strings will typically be
|
|
<span class="code">nsString</span>s. When you can't avoid creating a short string that you
|
|
need only temporarily during a function, you will typically use
|
|
<span class="code">nsAutoString</span>. When someone passes you a raw pointer, or a raw
|
|
pointer and a length, representing a buffer of characters that you may
|
|
examine, but won't own, you can treat it like a string by wrapping it
|
|
in an <span class="code">nsLiteralString</span>, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void
|
|
reads_a_buffer( const PRUnichar* aInput, PRUint32 aInputLength )
|
|
{
|
|
nsLiteralString input(aInput, aInputLength);
|
|
// doesn't allocate or copy
|
|
|
|
// ...
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>You will use <span class="code">nsLiteralString</span> around quoted constant strings as well,
|
|
though typically through the <span class="code">NS_LITERAL_STRING</span> macro, to avoid doing
|
|
a length calculation
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_LITERAL_STRING("x")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>expands to
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsLiteralString(L"x", (sizeof(L"x")/sizeof(PRUnichar) - 1))
|
|
</pre>
|
|
</div>
|
|
|
|
<p>if <span class="code">L</span> notation works as needed on your platform.
|
|
|
|
Those are the basics. Now onto your questions:
|
|
|
|
|
|
<pre class="email-quote">
|
|
>For example this won't compile. [...]
|
|
|
|
>str1 += L"abc " + str2 + L"def";
|
|
</pre>
|
|
|
|
|
|
<p><span class="code">L"abc "</span> makes a an object that is a <span class="code">const wchar_t[5]</span>, and none of
|
|
the string code knows about <span class="code">wchar_t</span>. The main reason is that
|
|
<span class="code">wchar_t</span> is not necessarily the right size (it can be 4 bytes under
|
|
gcc). If you wrap these constant expressions in <span class="code">NS_LITERAL_STRING</span>,
|
|
as described above, you should get the right thing, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
str1 += NS_LITERAL_STRING("abc ") + str2 + NS_LITERAL_STRING("def");
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<pre class="email-quote">
|
|
>Another one is:
|
|
>function(const PRUnichar *foo);
|
|
>call function(L"abc " + str2);
|
|
|
|
>It won't create a temporary nsString.
|
|
</pre>
|
|
|
|
<p>This one, I have a quick and easy explanation for. If <span class="code">function</span> was
|
|
declared like this
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
function( const nsAReadableString& )
|
|
</pre>
|
|
</div>
|
|
|
|
<p>then, no problem, since a <span class="code">nsPromiseConcatenation</span> (which was the
|
|
result of adding those two things together) <strong>is</strong> a readable string.
|
|
No other objects need to be created; no copying needs to be performed.
|
|
|
|
<p>In all cases, we want the creation of <span class="code">nsString</span>s et al, to be
|
|
<span class="code">explicit</span>, since creation is unbelievably expensive, requiring heap
|
|
allocation, locks, copying, etc.
|
|
|
|
<p>I hope this answers both your posts,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 17 Aug 2000 20:57:08 -0400
|
|
Subject: re our conversation
|
|
</pre>
|
|
|
|
return ToNewUnicode( nsLiteralCString(buffer) );
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Fri, 18 Aug 2000 02:52:45 -0400
|
|
Subject: Re: More questions and new string API
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>1) How do I return a static string?
|
|
|
|
>const nsAReadableString& foo() {return NS_LITERAL_STRING("x");}
|
|
>errors on taking the address of a temporary variable.
|
|
</pre>
|
|
|
|
<p>Unfortunately, <span class="code">NS_LITERAL_STRING</span>s definition is not particularly
|
|
amenable to this use. Instead, you would have to say something like
|
|
this:
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
const nsAReadableString&
|
|
foo()
|
|
{
|
|
#ifdef HAVE_CPP_2BYTE_WCHAR_T
|
|
static nsLiteralString static_foo(L"x", 1);
|
|
#else
|
|
static nsLiteralString static_foo;
|
|
static PRBool initialized = PR_FALSE;
|
|
if ( !initialized )
|
|
{
|
|
static_foo.AssignWithConversion("x", 1);
|
|
initialized = PR_TRUE;
|
|
}
|
|
#endif
|
|
return static_foo;
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<pre class="email-quote">
|
|
>2) I'm using these with the STL library in an XPCOM component.
|
|
>What type should I use with map? This doesn't work...
|
|
|
|
>typedef map<const nsAReadableString&, myType*> mapStringMyType;
|
|
>mapStringMyType foo;
|
|
>foo.find(nsAReadableString); - I want to find on a ReadableString
|
|
</pre>
|
|
|
|
<p>I don't know what errors you are getting; but it probably doesn't work
|
|
because a reference isn't an assignable type. This is just a guess.
|
|
You may need to use
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
map<const nsAReadableString*, myType*>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>If you actually want the map to manage ownership of the keys, then
|
|
you'll want to use a concrete type, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
map<nsString, myType*>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>or perhaps
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
map<nsSharedStringPtr, myType*>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Or maybe there's something else wrong. Send me the error messages.
|
|
If you end up using a pointer, then of course you'll have to supply a
|
|
comparison function to the <span class="code">map</span> template. You won't be satisfied
|
|
with the default comparison of pointers :-) Sorry I couldn't answer
|
|
this one more completely.
|
|
|
|
|
|
<pre class="email-quote">
|
|
>3) How do a get a raw PRUnichar pointer out of nsAReadableString
|
|
>when I need to call something that wants 'unsigned short *'?
|
|
</pre>
|
|
|
|
<p>The problem with this scenario is that an <span class="code">nsAReadableString</span> doesn't
|
|
promise that all its data is contiguous, nor that it is
|
|
zero-terminated, which is what I suspect you want in this case. If
|
|
the function you want to call can take {pointer, length} tuples, and
|
|
can consume the string in hunks without zero termination ... then you
|
|
can use <span class="code">copy_string</span> to pump the string into your function, see
|
|
|
|
<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
|
|
|
|
<p>If not, and you absolutely have to have a contiguous zero-terminated
|
|
buffer, then there is a new facility (part of the DOMAPI branch) that
|
|
does what you need. It's not checked in on the trunk; it should
|
|
be in early next week. It is <span class="code">nsPromiseFlatString</span>. This class
|
|
promises a contiguous zero-terminated buffer; and has an <span class="code">operator
|
|
PRUnichar*</span> to produce a pointer to that buffer automatically. If the
|
|
underlying class <strong>is</strong> one that happens to be a single fragment and
|
|
zero-terminated, then, like <span class="code">nsPromiseSubstring</span> and
|
|
<span class="code">nsPromiseConcatenation</span>, this class merely holds a reference into the
|
|
original data. If, however, the underlying string is multi-fragment
|
|
or not zero-terminated, then <span class="code">nsPromiseFlatString</span> allocates a
|
|
contiguous buffer of appropriate size and copies the fragmented string
|
|
data to it. So given
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void ReadBuffer( PRUnichar* );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>You can call this as efficiently as possible with an arbitrary string
|
|
like so
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
ReadBuffer( nsPromiseFlatString(aString) );
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<p>If the function you are calling needs to take ownership of the buffer
|
|
you hand it, then you will probably call <span class="code">ToNewUnicode</span> like so
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
void ConsumeBuffer( PRUnichar* );
|
|
|
|
ConsumeBuffer( ToNewUnicode(aString) );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The global function <span class="code">ToNewUnicode</span> is declared in "nsReadableUtils.h",
|
|
and was only recently added to the build. It is currently being used
|
|
in the DOMAPI branch. It is part of the build, but the file
|
|
"dlldeps.c" in XPCOM may need to be modified to ensure it is exported
|
|
on your platform if you are building the tip.
|
|
|
|
Needless to say, you want to avoid functions that require bare
|
|
pointers for several reasons: (a) they typically assume
|
|
zero-termination, which is not guaranteed by the normal encodings; (b)
|
|
they require contiguous allocation, which may not be possible; (c)
|
|
they scan for the end of the string, at linear cost (if the encoding
|
|
makes it possible at all), when the length could be known in advance.
|
|
If you have to do it, the above mechanisms work, but be aware of the
|
|
cost and the potential need to copy.
|
|
|
|
|
|
<pre class="email-quote">
|
|
>4) How do I declare a local variable to hold a nsAReadableString?
|
|
>and a member variable?
|
|
</pre>
|
|
|
|
<p><span class="code">nsAReadableString</span> is an abstract type. So you can't have a concrete
|
|
instance of it. All strings in the hierarchy are readable strings.
|
|
If you just want a reference to a readable string, you can say, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
struct foo
|
|
{
|
|
const nsAReadableString& mString;
|
|
// ...
|
|
|
|
foo( const nsAReadableString& aString ) : mString(aString) { }
|
|
};
|
|
</pre>
|
|
</div>
|
|
|
|
<p>...similarly with pointers; but I suspect you are looking for
|
|
something more concrete. An <span class="code">nsString</span> is a <span class="code">nsAReadableString</span>, and
|
|
is the typical thing you want as a member variable. An <span class="code">nsAutoString</span>
|
|
is also an <span class="code">nsAReadableString</span> and is typically what you would use for
|
|
a short (in length) temporary (in lifetime) local variable, as I
|
|
mentioned in my previous post.
|
|
|
|
|
|
<pre class="email-quote">
|
|
>5) If I call a function that returns a PRUnichar* and I want t
|
|
>use it as a nsAReadableString should I wrap it in a
|
|
>nsLiteralString?
|
|
</pre>
|
|
|
|
<p>Yes, though remember, an <span class="code">nsLiteralString</span> assumes the lifetime of the
|
|
underlying data is under someone else's control. If the called
|
|
function gives you a buffer that you need to <span class="code">delete</span>, you will have
|
|
to manage that yourself. Currently, people often use <span class="code">nsXPIDLString</span>
|
|
to handle that. XPIDL strings are <strong>not</strong> part of the hierarchy. They
|
|
are only used as a sort of string-<span class="code">auto_ptr</span>. However, I'm
|
|
integrating their functionality into <span class="code">nsString</span>. There is no problem
|
|
in wrapping the same pointer in both as two separate local variables,
|
|
one to give you the readable interface, and one to manage the
|
|
lifetime.
|
|
|
|
<p>If it's OK with you, I'd like to post this reply (including your
|
|
quoted questions) to n.p.m.xpcom and also put a copy near the string
|
|
iterator discussion I provided a link to above, so that other people
|
|
with similar questions can see these answers.
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Sun, 3 Sep 2000 03:52:17 -0400
|
|
</pre>
|
|
|
|
<p>In article <8nu9m2$eo14@secnews.netscape.com>, "Jon Smirl"
|
|
<jonsmirl@mediaone.com> wrote:
|
|
|
|
> I have the new strings up and running in my app. They work as
|
|
> advertised and
|
|
> I haven't found any bugs. Thanks for the good job in designing and
|
|
> implementing them. Here's are a summary of issues I've encountered
|
|
> so far...
|
|
|
|
<p>Thanks, and I appreciate your comments and insights.
|
|
|
|
|
|
>
|
|
> 1) Should there be a nsSegmentedString derived from nsString instead
|
|
> of building segment support into nsString? None of my strings are
|
|
> segmented but
|
|
> I keep executing code that is supports it. nsPromiseFlatString would
|
|
> be trivial in the non-segmented case.
|
|
|
|
<p>The general case is that a string does not promise to have contiguous
|
|
data. A specific case is that, for some implementations, it does.
|
|
You couldn't do it the other way around, because a segmented string
|
|
couldn't satisfy all the promises of a flat string. However, through
|
|
the use of chunky iterators, operating on strings that happen to be
|
|
flat is very efficient. In fact, <span class="code">nsPromiseFlatString</span> is trivial in
|
|
the non-segmented case. In addition, I'll be adding an abstract flat
|
|
class into the hierarchy, which will present additional interface ...
|
|
in your local routines where you actually have declared a concrete
|
|
string instance that happens to be flat, the compiler will give you
|
|
the benefit of using the flat specific routines (e.g., a substring
|
|
object over a flat string is simpler than the general purpose
|
|
substring). I need to be cautious about this, though, since I don't
|
|
automatically want people propagating the flat type through their
|
|
interfaces. That would put us in the same boat we're in right now ...
|
|
where routines only work on a specific kind of string, which denies
|
|
other parts of the code the opportunity to use an implementation
|
|
beneficial to its specific needs, and typically for no good reason.
|
|
|
|
>
|
|
> 2) Should nsAWritableString have a way to get the buffer and then
|
|
> return it?
|
|
> I need to get the buffer to pass it to OS calls. I'm doing this now
|
|
> by passing around nsStrings instead of the interface. If I just use
|
|
> the interface I encur an extra copy since I have to use a temporary
|
|
> buffer.
|
|
|
|
<p>A specific string implementation could promise this, but in general, a
|
|
writable could not. After all, a writable doesn't even guarantee
|
|
contiguous storage. To some degree, this is what
|
|
<span class="code">nsPromiseFlatString</span> is for. However, this is a readable promise
|
|
only. It will also be the case that <span class="code">ns[C]String</span>s, in the very near
|
|
future will be able to just assume ownership of an arbitrary buffer
|
|
allocated on the free store with the XPCOM allocators ... getting one
|
|
to give up its buffer, on the other hand, presents some problems. Do
|
|
you have a lot of places where the system writes into your string
|
|
buffer space? Or do you have a lot of system routines that return you
|
|
new buffers? I can imagine using <span class="code">nsPromiseFlatString</span> for this, but
|
|
what happens when the OS alters the underlying data? If the promise
|
|
had generated that flat data on behalf of a multi-fragment string,
|
|
should it now put the changes back? It's possible to do, I just want
|
|
to know if it's correct to allow this situation to happen.
|
|
|
|
|
|
|
|
>
|
|
> 3) There needs to be a NS_LITERAL_CHAR() to go along with
|
|
> NS_LITERAL_STRING().
|
|
|
|
<p>OK.
|
|
|
|
|
|
|
|
> Having NS_LITERAL_STRING() all over the code clutters
|
|
> it up and makes it hard to tell what the code is doing, could we
|
|
> have a standard short alias for this?
|
|
|
|
<p>Yes, I'll try to think of something ... perhaps <span class="code">NS_LSTR</span>?
|
|
|
|
|
|
> 4) nsLiteralString should support n.ToInteger(&error);
|
|
|
|
<p><span class="code">ToInteger</span> is actually a bad interface. It's only good if your
|
|
entire string is the number; this encourages you to edit your string
|
|
until it is one, or perhaps copy the numeric part to another string.
|
|
Better if you just <span class="code">sscanf</span> a string (don't know if I can provide
|
|
that in the general case, but I'm thinking about it), or else use
|
|
regular C++ extractors (which wouldn't be too hard for me to
|
|
provide), or else I could give you a <span class="code">ToInteger</span> that works on a pair
|
|
of iterators, extracting the integer from the digits between them.
|
|
|
|
>
|
|
> 5) There should be a global define for an interface to a readonly
|
|
> empty string.
|
|
|
|
<p>Yes, there will be.
|
|
|
|
|
|
>
|
|
> 6) Something is wrong with concatenation....
|
|
|
|
<p>Hopefully I've fixed this now.
|
|
|
|
|
|
|
|
> 8) A forward definition is missing in the h files
|
|
|
|
<p>I'll check it out.
|
|
|
|
|
|
|
|
<p>My understanding is that you have already found the answers to your
|
|
other questions.
|
|
|
|
<p>I hope this helps,
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Wed, 20 Sep 2000 17:32:13 -0400
|
|
Subject: Re: how to free an nsString::ToNewCString
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>What's the current approved way to free an nsString::ToNewCString?
|
|
</pre>
|
|
|
|
<p><span class="code">nsMemory::Free</span>
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
|
|
<p>You use several <span class="code">NS_ConvertASCIItoUCS2("...").get()</span>, these should be
|
|
|
|
NS_LITERAL_STRING("...").get()
|
|
|
|
<p>Don't do this to the very first case where you aren't wrapping an actual literal string.
|
|
The first instance would should exploit <span class="code">NS_LITERAL_STRING</span> technology as well,
|
|
around the initial declarations of the strings ... probably want to do this with
|
|
<span class="code">NS_NAMED_LITERAL_STRING</span>.
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 12 Oct 2000 00:57:28 -0400
|
|
Subject: string answers
|
|
</pre>
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsresult
|
|
DoSomething( nsAWritableString& answer )
|
|
{
|
|
nsresult rv;
|
|
|
|
nsXPIDLString registry_data;
|
|
Fetch("key", getter_Shares(registry_data));
|
|
|
|
nsLiteralString path(not_my_string);
|
|
|
|
PRInt32 first_colon = path.FindChar(PRUnichar(':'));
|
|
if ( first_colon != -1 )
|
|
{
|
|
// convert ... extract path from |path|
|
|
nsCOMPtr<nsILocalFile> localFile( do_CreateInstance(CID, &rv)
|
|
);
|
|
if ( localFile )
|
|
{
|
|
|
|
localFile->SetPersistentDescriptor(NS_ConvertUCS2toUTF8(path));
|
|
|
|
nsXPIDLString converted_path;
|
|
localFile->GetUnicodePath(getter_Copies(converted_path));
|
|
answer = converted_path.get();
|
|
}
|
|
}
|
|
else
|
|
{
|
|
answer = path;
|
|
}
|
|
|
|
|
|
return rv;
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 12 Oct 2000 02:03:49 -0400
|
|
Subject: Re: and the answer is ...
|
|
</pre>
|
|
|
|
<p>You can see from the line of code that you're on, that this should
|
|
have been fine. <span class="code">nsMemory::Alloc</span> would be asked to allocate a 1 byte
|
|
object. But it failed trying to allocate that. Which suggests that
|
|
the allocator was busy and non-reentrant and the debugger tried to
|
|
misuse it. Yes?
|
|
|
|
<p>Of course, this doesn't solve your problem. Perhaps we need to go
|
|
back to the idea of a function that returns a pointer to the first
|
|
hunk of the string.
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
const char*
|
|
debug_string( const nsAReadableCString& aCString )
|
|
{
|
|
nsReadingIterator<char> iter;
|
|
aCString.BeginReading(iter);
|
|
return aCString.IsEmpty() ? "" : iter.get();
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This code should work regardless of what the allocator is doing. The
|
|
downsides are (a) it only returns the first hunk of the string, in the
|
|
case of a multi-fragment string; and (b) that hunk <strong>might</strong> not be
|
|
zero-terminated.
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 12 Oct 2000 08:30:32 -0400
|
|
Subject: Re: Self healing the cache :-)
|
|
</pre>
|
|
|
|
<p>At 3:04 PM -0400 10/11/00, Mike Shaver wrote:
|
|
<pre class="email-quote">
|
|
>NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID);
|
|
</pre>
|
|
|
|
<p>Macro ugliness makes <span class="code">NS_LITERAL_STRING</span> inappropriate for use over
|
|
other macros. In other words:
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_LITERAL_STRING("foo")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>is <strong>good</strong>.
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
#define FOO "foo"
|
|
NS_LITERAL_STRING(FOO)
|
|
</pre>
|
|
</div>
|
|
|
|
<p>is <strong>bad</strong>. Why? Because it turns into
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsLiteralString(LFOO, sizeof(LFOO)...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>and there is no <span class="code">LFOO</span>. Sorry. If you have to do this to a
|
|
macro-ized string, do the magic by hand, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsLiteralString(FOO, sizeof(FOO)/sizeof(PRUnichar)
|
|
+ sizeof(PRUnichar('\0')))
|
|
</pre>
|
|
</div>
|
|
|
|
<p>or else if you don't care that <span class="code">nsLiteralString</span> will scan for the
|
|
length, just say
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsLiteralString(FOO)
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 12 Oct 2000 08:36:14 -0400
|
|
Subject: Re: Self healing the cache :-)
|
|
</pre>
|
|
|
|
<p>Actually, I'm not even sure you can do it by hand, since you didn't
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
#define FOO L"foo"
|
|
</pre>
|
|
</div>
|
|
|
|
<p>and <strong>can't</strong> do that cross-platform. The other way around this is to
|
|
define a global instead of a macro, that is, instead of saying
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
#define FOO "foo"
|
|
</pre>
|
|
</div>
|
|
|
|
<p>at the top of your file, say
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_NAMED_LITERAL_STRING(FOO, "foo")
|
|
</pre>
|
|
</div>
|
|
|
|
<p>or else, if the macro was used only in one spot ... perhaps you could
|
|
just eliminate the macro in favor of <span class="code">NS_NAMED_LITERAL</span> in situ.
|
|
|
|
<p>Arghh. In this case, you may be stuck with the extra work of
|
|
<span class="code">AssignWithConversion</span>.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Sun, 3 Dec 2000 16:38:07 -0400
|
|
Subject: Re: another copy_string question
|
|
</pre>
|
|
|
|
<pre class="email-quote">
|
|
>Is there a way to tell, inside the write() sink, if one is in the
|
|
>final hunk? I need to do some special processing at the end.
|
|
</pre>
|
|
|
|
<p>No, there isn't. But you could move such special processing into the
|
|
destructor of the sink. Remember, the sink is passed by reference, so
|
|
you can exactly control its lifetime.
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
{
|
|
MySink sink;
|
|
nsReadingIterator<PRUnichar> sourceStart = aStr.BeginReading();
|
|
nsReadingIterator<PRUnichar> sourceEnd = aStr.EndReading();
|
|
copy_string(sourceStart, sourceEnd, sink);
|
|
// |sink| destructor executed here
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Fri, 15 Dec 2000 20:02:08 -0400
|
|
Subject: fragment of code
|
|
</pre>
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsPromiseFlatString flatKey(aReadable);
|
|
|
|
flatKey.get()
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Tue, 16 Jan 2001 16:47:37 -0400
|
|
Subject: Re: a few string questions...
|
|
</pre>
|
|
|
|
>I've accumulated a few questions I've been wanting to ask you, mostly
|
|
>about string stuff. Nothing urgent, but I want to ask them before I
|
|
>forget. So here goes...:
|
|
>
|
|
>1) Is it acceptable to use nsLiteralCString or nsLiteralString on
|
|
>something that's not a literal? This can be useful in some places,
|
|
>for example, to convert a char* to PRUnichar*:
|
|
>
|
|
>PRUnichar* new = ToNewUnicode(nsLiteralCString(myCharPtr));
|
|
|
|
<p>This is explicitly allowed. That's why I'm proposing to change the
|
|
names of those classes to <span class="code">nsLocal[C]String</span>.
|
|
|
|
|
|
>2) Should nsString2x.h and nsString2x.cpp go away? They look like a
|
|
>never-completed rewrite or something...
|
|
|
|
<p>Yes. They should go away. They are uncompleted [old] bullshit,
|
|
exactly as you diagnosed.
|
|
|
|
<p>I'll look into the other two questions.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Thu, 1 Feb 2001 15:12:41 -0400
|
|
Subject: Re: [Fwd: bad string, bad string]
|
|
</pre>
|
|
|
|
<p>We've been removing implicit conversion operators because they
|
|
_always_ lead to trouble. Usually they make it harder to pick the
|
|
right function when overloading is involved and in the past they have
|
|
led to huge performance suckage because we ended up doing conversions
|
|
when we didn't need to because the implicit operator made us pick the
|
|
wrong function.
|
|
|
|
<p>It's borderline when the class implements something that is <strong>so</strong>
|
|
close, as with a guaranteed flat string or an <span class="code">nsCOMPtr</span> ... but the
|
|
general recommendation is to avoid implicit conversions.
|
|
|
|
<p>See bug #53057.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Tue, 6 Feb 2001 18:52:23 -0400
|
|
Subject: seeking review for bug #57087
|
|
</pre>
|
|
|
|
<p> bug:
|
|
<a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=57087">http://bugzilla.mozilla.org/show_bug.cgi?id=57087</a>
|
|
|
|
patch:
|
|
<a class="exact-uri" href="http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576">http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576</a>
|
|
|
|
<p>This patch is supposed to add the ability to define very long literal
|
|
strings more easily by breaking lines, e.g.,
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_MULTILINE_LITERAL( NS_L("This is the start of a very long line")
|
|
NS_L(" which actually continues across")
|
|
NS_L(" a couple more.") )
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The main danger in this scheme is callers who omit the inner <span class="code">NS_L</span>
|
|
wrapping. Though I believe this will be caught at compile time as the
|
|
wrong type initializer.
|
|
|
|
<p>Seeking input from everybody, and waterson in particular.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Wed, 14 Feb 2001 16:09:10 -0400
|
|
Subject: Re: Question...
|
|
</pre>
|
|
|
|
<p>There are some utilities in "xpcom/ds/nsReadableUtils.h". In
|
|
particular, if you want to get back a new heap-allocated ASCII string
|
|
with the minimal work, you would say
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
PRUnichar* sourceChars = ...;
|
|
|
|
char* destChars = ToNewCString(nsLiteralString(sourceChars));
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<p>It's more efficient if you happen to already know the length. If you
|
|
don't, don't bother counting, that's what I'll do in the constructor
|
|
for <span class="code">nsLiteralString</span>. If you do, then call like this
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
destChars = ToNewCString( nsLiteralString(sourceChars, length) );
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Other routines in that file will help you if, for instance, you wanted
|
|
to translate into a buffer you had already allocated.
|
|
|
|
<p>Hope this helps,
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Fri, 23 Feb 2001 03:12:58 -0400
|
|
Subject: string snippet
|
|
</pre>
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
nsCString aInput;
|
|
|
|
|
|
|
|
nsReadingIterator<char> search_start;
|
|
aInput.BeginReading(search_start);
|
|
|
|
nsReadingIterator<char> search_end;
|
|
aInput.EndReading(search_end);
|
|
|
|
if ( FindCharInReadable(':', search_start, search_end) )
|
|
{
|
|
++search_start;
|
|
return ToNewCString( Substring(aInput, search_start, search_end)
|
|
);
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Wed, 7 Mar 2001 19:44:08 -0400
|
|
Subject: string help
|
|
</pre>
|
|
|
|
<p>Here you go, Mike:
|
|
|
|
http://scottcollins.net/journal/discussion/mjudge-scratch.cpp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Fri, 9 Mar 2001 20:56:07 -0400
|
|
Subject: Re: string assertions
|
|
</pre>
|
|
|
|
<p>If you get an iterator into a string and you advance it all the way to
|
|
the end of the string, and then <strong>keep</strong> trying to advance it, you hit
|
|
this assert. This could happen, for example if you tried to copy 10
|
|
characters out of a 9 character string. I've tried to make this
|
|
impossible to get to. As far as I know, all my routines trim requests
|
|
in advance of manipulating iterators. When you see this, you should
|
|
get the stack. That will take you right to the bad spot.
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
<pre>
|
|
Date: Sat, 31 Mar 2001 11:04:03 -0400
|
|
Subject: Re: Sun bustage and string advice
|
|
</pre>
|
|
|
|
<p>You do know you are comparing two pointers now? It seems unlikely
|
|
those two pointers would ever be the same pointer. You probably want
|
|
to say something like
|
|
|
|
<div class="source-code">
|
|
<pre>
|
|
NS_LITERAL_STRING("foo").Equals(aTopic) // or
|
|
|
|
NS_LITERAL_STRING("foo") == nsLiteralString(aTopic)
|
|
</pre>
|
|
</div>
|
|
|
|
<p>...so that you compare the <strong>contents</strong> of two strings. Right now,
|
|
you're just testing to see if two pointers both point to the same
|
|
location in memory. A lot of people make this mistake. I would like
|
|
to make it obvious to people that comparing two pointers does not
|
|
compare strings. Can you tell me what gave you that impression so
|
|
that I can figure out how to better educate people not to do this? By
|
|
the way, it's not that I don't <strong>want</strong> to make this compare two
|
|
strings; it's that in C++, you can't override operations for built-in
|
|
types. And pointers are built-in types. So I can't make
|
|
<span class="code">operator==(const PRUnichar*, const PRUnichar*)</span> do anything different
|
|
than it already does, which is the same thing it does for any other
|
|
pointer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- .................................................................End Matter -->
|
|
|
|
|
|
|
|
</body>
|
|
</html>
|