gecko-dev/parser/html/nsHtml5SpeculativeLoad.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

420 lines
17 KiB
C
Raw Normal View History

/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
#ifndef nsHtml5SpeculativeLoad_h
#define nsHtml5SpeculativeLoad_h
#include "nsString.h"
#include "nsContentUtils.h"
#include "nsHtml5DocumentMode.h"
#include "nsHtml5String.h"
#include "ReferrerInfo.h"
class nsHtml5TreeOpExecutor;
enum eHtml5SpeculativeLoad {
eSpeculativeLoadUninitialized,
eSpeculativeLoadBase,
eSpeculativeLoadCSP,
eSpeculativeLoadMetaReferrer,
eSpeculativeLoadImage,
eSpeculativeLoadOpenPicture,
eSpeculativeLoadEndPicture,
eSpeculativeLoadPictureSource,
eSpeculativeLoadScript,
eSpeculativeLoadScriptFromHead,
eSpeculativeLoadNoModuleScript,
eSpeculativeLoadNoModuleScriptFromHead,
eSpeculativeLoadStyle,
eSpeculativeLoadManifest,
eSpeculativeLoadSetDocumentCharset,
eSpeculativeLoadSetDocumentMode,
eSpeculativeLoadPreconnect,
eSpeculativeLoadFont,
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
eSpeculativeLoadFetch,
eSpeculativeLoadMaybeComplainAboutCharset
};
class nsHtml5SpeculativeLoad {
using Encoding = mozilla::Encoding;
template <typename T>
using NotNull = mozilla::NotNull<T>;
public:
nsHtml5SpeculativeLoad();
~nsHtml5SpeculativeLoad();
inline void InitBase(nsHtml5String aUrl) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadBase;
aUrl.ToString(mUrlOrSizes);
}
inline void InitMetaCSP(nsHtml5String aCSP) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadCSP;
nsString csp; // Not Auto, because using it to hold nsStringBuffer*
aCSP.ToString(csp);
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity.Assign(
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(csp));
}
inline void InitMetaReferrerPolicy(nsHtml5String aReferrerPolicy) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadMetaReferrer;
nsString
referrerPolicy; // Not Auto, because using it to hold nsStringBuffer*
aReferrerPolicy.ToString(referrerPolicy);
mReferrerPolicyOrIntegrity.Assign(
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(
referrerPolicy));
}
inline void InitImage(nsHtml5String aUrl, nsHtml5String aCrossOrigin,
nsHtml5String aMedia, nsHtml5String aReferrerPolicy,
nsHtml5String aSrcset, nsHtml5String aSizes,
bool aLinkPreload) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadImage;
aUrl.ToString(mUrlOrSizes);
aCrossOrigin.ToString(mCrossOrigin);
aMedia.ToString(mMedia);
nsString
referrerPolicy; // Not Auto, because using it to hold nsStringBuffer*
aReferrerPolicy.ToString(referrerPolicy);
mReferrerPolicyOrIntegrity.Assign(
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(
referrerPolicy));
aSrcset.ToString(mCharsetOrSrcset);
aSizes.ToString(
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity);
mIsLinkPreload = aLinkPreload;
mInitTimestamp = mozilla::TimeStamp::Now();
}
inline void InitFont(nsHtml5String aUrl, nsHtml5String aCrossOrigin,
nsHtml5String aMedia, nsHtml5String aReferrerPolicy) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadFont;
aUrl.ToString(mUrlOrSizes);
aCrossOrigin.ToString(mCrossOrigin);
aMedia.ToString(mMedia);
nsString
referrerPolicy; // Not Auto, because using it to hold nsStringBuffer*
aReferrerPolicy.ToString(referrerPolicy);
mReferrerPolicyOrIntegrity.Assign(
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(
referrerPolicy));
// This can be only triggered by <link rel=preload type=font>
mIsLinkPreload = true;
}
inline void InitFetch(nsHtml5String aUrl, nsHtml5String aCrossOrigin,
nsHtml5String aMedia, nsHtml5String aReferrerPolicy) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadFetch;
aUrl.ToString(mUrlOrSizes);
aCrossOrigin.ToString(mCrossOrigin);
aMedia.ToString(mMedia);
nsString
referrerPolicy; // Not Auto, because using it to hold nsStringBuffer*
aReferrerPolicy.ToString(referrerPolicy);
mReferrerPolicyOrIntegrity.Assign(
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(
referrerPolicy));
// This method can be only be triggered by <link rel=preload type=fetch>,
// hence this operation is always a preload.
mIsLinkPreload = true;
}
// <picture> elements have multiple <source> nodes followed by an <img>,
// where we use the first valid source, which may be the img. Because we
// can't determine validity at this point without parsing CSS and getting
// main thread state, we push preload operations for picture pushed and
// popped, so that the target of the preload ops can determine what picture
// and nesting level each source/img from the main preloading code exists
// at.
inline void InitOpenPicture() {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadOpenPicture;
}
inline void InitEndPicture() {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadEndPicture;
}
inline void InitPictureSource(nsHtml5String aSrcset, nsHtml5String aSizes,
nsHtml5String aType, nsHtml5String aMedia) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadPictureSource;
aSrcset.ToString(mCharsetOrSrcset);
aSizes.ToString(mUrlOrSizes);
aType.ToString(
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity);
aMedia.ToString(mMedia);
}
inline void InitScript(nsHtml5String aUrl, nsHtml5String aCharset,
nsHtml5String aType, nsHtml5String aCrossOrigin,
nsHtml5String aMedia, nsHtml5String aIntegrity,
nsHtml5String aReferrerPolicy, bool aParserInHead,
bool aAsync, bool aDefer, bool aNoModule,
bool aLinkPreload) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
if (aNoModule) {
mOpCode = aParserInHead ? eSpeculativeLoadNoModuleScriptFromHead
: eSpeculativeLoadNoModuleScript;
} else {
mOpCode = aParserInHead ? eSpeculativeLoadScriptFromHead
: eSpeculativeLoadScript;
}
aUrl.ToString(mUrlOrSizes);
aCharset.ToString(mCharsetOrSrcset);
aType.ToString(
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity);
aCrossOrigin.ToString(mCrossOrigin);
aMedia.ToString(mMedia);
aIntegrity.ToString(mReferrerPolicyOrIntegrity);
nsAutoString referrerPolicy;
aReferrerPolicy.ToString(referrerPolicy);
referrerPolicy =
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(
referrerPolicy);
mScriptReferrerPolicy =
mozilla::dom::ReferrerInfo::ReferrerPolicyAttributeFromString(
referrerPolicy);
mIsAsync = aAsync;
mIsDefer = aDefer;
mIsLinkPreload = aLinkPreload;
}
inline void InitImportStyle(nsString&& aUrl) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadStyle;
mUrlOrSizes = std::move(aUrl);
mCharsetOrSrcset.SetIsVoid(true);
mCrossOrigin.SetIsVoid(true);
mMedia.SetIsVoid(true);
mReferrerPolicyOrIntegrity.SetIsVoid(true);
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity.SetIsVoid(
true);
}
inline void InitStyle(nsHtml5String aUrl, nsHtml5String aCharset,
nsHtml5String aCrossOrigin, nsHtml5String aMedia,
nsHtml5String aReferrerPolicy, nsHtml5String aIntegrity,
bool aLinkPreload) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadStyle;
aUrl.ToString(mUrlOrSizes);
aCharset.ToString(mCharsetOrSrcset);
aCrossOrigin.ToString(mCrossOrigin);
aMedia.ToString(mMedia);
nsString
referrerPolicy; // Not Auto, because using it to hold nsStringBuffer*
aReferrerPolicy.ToString(referrerPolicy);
mReferrerPolicyOrIntegrity.Assign(
nsContentUtils::TrimWhitespace<nsContentUtils::IsHTMLWhitespace>(
referrerPolicy));
aIntegrity.ToString(
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity);
mIsLinkPreload = aLinkPreload;
}
/**
* "Speculative" manifest loads aren't truly speculative--if a manifest
* gets loaded, we are committed to it. There can never be a <script>
* before the manifest, so the situation of having to undo a manifest due
* to document.write() never arises. The reason why a parser
* thread-discovered manifest gets loaded via the speculative load queue
* as opposed to tree operation queue is that the manifest must get
* processed before any actual speculative loads such as scripts. Thus,
* manifests seen by the parser thread have to maintain the queue order
* relative to true speculative loads. See bug 541079.
*/
inline void InitManifest(nsHtml5String aUrl) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadManifest;
aUrl.ToString(mUrlOrSizes);
}
/**
* We communicate the encoding change via the speculative operation
* queue in order to act upon it as soon as possible and so as not to
* have speculative loads generated after an encoding change fail to
* make use of the encoding change.
*/
inline void InitSetDocumentCharset(NotNull<const Encoding*> aEncoding,
int32_t aCharsetSource,
bool aCommitEncodingSpeculation) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadSetDocumentCharset;
mCharsetOrSrcset.~nsString();
mEncoding = aEncoding;
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity.Assign(
(char16_t)aCharsetSource);
mCommitEncodingSpeculation = aCommitEncodingSpeculation;
}
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
inline void InitMaybeComplainAboutCharset(const char* aMsgId, bool aError,
int32_t aLineNumber) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadMaybeComplainAboutCharset;
mCharsetOrSrcset.~nsString();
mMsgId = aMsgId;
mIsError = aError;
// Transport a 32-bit integer as two 16-bit code units of a string
// in order to avoid adding an integer field to the object.
// See https://bugzilla.mozilla.org/show_bug.cgi?id=1733043 for a better
// eventual approach.
char16_t high = (char16_t)(((uint32_t)aLineNumber) >> 16);
char16_t low = (char16_t)(((uint32_t)aLineNumber) & 0xFFFF);
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity.Assign(high);
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity.Append(low);
}
/**
* Speculative document mode setting isn't really speculative. Once it
* happens, we are committed to it. However, this information needs to
* travel in the speculation queue in order to have this information
* available before parsing the speculatively loaded style sheets.
*/
inline void InitSetDocumentMode(nsHtml5DocumentMode aMode) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadSetDocumentMode;
mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity.Assign(
(char16_t)aMode);
}
inline void InitPreconnect(nsHtml5String aUrl, nsHtml5String aCrossOrigin) {
MOZ_ASSERT(mOpCode == eSpeculativeLoadUninitialized,
"Trying to reinitialize a speculative load!");
mOpCode = eSpeculativeLoadPreconnect;
aUrl.ToString(mUrlOrSizes);
aCrossOrigin.ToString(mCrossOrigin);
}
void Perform(nsHtml5TreeOpExecutor* aExecutor);
private:
nsHtml5SpeculativeLoad(const nsHtml5SpeculativeLoad&) = delete;
nsHtml5SpeculativeLoad& operator=(const nsHtml5SpeculativeLoad&) = delete;
eHtml5SpeculativeLoad mOpCode;
/**
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
* Whether the refering element has async attribute.
*/
bool mIsAsync;
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
/**
* Whether the refering element has defer attribute.
*/
bool mIsDefer;
/**
* True if and only if this is a speculative load initiated by <link
* rel="preload"> tag encounter. Passed to the handling loader as an
* indication to raise the priority.
*/
bool mIsLinkPreload;
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
/**
* Whether the charset complaint is an error.
*/
bool mIsError;
/**
* Whether setting document encoding involves also committing to an encoding
* speculation.
*/
bool mCommitEncodingSpeculation;
/* If mOpCode is eSpeculativeLoadPictureSource, this is the value of the
* "sizes" attribute. If the attribute is not set, this will be a void
* string. Otherwise it empty or the value of the url.
*/
nsString mUrlOrSizes;
/**
* If mOpCode is eSpeculativeLoadScript[FromHead], this is the value of the
* "integrity" attribute. If the attribute is not set, this will be a void
* string. Otherwise it is empty or the value of the referrer policy.
*/
nsString mReferrerPolicyOrIntegrity;
/**
* If mOpCode is eSpeculativeLoadStyle or eSpeculativeLoadScript[FromHead]
* then this is the value of the "charset" attribute. For
* eSpeculativeLoadSetDocumentCharset it is the charset that the
* document's charset is being set to. If mOpCode is eSpeculativeLoadImage
* or eSpeculativeLoadPictureSource, this is the value of the "srcset"
* attribute. If the attribute is not set, this will be a void string.
* Otherwise it's empty.
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
* For eSpeculativeLoadMaybeComplainAboutCharset mMsgId is used.
*/
union {
nsString mCharsetOrSrcset;
const Encoding* mEncoding;
Bug 1701828 - meta charset rewrite. r=smaug Implements https://github.com/whatwg/html/issues/6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808
2021-12-08 11:34:20 +00:00
const char* mMsgId;
};
/**
* If mOpCode is eSpeculativeLoadSetDocumentCharset, this is a
* one-character string whose single character's code point is to be
* interpreted as a charset source integer. If mOpCode is
* eSpeculativeLoadSetDocumentMode, this is a one-character string whose
* single character's code point is to be interpreted as an
* nsHtml5DocumentMode. If mOpCode is eSpeculativeLoadCSP, this is a meta
* element's CSP value. If mOpCode is eSpeculativeLoadImage, this is the
* value of the "sizes" attribute. If the attribute is not set, this will
* be a void string. If mOpCode is eSpeculativeLoadStyle, this
* is the value of the "integrity" attribute. If the attribute is not set,
* this will be a void string. Otherwise it is empty or the value of the
* referrer policy. Otherwise, it is empty or the value of the type attribute.
*/
nsString mTypeOrCharsetSourceOrDocumentModeOrMetaCSPOrSizesOrIntegrity;
/**
* If mOpCode is eSpeculativeLoadImage or eSpeculativeLoadScript[FromHead]
* or eSpeculativeLoadPreconnect or eSpeculativeLoadStyle this is the value of
* the "crossorigin" attribute. If the attribute is not set, this will be a
* void string.
*/
nsString mCrossOrigin;
/**
* If mOpCode is eSpeculativeLoadPictureSource or eSpeculativeLoadStyle or
* Fetch or Image or Media or Script this is the value of the relevant "media"
* attribute of the <link rel="preload"> or <link rel="stylesheet">. If the
* attribute is not set, or the preload didn't originate from a <link>, this
* will be a void string.
*/
nsString mMedia;
/**
* If mOpCode is eSpeculativeLoadScript[FromHead] this represents the value
* of the "referrerpolicy" attribute. This field holds one of the values
* (REFERRER_POLICY_*) defined in nsIHttpChannel.
*/
mozilla::dom::ReferrerPolicy mScriptReferrerPolicy;
mozilla::TimeStamp mInitTimestamp;
};
#endif // nsHtml5SpeculativeLoad_h