After some investigation, I was able to find a theoretical codepath
which could lead to the "missing initial frame browsing context" error:
1. Two iframes are created for the same origin, and begin process
switching.
2. The first iframe finishes process switching, but for some reason
(e.g. being in shutdown) the call to `LaunchSubprocessResolve`
errors.
3. The second callback is called and also calls LaunchSubprocessResolve,
which this time returns `true` due to it previously having been
called.
4. The BrowserParent is created in the new content process despite
`InitInternal()` never having been finished, and therefore the
ContentParent never becoming subscribed to the BrowsingContextGroup.
To fix this, I made 2 changes:
1. Abort from process switching if the target process which we're going
to be creating a BrowserParent in `IsDead()`, and
2. Track the return value from `LaunchSubprocessResolve`, so we return
`false` if it is called a second time after a failed content process
launch.
I'm not confident that this is the cause of the crashes, as I was unable
to reproduce the issue.
Differential Revision: https://phabricator.services.mozilla.com/D123548
This will, for example, make it possible to behave differently for a normal navigation,
a reload navigation, a history navigation, and a pushstate navigation.
Differential Revision: https://phabricator.services.mozilla.com/D122532
Looks like the code below does similar thing already for iframes, so this is adding InternalSetRequestedIndex
call only for the top level case.
This is based on the same assumptions as bug 1697905, but unfortunately testing this is still
super hard.
Differential Revision: https://phabricator.services.mozilla.com/D122355
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
To support more cases, change this value to more general name and use a count instead, if the count is larger than zero, then we would not suspend the page.
In addition, this value now can be set in any processes (but still for the top level only), which is different from before where we would only set the value from the chrome process.
Differential Revision: https://phabricator.services.mozilla.com/D119837
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671
This accomplishes 2 things:
1. Allows us to directly fetch the layersId of the process that is
autoscrolling, which avoids having to fetch it in AutoScrollChild and pass it
around. This fixes autoscrolling out-of-process frames with Fission enabled.
2. Makes it easier to handle autoscrolling of in-process documents, since that
can't happen through PBrowser.
Differential Revision: https://phabricator.services.mozilla.com/D120766
This accomplishes 2 things:
1. Allows us to directly fetch the layersId of the process that is
autoscrolling, which avoids having to fetch it in AutoScrollChild and pass it
around. This fixes autoscrolling out-of-process frames with Fission enabled.
2. Makes it easier to handle autoscrolling of in-process documents, since that
can't happen through PBrowser.
Differential Revision: https://phabricator.services.mozilla.com/D120766
Instead of initializing on first use - which can race with workers - do the
initialization in nsLayoutStatics::Initialize(). Also make `sInShutdown` an
Atomic since it is accessed without locks.
Differential Revision: https://phabricator.services.mozilla.com/D121143
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Differential Revision: https://phabricator.services.mozilla.com/D120674
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
1. The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch.
2. Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier.
3. Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement.
4. ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging.
5. Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Differential Revision: https://phabricator.services.mozilla.com/D120673
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Differential Revision: https://phabricator.services.mozilla.com/D120671