AnimatedFrameDiscardingQueue subclasses AnimationFrameBuffer to allow a
cleaner abstraction over the behaviour change when we cross the
threshold of too high a memory footprint for an animated image. The next
patch will build on top of this to provide an abstraction to reuse the
discarded frames.
Differential Revision: https://phabricator.services.mozilla.com/D7515
This patch makes AnimationSurfaceProvider use the new abstractions
provided by AnimationFrameBuffer and AnimationFrameRetainedBuffer to
provide storage and lifetime management of decoders and the produced
frames. We initially start out with an implementation that will just
keep every frame forever, like our historical behaviour. The next patch
will add support for discarding.
Differential Revision: https://phabricator.services.mozilla.com/D7514
In the next few patches, AnimationFrameBuffer will be reworked to split
the discarding behaviour from the retaining behaviour. Once implemented
as separate classes, it will allow easier reuse of the discarding code
for recycling.
Differential Revision: https://phabricator.services.mozilla.com/D7513
The owner for the decoder may implement IDecoderFrameRecycler to allow
the decoder to request a recycled frame instead of allocating a new one.
If none are available, it will fallback to allocating a new frame.
Not only may the IDecoderFrameRecycler not have any frames available for
recycling, the recycled frame itself may still be in use by other
entities outside of imagelib. Additionally it may still be required by
BlendAnimationFilter to restore the previous frame's data. It may even
be the same frame as to restore. In the worst case, we will simply
choose to allocate an entirely new frame, just like before.
When we allocate a new frame, that means the old frame we tried to
recycle will be taken out of circulation and not reused again,
regardless of why it failed.
Differential Revision: https://phabricator.services.mozilla.com/D7512
Beyond the necessary reinitialization methods, we need to protect
ourselves from recycling a frame that some other entity in the browser
is still using. Generally speaking the animated surface will only be
used in imgFrame::Draw since we don't layerize animated images, which
will be safe. However with OMTP or blob recordings, we could retain a
reference to the surface outside the current stack context. Additional
if something calls RasterImage::GetImageContainer(AtSize) or
RasterImage::GetFrame(AtSize), it may also have a reference to the
surface for an indetermine period of time.
As such, if an imgFrame is a candidate for recycling, it will wrap
imgFrame::mLockedSurface in a RecyclingSourceSurface. Its job is to
track how many consumers there are still of the surface, so that after
we advance the animation, the decoder will know if there are still
outstanding consumers.
If the surface is still in use, it will block for a finite period of
time (the refresh interval) before giving up on reclaiming the surface,
and will allocate a new surface. The old surface can then remain in
circulation for as long as necessary without further blocking the
animation progression, since we stop recycling that surface/imgFrame.
Differential Revision: https://phabricator.services.mozilla.com/D7511
Since imgFrame::Draw will limit the drawing to only look at pixels that
we have written to and posted an invalidation for, there is no need to
hold the monitor while doing so. By taking the most expensive operation
outside the lock, we will minimize our chances of hitting contention
with the decoder thread.
A later part in this series will require that a surface be freed outside
the lock because it may end up reacquiring it. In addition to the
contention win, this change should facilitate that.
Differential Revision: https://phabricator.services.mozilla.com/D7510
If we discard a frame during decoding, e.g. due to an error, then we
don't want to take that frame into account for the first frame refresh
area. We should also be handling partial frames here (the dirty rect
needs to encompass the rows that did not get written with actual pixel
data). The only place that can have the necessary information is at the
end rather than at the beginning.
Differential Revision: https://phabricator.services.mozilla.com/D7509
The clear rect and the recycle rect can overlap, and depending on the
size of the clear rect, it could be a significant amount of data that
needs to be copied from the restore frame. This patch minimizes the
copying for a row which contains both the recycle rect and the clear
rect.
Differential Revision: https://phabricator.services.mozilla.com/D7508
Given an invalidation rect, called the recycle rect, which represents
the area which may have changed between the current frame and the frame
we are recycling, we can not only reuse the buffer itself to avoid an
allocation and free, we can also avoid copying pixel data from the
restore frame which is already set.
Differential Revision: https://phabricator.services.mozilla.com/D7507
When blending full frames off the main thread, FrameAnimator no longer
requires access to the raw data of the frame to advance the animation.
Now we only request a RawAccessFrameRef for the current/next frames when
we have discovered that we need to do blending on the main thread.
In addition to avoiding the mutex overhead of RawAccessFrameRef, this
will also facilitate potentially optimizing the surfaces for the
DrawTarget for individual animated image frames.
Differential Revision: https://phabricator.services.mozilla.com/D7506
We were marking them used even if only a decode was requested.
This can cause us to hold extra decoded copies of the image around because we have a tendency to request decode at the intrinsic size.
Calls to do_QueryInterface to a base class can be replaced by a static
cast, which is faster.
Differential Revision: https://phabricator.services.mozilla.com/D7224
--HG--
extra : moz-landing-system : lando
If class A is derived from class B, then an instance of class A can be
converted to B via a static cast, so a slower QI is not needed.
Differential Revision: https://phabricator.services.mozilla.com/D6861
--HG--
extra : moz-landing-system : lando
The lack of clarity over which functions initiate observing and which don't
is a headache since it makes it hard to reason about what's going on. This
rename makes it explicit in the function names.
Differential Revision: https://phabricator.services.mozilla.com/D7187
--HG--
extra : rebase_source : 1f2f86423a9bee7843533c09b3ea78416b233bcd
extra : amend_source : a89125d6a3b7b75a4056c4d600de74a5386ac4ff
If we do not pass the high quality scaling flag than the resulting surface will be marked as cannot substitute, which is not accurate, so we don't want.
The only place that actually tries to be smart about the size is nsImageFrame::MaybeDecodeForPredictedSize. All other cases just ask for the intrinsic size.
The two most likely cases are that there are no decoded copies of the image, or there is one decoded (or in progress) copy of the image.
In the first case we will request decode at the instrinsic size, and then if we draw at a different size that draw will request the proper size. This doesn't change with this patch.
In the second case there is a decoded copy already available, this is likely from a draw call on the image, and that is the surface size that we want. So we save a decode. If we are actually drawing the image at two different sizes the second size will be slightly delayed, but we have the wrongly sized copy of the image that we can draw until then. This seems like a good tradeoff to avoid always decoding an instrinic size copy of images.
By delegating responsibility for shared surfaces reporting to imagelib,
we can cross reference the GPU shared surfaces cache with the local
surface cache in a content process (or the main process). This will
allow us to identify entries that are in the GPU cache but not in the
content/main process cache, and aid in debugging memory leaks. This
functionality is pref'd off by default behind image.mem.debug-reporting.
Additionally, we want to report every entry that was mapped into the
compositor process, in the compositor process memory report. This will
give us a sense of how much of our resident memory is consumed by mapped
images in absence of the more detailed cross referencing above.
At present, surface providers roll up all of their individual surfaces
into a single reporting unit. Specifically this means animated image
frames are all reported as a block. This patch removes that
consolidation and reports every frame as its own SurfaceMemoryReport.
This is important because each frame may have its own external image ID,
and we want to cross reference that with what we expect from the GPU
shared surfaces cache.
By delegating responsibility for shared surfaces reporting to imagelib,
we can cross reference the GPU shared surfaces cache with the local
surface cache in a content process (or the main process). This will
allow us to identify entries that are in the GPU cache but not in the
content/main process cache, and aid in debugging memory leaks. This
functionality is pref'd off by default behind image.mem.debug-reporting.
Additionally, we want to report every entry that was mapped into the
compositor process, in the compositor process memory report. This will
give us a sense of how much of our resident memory is consumed by mapped
images in absence of the more detailed cross referencing above.
At present, surface providers roll up all of their individual surfaces
into a single reporting unit. Specifically this means animated image
frames are all reported as a block. This patch removes that
consolidation and reports every frame as its own SurfaceMemoryReport.
This is important because each frame may have its own external image ID,
and we want to cross reference that with what we expect from the GPU
shared surfaces cache.
If FLAG_HIGH_QUALITY_SCALING is used, we should use
SurfaceCache::LookupBestMatch just like how it is done in RasterImage.
This may provide an alternative size at which we should rasterize the
SVG instead of the requested size. Since SurfaceCache imposes a maximum
size for which it will permit rasterized SVGs, we should also bypass the
cache entirely if we are well above that and simply draw directly to the
draw target in such cases.
With WebRender, it is somewhat more complicated. We will now return
NOT_SUPPORTED if the size is too big, and this should trigger fallback
to blob images. This should only produce drawing commands for the
relevant region and save us the high cost of rasterized a very large
surface on the main thread, which at the same time, looking as crisp as
a user would expect.
There is one main difference between raster images and vector images
with respect to factor of 2 scaling. Vector images may be scaled
infinitely and so we need to extend factor of 2 scaling to permit
growing instead of just shrinking. Also, we don't want to scale
infinitely, so we should configure a maximum size limit. This size limit
will apply even outside of factor of 2 scaling, and so the caller
(VectorImage) will need to be careful to take this into account.
If FLAG_HIGH_QUALITY_SCALING is used, we should use
SurfaceCache::LookupBestMatch just like how it is done in RasterImage.
This may provide an alternative size at which we should rasterize the
SVG instead of the requested size. Since SurfaceCache imposes a maximum
size for which it will permit rasterized SVGs, we should also bypass the
cache entirely if we are well above that and simply draw directly to the
draw target in such cases.
With WebRender, it is somewhat more complicated. We will now return
NOT_SUPPORTED if the size is too big, and this should trigger fallback
to blob images. This should only produce drawing commands for the
relevant region and save us the high cost of rasterized a very large
surface on the main thread, which at the same time, looking as crisp as
a user would expect.
There is one main difference between raster images and vector images
with respect to factor of 2 scaling. Vector images may be scaled
infinitely and so we need to extend factor of 2 scaling to permit
growing instead of just shrinking. Also, we don't want to scale
infinitely, so we should configure a maximum size limit. This size limit
will apply even outside of factor of 2 scaling, and so the caller
(VectorImage) will need to be careful to take this into account.
DecoderFlags::BLEND_ANIMATION will cause the decoder to inject the
BlendAnimationFilter from the previous patch into the SurfacePipe filter
chain. All frames produced by this decoder will be complete, and
should be equivalent to the result outputted by FrameAnimator.
This new SurfaceFilter can be added to a SurfacePipe to perform the
blending of a previous frame with the current partial frame, for an
animated image. This functionality is currently provided by
FrameAnimator and must be performed each time we want to advance the
displayed frame, all on the main thread. Moving this to SurfacePipe
allows us to do the same operation once per frame decode, and on a
decoder thread.
This should reduce the cost of a refresh tick since advancing animated
images is reduced to merely checking if the frame is available. Also, if
the image is below the discard frames threshold (to save memory), then
we will also save CPU due to only blending once at decode.