* Move most CI to GitHub Actions
* Build sdist
* Build manylinux1 wheels with libyaml ext (also tested with 2010 and 2014)
* Build MacOS x86_64 wheels with libyaml ext
* Windows wheel builds remain on AppVeyor until we drop 2.7 support in 6.0
* Smoke tests of all post-build artifacts
* Add PEP517/518 build declaration (pyproject.toml with setuptools backend)
* Fully move build to setuptools
* Drop Python 3.5 support
* Declare Python 3.9 support
* Update PyPI metadata now that setuptools lets it flow through
Co-authored-by: Matt Davis <mrd@redhat.com>
* Prevents arbitrary code execution during python/object/new constructor
In FullLoader python/object/new constructor, implemented by
construct_python_object_apply, has support for setting the state of a
deserialized instance through the set_python_instance_state method.
After setting the state, some operations are performed on the instance
to complete its initialization, however it is possible for an attacker
to set the instance' state in such a way that arbitrary code is executed
by the FullLoader.
This patch tries to block such attacks in FullLoader by preventing
set_python_instance_state from setting arbitrary properties. It
implements a blacklist that includes `extend` method (called by
construct_python_object_apply) and all special methods (e.g. __set__,
__setitem__, etc.).
Users who need special attributes being set in the state of a
deserialized object can still do it through the UnsafeLoader, which
however should not be used on untrusted input. Additionally, they can
subclass FullLoader and redefine `get_state_keys_blacklist()` to
extend/replace the list of blacklisted keys, passing the subclassed
loader to yaml.load.
* Make sure python/object/new constructor does not set some properties
* Add test to show how to subclass FullLoader with new blacklist
* On load, now use aware datetimes if possible
On loading data, if timestamps have an ISO "+HH:MM" UTC offset then the resultant datetime is converted to UTC. This change adds that timezone information to the datetime objects.
Importantly, this addresses a Django warning (and potential error) that appears when using both YAML fixtures in a timezone-aware project. It was raised as a Django issue (https://code.djangoproject.com/ticket/18867), but subsequently closed because the Django devs felt that this is a PyYAML problem.
* Create timezone-aware datetime in timezone from data
* Create timezone-aware datetime in timezone from data for python2
* Define better timezone implementation for python2
* Handle timezone "Z" for python 3
* Handle timezone "Z" for python 2
* Fix code structure for Python 3
Call datetime.datetime constructor once at return.
* Fix code structure for Python 2
Call datetime.datetime constructor once at return.
* remove some unused imports
as suggested by lgtm
https://lgtm.com/projects/g/yaml/pyyaml/
* add back import * from nodes
* remove also sys import
* remove mkpath import
* increase size of index, line, and column fields
* use size_t instead of unsigned long long
* better test infrastructure for test for large file
* only run large file test when env var is set
* fix review comments regarding env vars
* fix missing import on python 3
* force all tests in CI
This patch fixes the handling of inputs with Unicode code points over
0xFFFF when running on a Python 2 that does not have UCS-4 support
(which certain distributions still ship, e.g. macOS).
When Python is compiled without UCS-4 support, it uses UCS-2. In this
situation, non-BMP Unicode characters, which have code points over
0xFFFF, are represented as surrogate pairs. For example, if we take
u'\U0001f3d4', it will be represented as the surrogate pair
u'\ud83c\udfd4'. This can be seen by running, for example:
[i for i in u'\U0001f3d4']
In PyYAML, the reader uses a function `check_printable` to validate
inputs, making sure that they only contain printable characters. Prior
to this patch, on UCS-2 builds, it incorrectly identified surrogate
pairs as non-printable.
It would be fairly natural to write a regular expression that captures
strings that contain only *printable* characters, as opposed to
*non-printable* characters (as identified by the old code, so not
excluding surrogate pairs):
PRINTABLE = re.compile(u'^[\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]*$')
Adding support for surrogate pairs to this would be straightforward,
adding the option of having a surrogate high followed by a surrogate low
(`[\uD800-\uDBFF][\uDC00-\uDFFF]`):
PRINTABLE = re.compile(u'^(?:[\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]|[\uD800-\uDBFF][\uDC00-\uDFFF])*$')
Then, this regex could be used as follows:
def check_printable(self, data):
if not self.PRINTABLE.match(data):
raise ReaderError(...)
However, matching printable strings, rather than searching for
non-printable characters as the code currently does, would have the
disadvantage of not identifying the culprit character (we wouldn't get
the position and the actual non-printable character from a lack of a
regex match).
Instead, we can modify the NON_PRINTABLE regex to allow legal surrogate
pairs. We do this by removing surrogate pairs from the existing
character set and adding the following options for illegal uses of
surrogate code points:
- Surrogate low that doesn't follow a surrogate high (either a surrogate
low at the start of a string, or a surrogate low that follows a
character that's not a surrogate high):
(?:^|[^\uD800-\uDBFF])[\uDC00-\uDFFF]
- Surrogate high that isn't followed by a surrogate low (either a
surrogate high at the end of a string, or a surrogate high that is
followed by a character that's not a surrogate low):
[\uD800-\uDBFF](?:[^\uDC00-\uDFFF]|$)
The behavior of this modified regex should match the one that is used
when Python is built with UCS-4 support.
It helps people to use `safe_load` if they discover the library.
It's more secure if `safe_load()` is used by default, and `load()` is used if it's necessary (and the developer knows what is does).
* centralized error handling on native commands
* ensure that errors from native commands will fail build
* use image-included Python 3.8
* drop Python 3.4 wheel builds
When someone writes a subclass of the YAMLObject class, the constructors
will now be added to all 3 (non-safe) loaders.
Furthermore, we support the class variable `yaml_loader` being a list,
offering more control of which loaders are affected.
To support safe_load in your custom class you could add this:
yaml_loader = yaml.SafeLoader
yaml_loader = yaml.YAMLObject.yaml_loader
yaml_loader.append(yaml.SafeLoader)
* Fix logic for quoting special characters
* Remove has_ucs4 from condition
on systems with `sys.maxunicode <= 0xffff` the comparison
(u'\U00010000' <= ch < u'\U0010ffff') can't be true anyway I think
* builds Windows wheels against a specified libyaml repo/refspec for many Python versions
* since we don't have multiple Appveyor workers, it's faster/more convenient to run them serially
* not all paths sufficient for general CI usage yet; still needs manual inspection/testing of output
* various hacks to quiet warning noise during build on old Pythons