Commit Graph

295 Commits

Author SHA1 Message Date
Thomas Smith
23120f886b Update tox.ini
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
2021-09-22 15:52:05 +02:00
Thom Smith
28af8b24e6 Remove 3.5 from Appveyor 2021-09-22 15:52:05 +02:00
Thom Smith
00be495c78 Simplify python_requires 2021-09-22 15:52:05 +02:00
Thom Smith
f5bd5607ef Remove 2.7 from CI 2021-09-22 15:52:05 +02:00
Thom Smith
f20947ae25 Move code from lib3 to lib 2021-09-22 15:52:05 +02:00
Thom Smith
dc0c4c1441 Remove 2.7 support 2021-09-22 15:52:05 +02:00
Thom Smith
e5f6dadf86 Remove py35 from tox.ini 2021-09-22 15:52:05 +02:00
Matt Davis
7f35bb5bf2 Explode multiarch matrix for ML2014/aarch64/s390x 2021-09-22 15:52:05 +02:00
Ingy döt Net
702b1767bf Updated the content of the README.md file 2021-09-02 09:23:19 -07:00
Tim Hoffmann
99d27e78e8 Change README format to Markdown 2021-09-02 09:23:19 -07:00
Ingy döt Net
ee37f4653c 5.4.1 release 2021-01-20 16:40:50 -05:00
Matt Davis
2b37f155d4 Fix stub compat with older pyyaml versions that may unwittingly load it 2021-01-20 16:39:29 -05:00
Ingy döt Net
58d0cb7ee0 5.4 release 2021-01-19 14:07:59 -05:00
Anish Athalye
a60f7a19c0 Fix compatibility with Jython
This patch was taken from
https://github.com/yaml/pyyaml/issues/369#issuecomment-571596545,
authored by Pekka Klärck <peke@iki.fi>.

In short, Jython doesn't support lone surrogates, so importing yaml (and
in particular, loading `reader.py`) caused a UnicodeDecodeError. This
patch works around this through a clever use of `eval` to defer
evaluation of the string containing the lone surrogates, only doing it
on non-Jython platforms.

This is only done in `lib/yaml/reader.py` and not `lib3/yaml/reader.py`
because Jython does not support Python 3.

With this patch, Jython's behavior with respect to Unicode code points
over 0xFFFF becomes as it was before
0716ae21a1. It still does not pass all the
unit tests on Jython (passes 1275, fails 3, errors on 1); all the
failing tests are related to unicode. Still, this is better than simply
crashing upon `import yaml`.

With this patch, all tests continue to pass on Python 2 / Python 3.
2021-01-13 17:51:32 -05:00
Matt Davis
ee98abd7d7 Run CI on PR base branch changes 2021-01-13 16:58:40 -05:00
Ovv
ddf20330be constructor.timezone: __copy_ & __deepcopy__
close #387
2021-01-13 16:58:40 -05:00
Phil Sphicas
fc914d52c4 Avoid repeatedly appending to yaml_implicit_resolvers
Repeated calls to `resolve` can experience performance degredation, if
`add_implicit_resolver` has been called with `first=None` (to add an
implicit resolver with an unspecified first character).

For example, every time `foo` is encountered, the "wildcard implicit
resolvers" (with `first=None`) will be appended to the list of implicit
resolvers for strings starting with `f`, which will normally be the
resolver for booleans. The list `yaml_implicit_resolvers['f']` will keep
getting longer. The same behavior applies for any first-letter matches
with existing implicit resolvers.

This change avoids unintentionally mutating the lists in the class-level
dict `yaml_implicit_resolvers` by looping through a temporary copy.

Fixes: #439
2021-01-13 16:58:40 -05:00
Ingy döt Net
a001f27825 Fix for CVE-2020-14343
Per suggestion https://github.com/yaml/pyyaml/issues/420#issuecomment-663888344
move a few constructors from full_load to unsafe_load.
2021-01-13 16:58:40 -05:00
Ingy döt Net
fe15062414 Add 3.9 to appveyor file for completeness sake
Are we done with appveyor now?
Can we just remove this file?
2021-01-13 16:58:40 -05:00
Ingy döt Net
1e1c7fb7c0 Add a newline character to end of pyproject.toml
Is this TOML file actually needed?

I'd prefer to remove it since it does so little, and stands out so
prominiently.
2021-01-13 16:58:40 -05:00
Ingy döt Net
0b6b7d6171 Start sentences and phrases for capital letters
End sentences with periods.
2021-01-13 16:58:40 -05:00
Ingy döt Net
c97691596e Shell code improvements 2021-01-13 16:58:40 -05:00
Ingy döt Net
d6572c3a80 Remove unneeded quotes 2021-01-13 16:58:40 -05:00
Ingy döt Net
c5fb909798 Use long forms for docker run options 2021-01-13 16:58:40 -05:00
Ingy döt Net
492bcbaa13 Better (non)use of literal form scalars 2021-01-13 16:58:40 -05:00
Ingy döt Net
c851ff7ead Replace ${{ x }} with ${{x}}
Spaces in the syntax make it harder to reason if there will be spaces in
the rendering or not.
2021-01-13 16:58:40 -05:00
Ingy döt Net
13c7aec48d Reduce long lines and adjust blank lines for clarity 2021-01-13 16:58:40 -05:00
Ingy döt Net
219fe65b66 Don't overindent sequences in maps 2021-01-13 16:58:40 -05:00
Ingy döt Net
6a19fd77a0 Rename ci.yml to YAML preferred ci.yaml 2021-01-13 16:58:40 -05:00
Ingy döt Net
4927e75d99 Add py29 to tox.ini envlist 2021-01-13 16:58:40 -05:00
Brad Solomon
89f608599d Build modernization (GHA, wheels, setuptools) (#407)
* Move most CI to GitHub Actions
* Build sdist
* Build manylinux1 wheels with libyaml ext (also tested with 2010 and 2014)
* Build MacOS x86_64 wheels with libyaml ext
* Windows wheel builds remain on AppVeyor until we drop 2.7 support in 6.0
* Smoke tests of all post-build artifacts
* Add PEP517/518 build declaration (pyproject.toml with setuptools backend)
* Fully move build to setuptools
* Drop Python 3.5 support
* Declare Python 3.9 support
* Update PyPI metadata now that setuptools lets it flow through

Co-authored-by: Matt Davis <mrd@redhat.com>
2021-01-13 16:58:40 -05:00
Tina Müller
3effceca2c Update list of maintainers
Remove myself
2020-04-01 00:57:16 +02:00
ossdev07
d0d660d035
Add ARM64 jobs in Travis-CI (#366) 2020-03-19 19:49:38 +01:00
Tina Müller
538b5c93f7 Update announcement.msg 2020-03-18 14:09:19 -07:00
Ingy döt Net
8a01c99c63 Move test files back into tests/data/ 2020-03-18 21:58:22 +01:00
Tina Müller
91bca4b856 Update version to 5.3.1 2020-03-17 20:52:26 +01:00
Riccardo Schirone
5080ba5133
Prevents arbitrary code execution during python/object/new constructor (#386)
* Prevents arbitrary code execution during python/object/new constructor

In FullLoader python/object/new constructor, implemented by
construct_python_object_apply, has support for setting the state of a
deserialized instance through the set_python_instance_state method.
After setting the state, some operations are performed on the instance
to complete its initialization, however it is possible for an attacker
to set the instance' state in such a way that arbitrary code is executed
by the FullLoader.

This patch tries to block such attacks in FullLoader by preventing
set_python_instance_state from setting arbitrary properties. It
implements a blacklist that includes `extend` method (called by
construct_python_object_apply) and all special methods (e.g. __set__,
__setitem__, etc.).

Users who need special attributes being set in the state of a
deserialized object can still do it through the UnsafeLoader, which
however should not be used on untrusted input. Additionally, they can
subclass FullLoader and redefine `get_state_keys_blacklist()` to
extend/replace the list of blacklisted keys, passing the subclassed
loader to yaml.load.

* Make sure python/object/new constructor does not set some properties

* Add test to show how to subclass FullLoader with new blacklist
2020-03-17 19:09:55 +01:00
Tina Müller
2f463cf5b0 Update announcement.msg 2020-01-06 21:13:22 +01:00
Tina Müller
377092fb2e Changes for 5.3 2020-01-06 20:37:50 +01:00
Tina Müller
69b025a9f3 Changes for 5.3b1 2019-12-21 22:49:24 +01:00
Tina Müller (tinita)
4fcdcdbf60 Add tests for timezone (#363)
After #163, this adds some test data to check if the datetime objects
return the correct timezone
2019-12-20 20:38:46 +01:00
Mattijs Ugen
96d65f3de1 Create timezone-aware datetimes when parsed as such (#163)
* On load, now use aware datetimes if possible

On loading data, if timestamps have an ISO "+HH:MM" UTC offset then the resultant datetime is converted to UTC.  This change adds that timezone information to the datetime objects.

Importantly, this addresses a Django warning (and potential error) that appears when using both YAML fixtures in a timezone-aware project.  It was raised as a Django issue (https://code.djangoproject.com/ticket/18867), but subsequently closed because the Django devs felt that this is a PyYAML problem.

* Create timezone-aware datetime in timezone from data

* Create timezone-aware datetime in timezone from data for python2

* Define better timezone implementation for python2

* Handle timezone "Z" for python 3

* Handle timezone "Z" for python 2

* Fix code structure for Python 3

Call datetime.datetime constructor once at return.

* Fix code structure for Python 2

Call datetime.datetime constructor once at return.
2019-12-20 20:38:46 +01:00
Tina Müller
49b354896e tox.ini: passenv = PYYAML_TEST_GROUP 2019-12-20 20:38:46 +01:00
Frédéric Chapoton
36fdf0c486 remove some unused imports (#260)
* remove some unused imports

as suggested by lgtm

https://lgtm.com/projects/g/yaml/pyyaml/

* add back import * from nodes

* remove also sys import

* remove mkpath import
2019-12-20 20:38:46 +01:00
Dwight Guth
e1ffe1afaa increase size of index, line, and column fields (#310)
* increase size of index, line, and column fields

* use size_t instead of unsigned long long

* better test infrastructure for test for large file

* only run large file test when env var is set

* fix review comments regarding env vars

* fix missing import on python 3

* force all tests in CI
2019-12-20 20:38:46 +01:00
Hugo van Kemenade
f1ab37df44 Fix for Python 3.10 (#329) 2019-12-20 20:38:46 +01:00
Jon Dufresne
252b4fe54e Document that PyYAML is implemented with Cython (#244) 2019-12-20 20:38:46 +01:00
Tina Müller (tinita)
d137e82ad1 Use full_load in yaml-highlight example (#359) 2019-12-20 20:38:46 +01:00
Tina Müller
a826f546c2 Enable certain unicode tests when maxunicode not > 0xffff
They were disabled in d6cbff6620

After #351 the tests are working again
2019-12-20 20:38:46 +01:00
Anish Athalye
0716ae21a1 Fix reader for Unicode code points over 0xFFFF (#351)
This patch fixes the handling of inputs with Unicode code points over
0xFFFF when running on a Python 2 that does not have UCS-4 support
(which certain distributions still ship, e.g. macOS).

When Python is compiled without UCS-4 support, it uses UCS-2. In this
situation, non-BMP Unicode characters, which have code points over
0xFFFF, are represented as surrogate pairs. For example, if we take
u'\U0001f3d4', it will be represented as the surrogate pair
u'\ud83c\udfd4'. This can be seen by running, for example:

    [i for i in u'\U0001f3d4']

In PyYAML, the reader uses a function `check_printable` to validate
inputs, making sure that they only contain printable characters. Prior
to this patch, on UCS-2 builds, it incorrectly identified surrogate
pairs as non-printable.

It would be fairly natural to write a regular expression that captures
strings that contain only *printable* characters, as opposed to
*non-printable* characters (as identified by the old code, so not
excluding surrogate pairs):

    PRINTABLE = re.compile(u'^[\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]*$')

Adding support for surrogate pairs to this would be straightforward,
adding the option of having a surrogate high followed by a surrogate low
(`[\uD800-\uDBFF][\uDC00-\uDFFF]`):

    PRINTABLE = re.compile(u'^(?:[\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]|[\uD800-\uDBFF][\uDC00-\uDFFF])*$')

Then, this regex could be used as follows:

    def check_printable(self, data):
        if not self.PRINTABLE.match(data):
            raise ReaderError(...)

However, matching printable strings, rather than searching for
non-printable characters as the code currently does, would have the
disadvantage of not identifying the culprit character (we wouldn't get
the position and the actual non-printable character from a lack of a
regex match).

Instead, we can modify the NON_PRINTABLE regex to allow legal surrogate
pairs. We do this by removing surrogate pairs from the existing
character set and adding the following options for illegal uses of
surrogate code points:

- Surrogate low that doesn't follow a surrogate high (either a surrogate
  low at the start of a string, or a surrogate low that follows a
  character that's not a surrogate high):

    (?:^|[^\uD800-\uDBFF])[\uDC00-\uDFFF]

- Surrogate high that isn't followed by a surrogate low (either a
  surrogate high at the end of a string, or a surrogate high that is
  followed by a character that's not a surrogate low):

    [\uD800-\uDBFF](?:[^\uDC00-\uDFFF]|$)

The behavior of this modified regex should match the one that is used
when Python is built with UCS-4 support.
2019-12-20 20:38:46 +01:00