Commit Graph

120 Commits

Author SHA1 Message Date
Anish Athalye
0716ae21a1 Fix reader for Unicode code points over 0xFFFF (#351)
This patch fixes the handling of inputs with Unicode code points over
0xFFFF when running on a Python 2 that does not have UCS-4 support
(which certain distributions still ship, e.g. macOS).

When Python is compiled without UCS-4 support, it uses UCS-2. In this
situation, non-BMP Unicode characters, which have code points over
0xFFFF, are represented as surrogate pairs. For example, if we take
u'\U0001f3d4', it will be represented as the surrogate pair
u'\ud83c\udfd4'. This can be seen by running, for example:

    [i for i in u'\U0001f3d4']

In PyYAML, the reader uses a function `check_printable` to validate
inputs, making sure that they only contain printable characters. Prior
to this patch, on UCS-2 builds, it incorrectly identified surrogate
pairs as non-printable.

It would be fairly natural to write a regular expression that captures
strings that contain only *printable* characters, as opposed to
*non-printable* characters (as identified by the old code, so not
excluding surrogate pairs):

    PRINTABLE = re.compile(u'^[\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]*$')

Adding support for surrogate pairs to this would be straightforward,
adding the option of having a surrogate high followed by a surrogate low
(`[\uD800-\uDBFF][\uDC00-\uDFFF]`):

    PRINTABLE = re.compile(u'^(?:[\x09\x0A\x0D\x20-\x7E\x85\xA0-\uD7FF\uE000-\uFFFD]|[\uD800-\uDBFF][\uDC00-\uDFFF])*$')

Then, this regex could be used as follows:

    def check_printable(self, data):
        if not self.PRINTABLE.match(data):
            raise ReaderError(...)

However, matching printable strings, rather than searching for
non-printable characters as the code currently does, would have the
disadvantage of not identifying the culprit character (we wouldn't get
the position and the actual non-printable character from a lack of a
regex match).

Instead, we can modify the NON_PRINTABLE regex to allow legal surrogate
pairs. We do this by removing surrogate pairs from the existing
character set and adding the following options for illegal uses of
surrogate code points:

- Surrogate low that doesn't follow a surrogate high (either a surrogate
  low at the start of a string, or a surrogate low that follows a
  character that's not a surrogate high):

    (?:^|[^\uD800-\uDBFF])[\uDC00-\uDFFF]

- Surrogate high that isn't followed by a surrogate low (either a
  surrogate high at the end of a string, or a surrogate high that is
  followed by a character that's not a surrogate low):

    [\uD800-\uDBFF](?:[^\uDC00-\uDFFF]|$)

The behavior of this modified regex should match the one that is used
when Python is built with UCS-4 support.
2019-12-20 20:38:46 +01:00
Tina Müller (tinita)
03b378d039
Allow add_multi_constructor with None (#358)
Loader.add_multi_constructor(None, myconstructor)

Also add test for add_multi_constructor('!', ...) etc.

See issue #317
2019-12-07 22:40:48 +01:00
Filip Salomonsson
5a0cfab86f Fix handling of __slots__ (#161) 2019-12-07 22:34:23 +01:00
Tim Gates
eb459f842f Fix up small typo
Replace `intendation` with `indentation`.
2019-12-04 00:31:05 +01:00
Sergey Fursov
e21af4a092 Use is instead of equality for comparing with None 2019-12-04 00:04:05 +01:00
David Kao
de11e43d52 fix typos and stylistic nit 2019-12-03 23:58:55 +01:00
Tina Müller
a5c2a043a2 Version 5.2 2019-12-02 21:13:24 +01:00
Matt Davis
3f3c373f50 bump version to 5.2b1 2019-11-25 23:39:55 +01:00
Tina Müller
8c5e47fe62 Move constructor for object/apply to Unsafe 2019-11-20 20:48:47 +01:00
Tina Müller
4a31b16b04 Change default loader for add_implicit_resolver, add_path_resolver
If the Loader parameter is not given, add constructor to
all three loaders
2019-11-18 12:28:20 +01:00
Ingy döt Net
a5394c04a2 Add custom constructors to multiple loaders
When someone writes a subclass of the YAMLObject class, the constructors
will now be added to all 3 (non-safe) loaders.

Furthermore, we support the class variable `yaml_loader` being a list,
offering more control of which loaders are affected.

To support safe_load in your custom class you could add this:

    yaml_loader = yaml.SafeLoader

    yaml_loader = yaml.YAMLObject.yaml_loader
    yaml_loader.append(yaml.SafeLoader)
2019-11-18 11:59:54 +01:00
Tina Müller (tinita)
8d7a78003a Change default loader for yaml.add_constructor (#287)
* Change default loader for yaml.add_constructor

If the Loader parameter is not given, add constructor to
all three loaders
2019-11-18 11:59:54 +01:00
Tina Müller (tinita)
31f2279252 Fix logic for quoting special characters (#276)
* Fix logic for quoting special characters

* Remove has_ucs4 from condition

on systems with `sys.maxunicode <= 0xffff` the comparison
(u'\U00010000' <= ch < u'\U0010ffff') can't be true anyway I think
2019-11-18 11:59:54 +01:00
Matt Davis
0f64cbfa54 changes for 5.1.2 release 2019-07-30 18:21:30 -07:00
Matt Davis
5986257f9f changes for 5.1.1 release 2019-06-06 15:14:10 -07:00
Ingy döt Net
e471e86bf6 Updates for 5.1 release 2019-03-13 08:45:34 -07:00
Tina Müller
507a464ce6 Make default_flow_style=False 2019-03-08 09:09:48 -08:00
Tina Müller
07c88c6c1b Allow to turn off sorting keys in Dumper 2019-03-08 09:09:48 -08:00
Ingy döt Net
0cedb2a069 Deprecate/warn usage of yaml.load(input)
The `load` and `load_all` methods will issue a warning when they are
called without the 'Loader=' parameter. The warning will point to a URL
that is always up to date with the latest information on the usage of
`load`.

There are several ways to stop the warning:

* Use `full_load(input)` - sugar for `yaml.load(input, FullLoader)`
  * FullLoader is the new safe but complete loader class
* Use `safe_load(input)` - sugar for `yaml.load(input, SafeLoader)`
  * Make sure your input YAML consists of the 'safe' subset
* Use `unsafe_load(input)` - sugar for `yaml.load(input, UnsafeLoader)`
  * Make sure your input YAML consists of the 'safe' subset
* Use `yaml.load(input, Loader=yaml.<loader>)`
  * Or shorter `yaml.load(input, yaml.<loader>)`
  * Where '<loader>' can be:
    * FullLoader - safe, complete Python YAML loading
    * SafeLoader - safe, partial Python YAML loading
    * UnsafeLoader - more explicit name for the old, unsafe 'Loader' class
* yaml.warnings({'YAMLLoadWarning': False})
  * Use this when you use third party modules that use `yaml.load(input)`
  * Only do this if input is trusted

The above `load()` expressions all have `load_all()` counterparts.

You can get the original unsafe behavior with:
* `yaml.unsafe_load(input)`
* `yaml.load(input, Loader=yaml.UnsafeLoader)`

In a future release, `yaml.load(input)` will raise an exception.

The new loader called FullLoader is almost entirely complete as
Loader/UnsafeLoader but it does it avoids all known code execution
paths. It is the preferred YAML loader, and the current default for
`yaml.load(input)` when you get the warning.

Here are some of the exploits that can be triggered with UnsafeLoader
but not with FullLoader:
```
python -c 'import os, yaml; yaml.full_load("!!python/object/new:os.system [echo EXPLOIT!]")'`
python -c 'import yaml; print yaml.full_load("!!python/object/new:abs [-5]")'
python -c 'import yaml; yaml.full_load("!!python/object/new:eval [exit(5)]")' ; echo $?
python -c 'import yaml; yaml.full_load("!!python/object/new:exit [5]")' ; echo $?
2019-03-08 09:09:48 -08:00
Ingy döt Net
ccc40f3e2b Reverting https://github.com/yaml/pyyaml/pull/74
Revert "Make pyyaml safe by default."

This reverts commit bbcf95fa05.
This reverts commit 7b68405c81.
This reverts commit 517e83e805.
2018-06-30 15:46:56 -07:00
Alex Gaynor
d3eb7daf88 Changes for 4.1 release 2018-06-26 15:08:15 -07:00
Ingy döt Net
4c2e993321 Changes for 4.01 release
This is the first release under new maintainership. A bunch of things
involving resource URLs and copyright details needed updating; in
addition to the normal version and changelog updates.
2018-06-24 17:08:57 -06:00
Tina Müller
f6049c8cd6 Support escaped slash in double quotes "\/"
YAML 1.2 JSON compat
2018-06-24 22:15:31 +02:00
Jon Dufresne
801288d796 Remove commented out Psyco code
From the Psyco website:

> 12 March 2012
>
> Psyco is unmaintained and dead. Please look at PyPy for the
> state-of-the-art in JIT compilers for Python.

http://psyco.sourceforge.net/
2018-04-11 10:02:31 -07:00
Ingy döt Net
0f2afdea77 Revert PR #150 per @asomov
and also explicitly return None if no tokens exist.

Also add a comment to show this.

This 'None' behavior should be tested at some point.
2018-04-10 16:51:43 -07:00
Andrey Somov
a02d17a027 Remove redundant code in Scanner.peek_token() 2018-03-28 10:07:27 +02:00
Alex Gaynor
517e83e805 wtf, how did this typo happen 2017-08-26 10:26:01 -05:00
Alex Gaynor
7b68405c81 Make pyyaml safe by default.
Change yaml.load/yaml.dump to be yaml.safe_load/yaml.safe_dump, introduced yaml.danger_dump/yaml.danger_load, and the same for various other classes.

(python2 only at this moment)

Refs #5
2017-08-26 10:26:01 -05:00
Jakub Wilk
d856c206fd
Fix typos 2017-08-08 06:05:28 -05:00
Timofei Bondarev
ef744d8609
Improve RepresenterError creation 2017-08-08 06:02:01 -05:00
Peter Murphy
cf1c86cb86 First attack at pyyaml does not support literals in unicode over codepoint 0xffff #25 2017-05-08 16:39:26 +10:00
Daniel Beer
c5b135fe39 Allow colon in a plain scalar in a flow context (#45)
* Allow colon in a plain scalar in a flow context

* Restore behavior of flow mapping with empty value
2017-02-08 13:50:53 -06:00
Kirill Simonov
37be8e0c17 Merged in scorphus/pyyaml (pull request #9)
scanner: use infinitive verb after auxiliary word could
2016-08-25 22:20:32 -05:00
Kirill Simonov
153a194e86 Adding an implicit resolver to a derived loader should not affect the base loader (fixes issue #57). 2016-08-25 17:42:41 -05:00
Kirill Simonov
f10d92f87b Fixed comparison to () (closes #64). 2016-08-25 16:27:19 -05:00
Kirill Simonov
d737907354 Fixed comparison to None warning (closes issue #64). 2016-08-25 15:55:09 -05:00
Kirill Simonov
53b4c075f6 Bumped the version number. 2016-06-15 19:26:06 -05:00
Pablo Santiago Blum de Aguiar
2c225b29fc scanner: use infinitive verb after auxiliary word could
Could, as well as should, shall, must, may, can, might, etc.
are auxiliary words. After an auxiliary word should come an
infinitive verb.
2015-04-04 13:25:24 -03:00
Kirill Simonov
a0c99023a5 Removed invalid simple key assertion. 2014-11-28 11:53:36 -06:00
Kirill Simonov
96ee4cbfcc Bumped the version number. 2014-03-26 19:34:36 -05:00
Kirill Simonov
644385bed3 Dropped support for Python 2.3 and 2.4. 2011-05-30 04:19:04 +00:00
Kirill Simonov
b1c7014863 Updated the changelog and bumped the version number. 2011-05-30 03:28:15 +00:00
Kirill Simonov
7e1b5fae0b Clear cyclic references in the parser and the emitter to avoid extra GC calls. 2011-05-30 02:51:30 +00:00
Kirill Simonov
b3c9435637 Preparing the next release. 2009-08-30 00:07:20 +00:00
Kirill Simonov
335c34455d Fixed a problem with a scanner error not detected when no line break at the end of the stream. 2009-08-29 22:12:45 +00:00
Kirill Simonov
51fd5cbfdb Fixed a typo in docstring. 2009-08-29 21:33:36 +00:00
Kirill Simonov
fa14e18b38 Fixed emitting of invalid BOM for UTF-16. 2009-08-29 20:59:56 +00:00
Kirill Simonov
6483cb73c7 Fixed a bug where folded scalar emitter did not respect the preffered line width (Thanks ingy for the report and the patch). 2009-03-28 12:49:11 +00:00
Kirill Simonov
08a55b972b Added a workaround against #116 (Thanks Andrey Somov). 2009-02-23 19:17:29 +00:00
Kirill Simonov
6f51a53a6b Fixed a typo in the attribute name (Thanks ingy). 2008-12-30 20:23:49 +00:00