pyhocon icon indicating copy to clipboard operation
pyhocon copied to clipboard

Wrong control characters escaping

Open dtarakanov1 opened this issue 6 years ago • 4 comments

Action. Save a configuration with a sting containing '\\r' or '\\n'. Expected result. The saved configuration contains strings with '\\r' or '\\n'. Actual result. The saved configuration contains strings with '\r' or '\n' instead.

Details. Save a configuration.

>>> import pyhocon
>>> config = pyhocon.ConfigTree()
>>> config['path'] = r'c:\some\path'
>>> config['path']
'c:\\some\\path'
>>> with open('test.conf', 'w') as f:
...     f.write(pyhocon.HOCONConverter.to_hocon(config, indent=4))
...
21
>>> config = pyhocon.ConfigFactory.parse_file('test.conf')
>>> config['path']
'c:\\some\\path'

The result is as expected. Now save a configuration with a string containing '\\r'.

>>> config['path'] = r'c:\result\dir'
>>> config['path']
'c:\\result\\dir'
>>> with open('test.conf', 'w') as f:
...     f.write(pyhocon.HOCONConverter.to_hocon(config, indent=4))
...
22
>>> config = pyhocon.ConfigFactory.parse_file('test.conf')
>>> config['path']
'c:\result\\dir'

Expected c:\\result\\dir. Save a configuration with a string containing '\\n'.

>>> config['path'] = r'c:\next\dir'
>>> config['path']
'c:\\next\\dir'
>>> with open('test.conf', 'w') as f:
...     f.write(pyhocon.HOCONConverter.to_hocon(config, indent=4))
...
20
>>> config = pyhocon.ConfigFactory.parse_file('test.conf')
>>> config['path']
'c:\next\\dir'

Expected c:\\next\\dir.

dtarakanov1 avatar Jul 22 '19 09:07 dtarakanov1

Python string literals interpret the \ as the start of an escape sequence. Characters \n and \r represent newlines and carriage returns, respectively. It will not render the value without escaping it (hence why when you created config['path'] you had to use the r raw string literal prefix). You can use / instead (i.e. c:/result/dir which will still work on Windows).

dsynkov avatar Aug 12 '19 00:08 dsynkov

I cannot agree with you. JSON escape sequences are intended exactly to take care of such cases. So if you put any value into a config tree, you must get exactly this value back.

Deserialized | Python             | Serialized (expected)   | Serialized (actual)
---------------------------------------------------------------------------------
c:\next\dir  | 'c:\\next\\dir' or | c:\\next\\dir           | c:\next\dir
             | r'c:\next\dir'     |                         |
---------------------------------------------------------------------------------
c:           | 'c:\next\\dir'     | c:\next\\dir            | c:\next\dir
next\dir     |                    |                         |

You can easily see that the serialized version must be exactly the same as Python non-raw representation.

dtarakanov1 avatar Aug 21 '19 08:08 dtarakanov1

In the converter.py not all escape sequences handled. The full list of short escape sequences is '"\/\b\f\n\r\t'.

dtarakanov1 avatar Aug 21 '19 09:08 dtarakanov1

There are more observations.

Every character in the range U+0000 ... U+001F must be escaped like "\u0000" ... "\u001f" except the mentioned special cases that may be escaped in the short form like "\n".

The test

# -*- encoding: utf-8 -*-

from pyhocon import ConfigTree
from pyhocon.converter import HOCONConverter


def to_hocon(obj):
    return HOCONConverter.to_hocon(ConfigTree(obj), compact=True)


class TestConverter(object):
    def test_escape_control_characters(self):
        assert r'a = "\u0000"' == to_hocon({'a': '\x00'})

fails

test_converter.py F
tests\test_converter.py:11 (TestConverter.test_escape_control_characters)
a = "\x00" != a = "\\u0000"

Expected :a = "\\u0000"
Actual   :a = "\x00"

See the PYHOCON and JSON specifications for reference.

dtarakanov1 avatar Sep 13 '19 08:09 dtarakanov1