String literals as tag values.
It should be possible to use string literals as tag values. String literal is a series of characters enclosed by double quotes [example: tag_name="tag value"]. String literals can contain spaces but can't contain new line and carriage return characters(otherwise RESP formatted output will be a mess). Escape sequences should be used to represent certain characters.
| Escape sequence | Description |
|---|---|
| \ | Backslash |
| " | Double quote |
| \r | Carriage return |
| \n | New line |
| \xHEX | Arbitrary hexadecimal value |
How about escaping strange characters like \0? I'm planning to export data to Grafana which typically accepts JSON data so I think a JSON-compatible escaping would be nice. RFC4627(JSON), section 2.5(Strings) says:
- MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
- MAY be escaped: anything :)
- Alternatively: C-style escaping is acceptable (, \r, \n, \t,...)
Another point to discuss: Should the quotes+escaping be stored in the DB, or only used for interfacing with the outside world? I think the backend currently doesn't like \0 characters.
It should be possible to use both escaped and unescaped tag values in queries, something like tag="host\u32" should work the same way as tag=host2. It is impossible to store some characters (space, tab, =, \0) in sting pool (SeriesMatcher class) because of regexp-based search. To find series that matches the query akumuli uses regular expressions like this one: "cpu(?:\s\w+=\w+)* (?:\s\w+=\w+)\s hash=\w+ (?:\s\w+=\w+)". When used with SeriesMatcher this regexp will find all series names that starts with metric "cpu" and has "hash" tag. Because of all this I think that tag values should be stored without quotes but some symbols in tag values should be escaped always (= should become \x3D or \u003D). It is possible to rewrite SeriesMatcher and replace regexp search with normal inverted index. In this case it is possible to store only unescaped tag values in memory.
How about escaping strange characters like \0? I'm planning to export data to Grafana which typically accepts JSON data so I think a JSON-compatible escaping would be nice. I think it's a good idea.
On Mon, Jan 18, 2016 at 3:07 AM Claudius Zingerli [email protected] wrote:
How about escaping strange characters like \0? I'm planning to export data to Grafana which typically accepts JSON data so I think a JSON-compatible escaping would be nice. RFC4627(JSON), section 2.5(Strings) says:
- MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
- MAY be escaped: anything :)
- Alternatively: C-style escaping is acceptable (, \r, \n, \t,...)
Another point to discuss: Should the quotes+escaping be stored in the DB, or only used for interfacing with the outside world? I think the backend currently doesn't like \0 characters.
— Reply to this email directly or view it on GitHub https://github.com/akumuli/Akumuli/issues/84#issuecomment-172398061.
Cheers, Evgeny