oshun icon indicating copy to clipboard operation
oshun copied to clipboard

Contemplating on String

Open happy-barney opened this issue 2 years ago • 6 comments

Motivation / Goal

Nowadays common usage of Perl programs is some kind of backend. There it usually needs 3 types of checks (better word here will be contract)

  • description of API I/O (mostly checks corresponding with JSON Schema (openapi) or XML Schema (XML over HTTP, XML/RPC, SOAP)
  • description of internal representation
  • description of storage representation (usually SQL)

It will be nice to have Perl checks / contracts specified that way so it will be possible to generate external descriptions directly from Perl definition.

Example: (syntax symbolic)

# declare Bar => String [ min_length => 3, max_length => 16 ];
sub operation_handler :returns (Bar) { ... }

say Bar->to_openapi;
# - <...>
#   - type: string
#   - max-length: 16
#   - min-length: 3

say Bar->to_xsd;
# <xs:simpleType>
#   <xs:restriction base="xs:string">
#    <xs:maxLength value="16"/>
#    <xs:minLength value="3"/>
#   </xs:restriction>
# </xs:simpleType>

String variants

restrictions

Typical String restrictions (I like more XML schema's word facet) are

  • min-length
  • max-length
  • pattern

There restrictions are supported by both JSON and XML schema as well (though they don't support perl regex).

It will be nice to support named restrictions, eg:

Str [ min_length => 10 ];
Str [ min_length (10) ];
Str :min_length (10);

binary vs text

It will be nice to be able to declare whether value is generic binary string or text string, eg:

  • Str - string treated as utf-8
  • Binary - generic binary string

XML schema

  • supported by dedicated type base64Binary

JSON schema

  • supported by string type property contentEncoding: base64
  • supports also content-type

It will be nice to be able to specify context encoding and related implicit coercions to/from internal encoding:

Binary :encoding (base64);
Binary :encoding (uuencode);
Binary :encoding (deflate);
Str :encoding (Latin-2);

documentation

It will be nice to be able to specify some description of check, eg:

Str :abstract (This is abstract);
Str :abstract_uri (https://...)

common derived checks (subtypes)

URI

XML schema

  • built-in type anyURI

JSON schema

  • string with format, one of
    • uri
    • uri-reference
    • iri
    • iri-reference

Although it is easy to write subcheck using pattern restriction, it will be IMHO handy to provide built-in checks:

  • URI
    • URL
    • URN

Date / time

XML schema

  • date
  • dateTime
  • duration
  • gDay
  • gMonth
  • gMonthDay
  • gYear
  • gYearMonth
  • time

JSON schema

  • date-time
  • date
  • time
  • duration

It will be nice to provide also date/time related checks with possible encodings

  • strict ISO 8601
  • relaxed variant allowing space as date-time separator (default?)
  • misc national format

Value represented by these checks may be dual valued, once there will good enough implementation of datetime object.

other useful checks

  • UUID (JSON schema: uuid)
  • Identifier (XML schema: token / ID / Name)

happy-barney avatar Jun 04 '23 06:06 happy-barney

I think introspection would be awesome. We'll definitely want to think of something like that post-MVP, but it has MVP impacts we should be aware of now. You wrote:

# declare Bar => String [ min_length => 3, max_length => 16 ];
sub operation_handler :returns (Bar) { ... }

say Bar->to_openapi;

Bar is not a sub in your namespace (we have tons of checks, so exporting them as subroutines would be disastrous).

So we would need something that could introspect a check to get the data you want. However, it would not produce XSD, OpenAPI definitions, or anything like that. Instead, it would just return a data structure, or an AST, and the consumer can write the custom transformation code they want.

@tobyink has considered similar ideas in this discussion, but that was when I was considering releasing Data::Checks as a module.

Ovid avatar Jun 04 '23 07:06 Ovid

please note "syntax symbolic" about that example. Mentioned code should represent only description of behaviour, not an actual code. I'm aware of fact that also this should be somehow pluggable, there are tons of other I/O protocols (known or unknown yet)

happy-barney avatar Jun 04 '23 07:06 happy-barney

For what it's worth, Types::XSD supports all of the above, and I definitely plan on adding some kind of methods to "export" Type::Tiny types as Data::Check checks. Somehow.

tobyink avatar Jul 11 '23 15:07 tobyink

@tobyink I know I tried to use it :-)

IMHO I will be good exercise to write these specialized checks using Data::Check syntax

happy-barney avatar Jul 11 '23 17:07 happy-barney

I feel it is important for Perl to internally track the intention of whether a string is octets/raw or text/characters, much as we now as of 5.36.0 have it track whether or not a scalar is intended to be a boolean or not. I realize for legacy compatibility reasons that we'd likely need at least 3 options in the general case, which are definitely text, definitely raw, and don't-know, and it would be nice to be able to eliminate occurrences of the last one where possible. But any time for example an IO is done with an explicit encoding, we should know for sure, if it is raw the result is known octets and if it is eg UTF-8 or lots of others, it is characters. Also any strings derived from string literals in Perl source are also definitely text. So then combined with this internal notion we should have routines like builtin::is_text() and builtin::is_raw() or such as the reliable way for a program to assess what kind of thing it was given in a general context, similar to the existing builtin::is_bool() or whatever, and one can stop testing the utf8 flag or doing encoding tests etc to determine this. The encoding tests would still be relevant but in a different context, which is to take a string internally considered raw and convert it to text if applicable, say if we want that conversion to be a separate step than the actual IO.

duncand avatar Jul 18 '23 00:07 duncand

Pydantic https://docs.pydantic.dev/latest/why/#json-schema also has similar ideas with respect to JSON Schema / OpenAPI.

zmughal avatar Jul 18 '23 18:07 zmughal