colander Add data URLs to the URL validator

Colander provides a colander.url regex that validates only http/s and ftp/s schemes:

https://github.com/Pylons/colander/blob/048fb24eeb6c3df21831413943dbf89d7b5776e4/src/colander/init.py#L610-L629

A data URL would be rejected as invalid. Any interest in adding support for data URLs? Or should this be handled by the user by using the colander.Any() validator with two separate Regex validators?

Also, I think it would be nice if the documentation would say “A Regex validator which…” (minor nit, though).

Jun 30 '22 02:06 jenstroeger

I think the schemes are too different to try to smoosh data URLs into the existing colander.url validator.

I think a new validator colander.dataurl, with documentation of a recipe of how to combine colander.url and colander.dataurl in colander.Any(), so that users could enter either a data URL or regular URL into a single input and get its value validated, would be useful.

For the nit, specifically which bit of the documentation would you change? IOW, original text to changed text, to provide some context.

Jun 30 '22 03:06 stevepiercy

@stevepiercy PR welcome I assume? 😉

For the nit, specifically which bit of the documentation would you change? IOW, original text to changed text, to provide some context.

Here I’d change “A validator which…” to “A Regex validator which…” to make it more explicit that colander.url is an instance of colander.Regex() with a preset URL regex. Maybe a similar edit to other validators that implement Regex(). I say that because I had to read through the source to understand what colander.url actually is.

Jun 30 '22 03:06 jenstroeger

I think for both bits, a separate PR for each would be good, one for the nit corrections and the other with the new validator and recipe in the docs. Thank you!

Jun 30 '22 04:06 stevepiercy

@stevepiercy sounds good.

Now that I look a bit more into the Colander code I’m tempted to go from

DATA_URL_REGEX = (
    # data: (required)
    r'^data:'
    # optional mime type
    r'([^;]*)?'
    # optional base64 identifier
    r'(;base64)?'
    # actual data follows the comma
    r',(.*)$'
)

data_url = Regex(DATA_URL_REGEX, msg=_('Must be a data URL'), flags=re.IGNORECASE)  # re.ASCII only in Py3

to a more thorough approach by subclassing Regex (like the Email validator does). That’s because the above simple approach would allow invalid MIME types and invalid Base64 encoded data and I think we want to make sure that’s also covered. You ok with a more complex validator?

Also, how would you like me to go about the locale/ catalogs: update the .pot file only? I could contribute to the German translation, but wouldn’t be of much help with the rest (although I could probably copy the translations for "Must be a URL" and add "data" in).

Jun 30 '22 13:06 jenstroeger

Thorough sounds good to me.

The word "data" could appear before or after "URL" depending on the language. I think the plan to insert "data" might add some confusion. Google Translate or similar service might be of use, but who knows when it concerns technical language? For translations for which you are not confident, I would follow a previous pattern.

https://github.com/Pylons/colander/commit/365b905137574dd8b04b22e5779080fd97c869cd#diff-4f15ab11bf63f7ef85baa6dbec941e93a8b7b1e79fa79bb9c80a78b97b421b66R56-R58

...and create a new issue stating that translations for "Must be a data URL" are incomplete and need someone to fix for the following list of languages. Then send an email to [email protected] and I can follow up with a tweet for a call for contributions.

Jun 30 '22 20:06 stevepiercy

@stevepiercy please see PR https://github.com/Pylons/colander/pull/348

Jul 01 '22 00:07 jenstroeger