commonmark icon indicating copy to clipboard operation
commonmark copied to clipboard

Config doesn't unexpectedly doesn't support option.

Open Danack opened this issue 3 years ago • 4 comments

Version(s) affected

2.3.5

Description

The GithubFlavoredMarkdownExtension includes multiple extensions including the DisallowedRawHtmlExtension extension.

Whether the config array allows a 'disallowed_raw_html' entry depends on whether the DisallowedRawHtmlExtension was added directly, or via the GithubFlavoredMarkdownExtension

I was expecting the config allowed to be the same, no matter how the extensions were added.

How to reproduce

The code below has a if (false) that allows switching between adding extensions individually or just adding the github extension.

Mardown:

### Code of conduct

Short version, any behaviour that prevents people from being able to use this site to do "useful" work is very likely to see someone suspended. 

<hr/>

Here is a brief summary of the Code of Conduct in musical form:

<iframe width="560" height="315" src="https://www.youtube.com/embed/HivxFBB87-Y" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Code:

<?php

use League\CommonMark\Extension\Autolink\AutolinkExtension;
use League\CommonMark\Extension\DisallowedRawHtml\DisallowedRawHtmlExtension;
use League\CommonMark\Extension\GithubFlavoredMarkdownExtension;
use League\CommonMark\Extension\Strikethrough\StrikethroughExtension;
use League\CommonMark\Extension\Table\TableExtension;
use League\CommonMark\Extension\TaskList\TaskListExtension;
use League\CommonMark\Environment\Environment;
use League\CommonMark\Extension\HeadingPermalink\HeadingPermalinkExtension;
use League\CommonMark\Normalizer\SlugNormalizer;
use League\CommonMark\MarkdownConverter;
use League\CommonMark\Extension\CommonMark\CommonMarkCoreExtension;

        $config = [
            'heading_permalink' => [
                'html_class' => 'heading-permalink',
                'id_prefix' => 'user-content',
                'insert' => 'after',
                'title' => 'Permalink',
                'symbol' => "\u{00A0}\u{00A0}🔗",
            ],
            'slug_normalizer' => [
                // ... other options here ...
                'instance' => new SlugNormalizer(),
            ],
            'html_input' => 'allow',
            'disallowed_raw_html' => [
                'disallowed_tags' => [
                    'title', 'textarea', 'style', 'xmp',
                    //'iframe',
                    'noembed', 'noframes', 'script', 'plaintext'
                ],
            ],
        ];

        $environment = new Environment($config);
        $environment->addExtension(new CommonMarkCoreExtension());
        $environment->addExtension(new HeadingPermalinkExtension());

        if (false) {
            // This works
            $environment->addExtension(new AutolinkExtension());
            $environment->addExtension(new DisallowedRawHtmlExtension());
            $environment->addExtension(new StrikethroughExtension());
            $environment->addExtension(new TableExtension());
            $environment->addExtension(new TaskListExtension());
        }
        else {
            // This errors "Unexpected item 'disallowed_raw_html'."
            $environment->addExtension(new GithubFlavoredMarkdownExtension());
        }

        $converter = new MarkdownConverter($environment);

        return $converter->convert($markdown)->getContent();

Possible solution

No response

Additional context

No response

Did this project help you today? Did it make you happy in any way?

Did it make you happy in any way?

Actually a little of the opposite....from the code above:

        $converter = new MarkdownConverter($environment);

        return $converter->convert($markdown)->getContent();

It would be better if the config was checked for being valid in the MarkdownConverter constructor, rather than when we get to the convert step.

Though as this is a personal opinion, I don't think it would be appropriate for me to open an issue for it. But as you were asking about feelings....

Danack avatar Jul 31 '22 21:07 Danack

Yikes! Looks like the lazy environment initialization approach doesn't play nicely with extensions that register extensions that register config schemas. There's a quick and dirty fix I could implement that would solve this for the GFM extension, but any third-party extensions would have the same problem.

A proper solution might require some breaking changes - I'll play with this and see if I can find a non-BC-breaking way to address this in v2.

colinodell avatar Aug 06 '22 15:08 colinodell

Sorry for being a bit of a downer in the comment.

I encountered this the same day my server was compromised, and apparently I need to rebuild from scratch, as the versions of ubuntu have changed how they manage ssh keys....and my script no worky.

A proper solution might require some breaking changes

No rush for me as a I found a workaround....but it is super surprising.

If you're making a breaking change, it would be nice to have config errors happen where the config is set. Also....I think maybe doing the config in a different way might make it easier to be forward compatible with config changes.

$environment = new Environment();
$environment->addExtension(new CommonMarkCoreExtension());
$environment->addExtension(new HeadingPermalinkExtension());

// Make it possible for people to get the default config as plain array
$config = $environment->getDefaultConfig()->toArray();

// Manipulate config here
// Though merging arrays in PHP is not always the most intuitive thing.

// errors in config happen here
$converter = new MarkdownConverter($environment, $config); 

Currently the config needs to be passed in to the initial object created, which means you can't then add the extensions, and then retrieve the default config. Being able to do that would be easier than having to go away and read the documentation.

Being able to get the default config as an array rather than as:

object(League\Config\ReadOnlyConfiguration)#119 (1) { ["config":"League\Config\ReadOnlyConfiguration":private]=> object(League\Config\Configuration)#105 (5) { ["userConfig":"League\Config\Configuration":private]=> object(Dflydev\DotAccessData\Data)#118 (1) { ["data":protected]=> array(4) { ["heading_permalink"]=> array(5) { ["html_class"]=> string(17) "heading-permalink" ["id_prefix"]=> string(12) "user-content" ["insert"]=> string(5) "after" ["title"]=> string(9) "Permalink" ["symbol"]=> string(8) "  🔗" } ["slug_normalizer"]=> array(1) { ["instance"]=> object(League\CommonMark\Normalizer\SlugNormalizer)#101 (1) { ["defaultMaxLength":"League\CommonMark\Normalizer\SlugNormalizer":private]=> int(255) } } ["html_input"]=> string(5) "allow" ["disallowed_raw_html"]=> array(1) { ["disallowed_tags"]=> array(8) { [0]=> string(5) "title" [1]=> string(8) "textarea" [2]=> string(5) "style" [3]=> string(3) "xmp" [4]=> string(7) "noembed" [5]=> string(8) "noframes" [6]=> string(6) "script" [7]=> string(9) "plaintext" } } } } ["configSchemas":"League\Config\Configuration":private]=> array(7) { ["html_input"]=> object(Nette\Schema\Elements\AnyOf)#106 (7) { ["set":"Nette\Schema\Elements\AnyOf":private]=> 
etc.
etc.

might make life easier for users, at the small price of more work for the maintainer....

But obviously that might be too big a change for this version.

Danack avatar Aug 06 '22 15:08 Danack

Wall of text incoming... sorry, I have lots of different thoughts here!

it would be nice to have config errors happen where the config is set

The challenge here is that the Environment is not usable unless/until it has the configuration, because some extensions use that to dynamically add or exclude parsers, renderers, event listeners, etc. And then multiple objects (like the MarkdownParser and HtmlRenderer) need to share that same fully-initialized Environment.

So we could do something like this (not my favorite):

$environment = new Environment();
$environment->addExtension(new CommonMarkCoreExtension());
$environment->addExtension(new HeadingPermalinkExtension());

// Un-deprecate and/or rename this existing method for v3
$environment->mergeConfig($config);

Or we could use some kind of builder object:

$builder = new EnvironmentBuilder();
$builder->addExtension(new CommonMarkCoreExtension());
$builder->addExtension(new HeadingPermalinkExtension());

$environment = $builder->build($config);

I think I like that second option best because:

  • We can fully separate the creation of the Environment from its usage, unlike the current class which mingles both responsibilities together
  • If the MarkdownConverter constructor takes in an EnvironmentBuilder then we can call ->build($config) under-the-hood, hiding some of this complexity from the user.

This would help remove the lazy/delayed config validation, but unfortunately this alone won't solve the original issue you experienced.

The current challenge is that extensions are supposed to register their config schemas before adding things to the environment, but extensions can add other extensions after all the schemas were supposed to be registered.

After lots of experimentation, I've come up with two possible solutions:

Solution 1: Delay validation of the extension's schema until it's used

We currently delay config validation until any config option is read. That validation is performed using the entire config array against all currently-registered schemas. But as noted above, if any config value is read prior to an extension adding other extensions that register schemas, this fails.

We could keep this delayed config validation approach, but only validate the schema the current key belongs to. This would solve this particular issue with no BC breaks, and only requires behavioral tweaks to the underlying league/config library.

It would also have the nice side effect of allowing users to set configuration options for extensions that may or may not be used. For example, code like this would no longer cause errors:

$config = [
  'heading_permalink' => [
      'html_class' => 'heading-permalink',
      'id_prefix' => 'user-content',
      'insert' => 'after',
      'title' => 'Permalink',
      'symbol' => "\u{00A0}\u{00A0}🔗",
  ],
  'slug_normalizer' => [
      // ... other options here ...
      'instance' => new SlugNormalizer(),
  ],
  'html_input' => 'allow',
  'disallowed_raw_html' => [
      'disallowed_tags' => [
          'title', 'textarea', 'style', 'xmp',
          //'iframe',
          'noembed', 'noframes', 'script', 'plaintext'
      ],
  ],
];

$environment = new Environment($config);
$environment->addExtension(new CommonMarkCoreExtension());

// Simulate a user using different combinations of extensions without needing to also change the configuration items accordingly:
if (random_int(0, 1) === 1) {
  $environment->addExtension(new HeadingPermalinkExtension());
}
if (random_int(0, 1) === 1) {
  $environment->addExtension(new DisallowedRawHtmlExtension());
}

$converter = new MarkdownConverter($environment);

return $converter->convert($markdown)->getContent();

But it's basically the exact opposite of this:

it would be nice to have config errors happen where the config is set.

Solution 2: Prevent extensions from adding other extensions

Basically, prevent the addExtension() method from being called within ExtensionInterface::register(). This change alone would address your issue, but it also means GithubFlavoredMarkdownExtension would have to look like this:

final class GithubFlavoredMarkdownExtension implements ConfigurableExtensionInterface
{
    public function configureSchema(ConfigurationBuilderInterface $builder): void
    {
        (new DisallowedRawHtmlExtension())->configureSchema($builder);
        (new TableExtension())->configureSchema($builder);
    }
    
    public function register(EnvironmentBuilderInterface $environment): void
    {
        (new AutolinkExtension())->register($environment);
        (new DisallowedRawHtmlExtension())->register($environment);
        (new StrikethroughExtension())->register($environment);
        (new TableExtension())->register($environment);
        (new TaskListExtension())->register($environment);
    }
}

Not great, not terrible.

Alternatively, we could add a new interface that would allow an extension to add extensions separately from other things; perhaps like this:

final class GithubFlavoredMarkdownExtension implements ExtensionWithDependenciesInterface
{
    public function registerDependencies(ExtensionAdderInterface $dependencies): void
    {
        $dependencies->add(new AutolinkExtension());
        $dependencies->add(new DisallowedRawHtmlExtension());
        $dependencies->add(new StrikethroughExtension());
        $dependencies->add(new TableExtension());
        $dependencies->add(new TaskListExtension());
    }
}

But either way, this would require some BC breaks, so we couldn't address this in 2.x.


Anyway, that's pretty much where I'm at right now. I need a few days to think through these different options, but if you have any thoughts on them I'd love to hear them!

And if you don't mind, could you expand on this thought:

Being able to get the default config as an array

What are you looking to do with that default config? How would this help you? Why doesn't the current approach work well for you? I want to better understand this as I personally haven't run into this need and want to get more insights here :)

Thanks!

colinodell avatar Aug 13 '22 16:08 colinodell

It's too warm for me to think much....but to answer one bit:

How would this help you?

So I spent ages trying to find the typo that I "must" have made in the config, and was going back and forth between the code and the documentation, growing more confused every minute.

Being able to get the default config from the library, and then passing it back in, unmodified:

$builder = new EnvironmentBuilder();
// whatever.

$config = $builder->getConfig();
$environment = $builder->build($config);

would have made finding the problem easier, probably. Or at least in my imagination.

What are you looking to do with that default config? How would this help you?

Also, although the documentation is comprehensive, it can still be quicker, and easier to discover what config settings are by getting the default config first, rather than reading a long webpage.

There is also a use-case for being able to detect any new config settings, or changes in default.

$config = $foo->getConfig();
$config['setting_that_needs_changing'] = 'bar';
assert(sha1(json_encode($config)) === 'd0be2dc421be4fcd0172e5afceea3970e2f3d940');

If that ever happens to fail, you know a new config setting has appeared, or a default has changed, and a programmer needs to take a look at what has changed.

Danack avatar Aug 13 '22 23:08 Danack

For now, I've implemented "Solution 1: Delay validation of the extension's schema until it's used" via league/config v1.2.0. Upgrading to that version should fix this issue for you.

colinodell avatar Dec 11 '22 20:12 colinodell