php-parser icon indicating copy to clipboard operation
php-parser copied to clipboard

Script tags & AST Strructure / Introducing Breaking Changes

Open ichiriac opened this issue 7 years ago • 11 comments

The A from AST stands for Abstract, meaning it does not matters about source code formatting. This can result to removing/loosing some extra informations, and that makes impossible to transform AST back to the original source.

The reason why AST is like that is because it's purpose is to be an intermediate machine language/structure, it's used by a VM or JIT machine in order to generate atomic machine instructions.

But as php-parser is mainly used to edit PHP sources, and the next major release introduce anyway breaking changes that the good time to decide the AST orientations.

If I choose to keep on AST the same structure as the document, the script tag nodes may also be included as an AST node in order to be able to scan something like this :

<?php /** HELLO **/ ?>
<?php /** WORLD **/ ?>

@czosel & @evilebottnawi - do you see any other missing use case where the formatting is lost ?

ichiriac avatar Aug 05 '18 10:08 ichiriac

@ichiriac I've been working on this exact issue yesterday in the prettier plugin - my approach is to parse the tokens and add new just comment nodes to the AST (https://github.com/prettier/plugin-php/pull/513). For the plugin, it would of course be great if we wouldn't have to do this anymore! I'm not sure in what other contexts this PHP parser is used, and if introducing the new AST nodes could case problems in these use-cases?

There is one more case for which we're looking at the AST tokens right now:

<?php echo 1; ?>
<?php echo 2; ?>
// same AST as
<?php echo 1; ?><?php echo 2; ?>

(see https://github.com/prettier/plugin-php/pull/503) If you're thinking of optimizing the AST output for re-printing the source code (like we do in the prettier plugin), an extra inline node would certainly help the prettier plugin as well.

czosel avatar Aug 05 '18 11:08 czosel

Here the problem comes from the way interprets newlines after closing tag (another example of AST oriented for execution). The first "\n" is stripped from the output, a sort of authors choice in order to break html output with extra new lines.

Actually php-parser is used to :

  • retrieve informations about the code structure
  • update / generate code from the AST structure
  • reformat the code
  • transpile to JS by converting AST nodes

In principle sticking to the document structure should not be a problem. The reason PHP engine avoids extra information is because the have only one use case, but it's not the case here.

What I plan to do is to introduce a script node (with children array nodes) and an output node with raw html output.

Here some use cases :

<?php
  function foo() {
     $var  = true;
     ?>OUTPUT <?
     for(;;) { 
       ?> HTML<%
    }
  }
?>
Another output <?= foo(); ?>

As nesting script nodes breaks AST struction, only open_tag at the document root would be represented :

Program {
   children: [
      Script : {
        short: false,
        children: [
            Function: {
               children: [
                  Assing: { ... },
                  Output: { ... nextShortTag: true },
                  For: { .. Output { nextAspTag: true } .. },
               ]
            }
        ]
      },
      Output: {},
      EchoScript: {
         expr: { Call foo() }
      }
   ]
}

Do you see any other kind of missing informations ? (/cc @chris-l)

ichiriac avatar Aug 05 '18 20:08 ichiriac

This looks really good - I think we'd have it much easier to format files with lots of inline HTML. /cc @mgrip @evilebottnawi

czosel avatar Aug 06 '18 07:08 czosel

Looks good :+1:

alexander-akait avatar Aug 07 '18 08:08 alexander-akait

@ichiriac I thoroughly studied your assumption, it seems to me that it will not solve the problem. New kind of node can introduce new problems.

I prefer add option pseudoInlineNode: true (maybe better name) and doesn't eat ?>\n<?php. Also inline node can contains information about tag type (type: asp, standard, etc). Why? We already known what each inline node should be wrapper in php tags (?>Content of inline node<?php) and that's enough.

I think it's not difficult to implement.

alexander-akait avatar Aug 10 '18 17:08 alexander-akait

I'm agree that creating new nodes does not solves directly your problem, and just fixing the behavior of the inline node would be enough, but here I'm looking for being more close to the document structure, and without a new script tag it would be analog to reducing & simplifying the structure (and in some cases loosing information).

So in principle no more implied syntax choices, every structure should have it's own node - keeping the document semantics clean.

If the ouput corresponds to the raw HTML output, so the script should have its own node and include inside its childs as it adds a new tree level. In principle at the document root you should only have script and/or output nodes, and the output node (that's just the inline node renamed) could now have a raw '\n' and a value (that could be empty in your example).

Here we speak about the script tags but as I will digg into nodes it could maybe then be applied to another structures (I don't see any other right now)

I think that's the right moment to introduce this kind of change as the parser is not yet released and I do not want to end up next with a v4 because of changing tags and breaking backwards compatibility.

ichiriac avatar Aug 10 '18 20:08 ichiriac

#171 related : removal of parenthesis node (replaced with parenthesizedExpression property)

ichiriac avatar Aug 17 '18 15:08 ichiriac

@ichiriac maybe it is good idea introduce PHPExpressionContainer (or maybe better name) when we found <?php ?>/<?php (looks same for JSX https://astexplorer.net/#/gist/31725f76a8a3a6c33da02d098f4aa3ac/fe432a1b03023d56693a2e26e8f3edaaca15b8e9). It is allow to detect when we should print <?php and ?>, use group for all content and detect indent for this.

Now we print ?> before inline and <?php after inline (it is very dirty solution and doesn't work as expected in some cases), Also please look on innerComments, it is allow to print comments in empty expression:

<?php
// Comment
?>

alexander-akait avatar Oct 09 '18 10:10 alexander-akait

@ichiriac it think right now it is not high priority, better stabilize parser, then implement better ast for inline

alexander-akait avatar Jan 18 '19 15:01 alexander-akait

related to https://github.com/prettier/plugin-php/issues/517

<!DOCTYPE html>
<html lang="de">
<head>
    <?php /* ups */
    if (1==1)
    {
        echo 'FOO';
    } ?>
</head>
<body>
</body>
</html>

NB : The /* ups */ comment would be attached on the script node instead of the inline - I'll try a proposal implementation soon as it's a blocking point for my release

ichiriac avatar Feb 16 '19 10:02 ichiriac

@ichiriac we use own algorithm to attach comment, it is simple comment location should fit in new script node ast location

alexander-akait avatar Feb 18 '19 10:02 alexander-akait