fparser Comment hierarchy issue

Hello,

I am walking an AST, looking for specific comments in the tree. I encountered some unexpected behavior regarding the hierarchy of the comments. This impacts the printer too.

My sample program looks like this, with 3 comments outside the do loop, and one comment inside the loop.

program test
    integer :: arg1
    integer :: iterator
    !comment_out 1
    arg1 = 10
    !comment_out 2
    do iterator = 0,arg1
        !comment_in
        print *, iterator
    end do
    !comment_out 3
end program test

I applied the following script to expose the unexpected behavior:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

ARG_FORTRAN_FILE_INPUT_PATH = "test_fail.f90"
ARG_FORTRAN_STANDARD = 'f2008'

from fparser.common.readfortran import FortranFileReader
from fparser.two.parser import ParserFactory
from fparser.common.readfortran import FortranFileReader
from fparser.two.utils import walk
from fparser.two.Fortran2003 import Comment

print("## Parsing input")
reader = FortranFileReader("{0}".format(ARG_FORTRAN_FILE_INPUT_PATH), ignore_comments=False)
f_parser = ParserFactory().create(std=ARG_FORTRAN_STANDARD)
parse_tree = f_parser(reader)
print("## Visiting AST")
cmtlist = walk(parse_tree, (Comment))
for cmt in cmtlist:
    if cmt.items[0]:
        print("## ---" + cmt.items[0])
        print(type(cmt.parent))
        print(cmt.parent)
print("## Done")

and got the following output:

## Parsing input
## Visiting AST
## ---!comment_out 1
<class 'fparser.two.Fortran2003.Implicit_Part'>
!comment_out 1
## ---!comment_out 2
<class 'fparser.two.Fortran2003.Block_Nonlabel_Do_Construct'>
!comment_out 2
  DO iterator = 0, arg1
  !comment_in
  PRINT *, iterator
END DO
## ---!comment_in
<class 'fparser.two.Fortran2003.Block_Nonlabel_Do_Construct'>
!comment_out 2
  DO iterator = 0, arg1
  !comment_in
  PRINT *, iterator
END DO
## ---!comment_out 3
<class 'fparser.two.Fortran2003.Execution_Part'>
arg1 = 10
!comment_out 2
  DO iterator = 0, arg1
  !comment_in
  PRINT *, iterator
END DO
!comment_out 3
## Done

We can observe that the comment_out 2 is considered as a children of the do loop, whereas I expect it to be at the same level as comment_out 3, that is as part of the execution part.

I've run this with fparser2 version 0.0.16.

We can also observe that the fparser2 script fparser2 --std=f2008 test_fail.f90 prints a wrong indentation when parsing this sample program:

PROGRAM test
  INTEGER :: arg1
  INTEGER :: iterator
  !comment_out 1
  arg1 = 10
  !comment_out 2
    DO iterator = 0, arg1
    !comment_in
    PRINT *, iterator
  END DO
  !comment_out 3
END PROGRAM test

Best.

Aug 03 '22 16:08 antoine-morvan

Thanks for the report and reproducer. We've just hit this problem in a different context so I'm keen to take a look. Just struggling for time at the moment as it's the summer holidays.

Aug 09 '22 08:08 arporter

The problem is in the BlockBase.match method which consumes any preceding comments, includes and directives before moving on to process the loop (https://github.com/stfc/fparser/blob/3ede17259e8854e9fd3976b80324a95bf60ca107/src/fparser/two/utils.py#L590-L595). Do @rupertford or @reuterbal know whether this is because we want to associate directives with the loop? Or is it something that was there originally?

Aug 11 '22 07:08 arporter

To answer my own question, git blame shows me that it was like that 4 years ago (albeit, it only handled comments back then). The question is, if I change this, am I going to break the directives support?

Aug 11 '22 07:08 arporter

If I remember correctly, #358 fixes this.

Aug 11 '22 07:08 rupertford

FWIW: I don't think it is because of directives but just an artifact of the way it was implemented? In fact, we explicitly deal with this association of comments when we convert to our IR and don't mind seeing the need for that being gone;-)

Aug 11 '22 07:08 reuterbal