Comment hierarchy issue
Hello,
I am walking an AST, looking for specific comments in the tree. I encountered some unexpected behavior regarding the hierarchy of the comments. This impacts the printer too.
My sample program looks like this, with 3 comments outside the do loop, and one comment inside the loop.
program test
integer :: arg1
integer :: iterator
!comment_out 1
arg1 = 10
!comment_out 2
do iterator = 0,arg1
!comment_in
print *, iterator
end do
!comment_out 3
end program test
I applied the following script to expose the unexpected behavior:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
ARG_FORTRAN_FILE_INPUT_PATH = "test_fail.f90"
ARG_FORTRAN_STANDARD = 'f2008'
from fparser.common.readfortran import FortranFileReader
from fparser.two.parser import ParserFactory
from fparser.common.readfortran import FortranFileReader
from fparser.two.utils import walk
from fparser.two.Fortran2003 import Comment
print("## Parsing input")
reader = FortranFileReader("{0}".format(ARG_FORTRAN_FILE_INPUT_PATH), ignore_comments=False)
f_parser = ParserFactory().create(std=ARG_FORTRAN_STANDARD)
parse_tree = f_parser(reader)
print("## Visiting AST")
cmtlist = walk(parse_tree, (Comment))
for cmt in cmtlist:
if cmt.items[0]:
print("## ---" + cmt.items[0])
print(type(cmt.parent))
print(cmt.parent)
print("## Done")
and got the following output:
## Parsing input
## Visiting AST
## ---!comment_out 1
<class 'fparser.two.Fortran2003.Implicit_Part'>
!comment_out 1
## ---!comment_out 2
<class 'fparser.two.Fortran2003.Block_Nonlabel_Do_Construct'>
!comment_out 2
DO iterator = 0, arg1
!comment_in
PRINT *, iterator
END DO
## ---!comment_in
<class 'fparser.two.Fortran2003.Block_Nonlabel_Do_Construct'>
!comment_out 2
DO iterator = 0, arg1
!comment_in
PRINT *, iterator
END DO
## ---!comment_out 3
<class 'fparser.two.Fortran2003.Execution_Part'>
arg1 = 10
!comment_out 2
DO iterator = 0, arg1
!comment_in
PRINT *, iterator
END DO
!comment_out 3
## Done
We can observe that the comment_out 2 is considered as a children of the do loop, whereas I expect it to be at the same level as comment_out 3, that is as part of the execution part.
I've run this with fparser2 version 0.0.16.
We can also observe that the fparser2 script fparser2 --std=f2008 test_fail.f90 prints a wrong indentation when parsing this sample program:
PROGRAM test
INTEGER :: arg1
INTEGER :: iterator
!comment_out 1
arg1 = 10
!comment_out 2
DO iterator = 0, arg1
!comment_in
PRINT *, iterator
END DO
!comment_out 3
END PROGRAM test
Best.
Thanks for the report and reproducer. We've just hit this problem in a different context so I'm keen to take a look. Just struggling for time at the moment as it's the summer holidays.
The problem is in the BlockBase.match method which consumes any preceding comments, includes and directives before moving on to process the loop (https://github.com/stfc/fparser/blob/3ede17259e8854e9fd3976b80324a95bf60ca107/src/fparser/two/utils.py#L590-L595). Do @rupertford or @reuterbal know whether this is because we want to associate directives with the loop? Or is it something that was there originally?
To answer my own question, git blame shows me that it was like that 4 years ago (albeit, it only handled comments back then). The question is, if I change this, am I going to break the directives support?
If I remember correctly, #358 fixes this.
FWIW: I don't think it is because of directives but just an artifact of the way it was implemented? In fact, we explicitly deal with this association of comments when we convert to our IR and don't mind seeing the need for that being gone;-)