ctpg icon indicating copy to clipboard operation
ctpg copied to clipboard

Maybe it is possible that removing lexer scanner?

Open 95833 opened this issue 3 years ago • 2 comments

I am writing a grammar using another parser library. i find lexer-scanner is unnatural. when we define a token , we usually give it a name with some semantics such as VARIABLE, STRING, INT, FLOAT, BOOL etc , this is unnatural because the lexer should not carry any infomation about semantics. maybe it is more suitable that using LITTTLE_CHAR_SET, CHARS_SET_WITH_QUOTES, DIGIT_SET replace VARIABLE, STRING, INT, but obviously, these name are too verbose. it seems unimportant, but when i define a grammar, i always need make a tradeoff between an natural but complex grammar and a simple but incoherent grammar, because the place using same token often have different semantics.

So, i consider whether we can get a nature grammar definition by removing lexer-scanner and replacing lexer-token with inline regex. At the same time, i think of your lib and i feel it is suitable to your lib becase it is able to complement the problem about lexer priority.

95833 avatar Oct 09 '22 09:10 95833

The problem is that the parser is supposed to be a constexpr object. This is the whole idea behind the library.

Now there are some problems:

  • I need to calculate the size of a finite automaton table to construct a lexer, so...
  • I need all of the sizes of regexes in compile time
  • I would like to allow inline terms but only if they are expressed as literals, like say "[0-9]"_r

For the char_term and 'string_term' it is easy, for the regex term I found a way but in c++20 standard:

template<std::size_t N>
struct regex
{
    constexpr regex(const char (&str)[N])
    {
        std::ranges::copy(str, array);
    }

    char array[N];
};
 
template<regex a>
constexpr auto operator ""_r()
{
    return a;
}

int main()
{
    constexpr auto expr = "[0-9]"_r;
    return 0;
}

Of course I could allow inlining them like this: regex_term("[0-9]"), but this seemed to verbose and the grammar looked ugly.

peter-winter avatar Oct 13 '22 15:10 peter-winter

the target of inline is to solve the priority of matching lexer along with the process of syntax parsed. And i don't know whether or not it can realized and how to realize it. whereas the style of writing is not very important.

95833 avatar Oct 13 '22 16:10 95833