latexify_py icon indicating copy to clipboard operation
latexify_py copied to clipboard

Generates algorithmic environment from algpseudocode

Open ZibingZhang opened this issue 3 years ago • 14 comments

Overview

Generates algorithmic environment from algpseudocode.

Details

Creates a new class algorithm_codegen that should generate $\LaTeX$ code in the following form:

Example

    def collatz(n):
        iterations = 0
        while n > 1:
            if n % 2 == 0:
                n = n / 2
            else:
                n = 3 * n + 1
            iterations = iterations + 1
        return iterations
image
\begin{algorithmic} ... \end{algorithmic}

Things to take into consideration:

  • The configs won't be the same for this and function_codegen, at least for the time being. Namely use_signature will be disregarded if set.
  • For the time being, this functionality is exposed through get_latex(style=Style.ALGORITHM, but there should be a way to use it as @latexify.algorithm. But what should the return type be? str? LatexifiedFunction?
  • There are multiple algorithmic environments, how should be specify which ones to target? Which ones should be supported?

References

Described by this issue: https://github.com/google/latexify_py/issues/57

Blocked by

None

ZibingZhang avatar Dec 06 '22 20:12 ZibingZhang

@odashi let me know what you think about this approach. Looking for a bit of early feedback before continuing.

ZibingZhang avatar Dec 06 '22 22:12 ZibingZhang

@ZibingZhang Could you open the pull request instead of leaving it as a draft? I'm using PR's status to manage my taskboard and sometimes draft PRs are overlooked.

odashi avatar Dec 07 '22 04:12 odashi

Thanks! I will take a look at the code later today. Some comments about the description:

else n % 2 == 1:

This may be a typo.

@latexify.algorithm

If we provide this function, it must provide a way to correctly print the expression onto Jupyter. Since algorithm environmets are not supported by Jupyter (MathJax), we need to simulate it by ourselves.

This means that the algorithm version of "LatexifiedFunction" needs two codegens for __str__ and _repr_latex_, but this is far beyond the scope of this PR IMO.

Which environment

algorithmic is fine I think. It is okay to start considering to support other envs when they are actually requested.

At this point, I guessed the enum value should be ALGORITHMIC rather than ALGORITHM to avoid confusion in the future.

odashi avatar Dec 07 '22 05:12 odashi

used internally only when we need to delegate the codegen logic to it

I'm not sure if this is possible, due to the recursive nature of ASTs. If you look at the first commit I had something like

def visit_SomeNode(node):
    return self._function_codegen.visit(node)

But something of this nature won't work since recursive calls will end up in the FunctionCodegen methods instead of back in the AlgorithmCodegen methods.

I think to avoid the weird is, relationship, they both need to inherit from some private base class which implements the codegen of basic expressions. I'm not sure if we can use composition here, at least I'm not able to think of a way to get it working.

ZibingZhang avatar Dec 07 '22 06:12 ZibingZhang

I'm not sure if this is possible

It should work as far as the subtree processing is completely delegated. Since both NodeVisitor and our FunctionCodegen are stateless and don't change the given AST, we can invoke visit as a usual function.

Unfortunately there is no private inheritance in Python, and many techniques available in other languages (such as C++) can't be used.

odashi avatar Dec 07 '22 07:12 odashi

It should work as far as the subtree processing is completely delegated. Since both NodeVisitor and our FunctionCodegen are stateless and don't change the given AST, we can invoke visit as a usual function.

Consider visit_Attribute:

class ACodegen:
    def visit_Attribute(self, node: ast.Attribute) -> str:
        vstr = self.visit(node.value)
        astr = self._identifier_converter.convert(node.attr)[0]
        return vstr + "." + astr
    
    def visit_If(self, node: ast.If) -> str:
        raise Error("Unsupported")

class BCodegen:
    def visit_Attribute(self, node: ast.Attribute) -> str:
        return self._a_codegen.visit(node)

    def visit_If(self, node: ast.If) -> str:
        return "..."

If we have some structure like:

{ "attribute": { "value": { "if": "..." } }

The call stack would look like

BCodegen.visit_Attribute 
-> ACodegen.visit_Attribute
-> ACodegen.visit_If
-> Error

whereas the desired behavior would be

BCodegen.visit_Attribute 
-> ACodegen.visit_Attribute
-> BCodegen.visit_If

Basically any visit_Node function with a self.visit wouldn't work properly when delegated to because self refers to the delegated class.

edit: I guess this could be solved like so?

class BCodegen:
    def visit_Attribute(self, node: ast.Attribute) -> str:
        return ACodegen.visit_Attribute(self, node)

but at that point you may as well just use inheritance, since you're relying on ACodegen and BCodegen having the same type, and instance / class variables.

ZibingZhang avatar Dec 07 '22 08:12 ZibingZhang

Before continuing discussion, we need to understand how the Python AST is constructed. Overall, all ASTs look like as follows:

[mod subtree] -> [stmt subtree] -> [expr subtree]

where stmt subtree never generate mod and expr never generate stmt. (the official guidance)

In FunctionCodegen and AlgorithmCodegen, we need to handle stmt and expr to generate the final LaTeX, but once the codegen walks into expr, the subroutine processes only expr and it never meet any stmt branch in this recursion. That means we can completely separate expr processing into another library, and this is why I proposed ExpressionCodegen in #57.

In this PR, AlgorithmCodegen need to implement every visitor for stmt nodes by itself, but can delegate every processing for expr to another. Since the current codebase implements every rule in FunctionCodegen, we can simply delegate everything to it, and we don't need additional information to invoke expr visitors in FunctionCodegen.

odashi avatar Dec 07 '22 09:12 odashi

The contrast between stmt and expr is important I guess, because codegen rules to control the overall style (the appearance of the overall algorithm) is basically implemented by only stmt rules.

odashi avatar Dec 07 '22 09:12 odashi

That makes sense ty for explaining!

The only issue I can maybe see down the line if AlgorithmCodegen decides that an expression should take one form but FunctionCodegen decides it should take another, but I can't think of any examples off the top of my head, so maybe it's a non-issue.

ZibingZhang avatar Dec 07 '22 09:12 ZibingZhang

visit_comprehension has to be in ExpressionCodegen for some reason (didn't really look into it), but it's an ast.stmt.

Do you think it's possible to merge this early somehow (i.e. without completing AlgorithmicCodegen). Either by

  1. merging as is (following review & edits) and following up with a PR to expand upon the algorithmic codegen while adding tests
  2. splitting this PR into 2, first part just factoring out ExpressionCodegen, next part the AlgorithmicCodegen implementation

Just worried about trying to deal with future merge conflicts, since this change is changing a fundamental part of the pipeline, and is touching a bunch of files.

ZibingZhang avatar Dec 07 '22 15:12 ZibingZhang

but it's an ast.stmt

I think it's AST, not stmt. This is because the class is used to aggregate a set of rules (in this case, a sub-expression of a comprehension in some other expr), and appears only within a limited context. This kind of subtree can't be treated as an expr in the AST, e.g. a + <comprehension> is illegal.

We should be able to treat this kind of subtrees safely as a part of either stmt or expr because they shouldn't violate the entire rule of the AST: stmt -> expr. In this case it is okay to assume that this is expr.

odashi avatar Dec 08 '22 00:12 odashi

ExpressionCodegen should be separated into other pull request since this is an independent feature.

odashi avatar Dec 08 '22 00:12 odashi

ExpressionCodegen should be separated into other pull request since this is an independent feature.

Sounds good. Will tackle this within the next few days!

ZibingZhang avatar Dec 08 '22 00:12 ZibingZhang

Thanks! It looks your repository breaks the blame history when splitting FunctionCodegen. Since it makes debugging hard, make sure that the history is preserved appropriately: https://stackoverflow.com/questions/3887736/keep-git-history-when-splitting-a-file

odashi avatar Dec 08 '22 00:12 odashi