Generates algorithmic environment from algpseudocode
Overview
Generates algorithmic environment from algpseudocode.
Details
Creates a new class algorithm_codegen that should generate $\LaTeX$ code in the following form:
Example
def collatz(n):
iterations = 0
while n > 1:
if n % 2 == 0:
n = n / 2
else:
n = 3 * n + 1
iterations = iterations + 1
return iterations
\begin{algorithmic} ... \end{algorithmic}
Things to take into consideration:
- The configs won't be the same for this and
function_codegen, at least for the time being. Namelyuse_signaturewill be disregarded if set. - For the time being, this functionality is exposed through
get_latex(style=Style.ALGORITHM, but there should be a way to use it as@latexify.algorithm. But what should the return type be?str?LatexifiedFunction? - There are multiple algorithmic environments, how should be specify which ones to target? Which ones should be supported?
References
Described by this issue: https://github.com/google/latexify_py/issues/57
Blocked by
None
@odashi let me know what you think about this approach. Looking for a bit of early feedback before continuing.
@ZibingZhang Could you open the pull request instead of leaving it as a draft? I'm using PR's status to manage my taskboard and sometimes draft PRs are overlooked.
Thanks! I will take a look at the code later today. Some comments about the description:
else n % 2 == 1:
This may be a typo.
@latexify.algorithm
If we provide this function, it must provide a way to correctly print the expression onto Jupyter. Since algorithm environmets are not supported by Jupyter (MathJax), we need to simulate it by ourselves.
This means that the algorithm version of "LatexifiedFunction" needs two codegens for __str__ and _repr_latex_, but this is far beyond the scope of this PR IMO.
Which environment
algorithmic is fine I think. It is okay to start considering to support other envs when they are actually requested.
At this point, I guessed the enum value should be ALGORITHMIC rather than ALGORITHM to avoid confusion in the future.
used internally only when we need to delegate the codegen logic to it
I'm not sure if this is possible, due to the recursive nature of ASTs. If you look at the first commit I had something like
def visit_SomeNode(node):
return self._function_codegen.visit(node)
But something of this nature won't work since recursive calls will end up in the FunctionCodegen methods instead of back in the AlgorithmCodegen methods.
I think to avoid the weird is, relationship, they both need to inherit from some private base class which implements the codegen of basic expressions. I'm not sure if we can use composition here, at least I'm not able to think of a way to get it working.
I'm not sure if this is possible
It should work as far as the subtree processing is completely delegated. Since both NodeVisitor and our FunctionCodegen are stateless and don't change the given AST, we can invoke visit as a usual function.
Unfortunately there is no private inheritance in Python, and many techniques available in other languages (such as C++) can't be used.
It should work as far as the subtree processing is completely delegated. Since both
NodeVisitorand ourFunctionCodegenare stateless and don't change the given AST, we can invokevisitas a usual function.
Consider visit_Attribute:
class ACodegen:
def visit_Attribute(self, node: ast.Attribute) -> str:
vstr = self.visit(node.value)
astr = self._identifier_converter.convert(node.attr)[0]
return vstr + "." + astr
def visit_If(self, node: ast.If) -> str:
raise Error("Unsupported")
class BCodegen:
def visit_Attribute(self, node: ast.Attribute) -> str:
return self._a_codegen.visit(node)
def visit_If(self, node: ast.If) -> str:
return "..."
If we have some structure like:
{ "attribute": { "value": { "if": "..." } }
The call stack would look like
BCodegen.visit_Attribute
-> ACodegen.visit_Attribute
-> ACodegen.visit_If
-> Error
whereas the desired behavior would be
BCodegen.visit_Attribute
-> ACodegen.visit_Attribute
-> BCodegen.visit_If
Basically any visit_Node function with a self.visit wouldn't work properly when delegated to because self refers to the delegated class.
edit: I guess this could be solved like so?
class BCodegen:
def visit_Attribute(self, node: ast.Attribute) -> str:
return ACodegen.visit_Attribute(self, node)
but at that point you may as well just use inheritance, since you're relying on ACodegen and BCodegen having the same type, and instance / class variables.
Before continuing discussion, we need to understand how the Python AST is constructed. Overall, all ASTs look like as follows:
[mod subtree] -> [stmt subtree] -> [expr subtree]
where stmt subtree never generate mod and expr never generate stmt. (the official guidance)
In FunctionCodegen and AlgorithmCodegen, we need to handle stmt and expr to generate the final LaTeX, but once the codegen walks into expr, the subroutine processes only expr and it never meet any stmt branch in this recursion. That means we can completely separate expr processing into another library, and this is why I proposed ExpressionCodegen in #57.
In this PR, AlgorithmCodegen need to implement every visitor for stmt nodes by itself, but can delegate every processing for expr to another. Since the current codebase implements every rule in FunctionCodegen, we can simply delegate everything to it, and we don't need additional information to invoke expr visitors in FunctionCodegen.
The contrast between stmt and expr is important I guess, because codegen rules to control the overall style (the appearance of the overall algorithm) is basically implemented by only stmt rules.
That makes sense ty for explaining!
The only issue I can maybe see down the line if AlgorithmCodegen decides that an expression should take one form but FunctionCodegen decides it should take another, but I can't think of any examples off the top of my head, so maybe it's a non-issue.
visit_comprehension has to be in ExpressionCodegen for some reason (didn't really look into it), but it's an ast.stmt.
Do you think it's possible to merge this early somehow (i.e. without completing AlgorithmicCodegen). Either by
- merging as is (following review & edits) and following up with a PR to expand upon the algorithmic codegen while adding tests
- splitting this PR into 2, first part just factoring out
ExpressionCodegen, next part theAlgorithmicCodegenimplementation
Just worried about trying to deal with future merge conflicts, since this change is changing a fundamental part of the pipeline, and is touching a bunch of files.
but it's an ast.stmt
I think it's AST, not stmt. This is because the class is used to aggregate a set of rules (in this case, a sub-expression of a comprehension in some other expr), and appears only within a limited context. This kind of subtree can't be treated as an expr in the AST, e.g. a + <comprehension> is illegal.
We should be able to treat this kind of subtrees safely as a part of either stmt or expr because they shouldn't violate the entire rule of the AST: stmt -> expr. In this case it is okay to assume that this is expr.
ExpressionCodegen should be separated into other pull request since this is an independent feature.
ExpressionCodegen should be separated into other pull request since this is an independent feature.
Sounds good. Will tackle this within the next few days!
Thanks! It looks your repository breaks the blame history when splitting FunctionCodegen. Since it makes debugging hard, make sure that the history is preserved appropriately: https://stackoverflow.com/questions/3887736/keep-git-history-when-splitting-a-file