MatchIfTrue lambdas used in decorator matchers in VisitorBasedCodemodCommand hit pickle errors on large codebase
In a subclass of VisitorBasedCodemodCommand, I have code that creates a @matchers.visit with a MatchIfTrue lambda that throws a lambda pickling error when the codemod is run on a sufficiently large codebase. Running it on a file at a time is fine, because the multiprocessing work doesn't kick in.
Probably it deserves a call out in the docs of MatchIfTrue to use named functions instead of lambdas. I'm not sure what other solutions are available.
The command in question:
def func_def_has_request_param_matcher() -> m.FunctionDef:
return m.FunctionDef(
params=m.Parameters(
params=m.MatchIfTrue(
lambda params: any(p.name.value == "request" for p in params)
),
)
)
class ReproCommand(VisitorBasedCodemodCommand):
DESCRIPTION = (
"Do cool thing."
)
@staticmethod
def add_args(_arg_parser: argparse.ArgumentParser) -> None:
return
@m.visit(func_def_has_request_param_matcher())
def _mark_func_def_includes_request_param(self, node: cst.FunctionDef) -> None:
self.context.scratch["cool-scratch"] = node
The error in question
$ python -m libcst.tool codemod commands.repro.ReproCommand src
Calculating full-repo metadata...
Executing codemod...
Traceback (most recent call last):
File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 839, in
Inlining the matcher creation gets a slightly different error
$ python -m libcst.tool codemod commands.repro.ReproCommand src
Calculating full-repo metadata...
Executing codemod...
Traceback (most recent call last):
File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/jeff.hodges/.asdf/installs/python/3.8.13/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/jeff.hodges/src/github.com/color/color/local/virtualenv3/lib/python3.8/site-packages/libcst/tool.py", line 839, in
(Tangentially, if there's a better way to do this "match if any Node in this Sequence matches this pattern", I'm down to be told what that is)
Hmm I suppose the entire visitor/codemod needs to be pickleable when running it against multiple files, but the matchers accepting callbacks definitely encourage lambdas, so I'd support a PR to add a callout in their docs. The two problematic matchers are MatchIfTrue and MatchMetadataIfTrue.
(Tangentially, if there's a better way to do this "match if any Node in this Sequence matches this pattern", I'm down to be told what that is)
I think your solution is the cleanest way to express this, but occasionally I find myself writing
m.Parameters(
params=[m.ZeroOrMore(), m.Param(name=m.Name(value="request")), m.ZeroOrMore()],
)
Honestly, I much prefer your ZeroOrMore version. The lambdas get more ugly when you want to match more than a simple string, involving calls out to matchers.matches and such.
The pre+post ZeroOrMore() for matching sequences in that fashion is a real nugget - would be worth highlight this recipe in the docs unless it's already there somewhere and I missed it!
MatchIfTrue will work if you create a regular named function and use that instead of a lambda.
Something like this:
def request_in_params(params):
return any(p.name.value == "request" for p in params)
...
m.MatchIfTrue(request_in_params)
Also, you could pass jobs=1 so Pickle isn't used for muliprocessing (but it will be slower).