lean simp only is quadratic in the size of the output term

Prerequisites

[X] Put an X between the brackets on this line if you have done all of the following:
- Checked that your issue isn't already filed.
  - Specifically, check out the wishlist, open RFCs, or feature requests.
- Reduced the issue to a self-contained, reproducible test case.

Description

universes u v w ℓ

axiom f : ℕ → ℕ
axiom g : ℕ → ℕ

@[simp]
def comp_pow {A : Type u} (f : A → A) : ℕ → A → A
| 0 x := x
| (nat.succ n) x := f (comp_pow n x)

@[simp] def goal100 : Σ' P, P = (∀ x, comp_pow f 100 x = comp_pow g 100 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal200 : Σ' P, P = (∀ x, comp_pow f 200 x = comp_pow g 200 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal300 : Σ' P, P = (∀ x, comp_pow f 300 x = comp_pow g 300 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal400 : Σ' P, P = (∀ x, comp_pow f 400 x = comp_pow g 400 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal500 : Σ' P, P = (∀ x, comp_pow f 500 x = comp_pow g 500 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal600 : Σ' P, P = (∀ x, comp_pow f 600 x = comp_pow g 600 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal700 : Σ' P, P = (∀ x, comp_pow f 700 x = comp_pow g 700 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal800 : Σ' P, P = (∀ x, comp_pow f 800 x = comp_pow g 800 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal900 : Σ' P, P = (∀ x, comp_pow f 900 x = comp_pow g 900 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

@[simp] def goal1000 : Σ' P, P = (∀ x, comp_pow f 1000 x = comp_pow g 1000 x) :=
by { eapply psigma.mk, simp only [comp_pow] }

running lean --profile (Lean (version 3.4.2, commit cbd2b6686ddb, Release)) on this gives the times

n	time (s)
100	0.033
200	0.055
300	0.081
400	0.115
500	0.153
600	0.193
700	0.251
800	0.356
900	0.358
1000	0.46

which is quadratic:

time of simp only vs size (1)

Steps to Reproduce

See above

Expected behavior: Linear performance of reduction

Actual behavior: Quadratic performance of reduction

Reproduces how often: 100% of the time

Versions

Lean (version 3.4.2, commit cbd2b6686ddb, Release)

$ uname -a
Linux jgross-Leopard-WS 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

Additional Information

The profile for the biggest example is

parsing took 3.56ms
elaboration: tactic compilation took 3.17ms
elaboration: tactic execution took 395ms
num. allocated objects:  220
num. allocated closures: 214
  394ms    99.7%   tactic.interactive.simp_core
  394ms    99.7%   _interaction._lambda_2
  394ms    99.7%   tactic.istep
  394ms    99.7%   scope_trace
  394ms    99.7%   tactic.istep._lambda_1
  394ms    99.7%   tactic.interactive.propagate_tags
  394ms    99.7%   tactic.step
  394ms    99.7%   tactic.solve1
  252ms    63.8%   tactic.interactive.simp_core_aux
  251ms    63.5%   interaction_monad_orelse
  251ms    63.5%   tactic.interactive.simp_core_aux._lambda_5
  251ms    63.5%   tactic.simp_target
  228ms    57.7%   tactic.simplify
  142ms    35.9%   tactic.try_core
  142ms    35.9%   tactic.apply_core
  142ms    35.9%   _private.1416197847.relation_tactic._lambda_1
  142ms    35.9%   tactic.try
  142ms    35.9%   relation_tactic
   23ms     5.8%   tactic.replace_target
   12ms     3.0%   tactic.mk_eq_mpr
   10ms     2.5%   tactic.exact
    1ms     0.3%   tactic.mk_app
    1ms     0.3%   tactic.mk_id_eq
elaboration of goal1000 took 460ms
type checking of goal1000 took 1.3ms
compilation of goal1000 took 1.67ms
decl post-processing of goal1000 took 3.76ms

Jul 21 '20 20:07 JasonGross

I don't think this should be considered a bug. The term is traversed completely at every pass of simp and every pass meets only one opportunity for rewrite. I pointed out one way in https://github.com/leanprover-community/mathlib/issues/3500#issuecomment-662087225 to reduce the number of passes. Another possibility is to increase the number of opportunity to rewrite at each pass with lemmas of this shape

lemma comp_pow_bit0 : comp_pow f (bit0 n) x = comp_pow f n (comp_pow f n x) := ...

The number of passes should be logarithmic and the amount of work should be roughly linear. That's no my favorite solution for this specific problem but it's something worth considering when speeding up your rewrites.

Jul 22 '20 13:07 cipher1024

simp recurses a linear number of times on these examples, and the time within the simp tactic seems linear even out to 4000:

[4000]   1628ms    46.2%   tactic.simplify
[2000]    830ms    59.6%   tactic.simplify

It is the apply following simp that blows up:

[4000]  516ms    37.0%   tactic.apply_core
[2000]  114ms    27.5%   tactic.apply_core

Here apply is called by relation_tactic following simplification. It successfully applies eq.refl, but in the process the proof term (which has quadratic tree size) will be traversed, and without sufficiently precise caching to avoid scaling in the tree size.

Note: you need to bump https://github.com/leanprover-community/lean/blob/master/src/library/abstract_context_cache.cpp#L19 for the larger examples.

Aug 01 '20 21:08 dselsam