spox icon indicating copy to clipboard operation
spox copied to clipboard

Naming algorithm has quadratic time complexity

Open cbourjau opened this issue 1 year ago • 2 comments

The current naming algorithm is defined as:

  def enum(self, base: str, suffix: str = "_{}") -> str:
      """Find an unused name by enumerating the pattern ``base + suffix.format(i)`` through `i = 0, 1, ...`"""
      i = 0
      while (name := f"{base}{suffix.format(i)}") in self:
          i += 1
      return name

This function becomes very expensive in (contrived) graphs where the same base occurs very often (1000+ times). We may consider simply having a single index across all nodes and edges.

cbourjau avatar Feb 28 '24 15:02 cbourjau

Ah yes, the original (ah, so very old by now!) build code did take care of this. IIRC, it used to do something like this:

key = (base, suffix)
i = self.memo_enum.setdefault(key, 0)
while (name := f"{base}{suffix.format(i)}") in self:
    i += 1
self.memo_enum[key] = i
return name

To be honest, there's no real use for different suffixes I think, but the above should be robust to them. This doesn't increase space complexity as we are already keeping track of all the names.

But maybe this isn't the simplest. Here's some other solutions:

  • Use the common index, like you say. That's probably fine.
  • Get rid of base altogether (depends if you think the improved generated code 'quality' is worth it)
  • Get rid of the custom suffixes so finding the next free index isn't as weird. Technically with bad suffixes it's possible for this code to loop forever...

jbachurski avatar Feb 28 '24 18:02 jbachurski

I think your code snippet is perfectly fine! Thanks!

cbourjau avatar Feb 29 '24 09:02 cbourjau