libsemigroups icon indicating copy to clipboard operation
libsemigroups copied to clipboard

Add AhoCorasick

Open Joseph-Edwards opened this issue 1 year ago • 4 comments

This PR includes the AhoCorasick class in a state ready for V3. In particular:

  • [X] split into hpp/tpp
  • [X] use of no_checks functions
  • [X] TODOs and FIXMEs reviewed
  • [X] Rule of 5
  • [X] asan/usan/tsan
  • [ ] doc
  • [ ] Python bindings
  • [ ] iwyu

Joseph-Edwards avatar May 21 '24 15:05 Joseph-Edwards

Presently, there are a few things that I'm not sure about when writing the doc. Specifically:

  • Which functions need \complexity
  • Should descriptions be "This function calculates ..." or "Calculate ..."? Is this true for both brief descriptions and long descriptions?
  • How do you document a "checks" version of a function, assuming that the no_checks version has documentation already?
  • When should you use \c vs backticks vs \ref?
  • Should the return description always be "A value of type X"?

Any insight would be much appreciated!

Joseph-Edwards avatar May 21 '24 15:05 Joseph-Edwards

Correction will be happy to merge when the CI passes :)

james-d-mitchell avatar May 21 '24 15:05 james-d-mitchell

Thanks for the review @james-d-mitchell. Will finish up the docs, do the Python bindings then un-draft-ify the PR.

Joseph-Edwards avatar May 21 '24 16:05 Joseph-Edwards

  • Which functions need \complexity

Any where you can more or less straightforwardly determine what the complexity is.

  • Should descriptions be "This function calculates ..." or "Calculate ..."? Is this true for both brief descriptions and long descriptions?

I find that often in the brief docs it's too long to write "This function calculates..." or equivalent, so often just remove it. In the detailed description, it's probably better to write something like "This function ...".

  • How do you document a "checks" version of a function, assuming that the no_checks version has documentation already?

By describing when it throws, or maybe more accurately by describing when libsemigroups throws (through LIBSEMIGROUPS_EXCEPTION).

  • When should you use \c vs backticks vs \ref?

\ref is for reference only, doxygen claims to automatically run word that has an underscore, "::", or a capital letter through its cross referencing, so \ref shouldn't be necessary unless you're trying to link to a single word like size. If doxygen complains that it can't find the \ref, then sometimes this is cured by giving the fully qualified name instead libsemigroups::AhoCorasick::size for example.

IIRC \c only works on the next word, so if you want to put something that has a space in it in code (like x == 0), then you have to use back ticks.

  • Should the return description always be "A value of type X"?

I've written that in lots of places, because the detail description often say what is returned already, and it's a little less repetitive. In the python bindings these returns statements are useless, so maybe it's better to just briefly repeat what is returned the rank of the transformation, or something.

Hope this helps!

james-d-mitchell avatar May 21 '24 16:05 james-d-mitchell