rust icon indicating copy to clipboard operation
rust copied to clipboard

Add a fourth approach to the rust version of the Bob problem

Open ArturAssisComp opened this issue 10 months ago • 10 comments

The fourth approach added to the benchmark is the finite state machine approach. It is faster because it loops through the message chars only once.

ArturAssisComp avatar Mar 09 '25 20:03 ArturAssisComp

If this approach is approved, I can add more details to the approaches folder.

ArturAssisComp avatar Mar 09 '25 21:03 ArturAssisComp

I like this solution, it seems like it's interesting and creative. I would be interested to see a full approach with it :+1:

I would suggest that for the approach you write you should imagine your audience is not familiar with state machines, since Bob is an easy-level exercise. In your shoes I would try to find a good beginner-friendly outside resource that explains state machines as a "Learn more" type of link.

ellnix avatar Mar 09 '25 21:03 ellnix

This doesn't make any sense at all. I had a fifth idea and added that to the benchmarks, which significantly slowed down the results of the other functions. I think the builtin bencher might just be trash, no wonder its unstable. We could try to setup divan or criterion, those should give more reliable results. But not today for me 🙂

senekor avatar Mar 09 '25 23:03 senekor

Ok, I will try criterion for best results. If you can add your approach here for me to test it.

ArturAssisComp avatar Mar 11 '25 15:03 ArturAssisComp

Ok, I will fix the requests.

ArturAssisComp avatar Mar 16 '25 00:03 ArturAssisComp

I used Criterion and found what was happening. Basically, the approaches that use message == message.to_uppercase() take benefit from the following optimization in some cases: image

Initially, I found it very strange that the approaches that transformed all the message to uppercase (creating a new String) and then compared them using == were faster than an approach that loops through all the chars once and does not perform very heavy operations on each iteration. But I forgot that the compiler and the std lib do a good job on optimizations and that iterating over chars is not so cheap as iterating over the bytes of the string.

One of the optimizations (the one on the image above) makes the operation to_uppercase much faster on pure ascii strings, that is why the state machine approach is slower in its worst cases (when it needs to loop through all the string to find the answer). For tests that have non-ascii chars, it tends to perform better.

Maybe it is possible to optimize the state machine approach to deal with ascii chars, but I did not have time to try.

Results

  • For YELL and YELL QUESTION: one of the state machine's worst case because it needs to loop through all the string to find the answer. Thus, it performed worse than the other approaches for lore ipsum with only ascii chars but, surprisingly, performed better for lore ipsum with ascii and non-ascii chars. The explanation is that the other approaches could not take advantage of the ascii optimization.
  • For normal question and whatever: state machine performed better, but that happened because the lorem ipsum has lower case chars distributed along the string, including the end of the string, which makes it almost O(1) for finding the answer. It is possible to build a worst case situation for the state machine in which all the chars are uppercase except the first, making it necessary to loop through all the chars. In that situation, we would have something close to the YELL and YELL QUESTION.
  • For the empty answer: For that specific situation, all the approaches are optimized, but the state-machine performed a little bit worse. That was because it used trim instead of trim_end, generating a little bit more computation.

ArturAssisComp avatar Mar 16 '25 00:03 ArturAssisComp

I updated the content.md for the benchmark. I found that maybe this exercise is not the best for introducing the concept of state machines, thus, I decided to keep the state machine approach only in the benchmark. Maybe an exercise for building a tokenizer or something like that would be better to introduce this concept.

ArturAssisComp avatar Apr 28 '25 19:04 ArturAssisComp

Looks like you committed the output directory of criterion, please clean up the diff.

senekor avatar Apr 28 '25 21:04 senekor

yes, the idea was to provide the report (the output of criterion), given that it takes a lot of time to reproduce the result. But I will clean it.

ArturAssisComp avatar Apr 28 '25 21:04 ArturAssisComp

Results cleaned and content updated.

ArturAssisComp avatar Apr 28 '25 22:04 ArturAssisComp