Add a fourth approach to the rust version of the Bob problem
The fourth approach added to the benchmark is the finite state machine approach. It is faster because it loops through the message chars only once.
If this approach is approved, I can add more details to the approaches folder.
I like this solution, it seems like it's interesting and creative. I would be interested to see a full approach with it :+1:
I would suggest that for the approach you write you should imagine your audience is not familiar with state machines, since Bob is an easy-level exercise. In your shoes I would try to find a good beginner-friendly outside resource that explains state machines as a "Learn more" type of link.
This doesn't make any sense at all. I had a fifth idea and added that to the benchmarks, which significantly slowed down the results of the other functions. I think the builtin bencher might just be trash, no wonder its unstable. We could try to setup divan or criterion, those should give more reliable results. But not today for me 🙂
Ok, I will try criterion for best results. If you can add your approach here for me to test it.
Ok, I will fix the requests.
I used Criterion and found what was happening. Basically, the approaches that use message == message.to_uppercase() take benefit from the following optimization in some cases:
Initially, I found it very strange that the approaches that transformed all the message to uppercase (creating a new String) and then compared them using == were faster than an approach that loops through all the chars once and does not perform very heavy operations on each iteration. But I forgot that the compiler and the std lib do a good job on optimizations and that iterating over chars is not so cheap as iterating over the bytes of the string.
One of the optimizations (the one on the image above) makes the operation to_uppercase much faster on pure ascii strings, that is why the state machine approach is slower in its worst cases (when it needs to loop through all the string to find the answer). For tests that have non-ascii chars, it tends to perform better.
Maybe it is possible to optimize the state machine approach to deal with ascii chars, but I did not have time to try.
Results
- For YELL and YELL QUESTION: one of the state machine's worst case because it needs to loop through all the string to find the answer. Thus, it performed worse than the other approaches for lore ipsum with only ascii chars but, surprisingly, performed better for lore ipsum with ascii and non-ascii chars. The explanation is that the other approaches could not take advantage of the ascii optimization.
- For normal question and whatever: state machine performed better, but that happened because the lorem ipsum has lower case chars distributed along the string, including the end of the string, which makes it almost O(1) for finding the answer. It is possible to build a worst case situation for the state machine in which all the chars are uppercase except the first, making it necessary to loop through all the chars. In that situation, we would have something close to the YELL and YELL QUESTION.
-
For the empty answer: For that specific situation, all the approaches are optimized, but the state-machine performed a little bit worse. That was because it used
triminstead oftrim_end, generating a little bit more computation.
I updated the content.md for the benchmark. I found that maybe this exercise is not the best for introducing the concept of state machines, thus, I decided to keep the state machine approach only in the benchmark. Maybe an exercise for building a tokenizer or something like that would be better to introduce this concept.
Looks like you committed the output directory of criterion, please clean up the diff.
yes, the idea was to provide the report (the output of criterion), given that it takes a lot of time to reproduce the result. But I will clean it.
Results cleaned and content updated.