Profile the gpt-2 module and the tortoise-tts gpt-2 module and try to improve the gpt-2 module's performance
This task is blocked by the gpt-2 forward pass test being added since this could introduce regressions. https://github.com/balisujohn/tortoise.cpp/issues/5
The task is as follows:
measure the runtime of the autoregressive model all the way from inputs to the full sequence of tokens and last layer latents being generated (as checked by the test in https://github.com/balisujohn/tortoise.cpp/issues/5 ), and the time taken for the corresponding batch of 4 token sequences and final layer latents in tortoise-tts.
Then, try improving the efficiency of the tortoise.cpp forward pass. Some suggestions are as follows:
try removing seemingly redundant ops change ops to in place where possible
Feel free to ask questions here or in the discord.