MOUGEL Sébastien
MOUGEL Sébastien
instead of opening a file for each request... maybe you should try to put it into memory ? But : if the file content change... You need to refresh your...
@arwyn the only way to verify these assertions is to perform some benchmarks... Disk Access are VERY VERY costly... There is a factor 1000 in terms of latency and throughput...
Hello @mw66 I was thinking about that today... Forcing the model to predict 2 tokens at a time... But remember that the model generate only one token during the forward...