LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

Experiments with Alphanumeric Entities

Open jasonngap1 opened this issue 1 year ago • 3 comments

Hi there! Thank you for the wonderful work done as this greatly reduced the memory overhead and increased inference time for my use case. I noticed that the prompt compression in some context that involves alphanumeric entities like car plate numbers and ID numbers causes them to be corrupted.

Are there any experiments that you have conducted with such entities or any optimisation techniques that can mitigate this issue that you can share please? I am currently using the prompt compression with default parameters.

jasonngap1 avatar Feb 21 '24 18:02 jasonngap1

Hi @jasonngap1,

Thank you for your interest in our method. We acknowledge the issue of losing sensitive entities and will release a new piece of work to address this problem. Currently, if users need to manually specify entities that must be preserved, you can find the relevant code in the develop branch.

iofu728 avatar Feb 26 '24 07:02 iofu728

Thank you for working on this issue. May I please know where exactly in the develop branch can I manually specify the entities to preserve?

jasonngap1 avatar Feb 27 '24 09:02 jasonngap1

Hi @jasonngap1, you can follow the scripts to use that.

iofu728 avatar Mar 01 '24 10:03 iofu728