tasklib icon indicating copy to clipboard operation
tasklib copied to clipboard

Be robust to invalid utf-8 characters in task db

Open bergercookie opened this issue 3 years ago • 4 comments

Sometimes, there may exist non-printable characters in the taskwarrior pending.data file, for example due to an emoji added to the task description but is not yet properly parsed by python. In these cases, we'd want tasklib not to crash but rather ignore it and keep parsing the results of the command.

bergercookie avatar Jun 06 '22 09:06 bergercookie

Not entirely sure that ignoring the error is the way to go, because it means the records obtained from taskwarrior will be inconsistent with the actual stored data. In fact it shouldn't be getting invalid text at all that cannot be decoded probably... it doesn't support arbitrary binary data.

Here is a small reproducer which tries to store the character '🦀':

# add the rustlang crab emoji as task annotation from shell prompt:
task 1 annotate -- $'\U0001f980\'

After this, tasklib will crash when reading the task database. I am not sure if this is rather a bug in taskwarrior, as it doesn't seem to get encoded correctly; after the above annotation, task 1 edit shows the annotation as two 16-bit characters 0xd83e and 0xdd80 rather than the single character 0x1f980. However, if this character is manually inserted with task 1 edit and saved, then both taskwarrior and tasklib seem to handle it just fine.

@bergercookie did you get the character into the database by specifically using task annotate ?

smemsh avatar Mar 13 '24 12:03 smemsh

Note that the latest development version of taskwarrior seems to store this fine, see GothenburgBitFactory/taskwarrior#3286

smemsh avatar Mar 14 '24 06:03 smemsh