python-libzim icon indicating copy to clipboard operation
python-libzim copied to clipboard

[next-major] Remove metadata conversion from python-libzim

Open benoit74 opened this issue 1 year ago • 6 comments

When adding metadatas, the python-libzim is doing a conversion for some types.

This should be removed because:

  • the python-libzim is meant to be only a thin wrapper around the C libzim
  • not all types are handled which is confusing (for instance we missed the conversion of tags list to a string)

This would need a major release since it is a breaking change.

benoit74 avatar Feb 08 '24 10:02 benoit74

sir is this issue still open

siddheshwar-9897 avatar Feb 23 '25 12:02 siddheshwar-9897


As per the issue, the current implementation in python-libzim automatically converts certain types of metadata (such as datetime.date and datetime.datetime objects) when they are added. The conversion includes:

Converting dates to a YYYY-MM-DD string and encoding in UTF-8. Automatically encoding strings in UTF-8. This leads to two main problems:

Inconsistent Type Handling: Not all types are handled consistently, such as lists (e.g., tags) not being converted to strings. Unnecessary Conversion: The conversion is not required, as libzim itself is likely capable of handling the raw metadata types. Suggested Solution: I propose we remove the automatic conversion logic entirely, which would simplify the code and avoid the inconsistencies. Specifically:

Remove date conversion: Stop automatically converting datetime.date and datetime.datetime objects into strings. Remove string encoding: Stop automatically encoding strings into UTF-8 bytes. Let libzim handle types directly: Allow libzim to handle the metadata content as it expects, letting users handle any necessary conversions on their own.

**This will:

Make python-libzim a more straightforward wrapper that doesn’t enforce unnecessary transformations. Avoid confusion or issues with inconsistent conversions (like missing list-to-string handling). Give users more flexibility and control over their metadata, ensuring they're working with the exact data format they need. Since this would be a breaking change (as it removes the automatic conversion behavior), I suggest we plan for a major release and update the documentation accordingly to inform users of the change.**

Let me know your thoughts or if you'd like to discuss further.

Best regards, Siddheshwar Kadm

siddheshwar-9897 avatar Feb 23 '25 13:02 siddheshwar-9897

I'm sorry, but I won't read your comment. This is way too long/verbose and using way too much emphasis for something as straightforward as removing few lines of code.

Propose a PR, but please be sharp.

benoit74 avatar Feb 23 '25 20:02 benoit74

I believe we have a PR already that's ready and awaits that next major

rgaudin avatar Feb 24 '25 08:02 rgaudin

I believe we have a PR already that's ready and awaits that next major

Indeed ! @aryanA101a PR is ready

benoit74 avatar Feb 24 '25 09:02 benoit74

Cannot assign aryanA101a so assigning myself so that it is clear

benoit74 avatar Feb 24 '25 09:02 benoit74