unicodedata2 Add old tables

It would be useful to be able to refer to tables for the previous versions of Unicode.

https://github.com/jquast/wcwidth/pull/23 is attempting to do that.

It would also be very helpful to faciliate Python based analysis of the changes in Unicode data.

It seems the build infrastructure of unicodedata2 is perfect for that.

In order to avoid forcing all users to install all data, perhaps a separate PyPI package name could be used for the 'all unicodedata versions' edition of this.

Oct 02 '19 04:10 jayvdb

Sorry for being so late to the conversation, but I am interested in joining forces :)

And I do have time for this etched out in the coming weeks. As far as wcwidth goes, the PR is to clean up the build infrastructure, so it's very much similar to the needs of unicodedata2's build infrastructure. I'll be sure to sit down and study unicodedata2 before I go any further with https://github.com/jquast/wcwidth/pull/23 changes.

Mar 01 '20 02:03 jquast

If there was something both of our packages could use, it would be "well-structured unicode data", the TXT files well-parsed and annotated, with the copyrights and dates and comments if possible, maybe just some json or toml data files.

If a CLI utility existed that helped navigate, fetch & parse the unicode text files archive, and spit out data blobs, this CLI tool could be a requirements-dev.txt for our projects that we could use for our respective code generation. This CLI app would be based on the class UnicodeData, roughly, from the unicodedata/2.py files.

@jayvdb: analysis of changes by version, through unicodedata2, would require an excess of API calls into the resulting C module, which we would have to manage a new API for a multi-verse, and then to organize those return values into structured data to compare. Phew! I think the CLI utility I propose would be better for any difference analysis, the data structures it outputs could immediately be analyzed for comparison without any further transformation.

Mar 24 '20 05:03 jquast

@jquast , what about if unicodedata2 had a "set unicode version" function, which switched the tables between versions.

The caller would then extract the info they needed from one version, and then switch and repeat with the other version?

Mar 24 '20 08:03 jayvdb