Restructure ip_country

Open dnwiebe opened this issue 1 year ago • 0 comments

Background

At the moment, all the ip_country logic is contained in the ip_country crate. This seems to make logical sense, but it represents a potential problem.

The reason for the masq_lib crate is to get all our dependencies flowing in one direction: masq_lib doesn't depend on anything else in the codebase, and everything else in the codebase (if necessary) depends on masq_lib and on nothing else in the codebase. This keeps us from developing circular dependencies.

However, at the moment there's a direct dependency from node to ip_country. That shouldn't be.

ip_country consists of two kinds of production code: the kind that serializes the CSV file from DBIP into the source file containing the IPv4 and IPv6 compressed datastores, and the kind that uses that data for daily Node operations. The first kind of code is outside the Node, and should be run only by developers and only during the build process--and then only if the compressed datastores don't already exist. The second kind of code is shipped with the Node and run by users. This is a fairly clear-cut distinction.

However, the tests are a bit trickier, because tests of the deserialize code use the serialize logic to create something to deserialize, and tests of the serialize code use the deserialize logic to assert on the serialization.

Task

Find a reasonable way to arrange the ip_country code so that dependencies link only from subprojects to masq_lib, but all the existing ip_country tests still run--or are replaced by tests that test the same things.

Suggestions

The generated dbip_country.rs file should not be part of the GitHub repo. Whenever it's needed, build scripts should download the CSV from DBIP and generate the file: however, this should only happen when it's necessary. We don't need to spend extra build time regenerating a file that will turn out exactly the same as the one it's replacing, and we shouldn't pester DBIP with large-volume downloads any more often than necessary.
If the absence of dbip_country.rs causes compilation problems, we can put a version of dbip_country.rs in the GitHub repository that contains no data, but only functions that panic when called, where the panic message explains how to generate the code containing the real data that will overwrite it.
It's not clear yet whether downloading 25MB from DBIP every time an Actions job runs is un-neighborly or not. If it turns out to be, we should investigate installing a regular (every six months? Three?) download from DBIP to some datastore we control (S3?) and have our Actions jobs and local builds download from that instead.

Aug 26 '24 12:08 dnwiebe