goatcounter icon indicating copy to clipboard operation
goatcounter copied to clipboard

[Feature Request] Integrate IPinfo's IP-to-Country ASN database.

Open abdullahdevrel opened this issue 1 year ago • 4 comments

Hi,

I work for IPinfo, but I have been using Goatcounter for my personal projects for several years and have been exploring self-hosting it recently.

I would like to request the integration of the IPinfo IP to Country or IP to Country ASN/ISP database for Goatcounter. I believe that from a development philosophy, IPinfo’s free IP database is perfect for Goatcounter. Additionally, there are technical benefits as well.

Goatcounter specific benefits

Binary distribution issues and "MaxMind®️'s EULA"

Even though I have not made progress in selfhosting it, but I believe the binary file includes MaxMind’s country database which actually creates a tricky situation. As far I know they do not allow redistribution of their database even the free database. They have an EULA that requires users to download their own database using their access tokens

  • https://www.maxmind.com/en/geolite2/eula
  • https://www.maxmind.com/en/geolite2-commercial-redistribution

The value proposition of IPinfo's database is that it is simply CC-BY-SA 4.0. You can do whatever you want with it as long as you give attribution. Commercial usage is allowed as well. Librespeed is using our data by packaging it directly in the repo: https://github.com/librespeed/speedtest/issues/641#issuecomment-2254375165

ASN/ISP data

You have mentioned that city-level data is too granular, so maybe you can add the ASN/ISP data from the IP to Country ASN database as an additional data source. The ASN/ISP detection is based on network routing data.

Our country-level data, even though free, is a zero-compromise, fully accurate database. We support daily updates and offer range clustering. It is just a pure subset of our IP geolocation database, without the more granular location information and only provides country level data.

General Technical benefits

The database has the following features:

  • It includes country and ASN information in the same database.
  • It is updated daily, with zero compromise to accuracy. There is no range clustering, and the database provides full accuracy.
  • The data granularity reaches individual IP level.
  • The database comes in MMDB database format.
  • It is licensed under CC-BY-SA 4.0, permitting commercial usage.
  • Available file formats include: CSV, MMDB, JSON
  • The data is tabular and unnested, making it very easy to use. The dataset includes both IPv4 and IPv6 in a single file.

Database schema

Field Name Example Data Type Description
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization

Documentation: https://ipinfo.io/developers/ip-to-country-asn-database

Samples are available here: https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country%20ASN

The database can be downloaded simply by accessing the storage URI with an access token.

curl -L https://ipinfo.io/data/free/country_asn.mmdb?token=<YOUR_TOKEN> -o country_asn.mmdb

My apologies for the wall of text. Let me know what you think. Thank you!

abdullahdevrel avatar Aug 31 '24 09:08 abdullahdevrel

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

Your databases seem way larger; "IP to Country Database" is ~38M. That's far to large to include in the GoatCounter binary. The "Geolite countries" is ~3.7M. I don't know why it's so much larger? People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

arp242 avatar Aug 31 '24 14:08 arp242

Thank you for reviewing the request.

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

The challenge is that they explicitly have a commercial distribution license for these free databases, so I am not sure what the consequences of this are, to be honest. I am not sure if those Linux distros have their own licensing terms with them that permit the distribution like that.

Your databases seem way larger; "IP to Country Database" is ~38M.

That is because our database provides full accuracy. The accuracy extends down to the individual IP level, even for a country database. When you download an IP database, compromise happens in two ways: with infrequent updates and range clustering. However, because we are providing full accuracy, the resulting database is larger.

Another idea is that since you can download the database directly via a URI, users can download it during installation. This will eliminate the need to package it with a database in the first place within the binary. Also, this download mechanism can support database updates as well.

People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

On a cursory view, it seems like the lookup mechanism is not database agnostic, but I could be wrong. There are structural differences between our database and MaxMind's (https://ipinfo.io/blog/migrating-from-maxmind-to-ipinfo/). Mainly:

  • We have the location built in, while they provide the geoname_id and a complementary geoname database
  • Our database structure is flat/tabular, while they opt for a nested database structure.

Let me know what you think.

abdullahdevrel avatar Aug 31 '24 15:08 abdullahdevrel

I want GoatCounter to be a "Just Works" binary without external dependencies, so people can easily self-host with a minimum of fuss. Dealing with GeoIP database downloads rather goes against that.

I don't mind providing compatibility with it, but I don't think it will be the default if it's so much larger.


However, if I try to use it, it errors out with:

maxminddb: cannot unmarshal EU into type struct { Names map[string]string "maxminddb:\"names\""; Code string "maxminddb:\"code\""; GeoNameID uint "maxminddb:\"geoname_id\"" }

So I guess the database structure is different.

I don't want to "migrate to" anything, I want to be compatible with both. I don't understand why you don't just provide a "Maxmind-compatible database" as an option.

Going from country = maxmind_data['country']['iso_code'] to country = ipinfo_data['country'] is a silly change and it doesn't really matter all that much which one is used. Maybe one is marginally better, but not at least providing a compatible database is rather lacking in pragmatism.

arp242 avatar Aug 31 '24 15:08 arp242

Thank you for reviewing. I understand that MaxMind's database is deeply integrated into the project and would require some engineering investment to adopt. We tried our best to provide the simplest and best data to use out there. Because of the ease of use and the quality of the data, it usually justifies making the engineering investment to adopt.

Due to the unpredictable nature of MaxMind's database structure, you have to wrap every call to get a value in switch/case statements. In our case, if we do not have the data, we simply return an empty string. Making a drop-in MaxMind integration compatible database would essentially be a compromise, in my personal opinion, as you have to create a nested version of the database, which will increase its size.

abdullahdevrel avatar Aug 31 '24 16:08 abdullahdevrel

@arp242 We launched an Unlimited requests free API service based on the IP to Country ASN database that I mentioned.

  • An infinite number of API requests against our IPinfo Lite database (renamed from IP to Country ASN database).
  • Free to use
  • Full accuracy and daily updates
  • CC BY SA 4.0 (Only attribution required, commercial permissive and no EULA)

You mentioned size being an issue with our full accuracy database, so we created this API specifically to support your project and a few similar projects that were interested in the IP to Country ASN database but the database size was an issue. Now, an API-based solution will eliminate the need for size-related issues.

curl https://api.ipinfo.io/lite/8.8.8.8?token=$TOKEN
{
  "ip": "8.8.8.8",
  "asn": "AS15169",
  "as_name": "Google LLC",
  "as_domain": "google.com",
  "country_code": "US",
  "country": "United States",
  "continent_code": "NA",
  "continent": "North America"
}

We built a fresh API system from scratch to support projects like yours with the ability to handle unlimited requests.

I integrated Goatcounter into a new project. Aside from the size issues, one problem is that the current database returns a lot of empty values for city-level information.

Image

So, I think you might as well choose our country's high accuracy data instead. Please let me know your thoughts.

abdullahdevrel avatar Apr 25 '25 17:04 abdullahdevrel

I integrated Goatcounter into a new project. Aside from the size issues, one problem is that the current database returns a lot of empty values for city-level information.

I think that's probably just because the database is somewhat old; on goatcounter.com I was actually updating the wrong .mmdb file so it used a database from 2023 up until a week or two ago >_< It seems a lot better now. I'll set up a job to automatically update this.

A REST API is not a particularly appealing option IMHO. It's even more work to implement and a lot slower than just querying a local database. It would be okay for a small site, but especially for goatcounter.com it would probably be prohibitively slow (just due to network latency etc.)

arp242 avatar Jun 02 '25 09:06 arp242

Thanks @arp242 for reviewing the request. We are trying our best to be as helpful as possible to Open Source Projects, so we created a whole API system from scratch that is designed to support billions of requests per day (if not per hour).

Tricky situation 🤔 You know the value proposition we have and the limitations the project has.

  • Local Database: higher accuracy (actually the best in-class geolocation data) but at the cost of having a higher download size.
  • API: Minimum maintenance issues as there is no local database to maintain and update (that 2023 database issue as you mentioned) and an infinite amount of queries, but it is an API.

We wanted to make sure that from our end, we are doing everything we can to support open source projects like GoatCounter, but I totally understand the challenges and limitations. Ping me if you end up checking out the IPinfo Lite service for Goatcounter or any other project you have in mind. Thanks and keep up the good work!

abdullahdevrel avatar Jun 02 '25 13:06 abdullahdevrel