New SDMX TypeProvider
Since several SDMX standard-based data sources have emerged recently it
would be useful to have a type provider supporting such data sources.
The following exposes the current status of the effort of creating an
SDMX TypeProvider. It is open to ideas and suggestions.
I am very much looking forward to getting feedback from the FSharp.Data
community to whether it would it be a good fit to have an SDMX type
provider implementation in FSharp.Data.
There are many details to cover so the following will only list the simplest examples and provide references below for further details in case someone is interested.
Motivation
The amount of data available over SDMX is growing, the standard is a good fit for the type provider approach.
The goal
Implement the SdmxProvider which will support the simplest cases at
the first step.
Background
SDMX - Statistical Data and Metadata eXchange gives a standardized way of
exposing statistical databases as a web service, which provides all
necessary metadata and extensive ways of querying the data.
Currently, there are multiple implementations of SDMX standard which can
be accessed publicly
- WorldBank
- European Central Bank
- Eurostat
- Statistics Estonia
- See Implementation References
Specification and WorldBank example
For simplicity, let's remember already familiar WorldBank TypeProvider from FSharpData
and replicate the same scenario using SDMX, let's say we want to query annual agricultural land data in Germany.
WorldBank Provider
let wb = WorldBankData.GetDataContext()
let data = wb.Countries.Germany.Indicators.``Agricultural land (sq. km)``
SDMX Specification
Following steps describe how the same data can be queried using SDMX rest API.
Everything starts fromwsEntryPoint which in case of WorldBank is
- https://api.worldbank.org/v2/sdmx/rest/
There are two major parts to this process, metadata and data retrieval.
Metadata
- Retrieve all
dataflows- https://api.worldbank.org/v2/sdmx/rest/dataflow/all/all/latest/ - We choose WDI - World Development Indicators
- Retrieve all
WDIrelatedmetadataanddatastructureinformation - https://api.worldbank.org/v2/sdmx/rest/datastructure/WB/WDI/1.0/?references=children - The previous step exposes information about existing data dimensions, in this case, there are 3 dimensions.
- Frequency - [Annual, Montly, Quarterly, ...]
- Series - [List of Indicators ... ]
- Reference Area - [List of countries and regions .. ]
Data
Dimension information is used to create a query(key), we are looking for annual agricultural land data in Germany. To create such a key we build a sequence of dimension identifiers separated by a dots. (ordering matters).
-
A-Annual -
AG_LND_AGRI_K2-Agricultural land (sq. km) -
DEU-Germany
Data query(key): A.AG_LND_AGRI_K2.DEU
Finally, data is retrieved using the URL: https://api.worldbank.org/v2/sdmx/rest/data/WDI/A.AG_LND_AGRI_K2.DEU/
SDMX Provider
To query the same data from Wordlbank using SdmxProvider would look like following
type wb = SdmxProvider<"https://api.worldbank.org/v2/sdmx/rest/">
let data = wb.``World Development Indicators``.Annual.``Agricultural land (sq. km)``.Germany
Navigation using. (dots) should allow interaction on multiple levels. The initialization of TypeProvider will need initial configuration or static parameters which are
- Protocol: Http or Https
- EntryPoint: Rest API entry point URL
- Credentials: In case of API is not publicly available
Foreseen issues
- SDMX supports complex data, e.g. it is possible to choose multiple values from the single dimension. (Multiple countries or indicators) this will require some design decisions.
- How to expose the SDMX
?queryparamsthat is used for additional filtering in the type provider? - Intermittent runtime errors
- The provider needs to have a mechanism of retrying to fetch the data.
Additional features to be included:
- Paging
- Lazy Fetching
- Async
- other optimizations
References
- http://sdmx.org
- SDMX Technical Specifications
-
Section 7 – Web Services Guidelines version
2.1 - Web Service API Cheat Sheet
- I am using wiki page to collect related and useful resources.
- Reference to fork https://github.com/demonno/FSharp.Data
Comments, ideas, suggestions are welcome. thanks
Would be nice to be able to replace the WorldBank provider which is very specific with something like this that would generalize to other data sources, and I think a SDMX provider would fit nicely into FSharp.Data
bumping this issue; this would make it much easier to create data science examples since the amount of data provided has grown significantly since this was created. any implementation tips would be appreciated it
A prototype working implementation is in https://github.com/demonno/FSharp.Data fork. We'll try to finally create a pull request based on that work. There is support for SDMX protocol version 2.1. Some SDMX sources offer only SDMX 2.0 protocol and that part is still not yet implemented. The description on how the proposed solution works is described here: https://digikogu.taltech.ee/en/Item/47d2c178-2681-4aa5-9e25-23868a21c29b
@juhan no need to implement 2.0; sdmx 3.0 is being released this year as well. Most places will move to a more modern version shortly.