Proper database schema
The current database schema consists only of a timestamp together with the city and the full JSON dump for this city and timestamp. This results in many problems, mainly bad performance and a waste of space (never changing attributes, e.g. lot coordinates, are saved each time even though they never change). A new database scheme should have the following properties:
- save static data (e.g. coordinates, total size...) only once
- allow location based access (see https://github.com/offenesdresden/ParkAPI/issues/130)
- allow city based access to provide the current API
This is my concept of how the new database could look like:

@kiliankoe @hbruch what are your thoughts on this?
<?xml version="1.0" encoding="utf-8" ?>
<!-- SQL XML created by WWW SQL Designer, https://github.com/ondras/wwwsqldesigner/ -->
<!-- Active URL: http://127.0.0.1:8080/ -->
<sql>
<datatypes db="mysql">
<group label="Numeric" color="rgb(238,238,170)">
<type label="Integer" length="0" sql="INTEGER" quote=""/>
<type label="TINYINT" length="0" sql="TINYINT" quote=""/>
<type label="SMALLINT" length="0" sql="SMALLINT" quote=""/>
<type label="MEDIUMINT" length="0" sql="MEDIUMINT" quote=""/>
<type label="INT" length="0" sql="INT" quote=""/>
<type label="BIGINT" length="0" sql="BIGINT" quote=""/>
<type label="Decimal" length="1" sql="DECIMAL" re="DEC" quote=""/>
<type label="Single precision" length="0" sql="FLOAT" quote=""/>
<type label="Double precision" length="0" sql="DOUBLE" re="DOUBLE" quote=""/>
</group>
<group label="Character" color="rgb(255,200,200)">
<type label="Char" length="1" sql="CHAR" quote="'"/>
<type label="Varchar" length="1" sql="VARCHAR" quote="'"/>
<type label="Text" length="0" sql="MEDIUMTEXT" re="TEXT" quote="'"/>
<type label="Binary" length="1" sql="BINARY" quote="'"/>
<type label="Varbinary" length="1" sql="VARBINARY" quote="'"/>
<type label="BLOB" length="0" sql="BLOB" re="BLOB" quote="'"/>
</group>
<group label="Date & Time" color="rgb(200,255,200)">
<type label="Date" length="0" sql="DATE" quote="'"/>
<type label="Time" length="0" sql="TIME" quote="'"/>
<type label="Datetime" length="0" sql="DATETIME" quote="'"/>
<type label="Year" length="0" sql="YEAR" quote=""/>
<type label="Timestamp" length="0" sql="TIMESTAMP" quote="'"/>
</group>
<group label="Miscellaneous" color="rgb(200,200,255)">
<type label="ENUM" length="1" sql="ENUM" quote=""/>
<type label="SET" length="1" sql="SET" quote=""/>
<type label="Bit" length="0" sql="bit" quote=""/>
</group>
</datatypes><table x="512" y="221" name="Pools">
<row name="Name" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
</row>
<row name="Source" null="0" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
</row>
<row name="Provider" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
<default>NULL</default></row>
<row name="URL" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
<default>NULL</default></row>
<row name="License" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
<default>NULL</default></row>
<key type="PRIMARY" name="Name">
<part>Name</part>
</key>
</table>
<table x="664" y="224" name="Lots">
<row name="ID" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
</row>
<row name="Pool" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
<relation table="Pools" row="Name" />
</row>
<row name="Type" null="0" autoincrement="0">
<datatype>ENUM</datatype>
</row>
<row name="Name" null="0" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
</row>
<row name="Address" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
</row>
<row name="Latitute" null="1" autoincrement="0">
<datatype>FLOAT</datatype>
<default>NULL</default></row>
<row name="Longitute" null="1" autoincrement="0">
<datatype>FLOAT</datatype>
<default>NULL</default></row>
<key type="PRIMARY" name="Lot_ID">
<part>ID</part>
</key>
</table>
<table x="801" y="226" name="Occupancies">
<row name="Timestamp" null="0" autoincrement="0">
<datatype>TIMESTAMP</datatype>
</row>
<row name="Lot" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
<relation table="Lots" row="ID" />
</row>
<row name="Total" null="0" autoincrement="0">
<datatype>BIGINT</datatype>
<default>0</default></row>
<row name="Free" null="0" autoincrement="0">
<datatype>BIGINT</datatype>
<default>0</default></row>
<row name="State" null="0" autoincrement="0">
<datatype>ENUM</datatype>
</row>
<key type="PRIMARY" name="Live_Key">
<part>Timestamp</part>
<part>Lot</part>
</key>
</table>
</sql>
Looks good 👍
A couple minor things:
- I'd suggest adding a table for datasources with their attribution info, source data and possibly an optional location as well? They don't necessarily have to match the pools, do they? Or will we just have pools for each datasource and additional pools with no further metadata? Would that require marking one pool as the canonical pool for a lot if one means to access the source metadata?
- Lots should have a bit more metadata like a detail URL, pricing info (just as a string), opening hours (same) and possibly more address fields like region (optional), city and country? Also I think it would make sense to add a field for additional information (possibly again using postgres' json type) to support those things we've wanted to cover in the past, but overlooked. Stuff like entrance height if something like that is available. Apps would basically just show a list of key-value pairs if anything is available. Oh and there's typos in latitude and longitude, just fyi.
- I think it would make sense to split up the timestamp for occupancies into two values, one for when the data was downloaded/fetched and one for when the data was updated (in case that information is available - which it mostly is).
Or will we just have pools for each datasource and additional pools with no further metadata? Would that require marking one pool as the canonical pool for a lot if one means to access the source metadata?
A pool is a data source. Pools are not meant to be reflected in the public API (except for the current legacy one).
Lots should have a bit more metadata like a detail URL, pricing info (just as a string), opening hours (same) and possibly more address fields like region (optional), city and country? Also I think it would make sense to add a field for additional information (possibly again using postgres' json type) to support those things we've wanted to cover in the past, but overlooked.
I was thinking about that. How easy/hard is it to add additional (optional) fields later to a table? Instead of adding fields in case we might need them and adding a json field that then serves as a dump for anything we forgot I'd like to add fields as we need them and alter the database accordingly. These fields all have to be optional (otherwise old data would break).
If pools are not meant to be publicly accessed, how would one see the license information?
Also I don't think it's feasible to add lot fields for every possible bit of data. There are many things to specifically add support for, but there'll always be things that are only relevant for a few lots. Having a fallback for that seems like a good solution imho.
If pools are not meant to be publicly accessed, how would one see the license information?
This data would be added per lot.
Ah, that makes sense 👍 But querying by pool would be possible, right?