Aristotle Project
Given an USGS Shakemap ID, get the corresponding rupture, exposure, vulnerability functions and GMPEs and perform a risk calculation (see https://docs.google.com/document/d/1mS2S7yOohiJjiEqL85E65k2XbOj7xEAuUo4NaAFHuco).
Geolocation by country can be done via the files in https://www.geoboundaries.org/globalDownloads.html
The difficulty here is to collect the world exposure, world vulnerability functions/taxonomy mappings and world GMPEs from dozen of repositories and fix all the inconsistencies. Here is a list of inconsistencies:
- [x] the hazard countries (https://github.com/GEMScienceTools/oq-mbtk/blob/master/openquake/ghm/mosaic.py#L130) are inconsistent with the risk countries, i.e. there are different or wrong 3-letter country codes
- [x] the risk repositories have inconsistencies, for instance
/home/risk/global_risk_model/North_Americacontains an empty directory Exposure/Exposure/Disaggregated/ differently from other regions - [x] the regions in the risk mosaic are different from the regions in the hazard mosaic
- [x] /home/risk/global_risk_model/Europe/Hazard/ is empty, so I cannot get gmmLT.xml
- [x] in many cases (i.e. Southeast_Asia containing IDN, PHL, SEA) there are multiple choices for gmmLT.xml
- [x] the site_model.csv files of the mosaic, when collected together, have duplicated points
- [x] the exposure XML files contain in the fieldmap
<field oq="residents" input="OCCUPANTS_PER_ASSET" />however in the CSV files the name isOCCUPANTS_PER_ASSET_AVERAGE.
NB: the risk regions are
Africa
Caribbean_Central_America
Central_Asia
East_Asia
Europe
Middle_East
North_America
North_Asia
Oceania
South_America
South_Asia
Southeast_Asia
USGS ruptures (like https://earthquake.usgs.gov/product/shakemap/us70006sj8/atlas/1594403794805/download/rupture.json) have the format
{
"type": "FeatureCollection",
"metadata": {
"reference": "Origin",
"id": "us70006sj8",
"network": "USGS National Earthquake Information Center, PDE",
"netid": "us",
"productcode": "us70006sj8",
"time": "2019-12-30T17:18:57.000000Z",
"lat": 35.5909,
"lon": 74.6280,
"depth": 13.8,
"mag": 5.6,
"locstring": "34km NW of Idgah, Pakistan",
"mech": "ALL",
"rake": 0
},
"features": [
{
"type": "Feature",
"properties": {
"rupture type": "rupture extent"
},
"geometry": {
"type": "Point",
"coordinates": [ 74.6280, 35.5909, 13.8 ]
}
}
]
}
- [x] The first step is to add a function to download such files and to convert into a
rupture_dict - [x] The second step is to add the
rupture_dictparameter to the job.ini - [x] The third step is to generate planar ruptures from
rupture_dictby using the code in IPT
Sometimes the USGS also gives .json files with geometries that can be converted to OpenQuake ruptures as in this notebook: https://github.com/gem/earthquake-scenarios/blob/main/src/2_1_rupture_usgs_json_to_oq_xml.ipynb
We also need
- [x] a file with famous ruptures (taken from the repo eartquake-scenarios) to be used for running tests
- [x] a file with the taxonomy mapping to use for each country
- [x] store the vulnerability functions for the whole world
- [x] store the taxonomy mappings for the whole world
See also https://gempad.openquake.org/p/2023-12-21-Aristotle-xkddghsdg9876hf.
The taxonomy mapping per country can be extracted from here: https://gitlab.openquake.org/risk/global_risk_model/Scripts/-/blob/master/grm_calculations/job_files.csv
NB: the taxonomy mapping is a HUGE problem. Currently the engine cannot manage the case of two assets of the same taxonomy being mapped to different vulnerability functions because they belong to different countries. The taxonomy mapping is global, while we would need to make it country-dependent. Also, splitting the exposure in countries and perform multiple calculations is a solution only in theory, since it makes everything more complex and much slower. We will probably have to rewrite completely the risk calculators (for instance the RiskComputer assumes assets with the same taxonomy are associated to the same risk functions), which is hard :-(
Michele, please see the attached CSV to help you map between the GRM repos and the hazard mosaic repos. I also included some comments on 'exceptions' to the general case.
If you clone the relevant risk region repo (e.g., global_risk_model/Africa) and --recurse-submodules / update the submodules, you should have all the dependencies (hazard, exposure, vulnerability) on the appropriate versions. The job.ini file will list all the specific paths you need for the gmmLT, vulnerability curves, etc. The Exposure_
The current status of the risk repos on cole/davis is unknown, since we have not run the GRM since June and individual modellers may make changes to those files, some repos may only be partially cloned (without submodules) after some server/cluster modifications, etc.
Currently the idea is to build a few HDF5 files at each new release of the mosaic:
- site_model.hdf5 for the hazard mosaic (using
utils/build_global_sites) - exposure.hdf5 for the risk mosaic (using
utils/build_global_exposure)
Then the Aristotle calculator will be able to extract from such files the relevant information quickly.
A further crucial feature will be the inclusion of recording station data for ground motion conditioning, if such station data is already available at the time of launching an Aristotle calculation.
Given a USGS ShakeMap id, the station data curated by the USGS for the event can be found in the associated stationlist.json file, for instance https://earthquake.usgs.gov/product/shakemap/us7000m9g4/us/1715297585708/download/stationlist.json for the 2024 M7.4 Hualien earthquake earlier this year in Taiwan. INGV uses an identical format for their station data file, for instance http://shakemap.rm.ingv.it/shake4/data/8863681/current/products/stationlist.json for the 2016 M6.5 Norcia earthquake. Documentation of the stationlist.json file format is available at https://usgs.github.io/shakemap/manual4_0/ug_products.html#stationlist-geojson.
The json file would need to be parsed, checked for duplicate entries and outliers, and converted to the csv format accepted by the OpenQuake engine (or directly to the internal dataframe format used by the engine after reading the csv station data input file).
The station data file can contain two kinds of stations – 'seismic' stations and 'macroseismic' stations. Seismic stations report the ground motions recorded by instruments, whereas macroseismic stations might report intensity values inferred from observed damage patterns for historical earthquakes or inferred from felt reports for recent earthquakes. For this implementation, only the seismic stations should be considered. All available IMTs relevant for the risk calculations should be read from the station data file – typically available IMTs might include PGA, SA(0.3), SA(1.0), and SA(3.0).
A site model would also need to be generated for the station sites. If the Vs30 values at the locations of the stations are already available through the stationlist.json file, those can be used directly, otherwise the Vs30 values for the stations would need to be extracted from the global vs30 hdf5 file. If any other site parameters other than Vs30 are required by the ground motion models that will be used in the calculation, those additional site parameters will also need to be included in the station site model file.
Once we have these two new inputs (the station data file or dataframe, and the station site model), Aristotle should run the requested scenario with the conditioned_gmfs calculator.
Points from 2024-07-24 meeting:
- [ ] If the user uploads a rupture file, or if we have the finite fault rupture from the USGS, then the input fields for strike and dip should be greyed out (see draft here: https://github.com/gem/oq-engine/pull/9883)
- [x] Presentation of the results table could be improved. eg. use human readable numbers for the losses, reduce the number of decimal places, clean up the names of the ground motion models when they are multiple lines as in
[KothaEtAl2020ESHM20SlopeGeology]
sigma_mu_epsilon = -2.85697000
c3_epsilon = -1.73205100
- [ ] Possibility to directly read the stationlist.json file instead of the user uploading the station data csv file (Catalina has a parser to convert the json to the OpenQuake csv format). The json format is not really standardised though, so there might be many edge cases we run into. (Alberto mentioned an xml format as well, but that might be deprecated) (implemented in this draft: https://github.com/gem/oq-engine/pull/9899)
- [ ] Document how to use the API to trigger/run calculations automatically instead of using the web interface manually
- [ ] Use ShakeMap directly as input for the risk calculations, now that they also include SA(0.6) as an IMT
- [ ] Possibly remove asset hazard distance option - TBD
- [ ] Links to OQ manual, or ‘i’ buttons to help explain terms?
- [ ] Add uncertainty/confidence intervals - like we are doing for event response
- [ ] Check the TRT has been harmonised across globe + Provide guidance on TRT around the world
- [ ] Have slightly different interfaces for different users? Expert, basic? Or have workflows for ‘rapid’ and ‘updates’
- [ ] Make clear the parameters that are default?
We need to expose the time_event parameter through the webui (converting UTC to the local time), with the possibility to override it.
the USGS rupture.json file contains the UTC timestamp: https://earthquake.usgs.gov/realtime/product/shakemap/us6000n8tq/us/1721368807663/download/rupture.json but the finite fault file may or may not contain the time: https://earthquake.usgs.gov/realtime/product/finite-fault/us6000n8tq_1/us/1719609323516/shakemap_polygon.txt and python’s time module can convert from UTC to local time: https://docs.python.org/3.11/library/time.html#time.localtime
NOTE: currently drafted here: https://github.com/gem/oq-engine/pull/9883
Here we can find some code to convert the USGS stationlist.json to a csv file in a format compatible with OQ: https://github.com/gem/earthquake-scenarios/blob/main/src/1_1_stations_usgs_json_to_csv.ipynb
Testing the service on recent earthquakes, we noticed that in most cases the USGS provides very limited information right after the event and for the next few days, so in most cases we can't rely on shakemap or finite-fault for quick responses. We need to collect some statistics about the USGS policies making data available after the events and figure out proper strategies to run calculations with different sets of data at different time deltas after an event.