sourcespec icon indicating copy to clipboard operation
sourcespec copied to clipboard

SourceSpec v2

Open claudiodsf opened this issue 1 year ago • 17 comments

SourceSpec v2

This issue is for discussing the development of SourceSpec v2.

The development takes place in the v2 branch.

The main objectives of this major release concern three areas:

  • Make code easier to understand and to maintain
  • Provide a single executable, called sourcespec with subcommands
  • Officially support using SourceSpec as a Python API, with provided examples

Code improvements

  • [x] Use a global config object, making thus unnecessary to pass config as a function parameter. This has been implemented through these commits
  • [ ] Make logging optional (to improve API usage)
  • [ ] Make writing output to disk optional (to improve API usage)
  • [ ] Reorganize the Python sources in submodules (subdirectories), dropping the ssp_ prefix.
    • [ ] Each Python file should expose only one public function or public class (with maybe, as an exception, a data_types.py file, exposing many data types classes)
    • [ ] Each submodule should expose, via __init__.py, only the public functions and classes that are used by other submodules. An example is the current implementation of config/__init__.py
    • [ ] The main script (which calls the subcommands) will go into a file called main.py in the main directory
    • [ ] Subcommands will go into a subcommands submodule (subdirectory). E.g., subcommands/spectral_inversion.py, subcommands/direct_modelling.py, subcommands/source_residuals.py
  • [ ] Improve the structure of the config file / config object (see below)
  • [ ] Change all the copyright headers to :copyright: 2018-2024 The SourceSpec Developers
  • [ ] Change the licence to GPLV3

Single executable

We will provide a single executable, called sourcespec with subcommands.

Here's a mockup of invoking sourcespec -h:

usage: sourcespec [-h] [-c CONFIGFILE] [-v] <command> [options] ...

sourcespec: Earthquake source parameters from P- or S-wave displacement spectra

options:
  -h, --help            show this help message and exit
  -c CONFIGFILE, --configfile CONFIGFILE
                        config file (default: sourcespec.conf)
  -v, --version         show program's version number and exit

commands:
    sample_config        write sample config file to current directory and exit
    update_config        update an existing config file to the latest version
    update_database      update an existing SourceSpec database from a previous version
    sample_sspevent      write sample SourceSpec Event File and exit
    spectral_inversion   inversion of P- or S-wave spectra
    direct_modelling     direct modelling of P- or S-wave spectra, based on user-defined earthquake source parameters
    source_residuals     compute station residuals from source_spec output
    clipping_detection   test the clipping detection algorithm
    plot_sourcepars      1D or 2D plot of source parameters from a sqlite parameter file

The old commands (source_spec, source_model, source_residual, etc.) will still exist in v2.0 and will print an error message when invoked. They will be removed in v2.1.

Python API

Here's some ideas from @krisvanneste, which I reworked:

# Import the global config object, pre-filled with default values
from sourcespec.config import config

# Update one ore more values and, optionally, validate the configuration
config.update(<DICTIONARY WITH CONF PARAMS>)
config.validate()

# Optionally, init logging
from sourcespec.logging import init_logging
init_logging()

# Functions reading inventory and traces from disk.
# They return standard ObsPy objects and can be replaced by user-defined ones.
# Note that those functions are aware of the global `config` object.
from sourcespec.input import read_station_metadata, read_traces
inv = read_station_metadata()
st = read_traces()

# Read events and picks. Returns SSPEvent() and SSPPick() objects.
# Can be replaced by user, but the user should take care of using PEvent() and SSPPick() objects
from sourcespec.input import read_event_and_picks
ssp_event, ssp_picks = read_event_and_picks()
# optionally, the stream can be passed, in case event and/or pick information is in the trace header
ssp_event, ssp_picks = read_event_and_picks(st)

# Functions to further prepare the data for the inversion
from sourcespec.preprocess import augment_event, augment_traces

# add velocity info to hypocenter, add evname, add event to config
augment_event(ssp_event)

# add information in trace objects
st = augment_traces(st, inventory, ssp_event, picks)
# here's what this function does internally:
#	for trace in st:
#		_correct_traceid(trace)
#		_add_instrtype(trace)
#		_add_inventory(trace, inventory)
#		_check_instrtype(trace)
#		_add_coords(trace)
#		_add_event(trace, ssp_event)
#		_add_picks(trace, picks)

# process traces, build spectra
from sourcespec.process import process_traces, build_spectra
proc_st = process_traces(st)
spec_st, specnoise_st, weight_st = build_spectra(proc_st)

# Spectral inversion
from sourcespec.spectral_inversion import spectral_inversion
sspec_output = spectral_inversion(spec_st, weight_st)

# Compute summary statistics from station spectral parameters
from statistics import compute_summary_statistics
compute_summary_statistics(sspec_output)

# Other optional things like:
# - radiated energy
# - plotting
# - local magnitude

Reorganize configuration

We will reorganize configuration into sections, reflecting the submodules structures. The new config file should look like the following (comments removed here for simplicity):

[ general ]
author_name = None
author_email = None
agency_full_name = None
agency_short_name = None
agency_url = None
agency_logo = None

[ input ]
mis_oriented_channels = None
instrument_code_acceleration = None
instrument_code_velocity = None
traceid_mapping_file = None
ignore_traceids = None
use_traceids = None
epi_dist_ranges = None
station_metadata = None
sensitivity = None
database_file = None
correct_instrumental_response = True
trace_units = auto

[ processing ]
vp_tt = None
vs_tt = None
NLL_time_dir = None
p_arrival_tolerance = 4.0
s_arrival_tolerance = 4.0
noise_pre_time = 6.0
signal_pre_time = 1.0
win_length = 5.0
variable_win_length_factor = None

...

The parameters will be accessible from the config object, as in the following examples:

config.general.author_name

config.input.station_metadata

config.processing.win_length

How to test SourceSpec v2

The easiest way is to clone the git repository to a new directory, called sourcespec2, then immediately switch to the v2 branch:

git clone [email protected]:SeismicSource/sourcespec.git sourcespec2
cd sourcespec2 && git checkout v2

In the v2 branch, the package name has been temporary renamed to sourcespec2 so that it can be installed alongside the current version. For installing, go to the sourcespec2 directory and run:

pip install -e .

This will install the sourcespec2 package and the command line utils, currently named source_spec2, source_model2, etc.

Keeping the branch up-to-date

The v2 branch is frequently rebased, so make sure to do a git pull --force

Contributing to SourceSpec v2

Contributions are always more than welcome!

Just make sure to create your development branch from the v2 branch and to make your pull requests against the v2 branch 😉.

Looking for feedback

Pinging here @krisvanneste and @rcabdia who are the main API users.

Everybody else is welcome to contribute to the discussion!

claudiodsf avatar Jul 10 '24 09:07 claudiodsf

Hi Claudio,

I made a first attempt to refactor some functions (in ssp_setup.py, ssp_read_traces.py and source_spec.py) in order to make it possible to run sourcespec as a function. Should I create a new branch for this?

krisvanneste avatar Jul 10 '24 16:07 krisvanneste

OK, I created a new branch called v2_ssp_func, but now I get a strange error trying to push it to my github fork: refusing to allow an OAuth App to create or update workflow .github/workflows/github-deploy.yml without workflow scope I will try to resolve this tomorrow...

krisvanneste avatar Jul 10 '24 17:07 krisvanneste

OK, I created a new branch called v2_ssp_func, but now I get a strange error trying to push it to my github fork: refusing to allow an OAuth App to create or update workflow .github/workflows/github-deploy.yml without workflow scope I will try to resolve this tomorrow...

Hi Kris, maybe the solution is here : https://stackoverflow.com/questions/64059610/how-to-resolve-refusing-to-allow-an-oauth-app-to-create-or-update-workflow-on

claudiodsf avatar Jul 11 '24 06:07 claudiodsf

I have been able to solve it by changing the repository URL in sourcetree, as mentioned here Pull request will follow.

krisvanneste avatar Jul 11 '24 09:07 krisvanneste

I'm able to run sourcespec2 in a jupyter notebook and without writing anything to disk with the modifications I made! Here's a PDF showing the notebook: test_ssp_func.pdf For now I read all required data from our own servers, which are not open to the outside world. We will need to replace that with FDSN service calls. Some lessons learned:

  • it would be nice to have a default config.options mockup of the command-line arguments
  • I had to add a TRACEID_MAP attribute to config
  • I noticed that all config attributes that are lists contain strings. In the case of config.Er_freq_range (default: ['None', 'None']), this results in an error; depending on the configuration, there may be other such cases
  • it would be nice to add a methods to SSPEvent / SSPPick to construct them from obspy.core.event.Event / obspy.core.event.Pick objects
  • it should be possible to pass an empty inventory if instrument response is already removed or if the metadata are already attached to the traces

There is room for further improvements/streamlining, but I think we are on the right track.

krisvanneste avatar Jul 11 '24 14:07 krisvanneste

Thanks, Kris, for the example!

Here's my comment:

  • it would be nice to have a default config.options mockup of the command-line arguments

I would like to bring the options into the config object as normal attributes: there is no point, from the point of view of the API usage, to have them separated into a sub-object.

For the CLI usage, the options should be used to override the config parameters: it should be possible to run SourceSpec without any option, and have everything in the config file.

  • I had to add a TRACEID_MAP attribute to config

Ok, I will put it intoConfig().__init__().

  • I noticed that all config attributes that are lists contain strings. In the case of config.Er_freq_range (default: ['None', 'None']), this results in an error; depending on the configuration, there may be other such cases

Ok, I will fix it in Config().__init__(). I made a quick scan: it doesn't seem to me that there are other such cases.

  • it would be nice to add a methods to SSPEvent / SSPPick to construct them from obspy.core.event.Event / obspy.core.event.Pick objects

Noted 😉

  • it should be possible to pass an empty inventory if instrument response is already removed or if the metadata are already attached to the traces

Noted 😉

There is room for further improvements/streamlining, but I think we are on the right track.

Great!

claudiodsf avatar Jul 15 '24 10:07 claudiodsf

  • I had to add a TRACEID_MAP attribute to config
  • I noticed that all config attributes that are lists contain strings. In the case of config.Er_freq_range (default: ['None', 'None']), this results in an error; depending on the configuration, there may be other such cases

This two points are fixed in this commit: https://github.com/SeismicSource/sourcespec/commit/87ce4f4315a24cc2da3527be6fbd793e3269b27e

claudiodsf avatar Jul 15 '24 14:07 claudiodsf

Claudio, I tested my notebook after the new rebase, and almost everything still works, except:

  • I had to update the call to the ssp_output function
  • I get an error when plotting stacked spectra because my version of matplotlib does not support ax.inset_axes(); I will try to solve this with a local patch

Based on an issue with the main branch that I experienced yesterday, I realized that we also need to have a function to clean up state at the end of each run (or probably better at the beginning), so that no problems occur when ssp_run is called a second time. So far, I have to do the following:

from sourcespec import ssp_setup
ssp_setup.oldlogfile = None
from sourcespec import ssp_wave_arrival
ssp_wave_arrival.add_arrival_to_trace.pick_cache = dict()
ssp_wave_arrival.add_arrival_to_trace.travel_time_cache = dict()
ssp_wave_arrival.add_arrival_to_trace.angle_cache = dict()
from sourcespec import ssp_plot_traces
ssp_plot_traces.SAVED_FIGURE_CODES = []
ssp_plot_traces.BBOX = None
from sourcespec import ssp_plot_spectra
ssp_plot_spectra.SAVED_FIGURE_CODES = []
ssp_plot_spectra.BBOX = None

I haven't checked yet if this still works in v2, but I guess you know this better than me.

krisvanneste avatar Jan 09 '25 11:01 krisvanneste

Ok, thanks for the feedback!

Whenever you have time, that would be great if you can contribute this ssp_clean_state() function 😉

claudiodsf avatar Jan 09 '25 13:01 claudiodsf

Claudio,

What are the further steps needed to finalize a first v2 version?

Some possible things that come to mind:

  • do we need a function or method to populate Options with all possible keys (but set to None)?
  • do we want to allow for interactive plotting or just stick with writing to an output folder?
  • I think the logging doesn't fully work in interactive mode currently: a number of messages seem to be missing, and when I run the code a second time there are even less messages
  • ...

krisvanneste avatar Oct 16 '25 10:10 krisvanneste

Thanks for the feedback

  • do we need a function or method to populate Options with all possible keys (but set to None)?

This is interesting and not too difficult to tackle.

Currently, the following code:

from sourcespec2.setup import config

Gives a config object has all the parameters set to default, but config.options is empty.

Can you make a PR with that?

  • do we want to allow for interactive plotting or just stick with writing to an output folder?

That would be wonderful. SourceSpec has some sort of interactive plotting (config.plot_show = True). But I never tested it in a Jupyter notebook. We should maybe start from here.

  • I think the logging doesn't fully work in interactive mode currently: a number of messages seem to be missing, and when I run the code a second time there are even less messages

I can tackle this later on, when we will finish restructuring the module structure.

  • ...

Todo:

  • [ ] Rebase on current v1 !
  • [ ] Create the processing subdir and move processing modules in there
  • [ ] Move the other modules in their subdirs (e.g., inversion, postprocessing, output, plotting)
  • [ ] Select the examples to show in the paper
  • [ ] Prepare a first quick draft for the paper!

claudiodsf avatar Oct 16 '25 14:10 claudiodsf

Can you make a PR with that?

OK, I will look into that. I will also check what happens if I set config.plot_show = True in a notebook.

krisvanneste avatar Oct 16 '25 15:10 krisvanneste

That would be wonderful. SourceSpec has some sort of interactive plotting (config.plot_show = True). But I never tested it in a Jupyter notebook. We should maybe start from here.

I tried plotting interactively, but this results in the following error:

E:\Home\_kris\Python\cloned_repos\sourcespec2\sourcespec2\ssp_plot_spectra.py in _make_fig(plot_params)
    140     if not stack_plots:
    141         textstr += (
--> 142             f'- {config.end_of_run.strftime("%Y-%m-%d %H:%M:%S")} '
    143             f'{config.end_of_run} '
    144         )

E:\Home\_kris\Python\cloned_repos\sourcespec2\sourcespec2\setup\config.py in __getattr__(self, key)
    196             return self.__getitem__(key)
    197         except KeyError as err:
--> 198             raise AttributeError(err) from err
    199 
    200     __setattr__ = __setitem__

AttributeError: 'end_of_run'

Apparently, end_of_run and end_of_run_tz are added to the config object in the ssp_output.write_output function. Also, additional information is appended to sspec_output.run_info in this function. I think it would be better 1) to add the end of run information to run_info as well, and 2) add all the necessary information to run_info when the inversion is completed, before ssp_output is called.

krisvanneste avatar Oct 18 '25 10:10 krisvanneste

add all the necessary information to run_info when the inversion is completed, before ssp_output is called.

Maybe in a dedicated function that is called at the end of ssp_run

krisvanneste avatar Oct 18 '25 10:10 krisvanneste

add all the necessary information to run_info when the inversion is completed, before ssp_output is called.

Maybe in a dedicated function that is called at the end of ssp_run

Do you want me to investigate this?

krisvanneste avatar Oct 21 '25 09:10 krisvanneste

add all the necessary information to run_info when the inversion is completed, before ssp_output is called.

Maybe in a dedicated function that is called at the end of ssp_run

Do you want me to investigate this?

yes please!

claudiodsf avatar Oct 21 '25 09:10 claudiodsf

OK, I will try to come up with a solution.

krisvanneste avatar Oct 21 '25 09:10 krisvanneste