pycbc icon indicating copy to clipboard operation
pycbc copied to clipboard

Completing pygrb xml to hdf5 transition

Open pannarale opened this issue 2 years ago • 11 comments

Comparing the latest results webpage to the most complete one generated when using old xml results files, the items missing are

  • [ ] tables of missed and quiet injections (A31 and A32)
  • [ ] follow ups of the 10 loudest quiet injections (sections 3.02 and 3.04, these are done for each injection set)
  • [ ] all section 4, i.e., loudest offsource events distributions, table and follow ups of the 10 loudest
  • [ ] all of section 5 (exclusion distances)

With the exception of the followups which we will handle later, these correspond to 3 scripts that require updating:

  • [x] pycbc_pygrb_plot_stats_distribution (assigned to @ETVincent)
  • [ ] pycbc_pygrb_page_tables (assigned to @MarcoCusinato)
  • [x] pycbc_pygrb_efficiency (assigned to @jakeb245)

At the moment these still assume the input files will be xml files, so it’s a matter of getting them to read the hdf5 files we have now and then adjusting the scripts so they navigate those files accordingly to produce the plots/tables.

pannarale avatar Jun 29 '23 11:06 pannarale

Old command lines are available under the plots and tables in the webpage linked in the issue description. Take one of those, edit it to point to hdf5 files and then start upgrading the scripts.

Make sure you run in an environment up to date with gwastro/pycbc/master as it is today.

A possible bump in the road is that so far with the new code we have run short, small tests, so I don’t think we have results files with enough background nor files with missed injections. This means your first plots and tables should be empty! However, we do want the codes to be able to produce empty output if the input is not interesting: after we get there, we can run a longer test run and make sure we have meaningful information to display.

You can find input files on CIT. Ask me if you need detailed paths.

pannarale avatar Jun 29 '23 11:06 pannarale

@MarcoCusinato is an assignee too, but right now the search in the assignees box is not picking up his user name.

pannarale avatar Jun 29 '23 11:06 pannarale

@MarcoCusinato , @ETVincent , see https://github.com/gwastro/pycbc/pull/4427: this is relevant for you as well.

pannarale avatar Jul 20 '23 11:07 pannarale

A practical way to approach this is to search for glue.ligolw imports (these appear in the three executables mentioned above and in pycbc/results/pygrb_postprocessing_utils.py), remove them, and replace anything that relies on them.

pannarale avatar Jul 27 '23 13:07 pannarale

I have gotten pycbc_pygrb_efficiency to run using HDF trigger files (examples here). The full changes live on this branch. Unfortunately, it's currently an ugly mess of various branches. Some of these branches have PRs and some don't.

My intent is to get the currently open PRs merged in (those being #4427 and #4443), and then open one for the rest of the changes.

I had to update two functions (ppu.load_time_slides and ppu.load_segment_dict) that are common to all three of the executables listed above, so hopefully that helps move things along.

Getting these executables running in the workflow will be another task!

jakeb245 avatar Sep 07 '23 20:09 jakeb245

Putting this here to keep the conversation going. I have a branch here with pycbc_pygrb_plot_stats_distribution that I believe does what we need for what I was assigned. It is based in Jacob's updates via the above mentioned PRs. My only comment is that I have not tested it on veto files.

ETVincent avatar Oct 04 '23 17:10 ETVincent

Putting this here to keep the conversation going. I have a branch here with pycbc_pygrb_plot_stats_distribution that I believe does what we need for what I was assigned. It is based in Jacob's updates via the above mentioned PRs. My only comment is that I have not tested it on veto files.

After the call today, @pannarale mentioned that I needed a clean branch so that I would be able to pull. I have the branch stats_dist_clean that updates only pycbc_pygrb_plot_stats_distribution. Note that it still requires Jacob's updates in the above PRs to function properly.

ETVincent avatar Oct 12 '23 13:10 ETVincent

Thanks, @ETVincent. Could you open a PR, please?

pannarale avatar Oct 19 '23 12:10 pannarale

Yes of course, #4538

ETVincent avatar Oct 19 '23 12:10 ETVincent

For the record, ongoing development for page_tables is happening on https://github.com/MarcoCusinato/pycbc/tree/page_tables

pannarale avatar Nov 23 '23 13:11 pannarale

@MarcoCusinato, @jakeb245's relevant PRs are through: can you set up your PR for pycbc_pygrb_page_tables please?

pannarale avatar Feb 15 '24 20:02 pannarale

https://github.com/gwastro/pycbc/pull/4649 completes the xml to hdf5 switch for pycbc_pygrb_page_tables

pannarale avatar Apr 23 '24 16:04 pannarale

I am closing this issue. The remaining tasks pertain to the webpage in general, rather than the xml to hdf5 transition. They have been copied over to issue #3660.

pannarale avatar Apr 23 '24 20:04 pannarale