Best practice for book keeping?
Once again let me say that i am blown away by the power of this wonderful tool. So much fun.
I am currently working to build an image calibration pipeline, centred on astrophot modelling/photometry for stars. There are a few steps to the process: first i fit many stars separately and independent to assess their suitability for PSF modelling; then i fit an ensemble of 'good' stars with a single model to determine the PSF and to get the positions and fluxes to be used for astrometric and photometric calibration. Like i say: fun! : )
The hardest part of this with astrophot is actually tracking and extracting the results of the fits — which says a lot about how good astrophot is at what it is supposed to do! But still, it would be good to have some documentation or guidance and possibly even some extra functionality to help with record keeping.
I'm not sure that what i'm doing is particularly good or clever, but just to show the approach that i have gravitated towards:
In the first phase when i fit many stars separately and independently, here is what i do:
list_of_row_data = []
for star in stars :
# relevant information for this source
objid, xcen, ycen = star['objid'], star['xcen'], star['ycen']
# create the model for this source
star_model = astrophot.models.AstroPhot_Model(
name=str(star['pts_key']), target=target_image, pixelscale=1,
model_type='point model', parameters={"center":[xcen, ycen], "flux":1.7},
psf=psf_model, psf_mode='full' )
# do the fit
star_result = astrophot.fit.LM( star_model ).fit()
star_result.update_uncertainty()
# now start extracting values
star_values = star_model.parameters.vector_values().detach().cpu().numpy()
star_covar = star_result.covariance_matrix.detach().cpu().numpy()
star_chi2min = star_result.chi2min()
# assemble these into what will become a table row - this is brittle
row_data = np.hstack(( star['pts_key'], # identifier
star_values, # fit parameters (n, Rd, x, y, flux)
np.sqrt(np.diag(star_covar)), # fit uncertainties
star_chi2min # min chi2
))
# append this row array to a list
list_of_row_data.append( row_data )
# define the column names corresponding to the row data - this is brittle
colnames = ( 'pts_key star_shape star_size star_xcenter star_ycenter star_logflux'
' star_shape_unc star_size_unc star_xcenter_unc star_ycenter_unc star_logflux_unc'
' star_chi2_min'
).split()
# then construct the final output as an astropy Table
data = Table( { col : vals for (col, vals)
in zip(colnames, np.vstack(list_of_row_data).T) } )
In the later phase, when i have the positions and fluxes of many stars fit with the same PSF, i have a different but equally awkward pattern:
model = astrophot.models.AstroPhot_Model(
name='psf star models', target=target_image,
model_type='group model', psf_mode='full',
models=good_star_models, # this is a list of point source models
)
model.initialize()
result = astrophot.fit.LM( model, verbose=1 ).fit()
result.update_uncertainty()
# now begins the process of extracting results ... much fiddlier!
# start by defining the column names
colnames = 'pts_key psf_xcenter psf_ycenter psf_xcenter_unc psf_ycenter_unc psf_logflux psf_logflux_unc'.split()
# create an empty dictionary to hold column data as a list of values
data_dict = { col : [] for col in colnames }
# step through each model in the group of models
for node in model.parameters.nodes.values() :
# a sky component behaves differently and shouldn't be reported
if 'sky' in node.name :
continue
# now extract and store each quantity carefully.
data_dict[ 'objid' ].append( int(node.name) )
xcen, ycen = node['center'].value.detach().cpu().numpy()
dx, dy = node['center'].uncertainty.detach().cpu().numpy()
data_dict[ 'psf_xcenter' ].append( xcen )
data_dict[ 'psf_ycenter' ].append( ycen )
data_dict[ 'psf_xcenter_unc' ].append( dx )
data_dict[ 'psf_ycenter_unc' ].append( dy )
flux = node['flux'].value.detach().cpu().numpy()
dflux = node['flux'].uncertainty.detach().cpu().numpy()
data_dict[ 'psf_logflux' ].append( flux )
data_dict[ 'psf_logflux_unc' ].append( dflux )
# finally use this dictionary of lists to construct an astropy table of results
psf_data = Table( data_dict )
I recognise that the pattern one should use is specific to use case, and that the point of astrophot is that it is structured to be applicable to many and varied situations. So actually conceiving of a unified scheme for 'just works' dumping of data into a table is probably a fool's errand. But it might make sense to have some facility to do some common/likely use cases like the ones above?
But regardless, it would be very helpful to have a tutorial notebook that includes some examples and/or guidance for best practice ways of constructing catalogues/tables of results from astrophot.
Hi @entaylor , indeed there is a lot of bookkeeping to do! It's so cool to see built out pipelines with the code :) I think there are a bunch of tricks that I could write out in a tutorial for sure. For example for any model you can do model.parameter_order to grab a tuple with all the parameter names. Or even better, you can do model.parameters.vector_names() to get a list of strings with the same length as model.parameters.vector_values() which has all of the fitted values.
I had attempted to make a human readable automatic output at one point, this is the model.save() yaml file. Ultimately, because there can be a lot of complexity in how models share parameters, and how one can make arbitrarily nested group model, I couldn't get a structure that was a really clean table like what I think you want.
Perhaps we can use this issue to discuss a format that AstroPhot could automatically output which would be useful for pipelines like what you are constructing. I have a few ideas for what could be done, which have advantages and drawbacks.
- Users could just do
model.parameters.vector_names()andmodel.parameters.vector_values()to get a matched list of parameter names and values. But if you are fitting multiple of the same model, then the parameter names will be duplicated. So maybe I could add a function likemodel.parameters.vector_model_names()so you can tell which model(s) that parameter points to. - I could make something like
model.save_csv()which dumps the data into a csv where each row is a model, and the columns are the model name, parameter value, parameter uncertainty. A challenge here is for parameters likecenterwhich have multiple values, but I could just do like I do forvector_namesand convert the array intocenter:0andcenter:1. Then it would be a trivial matter for a user to rename the columnsxandyorRAandDECor anything else depending on their circumstance. Another challenge here is for different model types, if you have a bunch of stars and a sky model, then how does the csv handle that? I suspect I would just group models of the same type together and then have to make a new header row when I get to a different model type. - I could make a tutorial which builds a csv using for loops etc. like the example you provided. Then users can grab that code and modify it how they like.
I'm sure there are other options too. Let me know what you think!
Hi @entaylor , I am working on some example scripts to help users get started. One of the scripts takes an image and a segmentation map from the user and fits a single model type to everything in the image (plus a sky model, and a "primary object" which can have different model types). Check out the #209 PR, the file you'll want to look at is: docs/source/prebuilt/segmap_models_fit.py which has a pre-designed fitting routine. I have added at the end a few lines which print the parameters for all the segmap models to a .csv file. Perhaps this is a good way to answer your question? Since in certain circumstances we can make efficient files like csv which we show in this example, but in a fully general fit we wouldn't be able to.