[Bug]: "basename" error when image set = 'core' + additional sets

Open llwiggins opened this issue 6 months ago • 1 comments

Checklist

[x] Find the offending file in the output. If processing halts, re-run analysis with topostats --core 1 process.
[x] Describe the bug.
[x] Include the configuration file.
[x] Copy of the log-file from running with topostats --log-level debug <command>.
[x] The exact command that failed. This is what you typed at the command line, including any options.
[x] TopoStats version, this is reported by topostats --version
[x] Operating System and Python Version

Describe the bug

We have identified a bug that occurs when image_set in the config file is set to 'core' as well as other processing steps such as 'disordered_tracing'. E.g.

image_set: # Options : all, core, filters, grains, grain_crops, disordered_tracing, nodestats, ordered_tracing, splining. Uncomment to include
    # - all
    - core
    # - filters
    # - grains
    # - grain_crops
    - disordered_tracing
    # - nodestats
    # - ordered_tracing
    # - splining

This results in a key error: basename when TopoStats attempts to produce the output csv files. It would be good to put a safeguard in place to ensure that users cannot use other image_set options at the same time as core, or alternatively to enable core to be used in combination with other image_set options.

Copy of the log-file from running with `topostats --log-level debug <command>`

Traceback (most recent call last):
  File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\Scripts\topostats.exe\__main__.py", line 7, in <module>
    sys.exit(entry_point())
  File "C:\TopoStats\topostats\entry_point.py", line 1284, in entry_point
    args.func(args)
  File "C:\TopoStats\topostats\run_modules.py", line 383, in process
    save_folder_grainstats(config["output_dir"], config["base_dir"], mols_results, "mol_stats")
  File "C:\TopoStats\topostats\io.py", line 465, in save_folder_grainstats
    dirs = set(all_stats_df["basename"].values)
  File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: 'basename'

Include the configuration file

base_dir: ./ # Directory in which to search for data files
output_dir: MC4Rx_Processed./output # Directory to output results to
log_level: info # Verbosity of output. Options: warning, error, info, debug
cores: 2 # Number of CPU cores to utilise for processing multiple files simultaneously.
file_ext: .spm # File extension of the data files.
loading:
  channel: Height Sensor # Channel to pull data from in the data files.
  extract: raw # Array to extract when loading .topostats files.
filter:
  run: true # Options : true, false
  row_alignment_quantile: 0.5 # lower values may improve flattening of larger features
  threshold_method: std_dev # Options : otsu, std_dev, absolute
  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: 10.0 # Threshold for data below the image background
    above: 1.0 # Threshold for data above the image background
  threshold_absolute:
    below: -1.0 # Threshold for data below the image background
    above: 1.0 # Threshold for data above the image background
  gaussian_size: 1.0121397464510862 # Gaussian blur intensity in px
  gaussian_mode: nearest # Mode for Gaussian blurring. Options : nearest, reflect, constant, mirror, wrap
  # Scar remvoal parameters. Be careful with editing these as making the algorithm too sensitive may
  # result in ruining legitimate data.
  remove_scars:
    run: false
    removal_iterations: 2 # Number of times to run scar removal.
    threshold_low: 0.250 # lower values make scar removal more sensitive
    threshold_high: 0.666 # lower values make scar removal more sensitive
    max_scar_width: 4 # Maximum thickness of scars in pixels.
    min_scar_length: 16 # Minimum length of scars in pixels.
grains:
  run: true # Options : true, false
  # Thresholding by height
  grain_crop_padding: 1 # Padding to apply to grains. Needs to be at least 1, more padding may help with unets.
  threshold_method: std_dev # Options : std_dev, otsu, absolute, unet
  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: [5.0] # Thresholds for grains below the image background. List[float].
    above: [1.0] # Thresholds for grains above the image background. List[float].
  threshold_absolute:
    below: [-1.0] # Thresholds for grains below the image background. List[float].
    above: [1.0] # Thresholds for grains above the image background. List[float].
  direction: above # Options: above, below, both (defines whether to look for grains above or below thresholds or both)
  area_thresholds:
    above: [300, 3000] # above surface [Low, High] in nm^2 (also takes null)
    below: [null, null] # below surface [Low, High] in nm^2 (also takes null)
  remove_edge_intersecting_grains: true # Whether or not to remove grains that touch the image border
  unet_config:
    model_path: null # Path to a trained U-Net model
    upper_norm_bound: 5.0 # Upper bound for normalisation of input data. This should be slightly higher than the maximum desired / expected height of grains.
    lower_norm_bound: -1.0 # Lower bound for normalisation of input data. This should be slightly lower than the minimum desired / expected height of the background.
    remove_disconnected_grains: false # Whether to remove grains in the crop that don't touch the original grain mask.
    confidence: 0.5 # Confidence threshold for the UNet model. Smaller is more generous, larger is more strict.
  vetting:
    whole_grain_size_thresholds: null # Size thresholds for whole grains in nanometres squared, ie all classes combined. Tuple of 2 floats, ie [low, high] eg [100, 1000] for grains to be between 100 and 1000 nm^2. Can use None to not set an upper/lower bound.
    class_conversion_size_thresholds: null # Class conversion size thresholds, list of tuples of 3 integers and 2 integers, ie list[tuple[tuple[int, int, int], tuple[int, int]]] eg [[[1, 2, 3], [5, 10]]] for each region of class 1 to convert to 2 if smaller than 5 nm^2 and to class 3 if larger than 10 nm^2.
    class_region_number_thresholds: null # Class region number thresholds, list of lists, ie [[class, low, high],] eg [[1, 2, 4], [2, 1, 1]] for class 1 to have 2-4 regions and class 2 to have 1 region. Can use None to not set an upper/lower bound.
    class_size_thresholds: null # Class size thresholds (nm^2), list of tuples of 3 integers, ie [[class, low, high],] eg [[1, 100, 1000], [2, 1000, None]] for class 1 to have 100-1000 nm^2 and class 2 to have 1000-any nm^2. Can use None to not set an upper/lower bound.
    nearby_conversion_classes_to_convert: null # Class conversion for nearby regions, list of tuples of two-integer tuples, eg [[[1, 2], [3, 4]]] to convert class 1 to 2 and 3 to 4 for small touching regions
    class_touching_threshold: 5 # Number of dilation steps to use for detecting touching regions
    keep_largest_labelled_regions_classes: null # Classes to keep the only largest regions for, list of integers eg [1, 2] to keep only the largest regions of class 1 and 2
    class_connection_point_thresholds: null # Class connection point thresholds, [[[class_1, class_2], [min, max]]] eg [[[1, 2], [1, 1]]] for class 1 to have 1 connection point with class 2
  classes_to_merge: null # Classes to merge into a single combined class. List of lists, eg [[1, 2]] to merge classes 1 and 2. New classes will be appended to the tensor.
grainstats:
  run: true # Options : true, false
  edge_detection_method: binary_erosion # Options: canny, binary erosion. Do not change this unless you are sure of what this will do.
  extract_height_profile: true # Extract height profiles along maximum feret of molecules
  class_names: ["DNA", "Protein"] # The names corresponding to each class of a object identified, please specify merged classes after.
disordered_tracing:
  run: true # Options : true, false
  class_index: 1 # The class index to trace. This is the class index of the grains.
  min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.
  mask_smoothing_params:
    gaussian_sigma: 2 # Gaussian smoothing parameter 'sigma' in pixels.
    dilation_iterations: 2 # Number of dilation iterations to use for grain smoothing.
    holearea_min_max: [0, null] # Range (min, max) of a hole area in nm to refill in the smoothed masks.
  skeletonisation_params:
    method: topostats # Options : zhang | lee | thin | topostats
    height_bias: 0.6 # Percentage of lowest pixels to remove each skeletonisation iteration. 1 equates to zhang.
  pruning_params:
    method: topostats # Method to clean branches of the skeleton. Options : topostats
    max_length: 10.0 # Maximum length in nm to remove a branch containing an endpoint.
    height_threshold: # The height to remove branches below.
    method_values: mid # The method to obtain a branch's height for pruning. Options : min | median | mid.
    method_outlier: mean_abs # The method to prune branches based on height. Options : abs | mean_abs | iqr.
    only_height_prune_endpoints: False # Whether to restrict height-based pruning to skeleton segments containing an endpoint or not.
nodestats:
  run: true # Options : true, false
  node_joining_length: 7.0 # The distance in nanometres over which to join nearby crossing points.
  node_extend_dist: 14.0 # The distance in nanometres over which to join nearby odd-branched nodes.
  branch_pairing_length: 20.0 # The length in nanometres from the crossing point to pair and trace, obtaining FWHM's.
  pair_odd_branches: false # Whether to try and pair odd-branched nodes. Options: true and false.
ordered_tracing:
  run: true
  ordering_method: nodestats # The method of ordering the disordered traces.
splining:
  run: true # Options : true, false
  method: "rolling_window" # Options : "spline", "rolling_window"
  rolling_window_size: 20.0e-9 # size in nm of the rolling window.
  rolling_window_resampling: true # Whether to resample the trace or not.
  rolling_window_resample_regular_spatial_interval: 0.5e-9 # The spatial interval to resample the trace to in nm.
  spline_step_size: 7.0e-9 # The sampling rate of the spline in metres.
  spline_linear_smoothing: 5.0 # The amount of smoothing to apply to linear features.
  spline_circular_smoothing: 5.0 # The amount of smoothing to apply to circular features.
  spline_degree: 3 # The polynomial degree of the spline.
curvature:
  run: False # Options : true, false
  colourmap_normalisation_bounds: [-0.5, 0.5] # Radians per nm to normalise the colourmap to.
plotting:
  run: true # Options : true, false
  style: topostats.mplstyle # Options : topostats.mplstyle or path to a matplotlibrc params file
  savefig_format: null # Options : null, png, svg or pdf. tif is also available although no metadata will be saved. (defaults to png) See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
  savefig_dpi: 100 # Options : null (defaults to the value in topostats/plotting_dictionary.yaml), see https://afm-spm.github.io/TopoStats/main/configuration.html#further-customisation and https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
  pixel_interpolation: null # Options : https://matplotlib.org/stable/gallery/images_contours_and_fields/interpolation_methods.html
  grain_crop_plot_size_nm: -1 # Size in nm of the square cropped grain images if using the grains image set. If -1, will use the grain's default bounding box size.
  image_set: # Options : all, core, filters, grains, grain_crops, disordered_tracing, nodestats, ordered_tracing, splining. Uncomment to include
    # - all
    - core
    # - filters
    # - grains
    # - grain_crops
    - disordered_tracing
    # - nodestats
    - ordered_tracing
    # - splining
  zrange: [-2, 3] # low and high height range for core images (can take [null, null]). low <= high
  colorbar: true # Options : true, false
  axes: false # Options : true, false (due to off being a bool when parsed)
  num_ticks: [null, null] # Number of ticks to have along the x and y axes. Options : null (auto) or integer > 1
  cmap: null # Colormap/colourmap to use (default is 'nanoscope' which is used if null, other options are 'afmhot', 'viridis' etc.)
  mask_cmap: blue_purple_green # Options : blu, jet_r and any in matplotlib
  histogram_log_axis: false # Options : true, false
summary_stats:
  run: false # Whether to make summary plots for output data
  config: null

To Reproduce

No response

TopoStats Version

Git main branch

Python Version

3.1?

Operating System

Windows

Python Packages

No response

Aug 13 '25 15:08 llwiggins

We do have some checking in place that the run options are rational. Its the topostats.processing.check_run_steps() function but it only looks at the different *_run options.

I think this will be addressed when #1200 is under-taken so I've made this a sub-issue of that.

Aug 13 '25 15:08 ns-rse

[Bug]: "basename" error when image set = 'core' + additional sets

Checklist

Describe the bug

Copy of the log-file from running with topostats --log-level debug <command>

Include the configuration file

To Reproduce

TopoStats Version

Python Version

Operating System

Python Packages

Copy of the log-file from running with `topostats --log-level debug <command>`