[Bug]: "basename" error when image set = 'core' + additional sets
Checklist
- [x] Find the offending file in the output. If processing halts, re-run analysis with
topostats --core 1 process. - [x] Describe the bug.
- [x] Include the configuration file.
- [x] Copy of the log-file from running with
topostats --log-level debug <command>. - [x] The exact command that failed. This is what you typed at the command line, including any options.
- [x] TopoStats version, this is reported by
topostats --version - [x] Operating System and Python Version
Describe the bug
We have identified a bug that occurs when image_set in the config file is set to 'core' as well as other processing steps such as 'disordered_tracing'. E.g.
image_set: # Options : all, core, filters, grains, grain_crops, disordered_tracing, nodestats, ordered_tracing, splining. Uncomment to include
# - all
- core
# - filters
# - grains
# - grain_crops
- disordered_tracing
# - nodestats
# - ordered_tracing
# - splining
This results in a key error: basename when TopoStats attempts to produce the output csv files. It would be good to put a safeguard in place to ensure that users cannot use other image_set options at the same time as core, or alternatively to enable core to be used in combination with other image_set options.
Copy of the log-file from running with topostats --log-level debug <command>
Traceback (most recent call last):
File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\Scripts\topostats.exe\__main__.py", line 7, in <module>
sys.exit(entry_point())
File "C:\TopoStats\topostats\entry_point.py", line 1284, in entry_point
args.func(args)
File "C:\TopoStats\topostats\run_modules.py", line 383, in process
save_folder_grainstats(config["output_dir"], config["base_dir"], mols_results, "mol_stats")
File "C:\TopoStats\topostats\io.py", line 465, in save_folder_grainstats
dirs = set(all_stats_df["basename"].values)
File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\mt1hr\AppData\Local\miniconda3\envs\topostats\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'basename'
Include the configuration file
base_dir: ./ # Directory in which to search for data files
output_dir: MC4Rx_Processed./output # Directory to output results to
log_level: info # Verbosity of output. Options: warning, error, info, debug
cores: 2 # Number of CPU cores to utilise for processing multiple files simultaneously.
file_ext: .spm # File extension of the data files.
loading:
channel: Height Sensor # Channel to pull data from in the data files.
extract: raw # Array to extract when loading .topostats files.
filter:
run: true # Options : true, false
row_alignment_quantile: 0.5 # lower values may improve flattening of larger features
threshold_method: std_dev # Options : otsu, std_dev, absolute
otsu_threshold_multiplier: 1.0
threshold_std_dev:
below: 10.0 # Threshold for data below the image background
above: 1.0 # Threshold for data above the image background
threshold_absolute:
below: -1.0 # Threshold for data below the image background
above: 1.0 # Threshold for data above the image background
gaussian_size: 1.0121397464510862 # Gaussian blur intensity in px
gaussian_mode: nearest # Mode for Gaussian blurring. Options : nearest, reflect, constant, mirror, wrap
# Scar remvoal parameters. Be careful with editing these as making the algorithm too sensitive may
# result in ruining legitimate data.
remove_scars:
run: false
removal_iterations: 2 # Number of times to run scar removal.
threshold_low: 0.250 # lower values make scar removal more sensitive
threshold_high: 0.666 # lower values make scar removal more sensitive
max_scar_width: 4 # Maximum thickness of scars in pixels.
min_scar_length: 16 # Minimum length of scars in pixels.
grains:
run: true # Options : true, false
# Thresholding by height
grain_crop_padding: 1 # Padding to apply to grains. Needs to be at least 1, more padding may help with unets.
threshold_method: std_dev # Options : std_dev, otsu, absolute, unet
otsu_threshold_multiplier: 1.0
threshold_std_dev:
below: [5.0] # Thresholds for grains below the image background. List[float].
above: [1.0] # Thresholds for grains above the image background. List[float].
threshold_absolute:
below: [-1.0] # Thresholds for grains below the image background. List[float].
above: [1.0] # Thresholds for grains above the image background. List[float].
direction: above # Options: above, below, both (defines whether to look for grains above or below thresholds or both)
area_thresholds:
above: [300, 3000] # above surface [Low, High] in nm^2 (also takes null)
below: [null, null] # below surface [Low, High] in nm^2 (also takes null)
remove_edge_intersecting_grains: true # Whether or not to remove grains that touch the image border
unet_config:
model_path: null # Path to a trained U-Net model
upper_norm_bound: 5.0 # Upper bound for normalisation of input data. This should be slightly higher than the maximum desired / expected height of grains.
lower_norm_bound: -1.0 # Lower bound for normalisation of input data. This should be slightly lower than the minimum desired / expected height of the background.
remove_disconnected_grains: false # Whether to remove grains in the crop that don't touch the original grain mask.
confidence: 0.5 # Confidence threshold for the UNet model. Smaller is more generous, larger is more strict.
vetting:
whole_grain_size_thresholds: null # Size thresholds for whole grains in nanometres squared, ie all classes combined. Tuple of 2 floats, ie [low, high] eg [100, 1000] for grains to be between 100 and 1000 nm^2. Can use None to not set an upper/lower bound.
class_conversion_size_thresholds: null # Class conversion size thresholds, list of tuples of 3 integers and 2 integers, ie list[tuple[tuple[int, int, int], tuple[int, int]]] eg [[[1, 2, 3], [5, 10]]] for each region of class 1 to convert to 2 if smaller than 5 nm^2 and to class 3 if larger than 10 nm^2.
class_region_number_thresholds: null # Class region number thresholds, list of lists, ie [[class, low, high],] eg [[1, 2, 4], [2, 1, 1]] for class 1 to have 2-4 regions and class 2 to have 1 region. Can use None to not set an upper/lower bound.
class_size_thresholds: null # Class size thresholds (nm^2), list of tuples of 3 integers, ie [[class, low, high],] eg [[1, 100, 1000], [2, 1000, None]] for class 1 to have 100-1000 nm^2 and class 2 to have 1000-any nm^2. Can use None to not set an upper/lower bound.
nearby_conversion_classes_to_convert: null # Class conversion for nearby regions, list of tuples of two-integer tuples, eg [[[1, 2], [3, 4]]] to convert class 1 to 2 and 3 to 4 for small touching regions
class_touching_threshold: 5 # Number of dilation steps to use for detecting touching regions
keep_largest_labelled_regions_classes: null # Classes to keep the only largest regions for, list of integers eg [1, 2] to keep only the largest regions of class 1 and 2
class_connection_point_thresholds: null # Class connection point thresholds, [[[class_1, class_2], [min, max]]] eg [[[1, 2], [1, 1]]] for class 1 to have 1 connection point with class 2
classes_to_merge: null # Classes to merge into a single combined class. List of lists, eg [[1, 2]] to merge classes 1 and 2. New classes will be appended to the tensor.
grainstats:
run: true # Options : true, false
edge_detection_method: binary_erosion # Options: canny, binary erosion. Do not change this unless you are sure of what this will do.
extract_height_profile: true # Extract height profiles along maximum feret of molecules
class_names: ["DNA", "Protein"] # The names corresponding to each class of a object identified, please specify merged classes after.
disordered_tracing:
run: true # Options : true, false
class_index: 1 # The class index to trace. This is the class index of the grains.
min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.
mask_smoothing_params:
gaussian_sigma: 2 # Gaussian smoothing parameter 'sigma' in pixels.
dilation_iterations: 2 # Number of dilation iterations to use for grain smoothing.
holearea_min_max: [0, null] # Range (min, max) of a hole area in nm to refill in the smoothed masks.
skeletonisation_params:
method: topostats # Options : zhang | lee | thin | topostats
height_bias: 0.6 # Percentage of lowest pixels to remove each skeletonisation iteration. 1 equates to zhang.
pruning_params:
method: topostats # Method to clean branches of the skeleton. Options : topostats
max_length: 10.0 # Maximum length in nm to remove a branch containing an endpoint.
height_threshold: # The height to remove branches below.
method_values: mid # The method to obtain a branch's height for pruning. Options : min | median | mid.
method_outlier: mean_abs # The method to prune branches based on height. Options : abs | mean_abs | iqr.
only_height_prune_endpoints: False # Whether to restrict height-based pruning to skeleton segments containing an endpoint or not.
nodestats:
run: true # Options : true, false
node_joining_length: 7.0 # The distance in nanometres over which to join nearby crossing points.
node_extend_dist: 14.0 # The distance in nanometres over which to join nearby odd-branched nodes.
branch_pairing_length: 20.0 # The length in nanometres from the crossing point to pair and trace, obtaining FWHM's.
pair_odd_branches: false # Whether to try and pair odd-branched nodes. Options: true and false.
ordered_tracing:
run: true
ordering_method: nodestats # The method of ordering the disordered traces.
splining:
run: true # Options : true, false
method: "rolling_window" # Options : "spline", "rolling_window"
rolling_window_size: 20.0e-9 # size in nm of the rolling window.
rolling_window_resampling: true # Whether to resample the trace or not.
rolling_window_resample_regular_spatial_interval: 0.5e-9 # The spatial interval to resample the trace to in nm.
spline_step_size: 7.0e-9 # The sampling rate of the spline in metres.
spline_linear_smoothing: 5.0 # The amount of smoothing to apply to linear features.
spline_circular_smoothing: 5.0 # The amount of smoothing to apply to circular features.
spline_degree: 3 # The polynomial degree of the spline.
curvature:
run: False # Options : true, false
colourmap_normalisation_bounds: [-0.5, 0.5] # Radians per nm to normalise the colourmap to.
plotting:
run: true # Options : true, false
style: topostats.mplstyle # Options : topostats.mplstyle or path to a matplotlibrc params file
savefig_format: null # Options : null, png, svg or pdf. tif is also available although no metadata will be saved. (defaults to png) See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
savefig_dpi: 100 # Options : null (defaults to the value in topostats/plotting_dictionary.yaml), see https://afm-spm.github.io/TopoStats/main/configuration.html#further-customisation and https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
pixel_interpolation: null # Options : https://matplotlib.org/stable/gallery/images_contours_and_fields/interpolation_methods.html
grain_crop_plot_size_nm: -1 # Size in nm of the square cropped grain images if using the grains image set. If -1, will use the grain's default bounding box size.
image_set: # Options : all, core, filters, grains, grain_crops, disordered_tracing, nodestats, ordered_tracing, splining. Uncomment to include
# - all
- core
# - filters
# - grains
# - grain_crops
- disordered_tracing
# - nodestats
- ordered_tracing
# - splining
zrange: [-2, 3] # low and high height range for core images (can take [null, null]). low <= high
colorbar: true # Options : true, false
axes: false # Options : true, false (due to off being a bool when parsed)
num_ticks: [null, null] # Number of ticks to have along the x and y axes. Options : null (auto) or integer > 1
cmap: null # Colormap/colourmap to use (default is 'nanoscope' which is used if null, other options are 'afmhot', 'viridis' etc.)
mask_cmap: blue_purple_green # Options : blu, jet_r and any in matplotlib
histogram_log_axis: false # Options : true, false
summary_stats:
run: false # Whether to make summary plots for output data
config: null
To Reproduce
No response
TopoStats Version
Git main branch
Python Version
3.1?
Operating System
Windows
Python Packages
No response
We do have some checking in place that the run options are rational. Its the topostats.processing.check_run_steps() function but it only looks at the different *_run options.
I think this will be addressed when #1200 is under-taken so I've made this a sub-issue of that.