API

DeepFinder

Each step of the DeepFinder workflow is coded as a class. The parameters of each method are stored as class attributes and are given default values in the constructor. These parameters can easily be given custom values as follows:

from deepfinder.training import Train
trainer = Train(Ncl=5, dim_in=56) # initialize training task, where default batch_size=25
trainer.batch_size = 16 # customize batch_size value

Each class has a main method called ‘launch’ to execute the procedure. These classes all inherit from a mother class ‘DeepFinder’ that possesses features useful for communicating with the GUI.

Training

class deepfinder.training.TargetBuilder
generate_with_shapes(objl, target_array, ref_list)

Generates segmentation targets from object list. Here macromolecules are annotated with their shape.

Parameters:
  • objl (list of dictionaries) – Needs to contain [phi,psi,the] Euler angles for orienting the shapes.
  • target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]
  • ref_list (list of 3D numpy arrays) – These reference arrays are expected to be cubic and to contain the shape of macromolecules (‘1’ for ‘is object’ and ‘0’ for ‘is not object’) The references order in list should correspond to the class label. For ex: 1st element of list -> reference of class 1; 2nd element of list -> reference of class 2 etc.
Returns:

Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.

Return type:

3D numpy array

generate_with_spheres(objl, target_array, radius_list)

Generates segmentation targets from object list. Here macromolecules are annotated with spheres. This method does not require knowledge of the macromolecule shape nor Euler angles in the objl. On the other hand, it can be that a network trained with ‘sphere targets’ is less accurate than with ‘shape targets’.

Parameters:
  • objl (list of dictionaries) –
  • target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]
  • radius_list (list of int) – contains sphere radii per class (in voxels). The radii order in list should correspond to the class label. For ex: 1st element of list -> sphere radius for class 1, 2nd element of list -> sphere radius for class 2 etc.
Returns:

Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.

Return type:

3D numpy array

class deepfinder.training.Train(Ncl, dim_in)
launch(path_data, path_target, objlist_train, objlist_valid)

This function launches the training procedure. For each epoch, an image is plotted, displaying the progression with different metrics: loss, accuracy, f1-score, recall, precision. Every 10 epochs, the current network weights are saved.

Parameters:
  • path_data (list of string) – contains paths to data files (i.e. tomograms)
  • path_target (list of string) – contains paths to target files (i.e. annotated volumes)
  • objlist_train (list of dictionaries) – contains information about annotated objects (e.g. class, position) In particular, the tomo_idx should correspond to the index of ‘path_data’ and ‘path_target’. See utils/objl.py for more info about object lists. During training, these coordinates are used for guiding the patch sampling procedure.
  • objlist_valid (list of dictionaries) – same as ‘objlist_train’, but objects contained in this list are not used for training, but for validation. It allows to monitor the training and check for over/under-fitting. Ideally, the validation objects should originate from different tomograms than training objects.

Note

The function saves following files at regular intervals:

net_weights_epoch*.h5: contains current network weights

net_train_history.h5: contains arrays with all metrics per training iteration

net_train_history_plot.png: plotted metric curves

Inference

class deepfinder.inference.Segment(Ncl, path_weights, patch_size=192)
launch(dataArray)

This function enables to segment a tomogram. As tomograms are too large to be processed in one take, the tomogram is decomposed in smaller overlapping 3D patches.

Parameters:
  • dataArray (3D numpy array) – the volume to be segmented
  • weights_path (str) – path to the .h5 file containing the network weights obtained by the training procedure
Returns:

contains predicted score maps. Array with index order [class,z,y,x]

Return type:

numpy array

class deepfinder.inference.Cluster(clustRadius)
launch(labelmap)

This function analyzes the segmented tomograms (i.e. labelmap), identifies individual macromolecules and outputs their coordinates. This is achieved with a clustering algorithm (meanshift).

Parameters:
  • labelmap (3D numpy array) – segmented tomogram
  • clustRadius (int) – parameter for clustering algorithm. Corresponds to average object radius (in voxels)
Returns:

the object list with coordinates and class labels of identified macromolecules

Return type:

list of dict

Utilities

Common utils

deepfinder.utils.common.bin_array(array)

Subsamples a 3D array by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.

Parameters:array (numpy array) –
Returns:binned array
Return type:numpy array
deepfinder.utils.common.plot_volume_orthoslices(vol, filename)

Writes an image file containing ortho-slices of the input volume. Generates same visualization as matlab function ‘tom_volxyz’ from TOM toolbox. If volume type is int8, the function assumes that the volume is a labelmap, and hence plots in color scale. Else, it assumes that the volume is tomographic data, and plots in gray scale.

Parameters:
  • vol (3D numpy array) –
  • filename (str) – ‘/path/to/file.png’
deepfinder.utils.common.read_array(filename, dset_name='dataset')

Reads arrays. Handles .h5 and .mrc files, according to what extension the file has.

Parameters:
  • filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’
  • dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc
Returns:

numpy array

deepfinder.utils.common.write_array(array, filename, dset_name='dataset')

Writes array. Can write .h5 and .mrc files, according to the extension specified in filename.

Parameters:
  • array (numpy array) –
  • filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’
  • dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc

Object list utils

deepfinder.utils.objl.above_thr(objlIN, thr)
Parameters:
  • objl (list of dict) –
  • thr (float) – threshold
Returns:

contains only objects with cluster size >= thr

Return type:

list of dict

deepfinder.utils.objl.disp(objlIN)

Prints objl in terminal

deepfinder.utils.objl.get_class(objlIN, label)

Get all objects of specified class.

Parameters:
  • objl (list of dict) –
  • label (int) –
Returns:

contains only objects from class ‘label’

Return type:

list of dict

deepfinder.utils.objl.get_labels(objlIN)

Returns a list with different (unique) labels contained in input objl

deepfinder.utils.objl.get_obj(objl, obj_id)

Get objects with specified object ID.

Parameters:
  • objl (list of dict) – input object list
  • obj_id (list of int) – object ID of wanted object(s)
Returns:

contains object(s) with obj ID ‘obj_id’

Return type:

list of dict

deepfinder.utils.objl.get_tomo(objlIN, tomo_idx)

Get all objects originating from tomo ‘tomo_idx’.

Parameters:
  • objlIN (list of dict) – contains objects from various tomograms
  • tomo_idx (int) – tomogram index
Returns:

contains objects from tomogram ‘tomo_idx’

Return type:

list of dict

deepfinder.utils.objl.read(filename)

Reads object list. Handles .xml and .xlsx files, according to what extension the file has.

Parameters:filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’
Returns:list of dict
deepfinder.utils.objl.remove_class(objl, label_list)

Removes all objects from specified classes.

Parameters:
  • objl (list of dict) – input object list
  • label_list (list of int) – label of objects to remove
Returns:

same as input object list but with objects from classes ‘label_list’ removed

Return type:

list of dict

deepfinder.utils.objl.remove_obj(objl, obj_id)

Removes objects by object ID.

Parameters:
  • objl (list of dict) – input object list
  • obj_id (list of int) – object ID of wanted object(s)
Returns:

same as input object list but with object(s) ‘obj_id’ removed

Return type:

list of dict

deepfinder.utils.objl.scale_coord(objlIN, scale)

Scales coordinates by specified factor. Useful when using binned (sub-sampled) volumes, where coordinates need to be multiplied or divided by 2.

Parameters:
  • objlIN (list of dict) –
  • scale (float, int or tuple) – if float or int, same scale is applied to all dim
Returns:

object list with scaled coordinates

Return type:

list of dict

deepfinder.utils.objl.write(objl, filename)

Writes object list. Can write .xml and .xlsx files, according to the extension specified in filename.

Parameters:
  • objl (list of dict) –
  • filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’

Scoremap utils

deepfinder.utils.smap.bin(scoremaps)

Subsamples the scoremaps by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.

Parameters:scoremaps (4D numpy array) – array with index order [class,z,y,x]
Returns:4D numpy array
deepfinder.utils.smap.read_h5(filename)

Reads scormaps stored in .h5 file.

Parameters:filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)
Returns:scoremaps array with index order [class,z,y,x]
Return type:4D numpy array
deepfinder.utils.smap.to_labelmap(scoremaps)

Converts scoremaps into a labelmap.

Parameters:scoremaps (4D numpy array) – array with index order [class,z,y,x]
Returns:array with index order [z,y,x]
Return type:3D numpy array
deepfinder.utils.smap.write_h5(scoremaps, filename)

Writes scoremaps in .h5 file

Parameters:
  • scoremaps (4D numpy array) – array with index order [class,z,y,x]
  • filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)