cd.data

This submodule contains data related code and numpy operations.

Datasets

BBBC039

class BBBC039Test(directory, download=False)

BBBC039 Test.

Test split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Train(directory, download=False)

BBBC039 Train.

Training split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Val(directory, download=False)

BBBC039 Validation.

Validation split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters
  • directory – Root directory.

  • download – Whether to download the dataset.

download_bbbc039(directory)

Download BBBC039.

Download and extract the BBBC039 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters

directory – Root directory.

BBBC041

class BBBC041Test(directory, download=False)

BBBC041 Test data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters
  • directory – Data directory.

  • download – Whether to download data.

class BBBC041Train(directory, download=False)

BBBC041 Train data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters
  • directory – Data directory.

  • download – Whether to download data.

download_bbbc041(directory, url='https://data.broadinstitute.org/bbbc/BBBC041/malaria.zip')

Download BBBC041.

Download and extract the BBBC041 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Synth

class SynthTest(directory, download=False, cache=True)

Synth Test data.

Parameters
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthTrain(directory, download=False, cache=True)

Synth Train data.

Parameters
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthVal(directory, download=False, cache=True)

Synth Val data.

Parameters
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

download_synth(directory, url='https://celldetection.org/data/synth.zip')

Download Synth.

Download and extract the Synth dataset to given directory.

Parameters
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Data Operations

channels_first2channels_last(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels first to channels last.

Parameters
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns

Transposed array.

channels_last2channels_first(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels last to channels first.

Parameters
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns

Transposed array.

ensure_tensor(x, device=None, dtype=torch.float32)

Ensure tensor.

Mapping ndarrays to Tensor. Possible shape mappings: - (h, w) -> (1, 1, h, w) - (h, w, c) -> (1, c, h, w) - (b, c, h, w) -> (b, c, h, w)

Parameters
  • x – Inputs.

  • device – Either Device or a Module or Tensor to retrieve the device from.

  • dtype – Data type.

Returns

Tensor

labels2crops(labels: ndarray, image: ndarray)

Labels to crops.

Crop all objects that are represented in labels from given image and return a list of all image crops, and a list of masks, each marking object pixels for respective crop.

Parameters
  • labels – Label image. Array[h, w(, c)].

  • image – Image. Array[h, w, …].

Returns

(crop_list, mask_list)

normalize_percentile(image, percentile=99.9, to_uint8=True)
padding_stack(*images, axis=0) ndarray

Padding stack.

Stack images along axis. If images have different shapes all images are padded to larges shape.

Parameters
  • *images – Images.

  • axis – Axis used for stacking.

Returns

Array

random_crop(*arrays, height, width=None)

Random crop.

Parameters
  • *arrays – Input arrays that are to be cropped. None values accepted. The shape of the first element is used as reference.

  • height – Output height.

  • width – Output width. Default is same as height.

Returns

Cropped array if arrays contains a single array; a list of cropped arrays otherwise

resample_contours(contours, num=None, close=True, epsilon=1e-06)

Resample contour.

Sample ´´num´´ equidistant points on each contour in contours.

Notes

  • Works for closed and open contours.

Parameters
  • contours – Contours to sample from. Array[…, num’, 2] or list of Arrays.

  • num – Number of points.

  • close – Set True if contours contains closed contours, with the end point not being equal to the start point. Set False otherwise.

  • epsilon – Epsilon.

Returns

Array[…, num, 2] or list of Arrays.

rgb_to_scalar(inputs: ndarray, dtype='int32')

RGB to scalar.

Convert RGB data to scalar, while maintaining color uniqueness.

Parameters
  • inputs – Input array. Shape ([d1, d2, …, dn,] 3)

  • dtype – Data type

Returns

Output array. Shape ([d1, d2, …, dn])

rle2mask(code, shape, transpose=True, min_index=1, constant=1) ndarray

Run length encoding to mask.

Convert run length encoding to mask image.

Parameters
  • code – Run length code. As ndarray: array([idx0, len0, idx1, len1, …]) or array([[idx0, len0], [idx1, len1], …]) As list: [idx0, len0, idx1, len1, …] or [[idx0, len0], [idx1, len1], …] As str: ‘idx0 len0 idx1 len1 …’

  • shape – Mask shape.

  • transpose – If True decode row by row, otherwise decode column by column.

  • min_index – Smallest pixel index. Depends on rle encoding.

  • constant – Pixels marked by rle are set to this value.

Returns

Mask image.

to_tensor(inputs: ndarray, spatial_dims=2, transpose=False, has_batch=False, dtype=None, device=None) Tensor

Array to Tensor.

Converts numpy array to Tensor and optionally transposes from channels last to channels first.

Parameters
  • inputs – Input array.

  • transpose – Whether to transpose channels from channels last to channels first.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

  • dtype – Data type of output Tensor.

  • device – Device of output Tensor.

Returns

Tensor.

transpose_spatial(inputs: ndarray, inputs_channels_last=True, spatial_dims=2, has_batch=False)
universal_dict_collate_fn(batch, check_padding=True) OrderedDict
boxes2masks(boxes, size)
fill_label_gaps_(labels)

Fill label gaps.

Ensure that labels greater zero are within interval [1, num_unique_labels_greater_zero]. Works fast if gaps are unlikely, slow otherwise. Alternatively consider using np.vectorize. Labels <= 0 are preserved as is.

Parameters

labels

Returns:

fill_padding_(inputs, padding: int, constant=-1)
filter_instances_(labels, partials=True, partials_border=1, min_area=4, max_area=None, constant=-1, continuous=True)

Filter instances from label image.

Note

Filtered instance labels are set to constant. Labels might not be continuous afterwards.

Parameters
  • labels

  • partials

  • partials_border

  • min_area

  • max_area

  • constant

  • continuous

Returns:

relabel_(label_stack, axis=2)

Relabel.

Inplace relabeling of a label stack. After applying this op the labels in label_stack are continuous, starting at 1. Negative labels remain untouched.

Notes

  • Uses label function from sklearn.morphology

Parameters
  • label_stack – Array[height, width, channels].

  • axis – Channel axis.

remove_padding(inputs, padding: int)
remove_partials_(label_stack, border=1, constant=-1)
stack_labels(*maps, axis=2, dtype='int32', relabel=True)

Stack labels.

Parameters
  • *maps – List[Union[Array[h, w], Array[h, w, 3]]. Grayscale or rgb label maps. Rgb labels are assumed to encode labels with color and are converted to grayscale labels before stacking.

  • axis – Stacking axis.

  • dtype – Output data type.

  • relabel – Whether to assign new labels or not.

Returns

Array[h, w, c]. Label image.

unary_masks2labels(unary_masks, transpose=True)

Unary masks to labels.

Parameters
  • unary_masks – List[Array[height, width]] or Array[num_objects, height, width] List of masks. Each mask is assumed to contain exactly one object.

  • transpose – If True label images are in channels last format, otherwise channels first.

Returns

Label image. Array[height, width, num_objects] if transpose else Array[num_objects, height, width].

CPN Operations

class CPNTargetGenerator(samples, order, random_sampling=True, remove_partials=False, min_fg_dist=0.75, max_bg_dist=0.5, flag_fragmented=True, flag_fragmented_constant=-1)
property contours
feed(labels, border=1, min_area=3, max_area=None)

Notes

  • May apply inplace changes to labels.

Parameters
  • labels – Single label image. E.g. of shape (height, width, channels).

  • border

  • min_area

  • max_area

property fourier
property locations
property sampled_contours

Returns: Tensor[num_contours, num_points, 2]

property sampled_sizes

Notes

The quality of sizes depends on how accurate sampled_contours represents the actual contours.

Returns

Tensor[num_contours, 2]. Contains height and width for each contour.

property sampling
clip_contour_(contour, size)
contours2fourier(contours: dict, order=5, dtype=<class 'numpy.float32'>)
contours2labels(contours, size, rounded=True, clip=True, initial_depth=1, gap=3, dtype='int32')

Contours to labels.

Converts contours to label image.

Notes

~137 ms for contours.shape=(1284, 128, 2), size=(1000, 1000).

Parameters
  • contours – Contours. Array[num_contours, num_points, 2] or List[Array[num_points, 2]].

  • size – Label image size. (height, width).

  • rounded – Whether to round contour coordinates.

  • clip – Whether to clip contour coordinates to given size.

  • initial_depth – Initial number of channels. More channels are used if necessary.

  • gap – Gap between instances.

  • dtype – Data type of label image.

Returns

Array[height, width, channels]. Channels are used to model overlap.

efd(contour, order=10, epsilon=1e-06)

Elliptic fourier descriptor.

Computes elliptic fourier descriptors from contour data.

Parameters
  • contour – Tensor of shape (…, num_points, 2). Should be set of num_points 2D points that describe the contour of an object. Based on each contour a descriptor of shape (order, 4) is computed. The result has thus a shape of (…, order, 4). As num_points may differ from one contour to another a list of (num_points, 2) arrays may be passed as a numpy array with object as its data type, i.e. np.array(list_of_contours).

  • order – Order of resulting descriptor. The higher the order, the more detail can be preserved. An order of 1 produces ellipses.

  • epsilon – Epsilon value. Used to avoid division by zero.

Notes

Locations may contain NaN if contour only contains a single point.

Returns

Tensor of shape (…, order, 4).

filter_contours_by_intensity(img, contours, min_intensity=None, max_intensity=200, aggregate='mean')
fourier2contour(fourier, locations, samples=64, sampling=None)
Parameters
  • fourier – Array[…, order, 4]

  • locations – Array[…, 2]

  • samples – Number of samples.

  • sampling – Array[samples] or Array[(fourier.shape[:-2] + (samples,)]. Default is linspace from 0 to 1 with samples values.

Returns

Contours.

labels2contour_list(labels, **kwargs) list
labels2contours(labels, mode=0, method=1, flag_fragmented=False, raise_fragmented=True, constant=-1) dict

Labels to contours.

Notes

  • If flag_fragmented is True, labels may be modified inplace.

Parameters
  • labels

  • mode

  • method – Contour method. CHAIN_APPROX_NONE must be used if contours are used for CPN.

  • flag_fragmented – Whether to flag fragmented labels. Flagging sets labels that consist of more than one connected component to constant.

  • constant – Flagging constant.

  • raise_fragmented – Whether to raise ValueError when encountering fragmented labels.

Returns

dict

labels2distances(labels, distance_type=2, overlap_zero=True)

Label stacks to distances.

Measures distances from pixel to closest border, relative to largest distance. Values as percentage. Overlap is zero.

Notes

54.9 ms ± 3.41 ms (shape (576, 576, 3); 762 instances in three channels)

Parameters
  • labels – Label stack. (height, width, channels)

  • distance_type – opencv distance type.

  • overlap_zero – Whether to set overlapping regions to zero.

Returns

Distance map of shape (height, width). All overlapping pixels are 0. Instance centers are 1. Also labels are returned. They are altered if overlap_zero is True.

mask_labels_by_distance_(labels, distances, max_bg_dist, min_fg_dist)
masks2labels(masks, connectivity=8, label_axis=2, count=False, reduce=<function amax>, keepdims=True, **kwargs)

Masks to labels.

Notes

~ 11.7 ms for Array[25, 256, 256]. For same array skimage.measure.label takes ~ 17.9 ms.

Parameters
  • masks – List[Array[height, width]] or Array[num_masks, height, width]

  • connectivity – 8 or 4 for 8-way or 4-way connectivity respectively

  • label_axis – Axis used for stacking label maps. One per mask.

  • count – Whether to count and return the number of components.

  • reduce – Callable used to reduce label_axis. If set to None, label_axis will not be reduced. Can be used if instances do not overlap.

  • **kwargs – Kwargs for cv2.connectedComponents.

Returns

labels or (labels, count)

render_contour(contour, val=1, dtype='int32')

Eval Operations

class LabelMatcher(inputs=None, targets=None, iou_thresh=None, zero_division='warn')

Evaluation of a label image with a target label image.

Simple interface to evaluate a label image with a target label image with different metrics and IOU thresholds.

The IOU threshold is the minimum IOU that two objects must have to be counted as a match. Each target object can be matched with at most one inferred object and vice versa.

Initialize LabelMatcher object.

Parameters
  • inputs – Input labels. Array[height, width, channels].

  • targets – Target labels. Array[height, width, channels].

  • iou_thresh – IOU threshold.

  • zero_division – One of (‘warn’, 0, 1). Sets the default return value for ZeroDivisionErrors. The default ‘warn’ will show a warning and return 0. For example: If there are no true positives and no false positives, precision will return the value of zero_division and optionally show a warning.

property ap
property f1
property false_negative_labels
property false_negatives
property false_positive_labels
property false_positives
filter_and_threshold(*a, **k)
property iou_thresh
property precision
property recall
property true_positive_labels
property true_positives
update(inputs, targets, iou_thresh=None)
class LabelMatcherList(iterable=(), /)

Label Matcher List.

Simple interface to get averaged results from a list of LabelMatcher objects.

Examples

>>> lml = LabelMatcherList([
...     LabelMatcher(pred_labels_0, target_labels0),
...     LabelMatcher(pred_labels_1, target_labels1),
... ])
>>> lml.iou_thresh = 0.5  # set iou_thresh for all LabelMatcher objects
>>> print('Average F1 score for iou threshold 0.5:', lml.avg_f1)
Average F1 score for iou threshold 0.5: 0.92
>>> # Testing different IOU thresholds:
>>> for lml.iou_thresh in (.5, .75):
...     print('thresh:', lml.iou_thresh, '       f1:', lml.avg_f1)
thresh: 0.5      f1: 0.92
thresh: 0.75     f1: 0.91
property avg_ap

Average AP.

property avg_f1

Average F1 score.

property avg_precision

Average precision.

property avg_recall

Average recall.

property f1

F1 score from average recall and precision.

property iou_thresh

Gets unique IOU thresholds from all items, if there is only one unique threshold, it is returned.

get_pos_labels(v)
intersection_mask(a, b)
labels2counts(a)
labels_exist(func)
matching_labels(a, b)
vec2matches(v)

Toy Data

random_circle(image, mask, x, y, color, radius_range=(3, 28))
random_ellipse(image, mask, x, y, color, radius_range=(3, 28))
random_geometric_objects(height=256, width=256, radius_range=(3, 28), intensity_range=(0, 180), margin=13)
random_rectangle(image, mask, x, y, color, radius_range=(3, 28))
random_triangle(image, mask, x, y, color, radius_range=(3, 28))