cd.data

This submodule contains data related code and numpy operations.

Datasets

BBBC039

class BBBC039Test(directory, download=False)

BBBC039 Test.

Test split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Train(directory, download=False)

BBBC039 Train.

Training split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Val(directory, download=False)

BBBC039 Validation.

Validation split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

download_bbbc039(directory)

Download BBBC039.

Download and extract the BBBC039 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:

directory – Root directory.

BBBC041

class BBBC041Test(directory, download=False)

BBBC041 Test data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Data directory.

  • download – Whether to download data.

class BBBC041Train(directory, download=False)

BBBC041 Train data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Data directory.

  • download – Whether to download data.

download_bbbc041(directory, url='https://data.broadinstitute.org/bbbc/BBBC041/malaria.zip')

Download BBBC041.

Download and extract the BBBC041 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Synth

class SynthTest(directory, download=False, cache=True)

Synth Test data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthTrain(directory, download=False, cache=True)

Synth Train data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthVal(directory, download=False, cache=True)

Synth Val data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

download_synth(directory, url='https://celldetection.org/data/synth.zip')

Download Synth.

Download and extract the Synth dataset to given directory.

Parameters:
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Data Operations

channels_first2channels_last(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels first to channels last.

Parameters:
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

channels_last2channels_first(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels last to channels first.

Parameters:
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

ensure_tensor(x, device=None, dtype=torch.float32)

Ensure tensor.

Mapping ndarrays to Tensor. Possible shape mappings: - (h, w) -> (1, 1, h, w) - (h, w, c) -> (1, c, h, w) - (b, c, h, w) -> (b, c, h, w)

Parameters:
  • x – Inputs.

  • device – Either Device or a Module or Tensor to retrieve the device from.

  • dtype – Data type.

Returns:

Tensor

labels2crops(labels: ndarray, image: ndarray)

Labels to crops.

Crop all objects that are represented in labels from given image and return a list of all image crops, and a list of masks, each marking object pixels for respective crop.

Parameters:
  • labels – Label image. Array[h, w(, c)].

  • image – Image. Array[h, w, …].

Returns:

(crop_list, mask_list)

labels2properties(labels: ndarray, *properties, iter_channels=True, **kwargs)

Labels to properties.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:
  • labels – Label image.

  • *properties – Property names. See [1] for details.

  • iter_channels – Whether to iterate channel axis of label image. If False label image is processed as is.

  • **kwargs – Keyword arguments for skimage.measure.regionprops.

Returns:

List of property lists.

labels2property_table(labels: ndarray, *properties, iter_channels=True, **kwargs) DataFrame

Labels to property table.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:
  • labels – Label image.

  • *properties – Property names. See [1] for details.

  • iter_channels – Whether to iterate channel axis of label image. If False label image is processed as is.

  • **kwargs – Keyword arguments for skimage.measure.regionprops_table.

Returns:

Table (pd.DataFrame) of properties.

normalize_percentile(image, percentile=99.9, to_uint8=True)
pad_to_div(v, div=32, nd=2, **kwargs)

Pad to div.

Applies padding to input Array to make it divisible by div.

Parameters:
  • v – Input array.

  • div – Div tuple. If single integer, nd is used to define number of dimensions to pad.

  • nd – Number of dimensions to pad. Only used if div is not a tuple or list.

  • **kwargs – Additional keyword arguments for np.pad.

Returns:

Padded Array.

pad_to_size(v, size, **kwargs)

Pad tp size.

Applies padding to end of each dimension.

Parameters:
  • v – Input array.

  • size – Size tuple. First element corresponds to first dimension of input v.

  • **kwargs – Additional keyword arguments for np.pad.

Returns:

Padded Array.

padding_stack(*images, axis=0) ndarray

Padding stack.

Stack images along axis. If images have different shapes all images are padded to larges shape.

Parameters:
  • *images – Images.

  • axis – Axis used for stacking.

Returns:

Array

random_crop(*arrays, height, width=None)

Random crop.

Parameters:
  • *arrays – Input arrays that are to be cropped. None values accepted. The shape of the first element is used as reference.

  • height – Output height.

  • width – Output width. Default is same as height.

Returns:

Cropped array if arrays contains a single array; a list of cropped arrays otherwise

resample_contours(contours, num=None, close=True, epsilon=1e-06)

Resample contour.

Sample ´´num´´ equidistant points on each contour in contours.

Notes

  • Works for closed and open contours.

Parameters:
  • contours – Contours to sample from. Array[…, num’, 2] or list of Arrays.

  • num – Number of points.

  • close – Set True if contours contains closed contours, with the end point not being equal to the start point. Set False otherwise.

  • epsilon – Epsilon.

Returns:

Array[…, num, 2] or list of Arrays.

rgb_to_scalar(inputs: ndarray, dtype='int32')

RGB to scalar.

Convert RGB data to scalar, while maintaining color uniqueness.

Parameters:
  • inputs – Input array. Shape ([d1, d2, …, dn,] 3)

  • dtype – Data type

Returns:

Output array. Shape ([d1, d2, …, dn])

rle2mask(code, shape, transpose=True, min_index=1, constant=1) ndarray

Run length encoding to mask.

Convert run length encoding to mask image.

Parameters:
  • code – Run length code. As ndarray: array([idx0, len0, idx1, len1, …]) or array([[idx0, len0], [idx1, len1], …]) As list: [idx0, len0, idx1, len1, …] or [[idx0, len0], [idx1, len1], …] As str: ‘idx0 len0 idx1 len1 …’

  • shape – Mask shape.

  • transpose – If True decode row by row, otherwise decode column by column.

  • min_index – Smallest pixel index. Depends on rle encoding.

  • constant – Pixels marked by rle are set to this value.

Returns:

Mask image.

to_tensor(inputs: ndarray, spatial_dims=2, transpose=False, has_batch=False, dtype=None, device=None) Tensor

Array to Tensor.

Converts numpy array to Tensor and optionally transposes from channels last to channels first.

Parameters:
  • inputs – Input array.

  • transpose – Whether to transpose channels from channels last to channels first.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

  • dtype – Data type of output Tensor.

  • device – Device of output Tensor.

Returns:

Tensor.

transpose_spatial(inputs: ndarray, inputs_channels_last=True, spatial_dims=2, has_batch=False)
universal_dict_collate_fn(batch, check_padding=True) OrderedDict
boxes2masks(boxes, size)
fill_label_gaps_(labels)

Fill label gaps.

Ensure that labels greater zero are within interval [1, num_unique_labels_greater_zero]. Works fast if gaps are unlikely, slow otherwise. Alternatively consider using np.vectorize. Labels <= 0 are preserved as is.

Parameters:

labels

Returns:

fill_padding_(inputs, padding: int, constant=-1)
filter_instances_(labels, partials=True, partials_border=1, min_area=4, max_area=None, constant=-1, continuous=True)

Filter instances from label image.

Note

Filtered instance labels are set to constant. Labels might not be continuous afterwards.

Parameters:
  • labels

  • partials

  • partials_border

  • min_area

  • max_area

  • constant

  • continuous

Returns:

relabel_(label_stack, axis=2)

Relabel.

Inplace relabeling of a label stack. After applying this op the labels in label_stack are continuous, starting at 1. Negative labels remain untouched.

Notes

  • Uses label function from sklearn.morphology

Parameters:
  • label_stack – Array[height, width, channels].

  • axis – Channel axis.

remove_padding(inputs, padding: int)
remove_partials_(label_stack, border=1, constant=-1)
stack_labels(*maps, axis=2, dtype='int32', relabel=True)

Stack labels.

Parameters:
  • *maps – List[Union[Array[h, w], Array[h, w, 3]]. Grayscale or rgb label maps. Rgb labels are assumed to encode labels with color and are converted to grayscale labels before stacking.

  • axis – Stacking axis.

  • dtype – Output data type.

  • relabel – Whether to assign new labels or not.

Returns:

Array[h, w, c]. Label image.

unary_masks2labels(unary_masks, transpose=True)

Unary masks to labels.

Parameters:
  • unary_masks – List[Array[height, width]] or Array[num_objects, height, width] List of masks. Each mask is assumed to contain exactly one object.

  • transpose – If True label images are in channels last format, otherwise channels first.

Returns:

Label image. Array[height, width, num_objects] if transpose else Array[num_objects, height, width].

CPN Operations

class CPNTargetGenerator(samples, order, random_sampling=True, remove_partials=False, min_fg_dist=0.75, max_bg_dist=0.5, flag_fragmented=True, flag_fragmented_constant=-1)
property contours
feed(labels, border=1, min_area=1, max_area=None)

Notes

  • May apply inplace changes to labels.

Parameters:
  • labels – Single label image. E.g. of shape (height, width, channels).

  • border

  • min_area

  • max_area

property fourier
property locations
property reduced_labels
property sampled_contours

Returns: Tensor[num_contours, num_points, 2]

property sampled_sizes

Notes

The quality of sizes depends on how accurate sampled_contours represents the actual contours.

Returns:

Tensor[num_contours, 2]. Contains height and width for each contour.

property sampling
clip_contour_(contour, size)
contours2boxes(contours)

Contours to boxes.

Parameters:

contours – Array[num_contours, num_points, 2]. (x, y) format.

Returns:

Array[num_contours, 4]. (x0, y0, x1, y1) format.

contours2labels(contours, size, rounded=True, clip=True, initial_depth=1, gap=3, dtype='int32')

Contours to labels.

Convert contours to label image.

Notes

  • ~137 ms for contours.shape=(1284, 128, 2), size=(1000, 1000).

  • Label images come with channels, as contours may assign pixels to multiple objects. Since such multi-assignments cannot be easily encoded in a channel-free label image, channels are used. To remove channels refer to resolve_label_channels.

Parameters:
  • contours – Contours of a single image. Array[num_contours, num_points, 2] or List[Array[num_points, 2]].

  • size – Label image size. (height, width).

  • rounded – Whether to round contour coordinates.

  • clip – Whether to clip contour coordinates to given size.

  • initial_depth – Initial number of channels. More channels are used if necessary.

  • gap – Gap between instances.

  • dtype – Data type of label image.

Returns:

Array[height, width, channels]. Since contours may assign pixels to multiple objects, the label image comes with channels. To remove channels refer to resolve_label_channels.

contours2properties(contours, *properties, round=True, **kwargs)

Contours to properties.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:
  • contours – Contours.

  • *properties – Property names. See [1] for details.

  • round – Whether to round contours. Default is True.

  • **kwargs – Keyword arguments for skimage.measure.regionprops.

Returns:

List of property lists.

filter_contours_by_intensity(img, contours, min_intensity=None, max_intensity=200, aggregate='mean')
masks2labels(masks, connectivity=8, label_axis=2, count=False, reduce=<function max>, keepdims=True, **kwargs)

Masks to labels.

Notes

~ 11.7 ms for Array[25, 256, 256]. For same array skimage.measure.label takes ~ 17.9 ms.

Parameters:
  • masks – List[Array[height, width]] or Array[num_masks, height, width]

  • connectivity – 8 or 4 for 8-way or 4-way connectivity respectively

  • label_axis – Axis used for stacking label maps. One per mask.

  • count – Whether to count and return the number of components.

  • reduce – Callable used to reduce label_axis. If set to None, label_axis will not be reduced. Can be used if instances do not overlap.

  • **kwargs – Kwargs for cv2.connectedComponents.

Returns:

labels or (labels, count)

render_contour(contour, val=1, dtype='int32', round=False, reference=None)
resolve_label_channels(labels, method='dilation', max_iter=999, kernel=(3, 3))

Resolve label channels.

Remove channels from a label image. Pixels that are assigned to exactly one foreground label remain as is. Pixels that are assigned to multiple foreground labels present a conflict, as they cannot be described by a channel-less label image. Such conflicts are resolved by method.

Parameters:
  • labels – Label image. Array[h, w, c].

  • method – Method to resolve overlapping regions.

  • max_iter – Max iteration.

  • kernel – Kernel.

Returns:

Labels with channels removed. Array[h, w].

Eval Operations

class LabelMatcher(inputs=None, targets=None, iou_thresh=None, zero_division='warn')

Evaluation of a label image with a target label image.

Simple interface to evaluate a label image with a target label image with different metrics and IOU thresholds.

The IOU threshold is the minimum IOU that two objects must have to be counted as a match. Each target object can be matched with at most one inferred object and vice versa.

Initialize LabelMatcher object.

Parameters:
  • inputs – Input labels. Array[height, width, channels].

  • targets – Target labels. Array[height, width, channels].

  • iou_thresh – IOU threshold.

  • zero_division – One of (‘warn’, 0, 1). Sets the default return value for ZeroDivisionErrors. The default ‘warn’ will show a warning and return 0. For example: If there are no true positives and no false positives, precision will return the value of zero_division and optionally show a warning.

property ap
property f1
property false_negative_labels
property false_negatives
property false_positive_labels
property false_positives
filter_and_threshold(*a, **k)
property iou_thresh
property precision
property recall
property true_positive_labels
property true_positives
update(inputs, targets, iou_thresh=None)
class LabelMatcherList(iterable=(), /)

Label Matcher List.

Simple interface to get averaged results from a list of LabelMatcher objects.

Examples

>>> lml = LabelMatcherList([
...     LabelMatcher(pred_labels_0, target_labels0),
...     LabelMatcher(pred_labels_1, target_labels1),
... ])
>>> lml.iou_thresh = 0.5  # set iou_thresh for all LabelMatcher objects
>>> print('Average F1 score for iou threshold 0.5:', lml.avg_f1)
Average F1 score for iou threshold 0.5: 0.92
>>> # Testing different IOU thresholds:
>>> for lml.iou_thresh in (.5, .75):
...     print('thresh:', lml.iou_thresh, '       f1:', lml.avg_f1)
thresh: 0.5      f1: 0.92
thresh: 0.75     f1: 0.91
property avg_ap

Average AP.

property avg_f1

Average F1 score.

property avg_precision

Average precision.

property avg_recall

Average recall.

property f1

F1 score from average recall and precision.

property iou_thresh

Gets unique IOU thresholds from all items, if there is only one unique threshold, it is returned.

get_pos_labels(v)
intersection_mask(a, b)
labels2counts(a)
labels_exist(func)
matching_labels(a, b)
vec2matches(v)

Toy Data

random_circle(image, mask, x, y, color, radius_range=(3, 28))
random_ellipse(image, mask, x, y, color, radius_range=(3, 28))
random_geometric_objects(height=256, width=256, radius_range=(3, 28), intensity_range=(0, 180), margin=13)
random_rectangle(image, mask, x, y, color, radius_range=(3, 28))
random_triangle(image, mask, x, y, color, radius_range=(3, 28))