cd.data

This submodule contains data related code and numpy operations.

Datasets

BBBC039

class BBBC039Test(directory, download=False)

BBBC039 Test.

Test split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Train(directory, download=False)

BBBC039 Train.

Training split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Val(directory, download=False)

BBBC039 Validation.

Validation split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

download_bbbc039(directory)

Download BBBC039.

Download and extract the BBBC039 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:

directory – Root directory.

BBBC041

class BBBC041Test(directory, download=False)

BBBC041 Test data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Data directory.

  • download – Whether to download data.

class BBBC041Train(directory, download=False)

BBBC041 Train data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Data directory.

  • download – Whether to download data.

download_bbbc041(directory, url='https://data.broadinstitute.org/bbbc/BBBC041/malaria.zip')

Download BBBC041.

Download and extract the BBBC041 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Synth

class SynthTest(directory, download=False, cache=True)

Synth Test data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthTrain(directory, download=False, cache=True)

Synth Train data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthVal(directory, download=False, cache=True)

Synth Val data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

download_synth(directory, url='https://celldetection.org/data/synth.zip')

Download Synth.

Download and extract the Synth dataset to given directory.

Parameters:
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Data Operations

channels_first2channels_last(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels first to channels last.

Parameters:
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

channels_last2channels_first(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels last to channels first.

Parameters:
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

ensure_tensor(x, device=None, dtype=torch.float32)

Ensure tensor.

Mapping ndarrays to Tensor. Possible shape mappings: - (h, w) -> (1, 1, h, w) - (h, w, c) -> (1, c, h, w) - (b, c, h, w) -> (b, c, h, w)

Parameters:
  • x – Inputs.

  • device – Either Device or a Module or Tensor to retrieve the device from.

  • dtype – Data type.

Returns:

Tensor

labels2crops(labels: ndarray, image: ndarray)

Labels to crops.

Crop all objects that are represented in labels from given image and return a list of all image crops, and a list of masks, each marking object pixels for respective crop.

Parameters:
  • labels – Label image. Array[h, w(, c)].

  • image – Image. Array[h, w, …].

Returns:

(crop_list, mask_list)

labels2properties(labels: ndarray, *properties, iter_channels=True, **kwargs)

Labels to properties.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:
  • labels – Label image.

  • *properties – Property names. See [1] for details.

  • iter_channels – Whether to iterate channel axis of label image. If False label image is processed as is.

  • **kwargs – Keyword arguments for skimage.measure.regionprops.

Returns:

List of property lists.

labels2property_table(labels: ndarray, *properties, iter_channels=True, **kwargs) DataFrame

Labels to property table.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:
  • labels – Label image.

  • *properties – Property names. See [1] for details.

  • iter_channels – Whether to iterate channel axis of label image. If False label image is processed as is.

  • **kwargs – Keyword arguments for skimage.measure.regionprops_table.

Returns:

Table (pd.DataFrame) of properties.

normalize_percentile(image, percentile=99.9, to_uint8=True)
pad_to_div(v, div=32, nd=2, **kwargs)

Pad to div.

Applies padding to input Array to make it divisible by div.

Parameters:
  • v – Input array.

  • div – Div tuple. If single integer, nd is used to define number of dimensions to pad.

  • nd – Number of dimensions to pad. Only used if div is not a tuple or list.

  • **kwargs – Additional keyword arguments for np.pad.

Returns:

Padded Array.

pad_to_size(v, size, **kwargs)

Pad tp size.

Applies padding to end of each dimension.

Parameters:
  • v – Input array.

  • size – Size tuple. First element corresponds to first dimension of input v.

  • **kwargs – Additional keyword arguments for np.pad.

Returns:

Padded Array.

padding_stack(*images, axis=0) ndarray

Padding stack.

Stack images along axis. If images have different shapes all images are padded to larges shape.

Parameters:
  • *images – Images.

  • axis – Axis used for stacking.

Returns:

Array

random_crop(inputs, size=None, *args, return_coords=False, return_slices=False, **kwargs)
regionprops2d(label_image, intensity_image=None, cache=True, *, extra_properties=None, spacing=None, offset=None)

Regionprops 2d.

Helper function that allows to use skimage.measure.regionprops with label images that have channels.

Note

Labels may not yield in order!

Parameters:
  • label_image – Array[h, w] or Array[h, w, c].

  • intensity_image

  • cache

  • extra_properties

  • spacing

  • offset

Returns:

resample_contours(contours, num=None, close=True, epsilon=1e-06)

Resample contour.

Sample ´´num´´ equidistant points on each contour in contours.

Notes

  • Works for closed and open contours.

Parameters:
  • contours – Contours to sample from. Array[…, num’, 2] or list of Arrays.

  • num – Number of points.

  • close – Set True if contours contains closed contours, with the end point not being equal to the start point. Set False otherwise.

  • epsilon – Epsilon.

Returns:

Array[…, num, 2] or list of Arrays.

rgb_to_scalar(inputs: ndarray, dtype='int32')

RGB to scalar.

Convert RGB data to scalar, while maintaining color uniqueness.

Parameters:
  • inputs – Input array. Shape ([d1, d2, …, dn,] 3)

  • dtype – Data type

Returns:

Output array. Shape ([d1, d2, …, dn])

rle2mask(code, shape, transpose=True, min_index=1, constant=1) ndarray

Run length encoding to mask.

Convert run length encoding to mask image.

Parameters:
  • code – Run length code. As ndarray: array([idx0, len0, idx1, len1, …]) or array([[idx0, len0], [idx1, len1], …]) As list: [idx0, len0, idx1, len1, …] or [[idx0, len0], [idx1, len1], …] As str: ‘idx0 len0 idx1 len1 …’

  • shape – Mask shape.

  • transpose – If True decode row by row, otherwise decode column by column.

  • min_index – Smallest pixel index. Depends on rle encoding.

  • constant – Pixels marked by rle are set to this value.

Returns:

Mask image.

split(n: int, *splits, shuffle=True, seed=None)

Split.

Splits a range of indices into multiple sets based on the given fractions.

Parameters:
  • n – The total number of indices.

  • *splits – Variable length list of floats representing the fraction of the dataset for each split.

  • shuffle – Whether to shuffle the indices before splitting.

  • seed – Seed for the random number generator.

Returns:

Split indices.

to_tensor(inputs: ndarray, spatial_dims=2, transpose=False, has_batch=False, dtype=None, device=None) Tensor

Array to Tensor.

Converts numpy array to Tensor and optionally transposes from channels last to channels first.

Parameters:
  • inputs – Input array.

  • transpose – Whether to transpose channels from channels last to channels first.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

  • dtype – Data type of output Tensor.

  • device – Device of output Tensor.

Returns:

Tensor.

transpose_spatial(inputs: ndarray, inputs_channels_last=True, spatial_dims=2, has_batch=False)
universal_dict_collate_fn(batch, check_padding=True) OrderedDict
boxes2masks(boxes, size)
fill_label_gaps_(labels)

Fill label gaps.

Ensure that labels greater zero are within interval [1, num_unique_labels_greater_zero]. Works fast if gaps are unlikely, slow otherwise. Alternatively consider using np.vectorize. Labels <= 0 are preserved as is.

Parameters:

labels

Returns:

fill_padding_(inputs, padding: int, constant=-1, preserve_existing=True, axes=(0, 1))
filter_instances_(labels, partials=True, partials_border=1, min_area=4, max_area=None, constant=-1, continuous=True)

Filter instances from label image.

Note

Filtered instance labels are set to constant. Labels might not be continuous afterwards.

Parameters:
  • labels

  • partials

  • partials_border

  • min_area

  • max_area

  • constant

  • continuous

Returns:

relabel_(label_stack, axis=2)

Relabel.

Inplace relabeling of a label stack. After applying this op the labels in label_stack are continuous, starting at 1. Negative labels remain untouched.

Notes

  • Uses label function from sklearn.morphology

Parameters:
  • label_stack – Array[height, width, channels].

  • axis – Channel axis.

remove_padding(inputs, padding: int)
remove_partials_(label_stack, border=1, constant=-1)
stack_labels(*maps, axis=2, dtype='int32', relabel=True)

Stack labels.

Parameters:
  • *maps – List[Union[Array[h, w], Array[h, w, 3]]. Grayscale or rgb label maps. Rgb labels are assumed to encode labels with color and are converted to grayscale labels before stacking.

  • axis – Stacking axis.

  • dtype – Output data type.

  • relabel – Whether to assign new labels or not.

Returns:

Array[h, w, c]. Label image.

unary_masks2labels(unary_masks, transpose=True)

Unary masks to labels.

Parameters:
  • unary_masks – List[Array[height, width]] or Array[num_objects, height, width] List of masks. Each mask is assumed to contain exactly one object.

  • transpose – If True label images are in channels last format, otherwise channels first.

Returns:

Label image. Array[height, width, num_objects] if transpose else Array[num_objects, height, width].

CPN Operations

class CPNTargetGenerator(samples, order, random_sampling=True, remove_partials=False, min_fg_dist=0.75, max_bg_dist=0.5, flag_fragmented=True, flag_fragmented_constant=-1)
property contours
feed(labels, border=1, min_area=1, max_area=None, **kwargs)

Notes

  • May apply inplace changes to labels.

Parameters:
  • labels – Single label image. E.g. of shape (height, width, channels).

  • border

  • min_area

  • max_area

property fourier
property locations
property reduced_labels
property resampled_contours

Returns: Tensor[num_contours, num_points, 2]

property sampled_contours

Returns: Tensor[num_contours, num_points, 2]

property sampled_sizes

Notes

The quality of sizes depends on how accurate sampled_contours represents the actual contours.

Returns:

Tensor[num_contours, 2]. Contains height and width for each contour.

property sampling
clip_contour_(contour, size)
contours2boxes(contours)

Contours to boxes.

Parameters:

contours – Array[num_contours, num_points, 2]. (x, y) format.

Returns:

Array[num_contours, 4]. (x0, y0, x1, y1) format.

contours2labels(contours, size, rounded=True, clip=True, initial_depth=1, gap=3, dtype='int32', ioa_thresh=None, sort_by=None, sort_descending=True, return_indices=False)

Contours to labels.

Convert contours to label image.

Notes

  • ~137 ms for contours.shape=(1284, 128, 2), size=(1000, 1000).

  • Label images come with channels, as contours may assign pixels to multiple objects. Since such multi-assignments cannot be easily encoded in a channel-free label image, channels are used. To remove channels refer to resolve_label_channels.

Parameters:
  • contours – Contours of a single image. Array[num_contours, num_points, 2] or List[Array[num_points, 2]].

  • size – Label image size. (height, width).

  • rounded – Whether to round contour coordinates.

  • clip – Whether to clip contour coordinates to given size.

  • initial_depth – Initial number of channels. More channels are used if necessary.

  • gap – Gap between instances.

  • dtype – Data type of label image.

  • ioa_thresh – Intersection over area threshold. Skip contours that have an intersection over own area (i.e. area of contour that already contains a label vs. area of contour) greater ioa_thresh, compared to the union of all contours painted before. Note that the order of contours is relevant, as contours are processed iteratively. IoA of 0 means no labels present so far, IoA of 1. means the entire contour area is already covered by other contours.

  • sort_by – Optional Array used to sort contours. Note, that if this option is used, labels and contour indices no longer correspond.

  • sort_descending – Whether to sort by descending.

  • return_indices – Whether to return indices.

Returns:

Array[height, width, channels]. Since contours may assign pixels to multiple objects, the label image comes with channels. To remove channels refer to resolve_label_channels.

contours2overlay(contours, size, hue_range=(0, 180), saturation_range=(60, 133), value_range=(180, 256), rounded=True, clip=True, intermediate_dtype='float16')
contours2properties(contours, *properties, round=True, **kwargs)

Contours to properties.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:
  • contours – Contours.

  • *properties – Property names. See [1] for details.

  • round – Whether to round contours. Default is True.

  • **kwargs – Keyword arguments for skimage.measure.regionprops.

Returns:

List of property lists.

draw_contours(canvas, contours, val=(51, 255, 51), round=True, contour_idx=-1, thickness=2, **kwargs)
filter_contours_by_intensity(img, contours, min_intensity=None, max_intensity=200, aggregate='mean')
masks2labels(masks, connectivity=8, label_axis=2, count=False, reduce=<function max>, keepdims=True, **kwargs)

Masks to labels.

Notes

~ 11.7 ms for Array[25, 256, 256]. For same array skimage.measure.label takes ~ 17.9 ms.

Parameters:
  • masks – List[Array[height, width]] or Array[num_masks, height, width]

  • connectivity – 8 or 4 for 8-way or 4-way connectivity respectively

  • label_axis – Axis used for stacking label maps. One per mask.

  • count – Whether to count and return the number of components.

  • reduce – Callable used to reduce label_axis. If set to None, label_axis will not be reduced. Can be used if instances do not overlap.

  • **kwargs – Kwargs for cv2.connectedComponents.

Returns:

labels or (labels, count)

render_contour(contour, val=1, dtype='int32', round=False, reference=None)
resolve_label_channels(labels, method='dilation', max_iter=999, kernel=(3, 3))

Resolve label channels.

Remove channels from a label image. Pixels that are assigned to exactly one foreground label remain as is. Pixels that are assigned to multiple foreground labels present a conflict, as they cannot be described by a channel-less label image. Such conflicts are resolved by method.

Parameters:
  • labels – Label image. Array[h, w, c].

  • method – Method to resolve overlapping regions.

  • max_iter – Max iteration.

  • kernel – Kernel.

Returns:

Labels with channels removed. Array[h, w].

Eval Operations

class LabelMatcher(inputs=None, targets=None, iou_thresh=None, zero_division='warn', epsilon=1e-12)

Evaluation of a label image with a target label image.

Simple interface to evaluate a label image with a target label image with different metrics and IOU thresholds.

The IOU threshold is the minimum IOU that two objects must have to be counted as a match. Each target object can be matched with at most one inferred object and vice versa.

Initialize LabelMatcher object.

Parameters:
  • inputs – Input labels. Array[height, width, channels].

  • targets – Target labels. Array[height, width, channels].

  • iou_thresh – IOU threshold.

  • zero_division – One of (‘warn’, 0, 1). Sets the default return value for ZeroDivisionErrors. The default ‘warn’ will show a warning and return 0. For example: If there are no true positives and no false positives, precision will return the value of zero_division and optionally show a warning.

property f1
property false_negative_labels
property false_negatives
property false_positive_labels
property false_positives
filter_and_threshold(*a, **k)
property fowlkes_mallows
property iou_thresh
property jaccard
property precision
property recall
property true_positive_labels
property true_positives
update(inputs, targets, iou_thresh=None)
class LabelMatcherList(*args, epsilon=1e-12, rank=None, num_ranks=None, device=None, cache=False, **kwargs)

Label Matcher List.

Simple interface to get averaged results from a list of LabelMatcher objects.

Note

Distributed use assumes, that each example is shown exactly once. Duplicates are not removed. Check your sampler accordingly. Also make sure that each rank calls the same methods in the same order.

Examples

>>> lml = LabelMatcherList([
...     LabelMatcher(pred_labels_0, target_labels0),
...     LabelMatcher(pred_labels_1, target_labels1),
... ])
>>> lml.iou_thresh = 0.5  # set iou_thresh for all LabelMatcher objects
>>> print('Average F1 score for iou threshold 0.5:', lml.avg_f1)
Average F1 score for iou threshold 0.5: 0.92
>>> # Testing different IOU thresholds:
>>> for lml.iou_thresh in (.5, .75):
...     print('thresh:', lml.iou_thresh, '   f1:', lml.avg_f1)
thresh: 0.5          f1: 0.92
thresh: 0.75         f1: 0.91
Parameters:
  • *args

  • epsilon

  • rank – Rank (e.g. `trainer.global_rank). Allows for distributed communication. If not passed, results will only be computed locally. If passed results are synced across all ranks.

  • num_ranks – Number of ranks (e.g. trainer.world_size). Allows for distributed communication. If not passed, results will only be computed locally. If passed results are synced across all ranks.

  • cache – Whether to cache aggregated results. Currently only for distributed environments.

  • **kwargs

append(_LabelMatcherList__object)
property avg_f1

Average F1 score.

property avg_fowlkes_mallows
property avg_jaccard

Average Jaccard index.

property avg_precision

Average precision.

property avg_recall

Average recall.

clear()
clear_cache()
copy()
property distributed
extend(_LabelMatcherList__iterable)
property f1

F1 score from average recall and precision.

property f1_np

F1 score from negatives and positives.

property false_negatives
property false_positives
property fowlkes_mallows_np
insert(_LabelMatcherList__index, _LabelMatcherList__object)
property iou_thresh

Gets local unique IOU thresholds from all items, if there is only one unique threshold, it is returned.

property jaccard_np
property length: int
pop(*args, **kwargs)
property precision
property recall
property true_positives
get_pos_labels(v)
intersection_mask(a, b)
labels2counts(a)
labels_exist(func)
matching_labels(a, b)
vec2matches(v)

Toy Data

random_circle(image, mask, x, y, color, radius_range=(3, 28))
random_ellipse(image, mask, x, y, color, radius_range=(3, 28))
random_geometric_objects(height=256, width=256, radius_range=(3, 28), intensity_range=(0, 180), margin=13)
random_rectangle(image, mask, x, y, color, radius_range=(3, 28))
random_triangle(image, mask, x, y, color, radius_range=(3, 28))