cd.data

This submodule contains data related code and numpy operations.

Datasets

BBBC039

class BBBC039Test(directory, download=False)

BBBC039 Test.

Test split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:

directory – Root directory.
download – Whether to download the dataset.

class BBBC039Train(directory, download=False)

BBBC039 Train.

Training split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:

directory – Root directory.
download – Whether to download the dataset.

class BBBC039Val(directory, download=False)

BBBC039 Validation.

Validation split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:

directory – Root directory.
download – Whether to download the dataset.

download_bbbc039(directory)

Download BBBC039.

Download and extract the BBBC039 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:: directory – Root directory.

BBBC041

class BBBC041Test(directory, download=False)

BBBC041 Test data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:

directory – Data directory.
download – Whether to download data.

class BBBC041Train(directory, download=False)

BBBC041 Train data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:

directory – Data directory.
download – Whether to download data.

download_bbbc041(directory, url='https://data.broadinstitute.org/bbbc/BBBC041/malaria.zip')

Download BBBC041.

Download and extract the BBBC041 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:

directory – Root directory.
url – Download URL (this dataset is distributed in a single zip file).

Synth

class SynthTest(directory, download=False, cache=True)

Synth Test data.

Parameters:

directory – Data directory. E.g. directory='data/synth'.
download – Whether to download data.
cache – Whether to hold data in memory.

class SynthTrain(directory, download=False, cache=True)

Synth Train data.

Parameters:

directory – Data directory. E.g. directory='data/synth'.
download – Whether to download data.
cache – Whether to hold data in memory.

class SynthVal(directory, download=False, cache=True)

Synth Val data.

Parameters:

directory – Data directory. E.g. directory='data/synth'.
download – Whether to download data.
cache – Whether to hold data in memory.

download_synth(directory, url='https://celldetection.org/data/synth.zip')

Download Synth.

Download and extract the Synth dataset to given directory.

Parameters:

directory – Root directory.
url – Download URL (this dataset is distributed in a single zip file).

Data Operations

channels_first2channels_last(inputs: ndarray, spatial_dims=2, has_batch=False) → ndarray

Channels first to channels last.

Parameters:

inputs – Input array.
spatial_dims – Number of spatial dimensions.
has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

channels_last2channels_first(inputs: ndarray, spatial_dims=2, has_batch=False) → ndarray

Channels last to channels first.

Parameters:

inputs – Input array.
spatial_dims – Number of spatial dimensions.
has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

ensure_tensor(x, device=None, dtype=torch.float32)

Ensure tensor.

Mapping ndarrays to Tensor. Possible shape mappings: - (h, w) -> (1, 1, h, w) - (h, w, c) -> (1, c, h, w) - (b, c, h, w) -> (b, c, h, w)

Parameters:

x – Inputs.
device – Either Device or a Module or Tensor to retrieve the device from.
dtype – Data type.

Returns:

Tensor

labels2crops(labels: ndarray, image: ndarray)

Labels to crops.

Crop all objects that are represented in labels from given image and return a list of all image crops, and a list of masks, each marking object pixels for respective crop.

Parameters:

labels – Label image. Array[h, w(, c)].
image – Image. Array[h, w, …].

Returns:

(crop_list, mask_list)

labels2properties(labels: ndarray, *properties, iter_channels=True, **kwargs)

Labels to properties.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:

labels – Label image.
*properties – Property names. See [1] for details.
iter_channels – Whether to iterate channel axis of label image. If False label image is processed as is.
**kwargs – Keyword arguments for skimage.measure.regionprops.

Returns:

List of property lists.

labels2property_table(labels: ndarray, *properties, iter_channels=True, **kwargs) → DataFrame

Labels to property table.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:

labels – Label image.
*properties – Property names. See [1] for details.
iter_channels – Whether to iterate channel axis of label image. If False label image is processed as is.
**kwargs – Keyword arguments for skimage.measure.regionprops_table.

Returns:

Table (pd.DataFrame) of properties.

normalize_percentile(image, percentile=99.9, to_uint8=True)

pad_to_div(v, div=32, nd=2, **kwargs)

Pad to div.

Applies padding to input Array to make it divisible by div.

Parameters:

v – Input array.
div – Div tuple. If single integer, nd is used to define number of dimensions to pad.
nd – Number of dimensions to pad. Only used if div is not a tuple or list.
**kwargs – Additional keyword arguments for np.pad.

Returns:

Padded Array.

pad_to_size(v, size, **kwargs)

Pad tp size.

Applies padding to end of each dimension.

Parameters:

v – Input array.
size – Size tuple. First element corresponds to first dimension of input v.
**kwargs – Additional keyword arguments for np.pad.

Returns:

Padded Array.

padding_stack(*images, axis=0) → ndarray

Padding stack.

Stack images along axis. If images have different shapes all images are padded to larges shape.

Parameters:

*images – Images.
axis – Axis used for stacking.

Returns:

Array

random_crop(inputs, size=None, *args, return_coords=False, return_slices=False, **kwargs)

regionprops2d(label_image, intensity_image=None, cache=True, *, extra_properties=None, spacing=None, offset=None)

Regionprops 2d.

Helper function that allows to use skimage.measure.regionprops with label images that have channels.

Note

Labels may not yield in order!

Parameters:

label_image – Array[h, w] or Array[h, w, c].
intensity_image
cache
extra_properties
spacing
offset

Returns:

resample_contours(contours, num=None, close=True, epsilon=1e-06)

Resample contour.

Sample ´´num´´ equidistant points on each contour in contours.

Notes

Works for closed and open contours.

Parameters:

contours – Contours to sample from. Array[…, num’, 2] or list of Arrays.
num – Number of points.
close – Set True if contours contains closed contours, with the end point not being equal to the start point. Set False otherwise.
epsilon – Epsilon.

Returns:

Array[…, num, 2] or list of Arrays.

rgb_to_scalar(inputs: ndarray, dtype='int32')

RGB to scalar.

Convert RGB data to scalar, while maintaining color uniqueness.

Parameters:

inputs – Input array. Shape ([d1, d2, …, dn,] 3)
dtype – Data type

Returns:

Output array. Shape ([d1, d2, …, dn])

rle2mask(code, shape, transpose=True, min_index=1, constant=1) → ndarray

Run length encoding to mask.

Convert run length encoding to mask image.

Parameters:

code – Run length code. As ndarray: array([idx0, len0, idx1, len1, …]) or array([[idx0, len0], [idx1, len1], …]) As list: [idx0, len0, idx1, len1, …] or [[idx0, len0], [idx1, len1], …] As str: ‘idx0 len0 idx1 len1 …’
shape – Mask shape.
transpose – If True decode row by row, otherwise decode column by column.
min_index – Smallest pixel index. Depends on rle encoding.
constant – Pixels marked by rle are set to this value.

Returns:

Mask image.

split(n: int, *splits, shuffle=True, seed=None)

Split.

Splits a range of indices into multiple sets based on the given fractions.

Parameters:

n – The total number of indices.
*splits – Variable length list of floats representing the fraction of the dataset for each split.
shuffle – Whether to shuffle the indices before splitting.
seed – Seed for the random number generator.

Returns:

Split indices.

to_tensor(inputs: ndarray, spatial_dims=2, transpose=False, has_batch=False, dtype=None, device=None) → Tensor

Array to Tensor.

Converts numpy array to Tensor and optionally transposes from channels last to channels first.

Parameters:

inputs – Input array.
transpose – Whether to transpose channels from channels last to channels first.
spatial_dims – Number of spatial dimensions.
has_batch – Whether inputs has a batch dimension.
dtype – Data type of output Tensor.
device – Device of output Tensor.

Returns:

Tensor.

transpose_spatial(inputs: ndarray, inputs_channels_last=True, spatial_dims=2, has_batch=False)

universal_dict_collate_fn(batch, check_padding=True) → OrderedDict

boxes2masks(boxes, size)

fill_label_gaps_(labels)

Fill label gaps.

Ensure that labels greater zero are within interval [1, num_unique_labels_greater_zero]. Works fast if gaps are unlikely, slow otherwise. Alternatively consider using np.vectorize. Labels <= 0 are preserved as is.

Parameters:: labels

Returns:

fill_padding_(inputs, padding: int, constant=-1, preserve_existing=True, axes=(0, 1))

filter_instances_(labels, partials=True, partials_border=1, min_area=4, max_area=None, constant=-1, continuous=True)

Filter instances from label image.

Note

Filtered instance labels are set to constant. Labels might not be continuous afterwards.

Parameters:

labels
partials
partials_border
min_area
max_area
constant
continuous

Returns:

relabel_(label_stack, axis=2)

Relabel.

Inplace relabeling of a label stack. After applying this op the labels in label_stack are continuous, starting at 1. Negative labels remain untouched.

Notes

Uses label function from sklearn.morphology

Parameters:

label_stack – Array[height, width, channels].
axis – Channel axis.

remove_padding(inputs, padding: int)

remove_partials_(label_stack, border=1, constant=-1)

stack_labels(*maps, axis=2, dtype='int32', relabel=True)

Stack labels.

Parameters:

*maps – List[Union[Array[h, w], Array[h, w, 3]]. Grayscale or rgb label maps. Rgb labels are assumed to encode labels with color and are converted to grayscale labels before stacking.
axis – Stacking axis.
dtype – Output data type.
relabel – Whether to assign new labels or not.

Returns:

Array[h, w, c]. Label image.

unary_masks2labels(unary_masks, transpose=True)

Unary masks to labels.

Parameters:

unary_masks – List[Array[height, width]] or Array[num_objects, height, width] List of masks. Each mask is assumed to contain exactly one object.
transpose – If True label images are in channels last format, otherwise channels first.

Returns:

Label image. Array[height, width, num_objects] if transpose else Array[num_objects, height, width].

CPN Operations

class CPNTargetGenerator(samples, order, random_sampling=True, remove_partials=False, min_fg_dist=0.75, max_bg_dist=0.5, flag_fragmented=True, flag_fragmented_constant=-1)

property contours

feed(labels, border=1, min_area=1, max_area=None, **kwargs)

Notes

May apply inplace changes to labels.

Parameters:

labels – Single label image. E.g. of shape (height, width, channels).
border
min_area
max_area

property fourier

property locations

property reduced_labels

property resampled_contours: Returns: Tensor[num_contours, num_points, 2]

property sampled_contours: Returns: Tensor[num_contours, num_points, 2]

property sampled_sizes

Notes

The quality of sizes depends on how accurate sampled_contours represents the actual contours.

Returns:: Tensor[num_contours, 2]. Contains height and width for each contour.

property sampling

clip_contour_(contour, size)

contours2boxes(contours)

Contours to boxes.

Parameters:: contours – Array[num_contours, num_points, 2]. (x, y) format.
Returns:: Array[num_contours, 4]. (x0, y0, x1, y1) format.

contours2labels(contours, size, rounded=True, clip=True, initial_depth=1, gap=3, dtype='int32', ioa_thresh=None, sort_by=None, sort_descending=True, return_indices=False)

Contours to labels.

Convert contours to label image.

Notes

~137 ms for contours.shape=(1284, 128, 2), size=(1000, 1000).
Label images come with channels, as contours may assign pixels to multiple objects. Since such multi-assignments cannot be easily encoded in a channel-free label image, channels are used. To remove channels refer to resolve_label_channels.

Parameters:

contours – Contours of a single image. Array[num_contours, num_points, 2] or List[Array[num_points, 2]].
size – Label image size. (height, width).
rounded – Whether to round contour coordinates.
clip – Whether to clip contour coordinates to given size.
initial_depth – Initial number of channels. More channels are used if necessary.
gap – Gap between instances.
dtype – Data type of label image.
ioa_thresh – Intersection over area threshold. Skip contours that have an intersection over own area (i.e. area of contour that already contains a label vs. area of contour) greater ioa_thresh, compared to the union of all contours painted before. Note that the order of contours is relevant, as contours are processed iteratively. IoA of 0 means no labels present so far, IoA of 1. means the entire contour area is already covered by other contours.
sort_by – Optional Array used to sort contours. Note, that if this option is used, labels and contour indices no longer correspond.
sort_descending – Whether to sort by descending.
return_indices – Whether to return indices.

Returns:

Array[height, width, channels]. Since contours may assign pixels to multiple objects, the label image comes with channels. To remove channels refer to resolve_label_channels.

contours2overlay(contours, size, hue_range=(0, 180), saturation_range=(60, 133), value_range=(180, 256), rounded=True, clip=True, intermediate_dtype='float16')

contours2properties(contours, *properties, round=True, **kwargs)

Contours to properties.

References

[1] https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.regionprops

Parameters:

contours – Contours.
*properties – Property names. See [1] for details.
round – Whether to round contours. Default is True.
**kwargs – Keyword arguments for skimage.measure.regionprops.

Returns:

List of property lists.

draw_contours(canvas, contours, val=(51, 255, 51), round=True, contour_idx=-1, thickness=2, **kwargs)

filter_contours_by_intensity(img, contours, min_intensity=None, max_intensity=200, aggregate='mean')

masks2labels(masks, connectivity=8, label_axis=2, count=False, reduce=<function max>, keepdims=True, **kwargs)

Masks to labels.

Notes

~ 11.7 ms for Array[25, 256, 256]. For same array skimage.measure.label takes ~ 17.9 ms.

Parameters:

masks – List[Array[height, width]] or Array[num_masks, height, width]
connectivity – 8 or 4 for 8-way or 4-way connectivity respectively
label_axis – Axis used for stacking label maps. One per mask.
count – Whether to count and return the number of components.
reduce – Callable used to reduce label_axis. If set to None, label_axis will not be reduced. Can be used if instances do not overlap.
**kwargs – Kwargs for cv2.connectedComponents.

Returns:

labels or (labels, count)

render_contour(contour, val=1, dtype='int32', round=False, reference=None)

resolve_label_channels(labels, method='dilation', max_iter=999, kernel=(3, 3))

Resolve label channels.

Remove channels from a label image. Pixels that are assigned to exactly one foreground label remain as is. Pixels that are assigned to multiple foreground labels present a conflict, as they cannot be described by a channel-less label image. Such conflicts are resolved by method.

Parameters:

labels – Label image. Array[h, w, c].
method – Method to resolve overlapping regions.
max_iter – Max iteration.
kernel – Kernel.

Returns:

Labels with channels removed. Array[h, w].

Eval Operations

class LabelMatcher(inputs=None, targets=None, iou_thresh=None, zero_division='warn', epsilon=1e-12)

Evaluation of a label image with a target label image.

Simple interface to evaluate a label image with a target label image with different metrics and IOU thresholds.

The IOU threshold is the minimum IOU that two objects must have to be counted as a match. Each target object can be matched with at most one inferred object and vice versa.

Initialize LabelMatcher object.

Parameters:

inputs – Input labels. Array[height, width, channels].
targets – Target labels. Array[height, width, channels].
iou_thresh – IOU threshold.
zero_division – One of (‘warn’, 0, 1). Sets the default return value for ZeroDivisionErrors. The default ‘warn’ will show a warning and return 0. For example: If there are no true positives and no false positives, precision will return the value of zero_division and optionally show a warning.

property f1

property false_negative_labels

property false_negatives

property false_positive_labels

property false_positives

filter_and_threshold(*a, **k)

property fowlkes_mallows

property iou_thresh

property jaccard

property precision

property recall

property true_positive_labels

property true_positives

update(inputs, targets, iou_thresh=None)

class LabelMatcherList(*args, epsilon=1e-12, rank=None, num_ranks=None, device=None, cache=False, **kwargs)

Label Matcher List.

Simple interface to get averaged results from a list of LabelMatcher objects.

Note

Distributed use assumes, that each example is shown exactly once. Duplicates are not removed. Check your sampler accordingly. Also make sure that each rank calls the same methods in the same order.

Examples

>>> lml = LabelMatcherList([
...     LabelMatcher(pred_labels_0, target_labels0),
...     LabelMatcher(pred_labels_1, target_labels1),
... ])
>>> lml.iou_thresh = 0.5  # set iou_thresh for all LabelMatcher objects
>>> print('Average F1 score for iou threshold 0.5:', lml.avg_f1)
Average F1 score for iou threshold 0.5: 0.92

>>> # Testing different IOU thresholds:
>>> for lml.iou_thresh in (.5, .75):
...     print('thresh:', lml.iou_thresh, '   f1:', lml.avg_f1)
thresh: 0.5          f1: 0.92
thresh: 0.75         f1: 0.91

Parameters:

*args
epsilon
rank – Rank (e.g. `trainer.global_rank). Allows for distributed communication. If not passed, results will only be computed locally. If passed results are synced across all ranks.
num_ranks – Number of ranks (e.g. trainer.world_size). Allows for distributed communication. If not passed, results will only be computed locally. If passed results are synced across all ranks.
cache – Whether to cache aggregated results. Currently only for distributed environments.
**kwargs

append(_LabelMatcherList__object)

property avg_f1: Average F1 score.

property avg_fowlkes_mallows

property avg_jaccard: Average Jaccard index.

property avg_precision: Average precision.

property avg_recall: Average recall.

clear()

clear_cache()

copy()

property distributed

extend(_LabelMatcherList__iterable)

property f1: F1 score from average recall and precision.

property f1_np: F1 score from negatives and positives.

property false_negatives

property false_positives

property fowlkes_mallows_np

insert(_LabelMatcherList__index, _LabelMatcherList__object)

property iou_thresh: Gets local unique IOU thresholds from all items, if there is only one unique threshold, it is returned.

property jaccard_np

property length: int

pop(*args, **kwargs)

property precision

property recall

property true_positives

get_pos_labels(v)

intersection_mask(a, b)

labels2counts(a)

labels_exist(func)

matching_labels(a, b)

vec2matches(v)

Toy Data

random_circle(image, mask, x, y, color, radius_range=(3, 28))

random_ellipse(image, mask, x, y, color, radius_range=(3, 28))

random_geometric_objects(height=256, width=256, radius_range=(3, 28), intensity_range=(0, 180), margin=13)

random_rectangle(image, mask, x, y, color, radius_range=(3, 28))

random_triangle(image, mask, x, y, color, radius_range=(3, 28))