cd.data

This submodule contains data related code and numpy operations.

Datasets

BBBC039

class BBBC039Test(directory, download=False)

BBBC039 Test.

Test split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Train(directory, download=False)

BBBC039 Train.

Training split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

class BBBC039Val(directory, download=False)

BBBC039 Validation.

Validation split of the BBBC039 dataset.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:
  • directory – Root directory.

  • download – Whether to download the dataset.

download_bbbc039(directory)

Download BBBC039.

Download and extract the BBBC039 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC039

Parameters:

directory – Root directory.

BBBC041

class BBBC041Test(directory, download=False)

BBBC041 Test data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Data directory.

  • download – Whether to download data.

class BBBC041Train(directory, download=False)

BBBC041 Train data.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Data directory.

  • download – Whether to download data.

download_bbbc041(directory, url='https://data.broadinstitute.org/bbbc/BBBC041/malaria.zip')

Download BBBC041.

Download and extract the BBBC041 dataset to given directory.

References

https://bbbc.broadinstitute.org/BBBC041

Parameters:
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Synth

class SynthTest(directory, download=False, cache=True)

Synth Test data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthTrain(directory, download=False, cache=True)

Synth Train data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

class SynthVal(directory, download=False, cache=True)

Synth Val data.

Parameters:
  • directory – Data directory. E.g. directory='data/synth'.

  • download – Whether to download data.

  • cache – Whether to hold data in memory.

download_synth(directory, url='https://celldetection.org/data/synth.zip')

Download Synth.

Download and extract the Synth dataset to given directory.

Parameters:
  • directory – Root directory.

  • url – Download URL (this dataset is distributed in a single zip file).

Data Operations

channels_first2channels_last(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels first to channels last.

Parameters:
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

channels_last2channels_first(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray

Channels last to channels first.

Parameters:
  • inputs – Input array.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

Returns:

Transposed array.

ensure_tensor(x, device=None, dtype=torch.float32)

Ensure tensor.

Mapping ndarrays to Tensor. Possible shape mappings: - (h, w) -> (1, 1, h, w) - (h, w, c) -> (1, c, h, w) - (b, c, h, w) -> (b, c, h, w)

Parameters:
  • x – Inputs.

  • device – Either Device or a Module or Tensor to retrieve the device from.

  • dtype – Data type.

Returns:

Tensor

labels2crops(labels: ndarray, image: ndarray)

Labels to crops.

Crop all objects that are represented in labels from given image and return a list of all image crops, and a list of masks, each marking object pixels for respective crop.

Parameters:
  • labels – Label image. Array[h, w(, c)].

  • image – Image. Array[h, w, …].

Returns:

(crop_list, mask_list)

normalize_percentile(image, percentile=99.9, to_uint8=True)
padding_stack(*images, axis=0) ndarray

Padding stack.

Stack images along axis. If images have different shapes all images are padded to larges shape.

Parameters:
  • *images – Images.

  • axis – Axis used for stacking.

Returns:

Array

random_crop(*arrays, height, width=None)

Random crop.

Parameters:
  • *arrays – Input arrays that are to be cropped. None values accepted. The shape of the first element is used as reference.

  • height – Output height.

  • width – Output width. Default is same as height.

Returns:

Cropped array if arrays contains a single array; a list of cropped arrays otherwise

resample_contours(contours, num=None, close=True, epsilon=1e-06)

Resample contour.

Sample ´´num´´ equidistant points on each contour in contours.

Notes

  • Works for closed and open contours.

Parameters:
  • contours – Contours to sample from. Array[…, num’, 2] or list of Arrays.

  • num – Number of points.

  • close – Set True if contours contains closed contours, with the end point not being equal to the start point. Set False otherwise.

  • epsilon – Epsilon.

Returns:

Array[…, num, 2] or list of Arrays.

rgb_to_scalar(inputs: ndarray, dtype='int32')

RGB to scalar.

Convert RGB data to scalar, while maintaining color uniqueness.

Parameters:
  • inputs – Input array. Shape ([d1, d2, …, dn,] 3)

  • dtype – Data type

Returns:

Output array. Shape ([d1, d2, …, dn])

rle2mask(code, shape, transpose=True, min_index=1, constant=1) ndarray

Run length encoding to mask.

Convert run length encoding to mask image.

Parameters:
  • code – Run length code. As ndarray: array([idx0, len0, idx1, len1, …]) or array([[idx0, len0], [idx1, len1], …]) As list: [idx0, len0, idx1, len1, …] or [[idx0, len0], [idx1, len1], …] As str: ‘idx0 len0 idx1 len1 …’

  • shape – Mask shape.

  • transpose – If True decode row by row, otherwise decode column by column.

  • min_index – Smallest pixel index. Depends on rle encoding.

  • constant – Pixels marked by rle are set to this value.

Returns:

Mask image.

to_tensor(inputs: ndarray, spatial_dims=2, transpose=False, has_batch=False, dtype=None, device=None) Tensor

Array to Tensor.

Converts numpy array to Tensor and optionally transposes from channels last to channels first.

Parameters:
  • inputs – Input array.

  • transpose – Whether to transpose channels from channels last to channels first.

  • spatial_dims – Number of spatial dimensions.

  • has_batch – Whether inputs has a batch dimension.

  • dtype – Data type of output Tensor.

  • device – Device of output Tensor.

Returns:

Tensor.

transpose_spatial(inputs: ndarray, inputs_channels_last=True, spatial_dims=2, has_batch=False)
universal_dict_collate_fn(batch, check_padding=True) OrderedDict
boxes2masks(boxes, size)
fill_label_gaps_(labels)

Fill label gaps.

Ensure that labels greater zero are within interval [1, num_unique_labels_greater_zero]. Works fast if gaps are unlikely, slow otherwise. Alternatively consider using np.vectorize. Labels <= 0 are preserved as is.

Parameters:

labels

Returns:

fill_padding_(inputs, padding: int, constant=-1)
filter_instances_(labels, partials=True, partials_border=1, min_area=4, max_area=None, constant=-1, continuous=True)

Filter instances from label image.

Note

Filtered instance labels are set to constant. Labels might not be continuous afterwards.

Parameters:
  • labels

  • partials

  • partials_border

  • min_area

  • max_area

  • constant

  • continuous

Returns:

relabel_(label_stack, axis=2)

Relabel.

Inplace relabeling of a label stack. After applying this op the labels in label_stack are continuous, starting at 1. Negative labels remain untouched.

Notes

  • Uses label function from sklearn.morphology

Parameters:
  • label_stack – Array[height, width, channels].

  • axis – Channel axis.

remove_padding(inputs, padding: int)
remove_partials_(label_stack, border=1, constant=-1)
stack_labels(*maps, axis=2, dtype='int32', relabel=True)

Stack labels.

Parameters:
  • *maps – List[Union[Array[h, w], Array[h, w, 3]]. Grayscale or rgb label maps. Rgb labels are assumed to encode labels with color and are converted to grayscale labels before stacking.

  • axis – Stacking axis.

  • dtype – Output data type.

  • relabel – Whether to assign new labels or not.

Returns:

Array[h, w, c]. Label image.

unary_masks2labels(unary_masks, transpose=True)

Unary masks to labels.

Parameters:
  • unary_masks – List[Array[height, width]] or Array[num_objects, height, width] List of masks. Each mask is assumed to contain exactly one object.

  • transpose – If True label images are in channels last format, otherwise channels first.

Returns:

Label image. Array[height, width, num_objects] if transpose else Array[num_objects, height, width].

CPN Operations

class CPNTargetGenerator(samples, order, random_sampling=True, remove_partials=False, min_fg_dist=0.75, max_bg_dist=0.5, flag_fragmented=True, flag_fragmented_constant=-1)
property contours
feed(labels, border=1, min_area=1, max_area=None)

Notes

  • May apply inplace changes to labels.

Parameters:
  • labels – Single label image. E.g. of shape (height, width, channels).

  • border

  • min_area

  • max_area

property fourier
property locations
property reduced_labels
property sampled_contours

Returns: Tensor[num_contours, num_points, 2]

property sampled_sizes

Notes

The quality of sizes depends on how accurate sampled_contours represents the actual contours.

Returns:

Tensor[num_contours, 2]. Contains height and width for each contour.

property sampling
clip_contour_(contour, size)
contours2boxes(contours)

Contours to boxes.

Parameters:

contours – Array[num_contours, num_points, 2]. (x, y) format.

Returns:

Array[num_contours, 4]. (x0, y0, x1, y1) format.

contours2fourier(contours: dict, order=5, dtype=<class 'numpy.float32'>)
contours2labels(contours, size, rounded=True, clip=True, initial_depth=1, gap=3, dtype='int32')

Contours to labels.

Converts contours to label image.

Notes

~137 ms for contours.shape=(1284, 128, 2), size=(1000, 1000).

Parameters:
  • contours – Contours. Array[num_contours, num_points, 2] or List[Array[num_points, 2]].

  • size – Label image size. (height, width).

  • rounded – Whether to round contour coordinates.

  • clip – Whether to clip contour coordinates to given size.

  • initial_depth – Initial number of channels. More channels are used if necessary.

  • gap – Gap between instances.

  • dtype – Data type of label image.

Returns:

Array[height, width, channels]. Channels are used to model overlap.

efd(contour, order=10, epsilon=1e-06)

Elliptic fourier descriptor.

Computes elliptic fourier descriptors from contour data.

Parameters:
  • contour – Tensor of shape (…, num_points, 2). Should be set of num_points 2D points that describe the contour of an object. Based on each contour a descriptor of shape (order, 4) is computed. The result has thus a shape of (…, order, 4). As num_points may differ from one contour to another a list of (num_points, 2) arrays may be passed as a numpy array with object as its data type, i.e. np.array(list_of_contours).

  • order – Order of resulting descriptor. The higher the order, the more detail can be preserved. An order of 1 produces ellipses.

  • epsilon – Epsilon value. Used to avoid division by zero.

Notes

Locations may contain NaN if contour only contains a single point.

Returns:

Tensor of shape (…, order, 4).

filter_contours_by_intensity(img, contours, min_intensity=None, max_intensity=200, aggregate='mean')
fourier2contour(fourier, locations, samples=64, sampling=None)
Parameters:
  • fourier – Array[…, order, 4]

  • locations – Array[…, 2]

  • samples – Number of samples.

  • sampling – Array[samples] or Array[(fourier.shape[:-2] + (samples,)]. Default is linspace from 0 to 1 with samples values.

Returns:

Contours.

labels2contour_list(labels, **kwargs) list
labels2contours(labels, mode=0, method=1, flag_fragmented_inplace=False, raise_fragmented=True, constant=-1) dict

Labels to contours.

Notes

  • If flag_fragmented_inplace is True, labels may be modified inplace.

Parameters:
  • labels

  • mode

  • method – Contour method. CHAIN_APPROX_NONE must be used if contours are used for CPN.

  • flag_fragmented_inplace – Whether to flag fragmented labels. Flagging sets labels that consist of more than one connected component to constant.

  • constant – Flagging constant.

  • raise_fragmented – Whether to raise ValueError when encountering fragmented labels.

Returns:

dict

labels2distances(labels, distance_type=2, overlap_zero=True)

Label stacks to distances.

Measures distances from pixel to closest border, relative to largest distance. Values as percentage. Overlap is zero.

Notes

54.9 ms ± 3.41 ms (shape (576, 576, 3); 762 instances in three channels)

Parameters:
  • labels – Label stack. (height, width, channels)

  • distance_type – opencv distance type.

  • overlap_zero – Whether to set overlapping regions to zero.

Returns:

Distance map of shape (height, width). All overlapping pixels are 0. Instance centers are 1. Also labels are returned. They are altered if overlap_zero is True.

mask_labels_by_distance_(labels, distances, max_bg_dist, min_fg_dist)
masks2labels(masks, connectivity=8, label_axis=2, count=False, reduce=<function amax>, keepdims=True, **kwargs)

Masks to labels.

Notes

~ 11.7 ms for Array[25, 256, 256]. For same array skimage.measure.label takes ~ 17.9 ms.

Parameters:
  • masks – List[Array[height, width]] or Array[num_masks, height, width]

  • connectivity – 8 or 4 for 8-way or 4-way connectivity respectively

  • label_axis – Axis used for stacking label maps. One per mask.

  • count – Whether to count and return the number of components.

  • reduce – Callable used to reduce label_axis. If set to None, label_axis will not be reduced. Can be used if instances do not overlap.

  • **kwargs – Kwargs for cv2.connectedComponents.

Returns:

labels or (labels, count)

render_contour(contour, val=1, dtype='int32')

Eval Operations

class LabelMatcher(inputs=None, targets=None, iou_thresh=None, zero_division='warn')

Evaluation of a label image with a target label image.

Simple interface to evaluate a label image with a target label image with different metrics and IOU thresholds.

The IOU threshold is the minimum IOU that two objects must have to be counted as a match. Each target object can be matched with at most one inferred object and vice versa.

Initialize LabelMatcher object.

Parameters:
  • inputs – Input labels. Array[height, width, channels].

  • targets – Target labels. Array[height, width, channels].

  • iou_thresh – IOU threshold.

  • zero_division – One of (‘warn’, 0, 1). Sets the default return value for ZeroDivisionErrors. The default ‘warn’ will show a warning and return 0. For example: If there are no true positives and no false positives, precision will return the value of zero_division and optionally show a warning.

property ap
property f1
property false_negative_labels
property false_negatives
property false_positive_labels
property false_positives
filter_and_threshold(*a, **k)
property iou_thresh
property precision
property recall
property true_positive_labels
property true_positives
update(inputs, targets, iou_thresh=None)
class LabelMatcherList(iterable=(), /)

Label Matcher List.

Simple interface to get averaged results from a list of LabelMatcher objects.

Examples

>>> lml = LabelMatcherList([
...     LabelMatcher(pred_labels_0, target_labels0),
...     LabelMatcher(pred_labels_1, target_labels1),
... ])
>>> lml.iou_thresh = 0.5  # set iou_thresh for all LabelMatcher objects
>>> print('Average F1 score for iou threshold 0.5:', lml.avg_f1)
Average F1 score for iou threshold 0.5: 0.92
>>> # Testing different IOU thresholds:
>>> for lml.iou_thresh in (.5, .75):
...     print('thresh:', lml.iou_thresh, '       f1:', lml.avg_f1)
thresh: 0.5      f1: 0.92
thresh: 0.75     f1: 0.91
property avg_ap

Average AP.

property avg_f1

Average F1 score.

property avg_precision

Average precision.

property avg_recall

Average recall.

property f1

F1 score from average recall and precision.

property iou_thresh

Gets unique IOU thresholds from all items, if there is only one unique threshold, it is returned.

get_pos_labels(v)
intersection_mask(a, b)
labels2counts(a)
labels_exist(func)
matching_labels(a, b)
vec2matches(v)

Toy Data

random_circle(image, mask, x, y, color, radius_range=(3, 28))
random_ellipse(image, mask, x, y, color, radius_range=(3, 28))
random_geometric_objects(height=256, width=256, radius_range=(3, 28), intensity_range=(0, 180), margin=13)
random_rectangle(image, mask, x, y, color, radius_range=(3, 28))
random_triangle(image, mask, x, y, color, radius_range=(3, 28))