cd.data
This submodule contains data related code and numpy operations.
Datasets
BBBC039
- class BBBC039Test(directory, download=False)
BBBC039 Test.
Test split of the BBBC039 dataset.
References
https://bbbc.broadinstitute.org/BBBC039
- Parameters:
directory – Root directory.
download – Whether to download the dataset.
- class BBBC039Train(directory, download=False)
BBBC039 Train.
Training split of the BBBC039 dataset.
References
https://bbbc.broadinstitute.org/BBBC039
- Parameters:
directory – Root directory.
download – Whether to download the dataset.
- class BBBC039Val(directory, download=False)
BBBC039 Validation.
Validation split of the BBBC039 dataset.
References
https://bbbc.broadinstitute.org/BBBC039
- Parameters:
directory – Root directory.
download – Whether to download the dataset.
- download_bbbc039(directory)
Download BBBC039.
Download and extract the BBBC039 dataset to given directory.
References
https://bbbc.broadinstitute.org/BBBC039
- Parameters:
directory – Root directory.
BBBC041
- class BBBC041Test(directory, download=False)
BBBC041 Test data.
References
https://bbbc.broadinstitute.org/BBBC041
- Parameters:
directory – Data directory.
download – Whether to download data.
- class BBBC041Train(directory, download=False)
BBBC041 Train data.
References
https://bbbc.broadinstitute.org/BBBC041
- Parameters:
directory – Data directory.
download – Whether to download data.
- download_bbbc041(directory, url='https://data.broadinstitute.org/bbbc/BBBC041/malaria.zip')
Download BBBC041.
Download and extract the BBBC041 dataset to given directory.
References
https://bbbc.broadinstitute.org/BBBC041
- Parameters:
directory – Root directory.
url – Download URL (this dataset is distributed in a single zip file).
Synth
- class SynthTest(directory, download=False, cache=True)
Synth Test data.
- Parameters:
directory – Data directory. E.g.
directory='data/synth'
.download – Whether to download data.
cache – Whether to hold data in memory.
- class SynthTrain(directory, download=False, cache=True)
Synth Train data.
- Parameters:
directory – Data directory. E.g.
directory='data/synth'
.download – Whether to download data.
cache – Whether to hold data in memory.
- class SynthVal(directory, download=False, cache=True)
Synth Val data.
- Parameters:
directory – Data directory. E.g.
directory='data/synth'
.download – Whether to download data.
cache – Whether to hold data in memory.
- download_synth(directory, url='https://celldetection.org/data/synth.zip')
Download Synth.
Download and extract the Synth dataset to given directory.
- Parameters:
directory – Root directory.
url – Download URL (this dataset is distributed in a single zip file).
Data Operations
- channels_first2channels_last(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray
Channels first to channels last.
- Parameters:
inputs – Input array.
spatial_dims – Number of spatial dimensions.
has_batch – Whether inputs has a batch dimension.
- Returns:
Transposed array.
- channels_last2channels_first(inputs: ndarray, spatial_dims=2, has_batch=False) ndarray
Channels last to channels first.
- Parameters:
inputs – Input array.
spatial_dims – Number of spatial dimensions.
has_batch – Whether inputs has a batch dimension.
- Returns:
Transposed array.
- ensure_tensor(x, device=None, dtype=torch.float32)
Ensure tensor.
Mapping ndarrays to Tensor. Possible shape mappings: - (h, w) -> (1, 1, h, w) - (h, w, c) -> (1, c, h, w) - (b, c, h, w) -> (b, c, h, w)
- Parameters:
x – Inputs.
device – Either Device or a Module or Tensor to retrieve the device from.
dtype – Data type.
- Returns:
Tensor
- labels2crops(labels: ndarray, image: ndarray)
Labels to crops.
Crop all objects that are represented in
labels
from givenimage
and return a list of all image crops, and a list of masks, each marking object pixels for respective crop.- Parameters:
labels – Label image. Array[h, w(, c)].
image – Image. Array[h, w, …].
- Returns:
(crop_list, mask_list)
- normalize_percentile(image, percentile=99.9, to_uint8=True)
- padding_stack(*images, axis=0) ndarray
Padding stack.
Stack images along axis. If images have different shapes all images are padded to larges shape.
- Parameters:
*images – Images.
axis – Axis used for stacking.
- Returns:
Array
- random_crop(*arrays, height, width=None)
Random crop.
- Parameters:
*arrays – Input arrays that are to be cropped. None values accepted. The shape of the first element is used as reference.
height – Output height.
width – Output width. Default is same as height.
- Returns:
Cropped array if arrays contains a single array; a list of cropped arrays otherwise
- resample_contours(contours, num=None, close=True, epsilon=1e-06)
Resample contour.
Sample ´´num´´ equidistant points on each contour in
contours
.Notes
Works for closed and open contours.
- Parameters:
contours – Contours to sample from. Array[…, num’, 2] or list of Arrays.
num – Number of points.
close – Set True if
contours
contains closed contours, with the end point not being equal to the start point. Set False otherwise.epsilon – Epsilon.
- Returns:
Array[…, num, 2] or list of Arrays.
- rgb_to_scalar(inputs: ndarray, dtype='int32')
RGB to scalar.
Convert RGB data to scalar, while maintaining color uniqueness.
- Parameters:
inputs – Input array. Shape ([d1, d2, …, dn,] 3)
dtype – Data type
- Returns:
Output array. Shape ([d1, d2, …, dn])
- rle2mask(code, shape, transpose=True, min_index=1, constant=1) ndarray
Run length encoding to mask.
Convert run length encoding to mask image.
- Parameters:
code – Run length code. As ndarray: array([idx0, len0, idx1, len1, …]) or array([[idx0, len0], [idx1, len1], …]) As list: [idx0, len0, idx1, len1, …] or [[idx0, len0], [idx1, len1], …] As str: ‘idx0 len0 idx1 len1 …’
shape – Mask shape.
transpose – If True decode row by row, otherwise decode column by column.
min_index – Smallest pixel index. Depends on rle encoding.
constant – Pixels marked by rle are set to this value.
- Returns:
Mask image.
- to_tensor(inputs: ndarray, spatial_dims=2, transpose=False, has_batch=False, dtype=None, device=None) Tensor
Array to Tensor.
Converts numpy array to Tensor and optionally transposes from channels last to channels first.
- Parameters:
inputs – Input array.
transpose – Whether to transpose channels from channels last to channels first.
spatial_dims – Number of spatial dimensions.
has_batch – Whether inputs has a batch dimension.
dtype – Data type of output Tensor.
device – Device of output Tensor.
- Returns:
Tensor.
- transpose_spatial(inputs: ndarray, inputs_channels_last=True, spatial_dims=2, has_batch=False)
- universal_dict_collate_fn(batch, check_padding=True) OrderedDict
- boxes2masks(boxes, size)
- fill_label_gaps_(labels)
Fill label gaps.
Ensure that labels greater zero are within interval [1, num_unique_labels_greater_zero]. Works fast if gaps are unlikely, slow otherwise. Alternatively consider using np.vectorize. Labels <= 0 are preserved as is.
- Parameters:
labels –
Returns:
- fill_padding_(inputs, padding: int, constant=-1)
- filter_instances_(labels, partials=True, partials_border=1, min_area=4, max_area=None, constant=-1, continuous=True)
Filter instances from label image.
Note
Filtered instance labels are set to constant. Labels might not be continuous afterwards.
- Parameters:
labels –
partials –
partials_border –
min_area –
max_area –
constant –
continuous –
Returns:
- relabel_(label_stack, axis=2)
Relabel.
Inplace relabeling of a label stack. After applying this op the labels in label_stack are continuous, starting at 1. Negative labels remain untouched.
Notes
Uses label function from sklearn.morphology
- Parameters:
label_stack – Array[height, width, channels].
axis – Channel axis.
- remove_padding(inputs, padding: int)
- remove_partials_(label_stack, border=1, constant=-1)
- stack_labels(*maps, axis=2, dtype='int32', relabel=True)
Stack labels.
- Parameters:
*maps – List[Union[Array[h, w], Array[h, w, 3]]. Grayscale or rgb label maps. Rgb labels are assumed to encode labels with color and are converted to grayscale labels before stacking.
axis – Stacking axis.
dtype – Output data type.
relabel – Whether to assign new labels or not.
- Returns:
Array[h, w, c]. Label image.
- unary_masks2labels(unary_masks, transpose=True)
Unary masks to labels.
- Parameters:
unary_masks – List[Array[height, width]] or Array[num_objects, height, width] List of masks. Each mask is assumed to contain exactly one object.
transpose – If True label images are in channels last format, otherwise channels first.
- Returns:
Label image. Array[height, width, num_objects] if transpose else Array[num_objects, height, width].
CPN Operations
- class CPNTargetGenerator(samples, order, random_sampling=True, remove_partials=False, min_fg_dist=0.75, max_bg_dist=0.5, flag_fragmented=True, flag_fragmented_constant=-1)
- property contours
- feed(labels, border=1, min_area=1, max_area=None)
Notes
May apply inplace changes to
labels
.
- Parameters:
labels – Single label image. E.g. of shape (height, width, channels).
border –
min_area –
max_area –
- property fourier
- property locations
- property reduced_labels
- property sampled_contours
Returns: Tensor[num_contours, num_points, 2]
- property sampled_sizes
Notes
The quality of sizes depends on how accurate sampled_contours represents the actual contours.
- Returns:
Tensor[num_contours, 2]. Contains height and width for each contour.
- property sampling
- clip_contour_(contour, size)
- contours2boxes(contours)
Contours to boxes.
- Parameters:
contours – Array[num_contours, num_points, 2]. (x, y) format.
- Returns:
Array[num_contours, 4]. (x0, y0, x1, y1) format.
- contours2fourier(contours: dict, order=5, dtype=<class 'numpy.float32'>)
- contours2labels(contours, size, rounded=True, clip=True, initial_depth=1, gap=3, dtype='int32')
Contours to labels.
Converts contours to label image.
Notes
~137 ms for contours.shape=(1284, 128, 2), size=(1000, 1000).
- Parameters:
contours – Contours. Array[num_contours, num_points, 2] or List[Array[num_points, 2]].
size – Label image size. (height, width).
rounded – Whether to round contour coordinates.
clip – Whether to clip contour coordinates to given size.
initial_depth – Initial number of channels. More channels are used if necessary.
gap – Gap between instances.
dtype – Data type of label image.
- Returns:
Array[height, width, channels]. Channels are used to model overlap.
- efd(contour, order=10, epsilon=1e-06)
Elliptic fourier descriptor.
Computes elliptic fourier descriptors from contour data.
- Parameters:
contour – Tensor of shape (…, num_points, 2). Should be set of num_points 2D points that describe the contour of an object. Based on each contour a descriptor of shape (order, 4) is computed. The result has thus a shape of (…, order, 4). As num_points may differ from one contour to another a list of (num_points, 2) arrays may be passed as a numpy array with object as its data type, i.e. np.array(list_of_contours).
order – Order of resulting descriptor. The higher the order, the more detail can be preserved. An order of 1 produces ellipses.
epsilon – Epsilon value. Used to avoid division by zero.
Notes
Locations may contain NaN if contour only contains a single point.
- Returns:
Tensor of shape (…, order, 4).
- filter_contours_by_intensity(img, contours, min_intensity=None, max_intensity=200, aggregate='mean')
- fourier2contour(fourier, locations, samples=64, sampling=None)
- Parameters:
fourier – Array[…, order, 4]
locations – Array[…, 2]
samples – Number of samples.
sampling – Array[samples] or Array[(fourier.shape[:-2] + (samples,)]. Default is linspace from 0 to 1 with samples values.
- Returns:
Contours.
- labels2contour_list(labels, **kwargs) list
- labels2contours(labels, mode=0, method=1, flag_fragmented_inplace=False, raise_fragmented=True, constant=-1) dict
Labels to contours.
Notes
If
flag_fragmented_inplace is True
,labels
may be modified inplace.
- Parameters:
labels –
mode –
method – Contour method. CHAIN_APPROX_NONE must be used if contours are used for CPN.
flag_fragmented_inplace – Whether to flag fragmented labels. Flagging sets labels that consist of more than one connected component to
constant
.constant – Flagging constant.
raise_fragmented – Whether to raise ValueError when encountering fragmented labels.
- Returns:
dict
- labels2distances(labels, distance_type=2, overlap_zero=True)
Label stacks to distances.
Measures distances from pixel to closest border, relative to largest distance. Values as percentage. Overlap is zero.
Notes
54.9 ms ± 3.41 ms (shape (576, 576, 3); 762 instances in three channels)
- Parameters:
labels – Label stack. (height, width, channels)
distance_type – opencv distance type.
overlap_zero – Whether to set overlapping regions to zero.
- Returns:
Distance map of shape (height, width). All overlapping pixels are 0. Instance centers are 1. Also labels are returned. They are altered if overlap_zero is True.
- mask_labels_by_distance_(labels, distances, max_bg_dist, min_fg_dist)
- masks2labels(masks, connectivity=8, label_axis=2, count=False, reduce=<function amax>, keepdims=True, **kwargs)
Masks to labels.
Notes
~ 11.7 ms for Array[25, 256, 256]. For same array skimage.measure.label takes ~ 17.9 ms.
- Parameters:
masks – List[Array[height, width]] or Array[num_masks, height, width]
connectivity – 8 or 4 for 8-way or 4-way connectivity respectively
label_axis – Axis used for stacking label maps. One per mask.
count – Whether to count and return the number of components.
reduce – Callable used to reduce label_axis. If set to None, label_axis will not be reduced. Can be used if instances do not overlap.
**kwargs – Kwargs for cv2.connectedComponents.
- Returns:
labels or (labels, count)
- render_contour(contour, val=1, dtype='int32')
Eval Operations
- class LabelMatcher(inputs=None, targets=None, iou_thresh=None, zero_division='warn')
Evaluation of a label image with a target label image.
Simple interface to evaluate a label image with a target label image with different metrics and IOU thresholds.
The IOU threshold is the minimum IOU that two objects must have to be counted as a match. Each target object can be matched with at most one inferred object and vice versa.
Initialize LabelMatcher object.
- Parameters:
inputs – Input labels. Array[height, width, channels].
targets – Target labels. Array[height, width, channels].
iou_thresh – IOU threshold.
zero_division – One of (‘warn’, 0, 1). Sets the default return value for ZeroDivisionErrors. The default ‘warn’ will show a warning and return 0. For example: If there are no true positives and no false positives, precision will return the value of zero_division and optionally show a warning.
- property ap
- property f1
- property false_negative_labels
- property false_negatives
- property false_positive_labels
- property false_positives
- filter_and_threshold(*a, **k)
- property iou_thresh
- property precision
- property recall
- property true_positive_labels
- property true_positives
- update(inputs, targets, iou_thresh=None)
- class LabelMatcherList(iterable=(), /)
Label Matcher List.
Simple interface to get averaged results from a list of LabelMatcher objects.
Examples
>>> lml = LabelMatcherList([ ... LabelMatcher(pred_labels_0, target_labels0), ... LabelMatcher(pred_labels_1, target_labels1), ... ]) >>> lml.iou_thresh = 0.5 # set iou_thresh for all LabelMatcher objects >>> print('Average F1 score for iou threshold 0.5:', lml.avg_f1) Average F1 score for iou threshold 0.5: 0.92
>>> # Testing different IOU thresholds: >>> for lml.iou_thresh in (.5, .75): ... print('thresh:', lml.iou_thresh, ' f1:', lml.avg_f1) thresh: 0.5 f1: 0.92 thresh: 0.75 f1: 0.91
- property avg_ap
Average AP.
- property avg_f1
Average F1 score.
- property avg_precision
Average precision.
- property avg_recall
Average recall.
- property f1
F1 score from average recall and precision.
- property iou_thresh
Gets unique IOU thresholds from all items, if there is only one unique threshold, it is returned.
- get_pos_labels(v)
- intersection_mask(a, b)
- labels2counts(a)
- labels_exist(func)
- matching_labels(a, b)
- vec2matches(v)
Toy Data
- random_circle(image, mask, x, y, color, radius_range=(3, 28))
- random_ellipse(image, mask, x, y, color, radius_range=(3, 28))
- random_geometric_objects(height=256, width=256, radius_range=(3, 28), intensity_range=(0, 180), margin=13)
- random_rectangle(image, mask, x, y, color, radius_range=(3, 28))
- random_triangle(image, mask, x, y, color, radius_range=(3, 28))