Erosion and Dilation in Tensorflow
Guys, I'm looking for a way to implement or run erosion and dilation on my image mask in TensorFlow 1.15. I've got a simple binary mask and want something like this.
How to do it with TensorFlow 1.*?
Guys, I'm looking for a way to implement or run erosion and dilation on my image mask in TensorFlow 1.15. I've got a simple binary mask and want something like this.
How to do it with TensorFlow 1.*?
It looks like both tensorflow methods (`tf.nn.max_pool2d` and `tf.nn.dilation2d`) will work, but are pretty horrifically inefficient compared to what they could be (as demonstrated by opencv). The following script yields
Dilation of 640x480 image with a 25x25 kernel took:
545.27ms with maxpool
228.72ms with dilate
0.66ms with opencv
On my computer (Macbook air with M1)
Source:
import numpy as np
import cv2
import tensorflow as tf
import time
def tf_dilate(heatmap, width: int = 20, use_max_pool_backend: bool = False):
""" Dilate the heatmap with a square kernel
Note - this is probably inefficient, as I suspect it's computing a max over a 20x20 box of pixels for each pixel
"""
if use_max_pool_backend:
return tf.nn.max_pool2d(heatmap[None, :, :, None], ksize=width, padding='SAME', strides=(1, 1))[0, :, :, 0]
else:
return tf.nn.dilation2d(heatmap[None, :, :, None], filters=tf.zeros((width, width, 1), dtype=heatmap.dtype),
strides=(1, 1, 1, 1), padding="SAME", data_format="NHWC", dilations=(1, 1, 1, 1))[0, :, :, 0]
def test_dilation_options(img_shape=(480, 640), kernel_size=25):
img = np.random.randn(*img_shape).astype(np.float32)**2
tf_image = tf.constant(img, dtype=tf.float32)
t0 = time.time()
result_tf_maxpool = tf_dilate(tf_image, width=kernel_size, use_max_pool_backend=True)
t1 = time.time()
result_tf_dilate = tf_dilate(tf_image, width=kernel_size, use_max_pool_backend=False)
t2 = time.time()
result_opencv = cv2.dilate(img, kernel=np.ones((kernel_size, kernel_size), dtype=np.float32))
t3 = time.time()
assert np.array_equal(result_tf_maxpool.numpy(), result_tf_dilate.numpy()), "Results of two tensorflow dilates not equal"
assert np.array_equal(result_tf_dilate.numpy(), result_opencv), "Results of tensorflow and opencv not equal"
print(f'Dilation of {img_shape[1]}x{img_shape[0]} image with a {kernel_size}x{kernel_size} kernel took: '
f'\n {(t1-t0)*1000:.2f}ms with maxpool'
f'\n {(t2-t1)*1000:.2f}ms with dilate'
f'\n {(t3-t2)*1000:.2f}ms with opencv'
)
if __name__ == '__main__':
test_dilation_options()
It looks like we get a 10x speedup (which still leaves us 40x slower than opencv), if we decompose the square dilation into row-wise and column-wise.
Dilation of 640x480 image with a 25x25 kernel took:
597.59ms with maxpool
23.50ms with dilate
0.50ms with opencv
Modified dilation function:
def tf_dilate(heatmap, width: int = 20, use_max_pool_backend: bool = False):
""" Dilate the heatmap with a square kernel
Note - this is probably inefficient, as I suspect it's computing a max over a 20x20 box of pixels for each pixel
"""
if use_max_pool_backend:
return tf.nn.max_pool2d(heatmap[None, :, :, None], ksize=width, padding='SAME', strides=(1, 1))[0, :, :, 0]
else:
row_dilation = tf.nn.dilation2d(heatmap[None, :, :, None], filters=tf.zeros((1, width, 1), dtype=heatmap.dtype),
strides=(1, 1, 1, 1), padding="SAME", data_format="NHWC", dilations=(1, 1, 1, 1))
full_dilation = tf.nn.dilation2d(row_dilation, filters=tf.zeros((width, 1, 1), dtype=heatmap.dtype),
strides=(1, 1, 1, 1), padding="SAME", data_format="NHWC", dilations=(1, 1, 1, 1))
return full_dilation[0, :, :, 0]
I've make a StackOverflow question to solicit more input on this: https://stackoverflow.com/questions/72733907/efficient-image-dilation-in-tensorflow
There is another option for it. You can do it by using the max-pooling operation of Tensorflow and it does not have to be of version 2. Here is how you can do it.
erosion = -tf.nn.max_pool2d(-x, ksize=(k, k), stride=1, name='erosion2D')
dilation = tf.nn.max_pool2d(x, ksize=(k, k), stride=1, name='dilation2D')
It is a short way to have dilation and erosion in Tensorflow overall.
I haven't dug into the efficiency of this implementation. E.g., say you have a KxK square kernel on a size HxW image - imaging K=25 or something..
Is it O(H*W*K*K)? (ie max over each KxK box per pixel?), or something more efficient? (it seems like it must be possible to take advantage of the fact that neighbouring boxes mostly overlap - I assume the implementation does not do this since max pooling is usually done with non-overlapping boxes).
There are 2 implementations in Tensorflow for erosion and dilation in 2D.
tf.nn.erosion2d(
value,
filters,
strides,
padding,
data_format,
dilations,
name=None
)
tf.nn.dilation2d(
input,
filters,
strides,
padding,
data_format,
dilations,
name=None
)
The official doc of this implementation is written in Tensorflow 2.*, but I believe it will work in Tensorflow 1.* as well. Here are the links of tf.nn.erosion2d and tf.nn.dilation2d.