Several years ago, computer vision datasets that were the beginning of several AI models offered precise annotations. They have been good enough to fulfill the requirements of recognizing machine systems. However, to enable sensitive human-machine interaction and innovative virtual life, AI has reached a period when it demands accurate results from computer vision algorithms. One of the most basic computer vision practices, image segmentation, is vital for aiding robots in recognizing and comprehending the outside world.
Several applications, such as 3D reconstruction, AR, medical image processing, image editing, satellite image analysis, and robot manipulation - offer more precise descriptions of the targets than image categorization and object identification. Based on how the above application directly influences physical objects, we can classify them as 'light' and 'heavy.' The light applications may endure segmentation failures and deflects to a greater extent as these issues primarily increase labor and time expenses, often within reason.
In contrast, failures in heavy applications are more likely to result in dreadful repercussions, like physical hard to objects or injuries that can be catastrophic to people and animals. As a result, these application models must be reliable and accurate. Unfortunately, due to precision and robustness, most segmentation models are still less suitable for heavy applications.
Comparing semantic segmentation, the recommended DIS task often focuses on images with one or more targets. It's seamless to obtain fuller, more accurate data on each target. Hence, it's very motivating to create a category-agnostic DIS task for accurately segmenting objects with several structural complexities, irrespective of their characteristics.