Robust Multi-Scale Methods for Optic Flow (ENN.6760)
Project nummer:
enn6760
Omschrijving van het onderzoek
Research:
Medical applications, video applications in the consumer market, robot vision, traffic monitoring, industrial inspec-tion and various other applications of image processing technology require objects in video sequences to be seg-mented. In such applications typically a (typically small) number of objects have to be tracked in a continuously changing background environment.
There is an increasing need to extract these motion fields automatically. The amount of data becomes overwhelming, especially in the medical field, but the same is true for all fields mentioned above. Some medical examples:
- Extraction of the deformation field ('local warping') to register two images from different modalities or different moments, to integrate the complementary information, or to study changes (e.g. tumor growth) over time
- Quantitative analysis of ventricular heart wall motion, to study the mechanical properties of the infarcted heart, and to calculate the possible location of the infarcted region
- Motion vector fields also are exploited to arrive at a meaningful segmentation: a segment is then characterized by the fact that all its pixels have sufficiently uniform velocities
Motion fields as obtained by present motion estimation algorithms, however, lack stability, are often sparse and cannot be used very well as input for time-dependent segmentation algorithms. Noise components in motion fields give rise to bad over-segmentation; regularization (smoothing) of the motion field deteriorates the accuracy of the segments.
The problem with the current approaches is that they are intrinsically local, based on local differential geometric properties and constraints. There is a need for development of global methods, based on contextual operators and mechanisms of perceptual grouping. Objects in images move as coherent 'blobs', which asks for analysis algorithms that leave the notion of 'pixels' and study the image on a hierarchical level of nested, meaningful areas (segments). The field of multi-scale computer vision ('scale-space' methods) is a highly promising field which we will exploit in this project. It has a solid mathematical foundation, and has proven to form an adequate and robust tool to deal with extracting geometrical information from noisy input data. It is known that the human visual system samples the world as a multi-scale stack of images (fig. 1) on the retina. It has a dedicated multi-scale motion channel in the vis-ual front-end system, i.e. the parasol ganglion cells in the retina, projecting on the magnocellular layers in the lateral geniculate nucleus in the thalamus.
Multi-scale methods currently receive much attention in the computer vision community. The image, when consid-ered at all scales simultaneously, is a high dimensional object (termed a 'scale-space', fig. 1) and contains interesting information about the hierarchical relations in the image. Upon blurring, information is lost in a highly specific man-ner. E.g. extrema in the image intensity annihilate with intensity saddle points in so-called 'toppoints'. Recently we have been able to extract the first tree graph from an image, clearly showing the promise for hierarchical methods (fig. 4). Segments are formed by the branches of the tree, and move as a branch of a grouped assembly. The structure over scale (also termed the 'deep structure') is only recently discovered as a rich framework for the development of hierarchical and robust computer vision methods.
We propose in this proposal to investigate how scale space methods can be exploited to interrogate time dependent image streams. In particular, it is the intention to obtain stable, dense and accurate motion vector fields and to arrive at time-dependent segmentations based on such motion fields.
The utilization areas are chosen by the industrial partners, and are described in the next summary section.
The partners form an experienced and balanced team of experts in mathematics, multi-scale (biomedical) computer vision applications, video sequence analysis and X-ray fluoroscopic cardiovascular applications. Partners have col-laborated in a number of previous projects.
Utilisation
We distinguish short term and long-term utilisation.
In the long term, this research contributes to the representation of 2-D, 3-D, and 4-D signals (images, time se-quences of images, volume sets and time sequences of volume sets) at higher semantic levels. This means that mean-ingful, time-dependent features in the signals can be represented, transmitted, manipulated and stored explicitly. For example, identification of motion patterns on behalf of diagnostic purposes in medical applications; encoding of moving objects on behalf of efficient bandwidth utilisation in video, or motion detection in surveillance applications. This has large impact on a broad range of applications, ranging from entertainment and consumer applications (inter-active video, videophone, computer games, contents-based data retrieval for the Web, etc.) to professional applica-tions (pattern recognition and computer vision, simulation and augmented reality, etc.).
In the short term, we focus on two important utilization areas, each with a substantial market:
- Medical diagnosis, treatment planning, and intervention in moving tissue (e.g., diagnosis of heart disor-ders). The Philips group X-ray Development will utilize the developed techniques in the tracking of catheters, where the predicted motion path may lead to substantial dose reduction and improvement of the detec-tion of the catheter's precise location.
- Video encoding of moving segments ('video objects' as defined in MPEG4 context). Philips Research has a longstanding experience in dynamic video analysis. The developed techniques may lead to a highly efficient predictive coding of contextual motion fields, and may lead to efficient and robust, quantitative and dense, motion detection.
The choice of these two areas allows gaining experience in both professional and consumer markets.
Furthermore, the wide variety of characteristics in these areas (3-D+time vs. 2-D+time; density values vs. colour; very different SNR regimes; highly non-linear motions vs. piecewise (nearly) affine motions; semi-transparent media vs. non-transparent, occluding objects) demonstrates the generic potential of the proposed methods.
Publicaties
| Prof.dr.ir. B.M. ter Haar Romeny |
Technische Universiteit Eindhoven Biomedische Technologie Biomedische Beeldverwerking |
Postbus 513 5600 MB Eindhoven |
Gebruikers
| Gestart |
: 01-06-2005 |
| Einddatum |
: 01-06-2008 |