Stereo Vision utilizes image data from two regular cameras to derive 3-D measurements. Using these measurements, a 3-D model of the surface of the observed scene can be reconstructed. Simply speaking, point correspondences are identified in both images and with knowledge about the relative position of both cameras and the projective geometry, the 3-D position of the scene point can be calculated.


  • Very accurate measurements possible.
  • Reliable measurements.
  • Using regular, established 2-D cameras.
  • Provides matching color image.
  • Benefits from ambient light.


  • Does not work on homogeneous surfaces (texture may be projected).
  • Computationally demanding (but real-time is possible).
  • Many parameters.
  • Bad performance in low-light conditions (may be projected).
Aufnahme mit zwei Kameras

The illustration shows the typical use case for epipolar geometry. Two cameras take a picture of the same scenery from different points of view. The epipolar geometrie then describes the relation between the two resulting views.

Technical Background

Intrinsic and Extrinsic Camera Calibration

Initially, the two cameras are calibrated. First, their intrinsic parameters like focal length etc. are determined using a software like MetriCalibrate Intrinsic. With these parameters one can calculate the view ray direction for each pixel. Second, the relative position of both cameras to each other is established using e.g. MetriCalibrate Stereo.

Epipolar geometry

Epipolar geometry. A point x_L in the left image defines a line (e_R) in the right image.

Epipolar Geometry

An arbitrary 3-D point in the scene is projected onto the image planes of the two cameras. The projected points are denoted x and x’. The two camera centers and the scene point define a plane in 3-D. The projected image points x and x’ also lie in that plane.

For reconstruction of the 3-D point x and x’ must be identified in the two images. Then, its 3-D position can be computed using the geometric constraints established in the calibration and a technique called triangulation. The inputs are:

  • View ray of the first camera (known from point correspondence)
  • View ray of the second camera (known from point correspondence)
  • Relative position of the two cameras (known from calibration)

Undistortion and Rectification

Using ideal projections, the point x in the image of the first camera defines a line in 3D (the view ray) which is projected onto a line in the second camera’s image. The corresponding point x’ must lie on that line. However, real world cameras have a much more complicated projective transformation. Hence, virtual cameras are defined which have a coplanar image planes and an ideal projective transformation. The input images are mapped to these virtual cameras in a first step.


When the images are properly rectified, the search space for point correspondences is reduced to a single image line. Then, for each pixel in the left image the best correspondence on its epipolar line in the right image is searched. To improve robustness, usually a block-matching algorithm is used. Semi-global block matching (SGBM) also includes the local neighborhood of a pixel to enforce smoothness properties and reduce noise in the disparity image. From the disparity the 3D position of the scene point can directly be computed using triangulation.