Evaluation of Lens Distortion Errors Using
An Underwater Camera System
For Video-Based Motion Analysis
Jeffrey Poliner
Lockheed Engineering & Sciences Company
Houston, Texas
Lauren Fletcher & Glenn K. Klute
Lyndon B. Johnson Space Center
Houston, Texas
INTRODUCTION
Video-based motion analysis systems are widely employed to study human
movement, using computers to capture, process, and analyze video data. This
video data can be collected in any environment where cameras can be located.
The Anthropometry and Biomechanics Laboratory (ABL) at the Johnson Space
Center is responsible for the collection and quantitative evaluation of human
performance data for the National Aeronautics and Space Administration (NASA).
One of the NASA facilities where human performance research is conducted is the
Weightless Environment Training Facility (WETF). In this underwater facility,
suited or unsuited crew members or subjects can be made neutrally buoyant by
adding weights or buoyant foam at various locations on their bodies. Because it
is underwater, the WETF poses unique problems for collecting video data.
Primarily, cameras must be either waterproof or encased in a waterproof housing.
The video system currently used by the ABL is manufactured by Underwater
Video Vault. This system consists of closed circuit video cameras (Panasonic
WV-BL202) enclosed in a cylindrical case with a plexiglass dome covering the
lens. The dome used to counter the magnifying effect of the water is
hypothesized to introduce distortion errors.
As with any data acquisition system, it is important for users to determine
the accuracy and reliability of the system. Motion analysis systems have many
possible sources of error inherent in the hardware, such as the resolution of
recording, viewing and digitizing equipment, and l a video-based motion analysis
system. It is, therefore, of interest to determine the degree of this error in
various regions of the lens. A previous study (Poliner,
et al., 1993) developed a methodology for evaluating errors introduced by
lens distortion. In that study, it was seen that errors near the center of the
video image were relatively small and the error magnitude increased with the
radial distance from the center. Both wide angle and standard lenses introduced
some degree of barrel distortion Fig
1.
Since the ABL conducts underwater experiments that involve evaluating crew
members' movements to understand and quantify the way they will perform in
space, it is of interest to apply this methodology to the cameras used to record
underwater activities. In addition to distortions from the lens itself, there
will be additional distortions caused by the refractive properties of the
interfaces between the water and camera lens.
This project evaluates the error caused by the lens distortion of the cameras
used by the ABL in the WETF.
METHODS
Data Collection
A grid was constructed from a sheet of 0.32 cm (0.125 in) Plexiglas. Thin
black lines spaced 3.8 cm (1.5 in) apart were drawn vertically and horizontally
on one side of the sheet. Both sides of the sheet were then painted with a WETF
approved white placite to give color contrast to the lines. The total grid size
was 99.1 x 68.6 cm (39.0 x 27.0 in). The intersections of the 19 horizontal and
27 vertical lines defined a total of 513 points (fig. 2). The center point of
the grid was marked for easy reference. Using Velcro, the grid was attached to a
wooden frame, which was then attached to a stand and placed on the floor of the)
WFTF pool. Fig 2.
At the heart of the Video Vault system was a Panasonic model WV-BL202 closed
circuit video camera. The camera had been focused above water, according to the
procedures described in and, and placed on the WETF floor facing the grid.
Divers used voice cues from the test director for fine alignment of the camera
with the center of the grid. By viewing the video on pool side monitors, the
camera was positioned so that a predetermined region of the grid nearly filled
the field of view. The distance from the camera to the grid was adjusted several
times, ranging from 65.3 to 72.6 cm (25.7 to 28.6 in). Data collection consisted
of videotaping the grid for at least 30 seconds in each of the positions, with
each position considered a separate trial. Descriptions of the arrangements of
the four trials are given in Table
1.
Distance refers to the distance from the camera to the grid. Image size was
calculated by estimating the total number of grid units from the video. The
distance from the outermost visible grid lines to the edge of the image was
estimated to the nearest one-tenth of a grid unit. The distance and image size
values are all in centimeters.
Data Analysis
An Ariel performance analysis system (APAS) was used to process the
video data. Recorded images of the grid were played back on a VCR. A
personal computer was used to grab and store the images on disk. For each trial,
several frames were chosen from the recording and saved, as per APAS
requirements. From these, analyses were performed on a single frame for each
trial.
Because of the large number of points (up to 357) being digitized in each
trial , the grid was subdivided into separate regions for digitizing and
analysis. Each row was defined as a region and digitized separately.
An experienced operator digitized all points in the grid for each of the
trials. Here digitizing refers to the process of the operator identifying the
location of points of interest in the image with the use of a mouse-driven
cursor. Often digitizing is used to refer to the process of grabbing an image
from video format and saving it in digital format on the computer. Digitizing
and subsequent processing resulted in X and Y coordinates for the points.
Part of the digitizing process involved identifying points of known
coordinates as control (calibration) points. Digitization of these allows for
calculation of the transformation relations from image space to actual
coordinates. In this study, the four points diagonal from the center of the grid
were used as the control points (points marked "X" in fig. 2). These
were chosen because it was anticipated that errors would be smallest near the
center of the image. Using control points which were in the distorted region of
the image would have further complicated the results. The control points were
digitized and their known coordinates were used to determine the scaling from
screen units to actual coordinates.
For trial 1, the coordinates ranged from 0 to approximately _38.1 cm in the X
direction and 0 to approximately _30.48 cm in the Y direction. For trials 2 and
3, the ranges were 0 to _34.29 cm in X and 0 to _26.67 cm in Y. For trial 4, the
range was 0 to _34.29 cm in X and 0 to -22.86 and +26.67 in Y. To remove the
dependence of the data on the size of the grid, normalized coordinates were
calculated by dividing the calculated X and Y coordinates by half the total
image size in the X and Y directions, respectively. Table 1 lists these sizes
for the four trials. Thus, normalized coordinates in both the X and Y directions
were dimensionless and ranged approximately from -1 to +1 for all four trials.
For all trials, the error for each digitized point was calculated as the
distance from the known coordinates of the point to the calculated coordinates.
RESULTS
Raw data from the four trials are presented in figure
3. Shown are graphs of the calculated normalized coordinates of points. Grid
lines on the graphs do not necessarily correspond to the edges of the images.
For each trial, the error of each point was calculated as the distance
between the calculated location (un-normalized) and the known location of that
point. These error values were then normalized by calculating them as a percent
of half the image size in the horizontal direction (trial 1, 40.2 cm; trial 2,
36.8 cm; trials 3 and 4, 36.2 cm). This dimension was chosen arbitrarily to be
representative of the size of the image.
Figure 4 presents
contour plots of the normalized error as a function of the normalized X-Y
location in the image for each of the trials. This type of graph, commonly used
in land elevation maps, displays information three dimensionally. The coordinate
axes represent two of the dimensions. Here, these were the X and Y coordinates
of the points. The third dimension represents the value of interest as a
function of the first two dimensions, in this case, the error as a function of
the X and Y location. Curves were created by connecting points of identical
value.
Interpreting these graphs is similar to interpreting a land map; peaks and
valleys are displayed as closed contour lines. Once again, it was clear that
errors were small near the center of the image and became progressively greater
further away from the center.
The unevenness evident in some of these graphs can be partly attributed to
splitting the image into separate regions for the purpose of digitizing. The
control points were redigitized for each individual section. Since the control
points were close to the center of the image, a small error in their
digitization would be magnified for points further away from the center.
Another quantitative way of viewing this data was to examine how the error
varied as a function of the radial distance from the center of the image. This
distance was normalized by dividing by half the image size in the horizontal
direction (trial 1, 40.2 cm; trial 2, 36.8 cm; trials 3 and 4, 36.2 cm). Figure
5 presents these data for each of the four trials.
Linear and binomial regressions were then fit to the data for each trial and
for all data combined. The linear fit was of the form
Error = An + A0 R
where R was the radial distance from the center of the image (normalized),
and Ao and A1 were the coefficients of the least-squares fit. The binomial fit
was of the form:
Error = Bo + B1 R + B2 R2
where Bo, B1, and B2 were the coefficients of the fit. The results of these
leastsquares fits are presented in
DISCUSSION
When reviewing these results, several points should be noted. First, this
study utilized a two-dimensional analysis algorithm. A limitation of the study
was that exactly four calibration points were required to define the scaling
from screen coordinates to actual coordinates. The use of more than four points
would likely result in less variability. Second, all coordinates and calculated
errors were normalized to dimensions of the image. Although there were many
possibilities for the choice of dimension (e.g., horizontal, vertical or
diagonal image size; maximum horizontal, vertical, or diagonal coordinate;
average of horizontal and vertical image size or maximum coordinate; etc.), the
dimensions used to normalize were assumed to best represent the image size.
It is clear from these data that a systematic error caused by lens distortion
occurred when using the underwater video system. Lens distortion errors were
less than 1% from the center of the image up to radial distances equivalent to
25% of the horizontal image length (normalized R equal to 0.5). Errors were less
than 5% for normalized R up to 1, an area covering most of the image.
There seemed to be some degree of random noise. This was evident in the
scatter pattern seen in the graphs in figure 5. This error can most likely be
attributed to the process of digitizing. There are factors which limit the
ability to correctly digitize the location of a point, such as: if the point is
more than one pixel in either or both dimensions, irregularly shaped points, a
blurred image, shadows, etc. Because of these factors, positioning the cursor
when digitizing was often a subjective decision.
Four trials were analyzed in this study. Although all the data were
normalized, there were slight differences among the four trials (fig. 5 and
table 2). These can most likely be attributed to the uncertainty in determining
the grid size, which was estimated from the fraction of a grid unit from the
outermost visible grid lines to the edge of the images.Two types of regressions
were fit to the data: linear and binomial. The interpretation of the
coefficients of the linear regression can provide insight into the data. A1, the
slope of the error-distance relation represents the sensitivity of the error to
the distance from the origin. Thus, it is a measure of the lens distortion. A0
the intercept of the linear relation can be interpreted as the error at a
distance of zero. If the relation being modeled were truly linear, this would be
related to the random error not accounted for by lens distortion. However, in
this case, it is not certain if the error-distance relation was linear. The RC
values gave an indication of how good the fit was. The binomial curve fit seemed
to more correctly represent the data. The interpretation of these coefficients,
however, is not as straightforward.
CONCLUSIONS
This study has taken a look at one of the sources of error in video-based
motion analysis using an underwater video system. It was demonstrated that
errors from lens distortion could be as high as 5%. By avoiding the outermost
regions of the lens, the errors can be kept to less than .5%.
REFERENCES
Poliner, J., Wilmington, R.P., Klute, G.K., and Micocci, A. Evaluation of
Lens Distortion for Video-Based Motion Analysis, NASA Technical Paper 3266, May
1993.