An Introduction to video capturing In order to understand how the computer can capture pictures from the
video, it is necessary to consider some history and explanations of previous techniques
and technologies. The use of the video camera/VCR with the computer was considered, at the
time, a major breakthrough. A faster, more streamlined approach that enables the user to
send the pictures directly from the video camera/camcorder to the computer, however, is
now available. A brief explanation of the earlier systems will enhance understanding of
this new technique.
Consider what happens when a video camera is used
to take a picture. A video camera can capture only one image at a time, in a series, each
time the record button is pressed. This sequence of individual images may record a
collection of positions by an athlete as he or she performs a certain athletic movement.
Information can be extracted from these pictures, if they were taken during the moment
when a position crucial for the actual movement/event occurred, such as the position of a
discus thrower just before he releases the discus.
In order to capture the position needed using the
techniques of video, it is necessary to start recording with the camcorder prior to the
beginning of the desired movement and stopping after the completion of the action. For
this example, the camera would be started before the discus thrower began his spin and
stopped after the discus was released. Examination of the discus thrower's videotape in a
VCR provides an opportunity for viewing the sequence as many times as desired. Using the
variety of controls on the VCR (play, stop pause and so on), it is easy and simple to
analyze/view interesting position in the movement. The greatest strength associated with
the camcorder/VCR/Computer use is the extensive amount of dynamic information can be
extracted by virtue of the sophisticated software. Dynamic information includes not just
the position but the variation or progression from image to image.
The standard camcorder is capable of recording 60
(NTSC) /50 (PAL) images every second onto the videotape inserted in the camcorder. Every
image in the video is made of multiple lines and a large number of points in every line.
The actual image can be compared to a huge table where each cell in the table represents a
color. The reason there are two different number of images that can be stored on videotape
every second is because each country in the world has set acceptance of different
standards. The two standards are PAL (mostly used in European countries) and NTSC (North
American standard). The major difference in the two standards is, of course, the number of
images that are stored every second. But, in addition, these standards also regulate how
the color is represented in the actual picture/image and the number of lines and points in
every line in the image (i.e. table principle). In summary, some of the differences:
|
Pal |
NTSC |
Number of images/sec |
50 |
60 |
Number of lines |
384 |
240 |
Number of points in line |
576 |
640 |
The camcorder/VCR has been used extensively during
the last 10 years to extract visual information about movement. The complex movements
associated with athletic performances have been easier to dissect with the development and
applications of these technological improvements. Evaluation of performances which attempt
to compare the differences between two different trials for the same person are quite
useful. However, this necessitates that both the athlete and the coach are highly skilled
at viewing movement and determining the appropriate changes. More often, it is impossible
to identify movement patterns, deceleration or acceleration changes of various body parts
throughout the activity by using only the human eye. Merely examining the video tape does
not provide as much information as processing the movement patterns with computerized
software. Better use of the video tape involves processing the recorded video sequence by
the computer with "easy to use" software packages which allow quantification of
the movement with highly mathematical techniques.
In order to understand more precisely how the
computer is able to utilized the video tape for subsequent mathematical applications, a
brief explanation is needed. When video is recorded onto the videotape, it is stored using
electromagnetic fields. The actual video is normally viewed using a VCR connected to a
television. The connection between the VCR and the television is a small cable where
electric signals are used to represent the actual images. In other words, the image stored
on the videotape is converted from magnetic fields to electric signals that is moved to
the television and displayed.
These same electric signals (which represents the
images from the video tape) can be transformed into images in the computer. Converting
electronic signals to something the computer can understand requires the use of
specialized hardware called an A/D (Analog to Digital) converter. An A/D converter
converts the Analog electric signal representing the video image into a large amount of
digital numbers which the computer recognizes. The computer cannot understand what to do
with the Analog electric signal unless it has been transformed or converted into a
meaningful form, that is, a digital signal. This is analogous to two people speaking
different languages, such as a Frenchman trying to talk with someone who only speaks
Japanese. If neither individual understand the language of the other, they need someone or
something to translate or convert from one language to another. This is the function of an
A/D converter for computers which need to have electrical signals translated into digital
forms.
After the analog electric signal has been converted
into a digital number, the computer can store that number in the memory or on the hard
drive. The A/D conversion of video images and the storage of this digital information to
the memory/hard drive is called a Video Capture System. In many different books and
literature this conversion and storing of images to a computer memory/hard drive is
referred to as Capturing or Video Capturing.
As explained earlier, every image in the video
sequence consists of several lines and a number of points in every line. This was called
the "table principle". When capturing and storing images in the computer, the
size, that is the amount of space required for storage on the computer, is a function of
several parameters. These image parameters include the number of lines in the image, the
number of points for every line, and the amount of bytes used for the representation of
the color. A sample calculation for size follows using the PAL mode for the example:
size of image = (number of lines) x (number of
points in line) x (number of bytes for every color)
PAL Image size = 384 x 576 x 3 = 663552 bytes
Normally the number of bytes for the color is 3.
This gives the possibility of 3 x 8 bit = 24 bit color which is very close to the maximum
of what the human eye can handle. When adding the information about the number of images
captured every second, the amount of data that the computer has to move to the hard drive
is (50 image/sec x 663552 bytes) 33177600 bytes which is more than 30Mb per second. People
with a little knowledge in computers are aware that no computer currently available can
transport this amount of data to the hard drive. The storage capacity of the hard drive is
sufficient but the flow rate from the Video Capture system to the hard drive is simply too
high for the system. Therefore, an alternative method is employed which reduces the amount
of data that has to go to the computer's hard drive.
Since the beginning of the computer, programmers
have tried to reduce the amount of data that has to be stored on the computer due to the
problem that the computer does not have unlimited storage. There are several ways that the
amount of data can be reduced. One way would be to decrease the number of lines or the
number of points in every line. Unfortunately, either of these choices would decrease the
quality of the image significantly. Reducing the number of colors would be another option.
Using only a gray scale (8 bit-1 byte) would decrease the amount of data by three. But
10Mb/sec is still very high and gray scale is not always as nice as color images.
Fortunately, the solution for this specific Video Capturing System are special compression
algorithms which allow large amounts of data to be stored in relatively small amounts of
memory with very little loss of information in the reduction process.
One compression algorithm is called JPG and has
become a standard used extensively on the Internet for compressing images. The Internet is
another part of the computer world which also requires reduction in the amount of data
moved from one end of the world to the another. If compression were not used on today's
networks, it would be very difficult to download anything from the Internet and the time
spent downloading or just viewing web pages would take forever. JPG used in video
sequences is called MotionJPG or just MJPG. There is a large and varied number of
compression algorithms currently available. Most of them are developed with specific goals
to accomplish. However, the JPG has been determined to be the best standard for video
based motion analysis systems.
The 33Mb/second delivered by the Video Capturing
System can be reduced to as little as 1.1Mb/second. Of course, with this amount of
compression, there is some loss of information but, if the amount of compression is chosen
to about 4-6Mb/second, then the quality of the image will remain very high. Even 4-6Mb/sec
produces a large amount of data for storage on the hard drive and most older computers
(like the old 486 or older Pentiums) are unable to accommodate such a file. The
requirements for the computer to adequately store these larger data files are as follows:
Pentium 200Mhz
More than 2Gb hard drive
The hard drive must be using Ultra DMA (which is the case for any new
computer) or Ultra SCSI when communicating with the processor.
The amount of memory must be higher than 16Mb
Use Win95/win98 or WinNT
In order to make the Video Capturing System
success, an appropriate compression algorithm to reduce the amount of data is essential.
The algorithm used in JPG is a complicated mathematical process which reduces certain
information. Attempting to let the processor perform this function is impossible, since
even a Pentium 450Mhz is not fast enough. This is because the computer processor were not
built specifically to execute these highly mathematical processes. Although they could do
such work, there were not for these tasks. Therefore, most computers use specialized video
processors mounted on the Video Capture system in order to compress the video without loss
of images. These types of processors are specialized and dedicated for this purpose only.
After the video has been stored on the computer's
hard drive, displaying it on the computer's monitor requires decompression of the video in
order again view the actual image. The same speed problem occurs and, therefore, instead
of using the computer processor for decompressing the video, the computer uses the video
processor on the Video Capturing System, before the data is displayed on the computer
monitor.
The COmpression and DECompression unit is call a
CODEC. A codec, as explained above, can be either hardware or software. A codec is
installed in Windows 95/98/NT in the same manner as is hardware. If you would like to view
what codec's you have installed on your system please follow these few steps:
1: Click the Start button in the task bar
2: Under the menu called Settings, choose the Control Panel
3: In the Control Panel Window, find the item called Multimedia
4: Double click the Multimedia Icon
5: In the Multimedia Properties, click the Tab called Devices
6: Under Devices it is possible view all components installed for Multimedia purposes.
7: Click the Plus sign at the title Video Compression Codecs
8: Now you will be presented for a list of all the codecs installed in you computer
When you open a video clip in your computer, the
system will ask the Multimedia Properties if you have a codec installed that can
decompress the video you selected. If it is installed, you will see the video directly on
your computer, but if the computer can not find a codec that can decompress the video you
will have an error message explaining that it is not possible to find a codec for this
video. This means that if the system has an codec installed to fit the video you need to
use, then everything will work just fine. Otherwise, you will have to install a codec and
then the video clip can be successfully decompress for your use.
Moving the captured video from one computer to
another requires that both computers has the correct codec installed either in hardware or
software form. It is possible to view the video using the computer where the video was
captured or on another computer which has the appropriate codec installed. The only
software MJPG codec available currently is free on the Internet and is called Paradigem
MJPG Codec.
Following this generalized explanation of video
capturing, it should be easier to relate to operation of the APAS system. Most of the
information was provided as background on how video signals can be captured into
computers. The connection between this information and the APAS system is essential
because the APAS system utilizes video for a computer-based analysis system.
In general, the analysis of human movement can be
divided into few steps:
1. Record the event using video cameras
(camcorders)
2. Capture the video from the Camcorder to the computer
3. Digitize the event utilizing the video
4. Transform the video-digitized coordinates to a 3D coordinate system
5. Filter the transformed coordinates
6. Analyze the movement
Step 2 is the actual capturing of the video from
the camcorder to the computer. For this purpose, the APAS system provides two modules:
Capture and RealCap.
The Capture module employs a special technique
which uses the connected VCR to capture the actual video. This process is based on
stepping the video forward for each image and capturing that image individually. This is
just like creating animated cartoons where the drawing is moved a fraction of a millimeter
and every time the drawing is moved a picture is taken.
The RealCap module utilizes the video compression
MJPG, which was described previously. This technique enables the module to stream the
video directly onto the hard drive while the Video is in play mode. There are several
advantages of the approach which include: (1) the time for capturing a 3 sec event is much
shorter than using the Capture module, (2) the quality of the images is much better, and
(3) there is no need for external VCR equipment thus reducing the price of the over all
system.
The reason why the images are better using the
RealCap is because the Capture module uses a pause mode during the image capturing. While
pausing, the VCR tries to adjust the video heads to acquire a stable image. This
adjustment can introduce a small jitter in the image. During the play mode, this jittering
in the image is not present.
Although the RealCap module appears to have more
pluses than the Capture module there is a small minus. When using more than one view (this
is required when analyzing 3D movement) in the recorded event, it is necessary to have the
first image of all views to be identical. If this is not the case, a huge error will be
introduced to the analysis.
Capturing video while the VCR/camcorder is in play
mode requires a certain level of finesse by the user. This difficulty is related to
positioning the video with such accuracy that it can be reproduced from one view to
another, that is, from one video sequence to another. This necessitates some kind of
trimming of each individual view. The RealCap module provides an opportunity to adjust the
individual view when saving the video to the hard drive. This adjustment enables the
individual views to have the same starting position and also to have the same length. The
best technique is to locate a clearly defined position or event within the movement and
select this as the synchronizing image. An example of a clearly defined position could be
the heel strike during walking or the release of a discus in a discus throw. However, the
most important point to remember is that the synchronizing image is clearly defined and
the same in all the views used for the analysis.
An explanation of the trimming process is presented
in greater detail elsewhere. Please refer to the online
manuals of the RealCap for this description.
|