Introduction to video capture with APAS
In order to understand how the computer can capture pictures from the video, it is necessary to consider some history and explanations of previous techniques and technologies. The use of the video camera/VCR with the computer was considered, at the time, a major breakthrough. A faster, more streamlined approach that enables the user to send the pictures directly from the video camera/camcorder to the computer, however, is now available. A brief explanation of the earlier systems will enhance understanding of this new technique.
Consider what happens when a video camera is used to take a picture. A video camera can capture only one image at a time, in a series, each time the record button is pressed. This sequence of individual images may record a collection of positions by an athlete as he or she performs a certain athletic movement. Information can be extracted from these pictures, if they were taken during the moment when a position crucial for the actual movement/event occurred, such as the position of a discus thrower just before he releases the discus.
In order to capture the position needed using the techniques of video, it is necessary to start recording with the camcorder prior to the beginning of the desired movement and stopping after the completion of the action. For this example, the camera would be started before the discus thrower began his spin and stopped after the discus was released. Examination of the discus thrower's videotape in a VCR provides an opportunity for viewing the sequence as many times as desired. Using the variety of controls on the VCR (play, stop pause and so on), it is easy and simple to analyze/view interesting position in the movement. The greatest strength associated with the camcorder/VCR/Computer use is the extensive amount of dynamic information can be extracted by virtue of the sophisticated software. Dynamic information includes not just the position but the variation or progression from image to image.
The standard camcorder is capable of recording 60 (NTSC) /50 (PAL) images every second onto the videotape inserted in the camcorder. Every image in the video is made of multiple lines and a large number of points in every line. The actual image can be compared to a huge table where each cell in the table represents a color. The reason there are two different number of images that can be stored on videotape every second is because each country in the world has set acceptance of different standards. The two standards are PAL (mostly used in European countries) and NTSC (North American standard). The major difference in the two standards is, of course, the number of images that are stored every second. But, in addition, these standards also regulate how the color is represented in the actual picture/image and the number of lines and points in every line in the image (i.e. table principle). In summary, some of the differences:
The camcorder/VCR has been used extensively during the last 10 years to extract visual information about movement. The complex movements associated with athletic performances have been easier to dissect with the development and applications of these technological improvements. Evaluation of performances which attempt to compare the differences between two different trials for the same person are quite useful. However, this necessitates that both the athlete and the coach are highly skilled at viewing movement and determining the appropriate changes. More often, it is impossible to identify movement patterns, deceleration or acceleration changes of various body parts throughout the activity by using only the human eye. Merely examining the video tape does not provide as much information as processing the movement patterns with computerized software. Better use of the video tape involves processing the recorded video sequence by the computer with "easy to use" software packages which allow quantification of the movement with highly mathematical techniques.
In order to understand more precisely how the computer is able to utilized the video tape for subsequent mathematical applications, a brief explanation is needed. When video is recorded onto the videotape, it is stored using electromagnetic fields. The actual video is normally viewed using a VCR connected to a television. The connection between the VCR and the television is a small cable where electric signals are used to represent the actual images. In other words, the image stored on the videotape is converted from magnetic fields to electric signals that is moved to the television and displayed.
These same electric signals (which represents the images from the video tape) can be transformed into images in the computer. Converting electronic signals to something the computer can understand requires the use of specialized hardware called an A/D (Analog to Digital) converter. An A/D converter converts the Analog electric signal representing the video image into a large amount of digital numbers which the computer recognizes. The computer cannot understand what to do with the Analog electric signal unless it has been transformed or converted into a meaningful form, that is, a digital signal. This is analogous to two people speaking different languages, such as a Frenchman trying to talk with someone who only speaks Japanese. If neither individual understand the language of the other, they need someone or something to translate or convert from one language to another. This is the function of an A/D converter for computers which need to have electrical signals translated into digital forms.
After the analog electric signal has been converted into a digital number, the computer can store that number in the memory or on the hard drive. The A/D conversion of video images and the storage of this digital information to the memory/hard drive is called a Video Capture System. In many different books and literature this conversion and storing of images to a computer memory/hard drive is referred to as Capturing or Video Capturing.
As explained earlier, every image in the video sequence consists of several lines and a number of points in every line. This was called the "table principle". When capturing and storing images in the computer, the size, that is the amount of space required for storage on the computer, is a function of several parameters. These image parameters include the number of lines in the image, the number of points for every line, and the amount of bytes used for the representation of the color. A sample calculation for size follows using the PAL mode for the example:
size of image = (number of lines) x (number of points in line) x (number of bytes for every color)
PAL Image size = 384 x 576 x 3 = 663552 bytes
Normally the number of bytes for the color is 3. This gives the possibility of 3 x 8 bit = 24 bit color which is very close to the maximum of what the human eye can handle. When adding the information about the number of images captured every second, the amount of data that the computer has to move to the hard drive is (50 image/sec x 663552 bytes) 33177600 bytes which is more than 30Mb per second. People with a little knowledge in computers are aware that no computer currently available can transport this amount of data to the hard drive. The storage capacity of the hard drive is sufficient but the flow rate from the Video Capture system to the hard drive is simply too high for the system. Therefore, an alternative method is employed which reduces the amount of data that has to go to the computer's hard drive.
Since the beginning of the computer, programmers have tried to reduce the amount of data that has to be stored on the computer due to the problem that the computer does not have unlimited storage. There are several ways that the amount of data can be reduced. One way would be to decrease the number of lines or the number of points in every line. Unfortunately, either of these choices would decrease the quality of the image significantly. Reducing the number of colors would be another option. Using only a gray scale (8 bit-1 byte) would decrease the amount of data by three. But 10Mb/sec is still very high and gray scale is not always as nice as color images. Fortunately, the solution for this specific Video Capturing System are special compression algorithms which allow large amounts of data to be stored in relatively small amounts of memory with very little loss of information in the reduction process.
One compression algorithm is called JPG and has become a standard used extensively on the Internet for compressing images. The Internet is another part of the computer world which also requires reduction in the amount of data moved from one end of the world to the another. If compression were not used on today's networks, it would be very difficult to download anything from the Internet and the time spent downloading or just viewing web pages would take forever. JPG used in video sequences is called MotionJPG or just MJPG. There is a large and varied number of compression algorithms currently available. Most of them are developed with specific goals to accomplish. However, the JPG has been determined to be the best standard for video based motion analysis systems.
The 33Mb/second delivered by the Video Capturing System can be reduced to as little as 1.1Mb/second. Of course, with this amount of compression, there is some loss of information but, if the amount of compression is chosen to about 4-6Mb/second, then the quality of the image will remain very high. Even 4-6Mb/sec produces a large amount of data for storage on the hard drive and most older computers (like the old 486 or older Pentiums) are unable to accommodate such a file. The requirements for the computer to adequately store these larger data files are as follows: