SCHOOL OF ENGINEERING, CUSAT
COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-
Imagine yourself in a world where humans interact with computers. You are sitting in front of your personal computer that can listen, talk, or even scream aloud. It has the ability to gather information about you and interact with you through special techniques like facial recognition, speech recognition, etc. It can even understand your emotions at the touch of the mouse. It verifies your identity, feels your presents, and starts interacting with you .You asks the computer to dial to your friend at his office. It realizes the urgency of the situation through the mouse, dials your friend at his office, and establishes a connection.
Initiative to make this happen: the Blue Eyes research project currently being implemented by the center’s user systems ergonomic research group (User). Blue Eyes seeks attentive computation by integrating perceptual abilities to computers wherein non-obtrusive sensing technology, such as video cameras and microphones, are used to identify and observe your actions. As you walk by the computer screen, for example, the camera would immediately "sense" your presence and automatically turn on room lights, the television, or radio while popping up your favorite Internet website on the display.
Part of this project is not only teaching computers how to sense or perceive user action. They are also being programmed to know how users feel--depressed, ecstatic, bored, amused, or anxious--and make a corresponding response. Computers can, on their own, play a funny Flash animation feature to entertain its "master" if it notices a sad look on his or her face. Or sound capabilities can also be integrated, with the computer "talking" to his user about the task at hand or simply acknowledging a command with a respectful, "yes, sir."In these cases, the computer extracts key information, such as where the user is looking, what he or she is saying or
gesturing or how the subject’s emotions are evident with a grip on the pointing device.
In these cases, the computer extracts key information, such as where the user is looking, what he or she is saying or gesturing or how the subject’s emotions are evident with a grip on the pointing device. These cues are analyzed to determine the
user’s physical, emotional, or informational state, which can be used to increase productivity. This is done by performing expected actions or by providing expected information.
Now let’s evaluate the human cognition mechanism. Human cognition depends primarily on the ability to perceive, interpret, and integrate audio-visuals and sensoring information. Adding extraordinary perceptual abilities to computers would enable computers to work together with human beings as intimate
partners. Researchers are attempting to add more capabilities to computers that will allow them to interact like humans, recognize human presents, talk, listen, or even guess their feelings.
The Blue Eyes technology aims at creating computational machines that have perceptual and sensory ability like those of human beings. It uses non-obtrusive sensing method, employing most modern video cameras and microphones to identify the users’ actions through the use of imparted sensory abilities. The machine can understand what a user wants, where he is looking at, and even realize his physical or emotional states.
For a long time emotions have been kept out of the deliberate tools of science; scientists have expressed emotion, but no tools could sense and respond to their affective information. This paper highlights research aimed at giving computers the ability to comfortably sense, recognize and respond to the human communication of emotion, especially affective states such as frustration, confusion, interest, distress, anger and joy.
Two main themes of sensing—self–report and concurrent expression—are described, together with examples of systems that give users new ways to communicate emotions to computers and, through computers, to other people. In addition to building systems that try to elicit and detect frustration, system has been developed that responds to user frustration in a way that appears to help alleviate it. This paper highlights applications of this research to interface design, wearable computing, entertainment and
education and briefly presents some potential ethical concerns and how they might be
addressed. Not all computers need to “pay attention” to emotions or to have the capability to emulate emotion. Some machines are useful as rigid tools, and it is fine to keep them that way. However, there are situations in which human—computer interaction could be
improved by having he computer adapt to the user, and in which communication about when, where, how and how important it is to adapt involves the use of emotional information.
Findings of Reeves and Nass at Stanford University suggest that the interaction between human and machine is largely natural and social, indicating that factors important inhuman—human interaction are also important in human—computer interaction. In human—human interaction, it has been argued that skills of so—called “emotional intelligence” are more important than are traditional mathematical and verbal skills of intelligence. These skills include the ability to recognize the emotions of another and to respond appropriately to these emotions. Whether or not these particular skills are more important than certain other skills will depend on the situation and goals of the used, but what is clear is that these skills are important in human—human interaction, and when they are missing, interaction is more likely to be perceived as frustrating and not very intelligent.Current computer input devices, particularly the common ones such as keyboards and mice, are limiting in capabilities. Interfaces should not be limited merely to the screen, which forms the intermediary between the user and the results of the computer processes. Rather, the subsidiary devices should also be brought into the equation. In a sense, computer interfaces could be seen as a ‘peer’, or as one who responds activity to user input, as a reflection and a response lf the user’s feeling and emotions, to better understand the true intensions of the user.
There are three key aspects that is important in representing the ‘emotions’ that a computer is believed to posses: automatic signals, facial expressions and behavioral manifestations. When observing human communication, studies have shown that apart from facial expressions, gestures, touch and other signs of the body language play a vital role in the communication of feelings and emotion. However one failing of the desktop PC is its inability to simulate the effect of touch. Humans are experts at interpreting facial expressions and tones of voice and making accurate
interferences about others’ internal states from these clues. Controversy rages over anthromorphism: should we leverage this expertise in the service of computer interface
design, since attributing human characteristic to machines often means setting unrealistic and unfulfillable expectations about the machine’s capabilities? Show a human face;
expect human capabilities that far outstrip the machines? Yet the fact remains that faces have been used effectively in media to represent a wide variety of internal states. And
with careful design, we regard emotional expression via face and sound as a potentially effective means of communicating a wide array of information to computer users. As system become more capable of emotional communication with users, we see systems needing more and more sophisticated emotionally— expressive capability.
Sensors, tactile or otherwise, are an integral part of an effective computing system because they provide information about the wearer’s physical state or behavior. They can gather data in a continuous way without having to interrupt the user. The emphasis here is on describing physiological sensors; however, there are many kinds of new sensors currently under development that might be useful in recognizing affective cues. (Tactile) Sensors to receive human felling as input have been progressively developing over the last few decades. Since the human brain functions communicates its emotions as electrical signals, sensitive equipmentand apparatus are able to pick up these weak signals. Here, we provide a concise list of the current technology available that could be further developed as input devices for obtaining user emotional information.
Blue Eyes system provides technical means for monitoring and recording human-operator's physiological condition.
The key features of the system are:
1. Visual attention monitoring (eye motility analysis)
2. Physiological condition monitoring (pulse rate, blood oxygenation)
3. Operator’s position detection (standing, lying)
4. Wireless data acquisition using Bluetooth technology
5. Real-time user-defined alarm triggering
6. Physiological data, operator's voice and overall view of the control room recording
recorded data playback.
The system consists of a portable measuring unit----Data Acquisition Unit) and a central analytical system----Central System Unit. The mobile device is integrated with Bluetooth module providing wireless interface between the operator-worn
sensors and the central unit. ID cards assigned to each of the operators and adequate user profiles on the central unit side provide necessary data personalization so that different people can use a single sensor device.
Blue Eyes system provides technical means for monitoring and recording the operator’s basic physiological parameters. The most important parameter is saccadic activity (Saccades are rapid eye jumps to new locations within a visual environment assigned predominantly by the conscious attention process.), which enables the system to monitor the status of the operator’s visual attention along with head acceleration, which accompanies large displacement of the visual axis (saccades larger than 15 degrees).
The JAZZ-novo is the multisensor system that allows the acquisition of the eye movements with excellent spatial and temporal resolution, together with other physiological and environmental signals. The main idea behind developing the JAZZ-novo multisensor was to gather different kinds of information about pilot’s interaction in the cockpit, using a single device. The physiological signals measured by JAZZ-novo include:
• Eye movements in horizontal and vertical axis (1000 Hz sampling frequency),
• Head rotation velocity in horizontal and vertical axis (1000 Hz sampling frequency),
• Head acceleration in horizontal and vertical axis (1000 Hz sampling frequency),
• Photoplethysmographic signals in two lengths of the light wave (500 Hz sampling frequency)
• Audio signal recording (8000 Hz sampling frequency).
JAZZ Multisensor provides the Data Acquisition Unit with necessary physiological data. It supplies raw digital data regarding eye position, the level of blood oxygenation, acceleration along horizontal and vertical axes and ambient light intensity.
Eye movement is measured using direct infrared oculographic transducers. The eye movement is sampled at 1 kHz, the other parameters at 250 Hz. The sensor sends approximately 5,2kB of data per second.
For the eye movement measurement the JAZZ-novo system utilizes the Cyclops-ODS (Oculus Dexter Sinister) technology (Infrared Oculography - IRO) optimized for easy set-up and minimal intrusiveness, which are crucial for monitoring of subject behavior in the non-laboratory environment. The Cyclops-ODS’s set of optoelectronic sensors is placed between the eyes, hiding the sensor in the “shadow” of the nose. The limitations of the visual field are minimal, reducing the risk of JAZZ-novo interference with subject’s visual exploration of working environment. The eye movement measurements are performed with high temporal and spatial resolution, allowing the precise detection of saccades — fast eye movements used to move the point of gaze around the available field of view. Statistical processing of detected saccades over selected periods of time (their quantity, amplitude, duration of preceding fixations) provides important information about operator's visual attention involvement.
Head rotation velocity and acceleration measurements allow detection of the head movements linked with the visual exploration of the control room environment.
Additionally the absolute measurement of horizontal head rotation velocity allows automatic calibration of horizontal eye movement in angular degree units.
The photoplethysmography signals measured by JAZZ-novo system, allow evaluation of the operator's heart beat and relative changes of the blood oxygenation.
As these signals carry the information about the vasodilatation/vasoconstriction responses regulated by the sympathetic system, its analysis can be used to access subject’s workload.
ii.MC145483 - 13bit PCM codec
A PCM Codec–Filter is used for digitizing and reconstructing the human voice. These devices are used primarily for the telephone network to facilitate voice switching and transmission. Once the voice is digitized, it may be switched by digital switching methods or transmitted long distance (T1, microwave, satellites, etc.) without degradation. The name codec is an acronym from ‘‘COder’’ for the analog–to–digital converter (ADC) used to digitize voice, and ‘‘DECoder’’ for the digital–to–analog converter (DAC) used for reconstructing voice. A codec is a single device that does both the ADC and DAC conversions.
iii.Bluetooth module (based on ROK101008)
ROK101008 is a Bluetooth module developed by electronic giants Erickson Corporation. It is readily available in the present market at a considerably low cost. It is because of its low cost and availability that it was chosen for the system.
b. The Software
Blue Eyes software’s main task is to look after working operators' physiological condition. To assure instant reaction on the operators' condition change the software performs real time buffering of the incoming data, real-time physiological data analysis and alarm triggering.
The Blue Eyes software comprises several functional modules:
i. SystemCore facilitates the data flow between other system modules (e.g. transfers raw data from the Connection Manager to data analyzers, processed data from the data analyzers to GUI controls, other data analyzers, data logger etc.).
The term EYE TRACKER refers to the device which tracks the movement of eye. IBM’s ALMADEN research centers have developed an eye tracker
In comparison to the system reported in early studies this system is much more compact and reliable. However, we felt that it was still not robust enough for a variety of people with different eye characteristics, such as pupil brightness and correction glasses. We hence chose to develop and use our own eye tracking system. Available commercial systems, such as those made by ISCAN Incorporated, LC Technologies, and Applied Science Laboratories (ASL), rely on a single light source that is positioned either off the camera axis in the case of the ISCANETL-400 systems, or on-axis in the case of the LCT and the ASL E504 systems. Eye tracking data can be acquired simultaneously with MRI scanning using a system that illuminates the left eye of a subject with an infrared (IR) source, acquires a video image of that eye, locates the corneal reflection (CR) of the IR source, and in real time calculates/displays/records the gaze direction and pupil diameter.
The user speaks to the computer through a microphone, which, in used; a simple system may contain a minimum of three filters. The more the number of filters used, the higher the probability of accurate recognition. Presently, switched capacitor digital filters are used because these can be custom-built in integrated circuit form. These are smaller and cheaper than active filters using operational amplifiers. The filter output is then fed to the ADC to translate the analogue signal into digital word. The ADC samples the filter outputs many times a second. Each sample represents different amplitudes of the signal .Evenly spaced vertical lines represent the amplitude of the audio filter output at the instant of sampling. Each value is then converted to a binary number proportional to the amplitude of the sample. A central processor unit (CPU) controls the input circuits that are fed by the ADCS. A large RAM (random access memory) stores all the digital values in a buffer area. This digital information, representing the spoken word, is now accessed by the CPU to process it further. The normal speech has a frequency range of 200 Hz to 7 kHz. Recognizing a telephone call is more difficult as it has bandwidth limitation of 300 Hz to3.3 kHz.
The spoken words are processed by the filters and ADCs. The binary representation of each of these words becomes a template or standard, against which the future words are compared. These templates are stored in the memory. Once the storing process is completed, the system can go into its active mode and is capable of identifying spoken words. As each word is spoken, it is converted into binary equivalent and stored in RAM. The computer then starts searching and compares the binary input pattern with the templates. t is to be noted that even if the same speaker talks the same text, there are always slight variations in amplitude or loudness of the signal, pitch, frequency difference, time gap, etc. Due to this reason, there is never a perfect match between the template and binary input word. The pattern matching process therefore uses statistical techniques and is designed to look for the best fit. The values of binary input words are subtracted from the corresponding values in the templates. If both the values are same, the difference is zero and there is perfect match. If not, the subtraction produces some
difference or error. The smaller the error, the better the match. When the best match occurs, the word is identified and displayed on the screen or used in some other manner. The search process takes a considerable amount of time, as the CPU has to make many comparisons before recognition occurs. This necessitates use of very high-speed processors.
A large RAM is also required as even though a spoken word may last only a few hundred milliseconds, but the same is translated into many thousands of digital words. It is important to note that alignment of words and templates are to be matched correctly in time, before computing the similarity score. This process, termed as dynamic time warping, recognizes that different speakers pronounce the same words at different speeds as well as elongate different parts of the same word. This is important for the speaker-independent recognizers. It is important to consider the environment in which the speech recognition system has to work.
The grammar used by the speaker and accepted by the system, noise level, noise type, position of the microphone, and speed and manner of the user’s speech are some factors that may affect the quality of speech recognition .When you dial the telephone number of a big company, you are likely to hear the sonorous voice of a cultured lady who responds to your call with great courtesy saying “Welcome to company X. Please give me the extension number you want”. You pronounce the extension number, your name, and the name of person you want to contact. If the called person accepts the call, the connection is given quickly. This is artificial intelligence where an automatic call-handling system is used without employing any telephone operator.
One of the main benefits of speech recognition system is that it lets user do other works simultaneously. The user can concentrate on observation and manual operations, and still control the machinery by voice input commands. Another major
application of speech processing is in military operations. Voice control of weapons is an example. With reliable speech recognition equipment, pilots can give commands and information to the computers by simply speaking into their microphones—they don’t have to use their hands for this purpose. Another good example is a radiologist scanning hundreds of X-rays, ultra sonograms, CT scans and simultaneously dictating conclusions to a speech recognition system connected to word processors. The radiologist can focus his attention on the images rather than writing the text. Voice recognition could also be
used on computers for making airline as well as hotel reservations. A user requires simply stating his needs, to make reservation, cancel a reservation, or making enquiries about schedule.
Blue Eyes system has the following advantages:-
For a real-time monitoring system for a human operator.THE approach is innovative since it helps supervise the operator not the process, as it is in presently available solutions. The system will help avoid potential threats resulting from human errors, such as weariness, oversight, tiredness or temporal indisposition. It is
possible still to improve the system.
The use of a miniature CMOS camera integrated into the eye movement sensor will enable the system to calculate the point of gaze and observe what the operator is actually looking at.
Introducing voice recognition algorithm will facilitate the communication between the operator and the central system and simplify authorization process. Despite considering only the operators working in control rooms, the system may well be applied to everyday life situations. Assuming the operator is a driver and the supervised process is car driving it is possible to build a simpler embedded online system, which will only monitor conscious brain involvement and warn when necessary. As in this case the logging module is redundant, and the Bluetooth technology is becoming more and more popular, the commercial implementation of such a system would be relatively inexpensive.