To tackle this stiff challenge, the BU team is asking multiple signers to sit in a studio, one at a time, and sign through 3,000 gestures in a classic American Sign Language (ASL) dictionary. As they sign, four high-speed, high-quality cameras simultaneously pick up front and side views, as well as facial expressions. According to Neidle, smiles, frowns, and raised eyebrows are a largely understudied part of ASL that could offer strong clues to a gesture's meaning.
As the visual data comes in, Neidle and her students analyze it, marking the start and finish of each sign and identifying key subgestures--units equivalent to English phonemes. Meanwhile, Sclaroff is using this information to develop algorithms that can, say, distinguish the signer's hands from the background, or recognize hand position and shape and patterns of movement. Given that any individual could sign a word in a slightly different way, the team is analyzing gestures from both native and non-native signers, hoping to develop a computer recognizer that can handle such variations.
The main challenge going forward may be taking into account the many uncontrollable factors on the user's side of the interface, says Sclaroff. For example, someone using a gesture to enter a search query into a laptop will have a lower-quality camera. The background may be more cluttered than the carefully controlled studio environment in the database samples, and the computer will have to adjust for variables like clothing and skin tone.
"Just to produce the sign and look it up--that's the real novelty we're trying to accomplish," says Neidle. "That would be an improvement over anything that exists now."
Tags
camera computation facial recognition gesture interface