You Sound different today

Want create site? Find Free WordPress Themes and plugins.

Want create site? Find Free WordPress Themes and plugins.

At the heart of the TeSLA system lie several instruments that implement various biometrics based identity-verification systems. TeSLA collects several kinds of biometric data from each learner. These biometric samples are processed by the various instruments, to determine whether the learner is in fact the person he or she claims to be (i.e., the instruments verify the identity of the learner). At present the set of instruments includes Face Recognition (FR), Voice Recognition (VR), Keystroke-dynamics (KSD) based identification, and Forensic Analysis (FA) of text.

It is well known that most identity-verification systems can be attacked, to malicious ends, in various ways. The easiest kind of attack are spoof attacks, formally called presentation attacks (PA) are attacks made on the biometric sensor (e.g., the camera for face-recognition system or the microphone for voice-recognition system). Therefore, TeSLA also includes two instruments dedicated to presentation-attack detection (PAD): one for face-PAD, and another for voice-PAD. A previous blog entry discussed face presentation attack detection (face-PAD for short), that is, detecting spoof-attacks on the face-recognition instrument. Today we will talk about voice-PAD, that is, how to detect PAs on the VR instrument.

Suppose Bob has enrolled his voice-pattern in the TeSLA VR instrument. Next time he uses TeSLA, the VR instrument can verify his identity based on samples of his speech. There are three common common types of attacks that can be performed by another person, say Alice, on Bob’s voice-based identity:

replay: Alice records Bob’s voice, and replays it back, say, on a laptop or a mobile device, to TeSLA,
speech-synthesis: Alice uses speech-synthesis software to generate a speech-sample that sounds like Bob’s voice, or
voice-conversion: Alice records herself, speaking some text, and uses a voice-conversion software tool to generate a speech sample sounding like Bob’s voice.

Of these, replay-attacks are the easiest to perpetrate, and probably also the most common. The other two kinds of attacks require access to some sophisticated technology, but are not improbable in today’s world.

Listen to the following examples and see if you can distinguish the different kinds of attacks.

Genuine voice sample

Replay-attack sample

Speech-synthesis attack sample

Voice-conversion attack sample

In fact, in the voice-conversion example, indeed the voice of a woman has been converted to resemble that of the male speaker in the genuine voice sample.

One common way of analyzing a speech waveform is by looking at its spectrogram. Figure shows the spectrograms of the four speech samples above. The spectrograms indicate the following:

The genuine speech sample covers the entire frequency-range (0 to 8 Khz), but most of the energy is concentrated in the lower frequency-bands (up to 500 Hz). During the short silences within the sample, some low-frequency signal is still present.
The spectrogram of the replay-attack sample looks similar to that of the genuine sample, except that a lot of the high-frequency information is missing, and some upper mid-range frequencies (5 KHz to 6 KHz) are more prominent than in the genuine sample. This is so because the replay-attack sample is generated by re-recording the genuine sample as it is played back on a laptop.
In both, the speech-synthesis sample and the voice-conversion sample, the speech pattern looks quite unnatural, where the silences are very clean, and most of the high-frequency information is missing.

The voice-PAD instrument tries to detect attacks based on the spectrogram of the input sample. For this, the instrument is first trained, using a set of only trusted genuine samples from a large set of speakers, to recognize the spectral characteristics of genuine speech-samples. This kind of classifier, trained on only the genuine class, is called as a one-class classifier (OCC). OCCs are a good approach to PAD because they can deal with different kinds of attacks, without being explicitly trained to detect those kinds of attacks.

To be specific, a gaussian mixture-model (GMM) based OCC can be trained, to recognize genuine samples based on Mel-frequency cepstral coefficients (MFCC) and their derivative features. Once trained, the GMM produces a probability value for each input speech sample. When presented with a genuine sample, the output of the GMM will always be higher than a certain threshold, and for attack presentations it is expected to produce low probability values.

Idiap Team

Did you find apk for android? You can find new Free Android Games and apps.

8 de November de 2017 in General Tags:

Did you find apk for android? You can find new Free Android Games and apps.

Want create site? Find Free WordPress Themes and plugins.
You Sound different today
Did you find apk for android? You can find new Free Android Games and apps.

Recent posts

TeSLA Project Twitter

Want create site? Find Free WordPress Themes and plugins.You Sound different todayDid you find apk for android? You can find new Free Android Games and apps.

Recent posts

TeSLA Project Twitter

Want create site? Find Free WordPress Themes and plugins.
You Sound different today
Did you find apk for android? You can find new Free Android Games and apps.