At the heart of the TeSLA system lie several instruments that implement various biometrics based identity-verification systems. TeSLA collects several kinds of biometric data from each learner. These biometric samples are processed by the various instruments, to determine whether the learner is in fact the person he or she claims to be (i.e., the instruments verify the identity of the learner). At present the set of instruments includes Face Recognition (FR), Voice Recognition (VR), Keystroke-dynamics (KSD) based identification, and Forensic Analysis (FA) of text.
It is well known that most identity-verification systems can be attacked, to malicious ends, in various ways. The easiest kind of attack are spoof attacks, formally called presentation attacks (PA) are attacks made on the biometric sensor (e.g., the camera for face-recognition system or the microphone for voice-recognition system). Therefore, TeSLA also includes two instruments dedicated to presentation-attack detection (PAD): one for face-PAD, and another for voice-PAD. A previous blog entry discussed face presentation attack detection (face-PAD for short), that is, detecting spoof-attacks on the face-recognition instrument. Today we will talk about voice-PAD, that is, how to detect PAs on the VR instrument.
Suppose Bob has enrolled his voice-pattern in the TeSLA VR instrument. Next time he uses TeSLA, the VR instrument can verify his identity based on samples of his speech. There are three common common types of attacks that can be performed by another person, say Alice, on Bob’s voice-based identity:
Of these, replay-attacks are the easiest to perpetrate, and probably also the most common. The other two kinds of attacks require access to some sophisticated technology, but are not improbable in today’s world.
Listen to the following examples and see if you can distinguish the different kinds of attacks.
Genuine voice sample
Replay-attack sample
Speech-synthesis attack sample
Voice-conversion attack sample
In fact, in the voice-conversion example, indeed the voice of a woman has been converted to resemble that of the male speaker in the genuine voice sample.
One common way of analyzing a speech waveform is by looking at its spectrogram. Figure shows the spectrograms of the four speech samples above. The spectrograms indicate the following:
The voice-PAD instrument tries to detect attacks based on the spectrogram of the input sample. For this, the instrument is first trained, using a set of only trusted genuine samples from a large set of speakers, to recognize the spectral characteristics of genuine speech-samples. This kind of classifier, trained on only the genuine class, is called as a one-class classifier (OCC). OCCs are a good approach to PAD because they can deal with different kinds of attacks, without being explicitly trained to detect those kinds of attacks.
To be specific, a gaussian mixture-model (GMM) based OCC can be trained, to recognize genuine samples based on Mel-frequency cepstral coefficients (MFCC) and their derivative features. Once trained, the GMM produces a probability value for each input speech sample. When presented with a genuine sample, the output of the GMM will always be higher than a certain threshold, and for attack presentations it is expected to produce low probability values.
Idiap Team
FUNDED BY THE EUROPEAN UNION
TeSLA is not responsible for any contents linked or referred to from these pages. It does not associate or identify itself with the content of third parties to which it refers via a link. Furthermore TESLA is not liable for any postings or messages published by users of discussion boards, guest books or mailing lists provided on its page. We have no control over the nature, content and availability of any links that may appear on our site. The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.
TeSLA is coordinated by Universitat Oberta de Catalunya (UOC) and funded by the European Commission’s Horizon 2020 ICT Programme. This website reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.