Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Machine learning (ML) architectures based on neural model have garnered considerable attention in the field of language classification. Code-mixing is a common phenomenon on social networking sites for exhibiting opinion on a topic. The code-mixed text is the approach of mixing two or more languages. This paper describes the application of the code-mixed index in Indian social media texts and compares the complexity to identify language at the word level using Bi-directional Long Short-Term Memory model. The major contribution of the work is to propose a technique for identifying the language of Hindi–English code-mixed data used in three social media platforms namely, Facebook, Twitter and WhatsApp. Here, we demonstrate that a special class of quantum LSTM network model is capable of learning and accurately predicting the languages used in social media texts. Our work paves the way for future applications of machine learning methods in quantum dynamics without relying on the explicit form of the Hamiltonian.
Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are studies on distributional properties of French text materials; problems specific to speech-to-text conversion particular of French; studies in phoneme-to-grapheme conversion for continuous, error-free phonemic strings; past work on isolated-word speech-to-text conversion; and more recent work on continuous-speech, speech-to-text conversion. Also demonstrated is the use of phone recognition for both language and speaker identification.
The continuous speech-to-text conversion for French is based on a speaker-independent, vocabulary-independent recognizer. In this paper phone recognition and word recognition results are reported evaluating this recognizer on read speech taken from the BREF corpus. The recognizer was trained on over 4 hours of speech from 57 speakers, and tested on sentences from an independent set of 19 speakers. A phone accuracy of 78.7% was obtained using a set of 35 phones. The word accuracy was 88% for a 1139 word lexicon and 86% for a 2716 word lexicon, with a word pair grammar with respective perplexities of 100 and 160. Using a bigram grammar, word accuracies of 85.5% and 81.7% were obtained with 5 K and 20 K word vocabularies, with respective perplexities of 122 and 205.
In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a “Visual Word” as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.
Spoken language identification is the process of recognising language in an audio segment and is the precursor for several technologies such as automatic call routing, language recognition, multilingual conversation, language parsing, and sentimental analysis. Language identification has become a challenging task for low-resource languages like Kashmiri and Ladakhi spoken in the UT’s of Jammu and Kashmir (JK) and Ladakh, India. This is mainly due to speaker variations like duration, moderator, and ambiance particularly when training and testing are done on different datasets whilst analysing the accuracy of language identification system in actual implementation, thus producing low accuracy results. In order to tackle this problem, we propose a hybrid convolutional bi-directional gated recurrent unit (Bi-GRU) utilising the effects of both static and dynamic behaviour of the audio signal in order to achieve better results as compared to state-of-the-art models. The audio signals are first converted into two-dimensional structures called Mel-spectrograms to represent the frequency distribution over time. To investigate the spectral behaviour of audio signals, we employ a convolutional neural network (CNN) that perceives Mel-spectrograms in multiple dimensions. The CNN-learned feature vector serves as input to the Bi-GRU that maintains the dynamic behaviour of the audio signal. Experiments are done on six spoken languages, i.e. Ladakhi, Kashmiri, Hindi, Urdu, English, and Dogri. The data corpora used for experimentation are the International Institute of Information Technology Hyderabad-Indian Language Speech Corpus (IIITH-ILSC) and the self-created data corpus for the Ladakhi language. The model is tested on two datasets, i.e. speaker-dependent and speaker-independent. Results show that when validating the efficiency of our proposed model on both speaker-dependent and speaker-independent datasets, we achieve optimal accuracies of 99% and 91%, respectively, thus achieving promising results in comparison to the state-of-the-art models available.