M Tanusri
Research Assistant, Osmania University College Of Engineering, India
Publications
-
Research Article
Unified Speech-To-Speech Models for Real-Time, Multilingual, and Emotionally Aware AI
Author(s): Vansh Kumar* and M Tanusri
This paper presents a novel speech-to-speech (S2S), a 250 Billion parameter AI model built on a multimodal AI foundation, Vision [16]. The model is trained to natively understand and generate speech while preserving prosody, emotional nuance, and speaker-specific characteristics, enabling fully end-to-end, real-time conversational interactions. Unlike traditional cascaded systems that rely on separate ASR, LLM, and TTS components, our model integrates speech understanding, reasoning, and generation within a unified framework, minimizing latency and mitigating error propagation. The system is trained on 400,000+ hours of multilingual conversational and expressive speech, supporting over 200 languages, including all major Indian languages, and is capable of cross-lingual prosody adaptation. Evaluations on extensive benchmarks demonstrate state-of-the-art performance in technical reasoni.. Read More»

