Pioneering the Use of Artificial Intelligence In Online Streams: Full Artificial Intelligence Co-Host (Second Host on Screen)
Abstract
The integration of artificial intelligence (AI) into live online streaming has opened new avenues for enhancing user engagement, particularly in platforms like Pyjam, which cater to adolescent audiences by facilitating real- time broadcasting of daily activities, creative talents, and interactive events. This paper pioneers a Full AI Co-Host framework, functioning as a live co-presenter that comments on ongoing events (e.g., reacting to a host's dance routine with “That's paw-some!”), interacts with the audience through chat responses (e.g., answering viewer questions like “What's your favorite joke?” with tailored humor), tells jokes to lighten the mood (e.g., “Why did the cat go to school? To improve its purr-sonal skills!”), and supports the human host by providing encouragement or filling silences (e.g., “Great point, host—let's hear from the chat!”). Designed as a cute kitten mascot with English speech capabilities, the AI co-host leverages multimodal analysis of video (object detection via YOLOv8-tiny), audio (transcription with Vosk), and chat data to generate context-aware responses in near-real time (~1–2 seconds latency), ensuring seamless integration into Pyjam's WebRTC-based streams.
Our implementation optimizes latency through keyframe sampling and parallel processing, achieving response times under 2 seconds for interactive features. Evaluation on 1000 simulated streams demonstrates 92% relevance in comments and 95% audience satisfaction in engagement metrics, with computational overhead limited to 20% GPU utilization for 50 concurrent streams. This framework not only boosts interactivity in adolescent-focused platforms but also sets a benchmark for AI-driven co-presentation in online streams, addressing challenges in real-time content generation and user retention.

