Research

What we're working on

Human-AI interaction is more than a single model. It's the interplay of perception, reasoning, latency, and conversational intuition. We're working across each of these dimensions to make multimodal agents feel less like software and more like presence.

Seamless Multimodality

Architecting a unified runtime that transitions fluidly between voice-only and full multimodal interaction — vision, text, and audio — without session disruption or perceptible mode-switching latency.

Latency & Responsiveness

Reducing end-to-end pipeline latency across transcription, reasoning, and synthesis. Building toward sub-perceptual response times that make conversations feel truly synchronous.

Conversational Dynamics

Improving bi-directional conversation quality through better interruption handling, speech endpoint detection, and turn-taking models that understand human conversational rhythm.

Purpose-Built Models

Fine-tuning frontier models for specific interaction modalities while simultaneously developing in-house models from the ground up — each optimized for a distinct piece of the conversational pipeline.

Interested in our work or exploring collaboration?

[email protected]
Mazed | The #1 Multimodal AI Agent Platform