India’s Sarvam AI beats Google Gemini and ChatGPT, tops this task

Bengaluru-based startup Sarvam AI has brought glory to India worldwide. Sarvam AI has launched an OCR tool called Vision that surpasses tools like Gemini and ChatGPT when it comes to reading documents in Indian languages.

Sarvam AI
India’s Sarvam AI beats Google Gemini and ChatGPT, tops this task

Sarvam AI: When it comes to building AI models, the first names that probably come to mind are the US or China. But you’ll be proud to know that an Indian AI model has surpassed even giants like Gemini and ChatGPT. In fact, a groundbreaking model from Sarvam AI, a Bengaluru-based startup, has achieved this feat. This week, two of its tools, Sarvam Vision and Bulbul, are making headlines. Sarvam AI has launched Vision, an OCR tool that outperforms tools like Gemini and ChatGPT in reading documents in Indian languages, and also Bulbul V3, which excels in AI voice generation. Let’s learn more in detail…

Setting New Benchmarks in OCR

According to an India Today report, Sarvam Vision is outperforming large and popular AI models like ChatGPT, Google Gemini, and Anthropic Claude on several Optical Character Recognition (OCR) benchmarks, which is its area of ​​specialization. Its performance is so impressive that it’s receiving praise from both users and experts.

Sarvam AI co-founder Pratyush Kumar recently detailed the achievements of the company’s in-house AI models in several posts on X. According to the company, Sarvam Vision achieved an accuracy score of 84.3 percent on OmniOCR-Bench. This score is higher than Gemini 3 Pro and recent OCR models like DeepSeek OCR v2, while ChatGPT ranked significantly lower.

Sarvam AI co-founder Pratyush Kumar posts on X

In addition, Sarvam Vision also scored well on OmniDocBench v1.5, a benchmark that tests how AI systems read and understand real-world documents. It scored 93.28 percent overall, with particularly strong results in complex layouts, technical tables, and mathematical formulas, areas where traditional OCR systems often struggle due to poor formatting and dense content.

The AI ​​tool’s performance has attracted worldwide attention. Sarvam, which was previously questioned for its focus on Indian language models, is now receiving praise.

Tech commentator DD Das, who previously questioned the value of building small Indian language models, recently admitted he underestimated the company. In a post on X, Das said that Sarvam’s OCR and speech models for Indian languages ​​are strong and fill a gap that large global AI labs have largely ignored.

“I was wrong about Sarvam. When I wrote about them a year ago, I thought training small Indian language models was the wrong direction. But wow, they’ve done amazing things,” he wrote. “They have the best text-to-speech, speech-to-text, and OCR models for Indian languages, and they’re truly valuable. The price is also very reasonable.”

It has also received praise from users. One user shared their experience with Sarvam’s models, writing, “I used this a few days ago! Wow.”

Bulbul Brings AI Voices to Indian Languages

In addition to the OCR tool, Sarvam has also launched its new AI voice model, Bulbul V3. This is a text-to-speech AI model that aims to create audio using AI. In a way, it’s similar to AI tools from companies like ElevenLabs, which are considered the best in this field.

Sarvam said in a blog post, “Today we’re launching Bulbul V3, our most capable text-to-speech model, designed to deliver natural, expressive, and production-ready voices for Indian languages. Bulbul V3 reduces the chance of failure and delivers content-appropriate, stable speech to the inputs required for India’s specific use case.”

Currently, the tool supports over 35 voices in 11 Indian languages. The company says they plan to expand language support to a total of 22 languages.

Bulbul is also receiving praise. KissanAI founder Pratik Desai wrote on X, “We use Bulbul as our preferred TTS model for our Indian use case, and it’s gotten better with each new release. Meanwhile, ElevenLabs’ pricing has never felt right for Indian or any other language.”

Rithanya

Rithanya has been a professional 'Blog Post' writer since 2019. She loves reading stories and playing badminton. She completed a B.E. in Information and Technology.

Exit mobile version