IIT Madras unveiled TamilGPT, a large language model specifically trained on Tamil text including classical Sangam literature, medieval Bhakti poetry, and modern digital content totalling 50 billion tokens.
The model, developed by the Robert Bosch Centre for Data Science and AI at IIT Madras, outperforms GPT-4 and Gemini on Tamil-language benchmarks by a significant margin.
TamilGPT can understand and generate text in both classical Tamil (செம்மொழி) and modern colloquial Tamil, bridging the linguistic gap that global models struggle with.