MUMBAI, India, Jan. 2 -- Intellectual Property India has published a patent application (202541123664 A) filed by Vit-Ap University, Amaravati, Andhra Pradesh, on Dec. 8, 2025, for 'system and method for automated image caption generation using deep learning models.'
Inventor(s) include M Balakrishna Mallapu; and Deepthi Godavarthi.
The application for the patent was published on Jan. 2, under issue no. 01/2026.
According to the abstract released by the Intellectual Property India: "The present disclosure relates to a system for automatic image caption generation. The system (102) receives a digital image from an image source such as a camera, local storage device, network-connected database, or image dataset. The received image is pre-processed through operations including gray scaling, resizing, normalization, or noise reduction. The system (102) extracts visual feature representations from pre-processed image using a Vision Transformer (ViT) model. A Transformer-based decoder generates a natural language caption by comparing the extracted visual feature representations to learned language patterns stored in memory, the learned patterns obtained through training on large-scale image-caption datasets. The generated caption is evaluated using caption quality metrics stored in the memory (204), and a syntactically coherent and semantically meaningful textual description of the image is output based on the evaluation. The system (102) improves recognition of fine-grained objects and overall scene context, enabling more accurate and fluent automatic caption generation."
Disclaimer: Curated by HT Syndication.