MUMBAI, India, June 24 -- Intellectual Property India has published a patent application (202417105177 A) filed by Sense Insights Inc. on December 31, 2024, for System And Method For Extracting And Categorizing Information From Online Sources.

Inventors include Selvaraj, Ernest Kirubakaran; Golsefid, Samira; Bajaria, Viral Tarun; Chilloji, Satish Arjun; Shah, Akshay Rajendra; Sekar, Amresh; and Sunwalka, Shubham Kumar.

The application for the patent was published on June 12, 2026, under issue no. 24/2026.

Abstract: A system and method for efficiently extracting and categorizing business information from online sources is disclosed. The system comprises a web crawler that obtains company domains from a database and collects depth-1 URLs from company homepages. A classification model, utilizing a fine-tuned BERT architecture, predicts which URLs contain relevant information for generating tags. A content extractor then extracts content from these predicted URLs using one or more modules. Finally, a large language model (LLM) processes the extracted content and generates tags using custom prompts designed for each tag category. These prompts are tailored to the nature of the extracted content, enhancing the context provided to the LLM. This multi-stage approach addresses challenges in processing large-scale, unstructured business data from diverse web sources, potentially offering improved efficiency, scalability, and accuracy in automated business intelligence gathering.

Disclaimer: Curated by HT Syndication.