Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach

Billal MOKHTARI; Lilia MAHDID

Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach

Files

Date

2025-01-21

Authors

Billal MOKHTARI

Lilia MAHDID

Publisher

Tassadit

Abstract

With the fast expansion of Deep Learning, multi-modal models have become increasingly popular for tasks requiring complex data inputs. Content gen eration—such as image, video, or text generation—as well as recent object detection and segmentation methods, frequently use Large Language Mod els (LLMs). This project focuses on enhancing image and text similarity measures, aiming to improve the CLIP (Contrastive Language-Image Pre training) method by examining the impact of object semantics on image descriptions. Our approach, named ODITS (Object Driven Image and Text Similarity), uses the CLIP model pre-trained with the ViT-B/32 architecture, which is subsequently fine-tuned for our specific purposes. We evaluated the performance of the fine-tuned model using modified metrics, selecting the optimal checkpoint based on precision to minimize false associations between descriptions and images. Our findings indicate that this optimal checkpoint is 10% more precise than the original checkpoint. The weights from this model will be integrated into ODITS’s shared components with CLIP, providing a robust starting point for further optimization. The research component of the ODITS model, including theoretical and preliminary analysis, is also discussed, providing insights into its potential and areas for future development.

Keywords

Image and Text Similarity, Multi-Modal models, CLIP, ODITS, Zero-Shot Learning, Large Language Models, Object Detection, Image Segmentation, Text Recognition and Spotting

URI

https://dspace.estin.dz/handle/123456789/31

Collections

Projet de Fin d'Études : Mémoire d'Ingénieur

Full item page

Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections