Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach

Billal MOKHTARI; Lilia MAHDID

Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach

dc.contributor.author	Billal MOKHTARI
dc.contributor.author	Lilia MAHDID
dc.date.accessioned	2025-01-21T21:18:45Z
dc.date.available	2025-01-21T21:18:45Z
dc.date.issued	2025-01-21
dc.description.abstract	With the fast expansion of Deep Learning, multi-modal models have become increasingly popular for tasks requiring complex data inputs. Content gen eration—such as image, video, or text generation—as well as recent object detection and segmentation methods, frequently use Large Language Mod els (LLMs). This project focuses on enhancing image and text similarity measures, aiming to improve the CLIP (Contrastive Language-Image Pre training) method by examining the impact of object semantics on image descriptions. Our approach, named ODITS (Object Driven Image and Text Similarity), uses the CLIP model pre-trained with the ViT-B/32 architecture, which is subsequently fine-tuned for our specific purposes. We evaluated the performance of the fine-tuned model using modified metrics, selecting the optimal checkpoint based on precision to minimize false associations between descriptions and images. Our findings indicate that this optimal checkpoint is 10% more precise than the original checkpoint. The weights from this model will be integrated into ODITS’s shared components with CLIP, providing a robust starting point for further optimization. The research component of the ODITS model, including theoretical and preliminary analysis, is also discussed, providing insights into its potential and areas for future development.
dc.identifier.uri	https://dspace.estin.dz/handle/123456789/31
dc.language.iso	en
dc.publisher	Tassadit
dc.subject	Image and Text Similarity
dc.subject	Multi-Modal models
dc.subject	CLIP
dc.subject	ODITS
dc.subject	Zero-Shot Learning
dc.subject	Large Language Models
dc.subject	Object Detection
dc.subject	Image Segmentation
dc.subject	Text Recognition and Spotting
dc.title	Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Image_Text_Similarity_using_Deep__Learning_Object_Detection_and__Word_Spotting_Approach_Report - Billal MOKHTARI.pdf
Size:: 4.5 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Projet de Fin d'Études : Mémoire d'Ingénieur