Image Text Similarity using Deep Learning Object Detection and Word Spotting Approach

dc.contributor.authorBillal MOKHTARI
dc.contributor.authorLilia MAHDID
dc.date.accessioned2025-01-21T21:18:45Z
dc.date.available2025-01-21T21:18:45Z
dc.date.issued2025-01-21
dc.description.abstractWith the fast expansion of Deep Learning, multi-modal models have become increasingly popular for tasks requiring complex data inputs. Content gen eration—such as image, video, or text generation—as well as recent object detection and segmentation methods, frequently use Large Language Mod els (LLMs). This project focuses on enhancing image and text similarity measures, aiming to improve the CLIP (Contrastive Language-Image Pre training) method by examining the impact of object semantics on image descriptions. Our approach, named ODITS (Object Driven Image and Text Similarity), uses the CLIP model pre-trained with the ViT-B/32 architecture, which is subsequently fine-tuned for our specific purposes. We evaluated the performance of the fine-tuned model using modified metrics, selecting the optimal checkpoint based on precision to minimize false associations between descriptions and images. Our findings indicate that this optimal checkpoint is 10% more precise than the original checkpoint. The weights from this model will be integrated into ODITS’s shared components with CLIP, providing a robust starting point for further optimization. The research component of the ODITS model, including theoretical and preliminary analysis, is also discussed, providing insights into its potential and areas for future development.
dc.identifier.urihttps://dspace.estin.dz/handle/123456789/31
dc.language.isoen
dc.publisherTassadit
dc.subjectImage and Text Similarity
dc.subjectMulti-Modal models
dc.subjectCLIP
dc.subjectODITS
dc.subjectZero-Shot Learning
dc.subjectLarge Language Models
dc.subjectObject Detection
dc.subjectImage Segmentation
dc.subjectText Recognition and Spotting
dc.titleImage Text Similarity using Deep Learning Object Detection and Word Spotting Approach
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Image_Text_Similarity_using_Deep__Learning_Object_Detection_and__Word_Spotting_Approach_Report - Billal MOKHTARI.pdf
Size:
4.5 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: