Implementation Approach for Duplicate Image Identification and Removal

Zaw Ye Htet; Tin Shine Aung

Implementation Approach for Duplicate Image Identification and Removal

Subject: Information Technology

Type: Research Article

Reviews:

0 (0)

3751 3063

Volume : 2 Issue : 1 2024
Page Number : 11-17
Publication : ISRDO

Published Manuscript

Title

Implementation Approach for Duplicate Image Identification and Removal

Author

1. Zaw Ye Htet, Student, Yangon Technological University, Myanmar
2. Tin Shine Aung, Lecturer, Yangon Technological University, Myanmar

Abstract

This paper presents a systematic approach for identifying and removing duplicate images from various 3D image format collections. The identification process considers image structure, density, meta descriptions, and other properties. The system employs a preprocessing module to standardise and extract meta descriptions from diverse formats like STL, OBJ, FBX, and others. A vector database, utilising tools like FAISS or Milvus, stores the image vectors and meta descriptions for efficient similarity searches. Deep learning models, particularly Convolutional Neural Networks (CNNs), are trained to extract image features and compare vectors using cosine similarity or Euclidean distance. An integrated search engine allows users to find similar images by uploading an image and its meta description. A human validation interface is provided for manual confirmation of potential duplicates. This approach ensures efficient management and retrieval of 3D images while enhancing storage utilisation. Future work will further explore alternative models and similarity measures to improve system accuracy and efficiency.

Keywords

Duplicate image identification 3D image formats image structure image density meta descriptions preprocessing module Vision Transformers

Conclusion

This article details an implementation technique for finding and deleting duplicate photos in 3D image collections. It offers a comprehensive and scalable solution to a significant issue that many businesses encounter. The system efficiently extracts picture characteristics and compares vectors using cosine or geometric distance metrics using deep learning methods, namely Convolutional Neural Networks (CNNs). Duplicate detection using structure, density, and meta-descriptions is undoubtedly accurate.

The system's preprocessing module standardises various 3D image formats and extracts necessary meta-descriptions, facilitating consistent and accurate processing. Utilising a vector database like FAISS or Milvus, the system efficiently stores and retrieves image vectors, enabling rapid and precise similarity searches. Including a search engine allows users to find similar images by uploading an image and its meta description, further enhancing the system's utility.

The human validation interface is a crucial system feature, allowing users to manually confirm or reject potential duplicates flagged by the AI. This human-centric approach ensures that the system can handle ambiguous cases and meet user expectations. Integrating user feedback helps continuously refine and improve the system's accuracy.

Exploring alternative approaches, such as using Transformer-based models like Vision Transformers (ViTs) for image feature extraction, offers promising avenues for further improvement. Additionally, considering different similarity measures, such as Jaccard similarity and Hamming distance, and leveraging Natural Language Processing (NLP) techniques for meta-description analysis can enhance the system's performance and accuracy.

In conclusion, the presented approach improves storage efficiency and enhances the accuracy and speed of duplicate image identification and removal. Future work will explore these alternative models and similarity measures to further optimise the system, ensuring it remains a robust solution for managing extensive collections of 3D images.

Author Contrubution

The study's design, data collection, result analysis, and manuscript preparation were entirely managed by the author.

Funding

No grants from public, commercial, or non-profit funding agencies supported the research, authorship, or publication of this article.

Conflict of Interest

The authors disclose no conflicts of interest in relation to this work.

Data Sharing Statement

There are no data available for sharing in this work.

Software And Tools Use

The research did not involve the use of any particular software or tools.

Acknowledgements

My gratitude goes to those who assisted in this study and manuscript preparation, and to the anonymous reviewers for their constructive insights.

Corresponding Author

Zaw Ye Htet

Yangon Technological University, Student, Myanmar

Tin Shine Aung

Yangon Technological University, Lecturer, Myanmar

Copyright

Copyright: ©2026 Corresponding Author. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MLA

Htet, Zaw Ye, and Aung, Tin Shine. “Implementation Approach for Duplicate Image Identification and Removal.” Scientific Research Journal of Science, Engineering and Technology, vol. 2, no. 1, 2024, pp. 11-17, https://isrdo.org/journal/SRJSET/currentissue/implementation-approach-for-duplicate-image-identification-and-removal

APA

Htet, Z., & Aung, T. (2024). Implementation Approach for Duplicate Image Identification and Removal. Scientific Research Journal of Science, Engineering and Technology, 2(1), 11-17. https://isrdo.org/journal/SRJSET/currentissue/implementation-approach-for-duplicate-image-identification-and-removal

Chicago

Htet Zaw Ye and Aung Tin Shine, Implementation Approach for Duplicate Image Identification and Removal, Scientific Research Journal of Science, Engineering and Technology 2, no. 1(2024): 11-17, https://isrdo.org/journal/SRJSET/currentissue/implementation-approach-for-duplicate-image-identification-and-removal

2345

Total words

846

Unique Words

122

Sentence

18.614754098361

Avg Sentence Length

0.1914220660942

Subjectivity

0.039078194119178

Polarity

Text Statistics

Flesch Reading Ease : 28.13
Smog Index : 15
Flesch Kincaid Grade : 13.7
Coleman Liau Index : 17.4
Automated Readability Index : 16.7
Dale Chall Readability Score : 8.52
Difficult Words : 439
Linsear Write Formula : 19.75
Gunning Fog : 11.17
Text Standard : 16th and 17th grade

Viewed / Downloads

Total article views: 187 (including HTML, PDF, and XML)

HTML	PDF	XML	Total
91	82	14	187

Viewed (geographical distribution)

Total article views: 187 (including HTML, PDF, and XML)
Thereof 187 with geography defined and 0 with unknown origin.

No records found.