Fast retrieval of multi-modal embeddings for e-commerce applications

Alessandro Abluton, Daniele Ciarlo, Luigi PORTINALE

Risultato della ricerca: Contributo su rivistaArticolo in rivistapeer review

Abstract

In this paper, we introduce a retrieval framework designed for e-commerce applications, which employs a multi-modal approach to represent items of interest. This approach incorporates both textual descriptions and images of products, alongside a locality-sensitive hashing (LSH) indexing scheme for rapid retrieval of potentially relevant products. Our focus is on a data-independent methodology, where the indexing mechanism remains unaffected by the specific dataset, while the multi-modal representation is learned beforehand. Specifically, we utilize a multi-modal architecture, CLIP, to learn a latent representation of items by combining text and images in a contrastive manner. The resulting item embeddings encapsulate both the visual and textual information of the products, which are then subjected to various types of LSH for balancing between result quality and retrieval speed. We present the findings of our experiments conducted on two real-world datasets sourced from e-commerce platforms, comprising both product images and textual descriptions. Promising results have been achieved, demonstrating favorable retrieval time and average precision. These results were obtained through testing the approach with a specifically selected set of queries and with synthetic queries generated using a Large Language Model.
Lingua originaleInglese
pagine (da-a)765-779
Numero di pagine15
RivistaInternational Journal of Knowledge-Based and Intelligent Engineering Systems
Volume28
Numero di pubblicazione4
DOI
Stato di pubblicazionePubblicato - 2024

Keywords

  • Multi-modal embeddings
  • e-commerce applications
  • locality sensitive hashing

Fingerprint

Entra nei temi di ricerca di 'Fast retrieval of multi-modal embeddings for e-commerce applications'. Insieme formano una fingerprint unica.

Cita questo