Enhancing Fashion Recommendation using Visual, Textual and Temporal Features

Last updated on Mar 7, 2023

Depiction of the pre-processing adapted by the research work

The proposed work intends to tackle the problem of high sparsity associated with the user behavioral data in the fashion domain by incorporating external factors for training the recommender. More specifically, the work wants to explore how incorporating textual, visual, item, and temporal features can improve performance over conventional collaborative models and aims to eliminate the long-tail problem common in fashion recommendation systems.

Models implemented

User-Item Interaction History: The work starts its experiments by using the user behaviour history to build a list of models which can help predict the ratings for an unseen user-item pair in the test set. Models to be implemented:
1. Item Similarity Based Rating : A basic collabora- tive filtering implementation that employs item-item similarity and the mean ratings to make the predictions.
2. Latent Factor Model : The implementation imple- ments matrix factorization to create latent factor vectors for the users and items from the rating ma- trix. The prediction function assists in determining the rating.
3. SVD++ : An extension of the famous Singular Value Decomposition(SVD) implementation from the Netflix prize, which also considers the implicit ratings. A probabilistic Matrix Factorization technique to obtain the latent vectors.
Textual Features Models are built around the textual features associated with user reviews and item meta- data. More specifically the proposed work attempts to design:
1. Rating Regression with Text input: Textual vectors embeddings are generated from the user text reviews. These embeddings act as features for training the regressor to predict the rating given a new review text.
2. Recommendation using Textual Compatibility Model:The title of the fashion items is used as a feature to estimate product similarity. Rating is then calculated using the similarity scores and heuristics defined.
Image Features
1. The model extends the idea adapted by the model us- ing textual features. Images of the items are used to compute item-item compatibility. Pre-trained Con- volutional Neural Network models are deployed to fetch essential elements from each product image. These features are now used for estimating item- item similarity.
The Factorization Machine The research work implements the factorization ma- chine(FM) to extend the approach of conventional la- tent factor models to incorporate pairwise interactions between users, items, and external features. Th e fea- tures the work will include in the FM are:
1. Temporal Features : Day of the week and Month of the year.
2. Color of the item

The visual compatibility-based recommender achieves a 95.58% decrease in the Mean Squared Error metric compared to the trivial baseline predictor. On the contrary, textual feature-based and the factorization machine models perform relatively poorly.

Enhancing Fashion Recommendation using Visual, Textual and Temporal Features

Niraj Yagnik

Head of Machine Learning