Summary

The paper basically investigates the use of LLMs as a ranking agent for recommendation systems. And in the process, finds a few interesting observations about LLMs, like - position bias, popularity bias. They also propose some ideas like - prompting and bootstrapping strategies to fix these.

Dataset

filter out users and items with fewer than 5 interactions

Experiment

Setting 1

Give historical behavior as context. For each ground-truth item, we randomly retrieve m − 1 items from the entire item set I as negative instances, where m = 20. These candidate items are then randomly shuffled before constructing the prompts.

Prompt: “[pattern that contains sequential historical interactions H] [pattern that contains retrieved candidate items C] Please rank these movies by measuring the possibilities that I would like to watch next most, according to my watching history. You MUST rank the given candidate movies. You cannot generate movies that are not in the given candidate list.”.

Untitled

Now, results with different prompting strategies:

Untitled

Setting 2

use different candidate generation method (These candidates can be viewed as hard negatives for ground truth items)

Untitled

Setting 3

top-3 candidates of 7 candidate generation is merged into one candidate set, containing a total of 21 items. (ground truth is not explicitly included like previous setting)