The paper basically investigates the use of LLMs as a ranking agent for recommendation systems. And in the process, finds a few interesting observations about LLMs, like - position bias, popularity bias. They also propose some ideas like - prompting and bootstrapping strategies to fix these.
filter out users and items with fewer than 5 interactions
Give historical behavior as context. For each ground-truth item, we randomly retrieve m − 1 items from the entire item set I as negative instances, where m = 20. These candidate items are then randomly shuffled before constructing the prompts.
Prompt: “[pattern that contains sequential historical interactions H] [pattern that contains retrieved candidate items C] Please rank these movies by measuring the possibilities that I would like to watch next most, according to my watching history. You MUST rank the given candidate movies. You cannot generate movies that are not in the given candidate list.”.

Now, results with different prompting strategies:
Sequential prompting
“I’ve watched the following movies in the past in order: ’0. Multiplicity’, ’1. Jurassic Park’, . . .”
Recency focused prompting → newly proposed
“I’ve watched the following movies in the past in order: ’0. Multiplicity’, ’1. Jurassic Park’, . . .. Note that my most recently watched movie is Dead Presidents. . . .”
In-context learning (ICL) → newly proposed
“ If I’ve watched the following movies in the past in order: ’0. Multiplicity’, ’1. Jurassic Park’, . . ., then you should recommend Dead Presidents to me and now that I’ve watched Dead Presidents, then . . .”.

use different candidate generation method (These candidates can be viewed as hard negatives for ground truth items)

top-3 candidates of 7 candidate generation is merged into one candidate set, containing a total of 21 items. (ground truth is not explicitly included like previous setting)