# Sequential Recommender Systems Walk-ThroughBy modeling the same example, compare the difference of all SRS related methods

### Introduction
I am blessed to have a cat baby, and I enjoy baking a lot, however recently I started to lose weight. So my amazon records looks like this:
- Months 1–6: lots of cat stuff (litter mat, fountain, scratching post, treats, toys, carrier).
- Months 7–9: baking accessories (sheet pans, silicone mats, piping bags, cake turntable, mixer bowl).
- Last 2 weeks: workout kick (Theragun → whey protein → lifting gloves).

Apparently my purchases have sequential dependencies, which are unable to be captured by conventional recommendation systems, including collaborative filtering and content-based filtering, as they model/depict consumers by their interaction to the items and it's order-agnostic and it's also just pair-wise correlation between the consumer and items based on engagements (clicks, conversions etc).

And to model such sequential dependencies, there are a lot of different models from non-DNN, DNN, to latest LLM. This doc summarizes how each model works for my user behaviors. For difficulties/challenges/characteristics of sequential recommender, please refer to [Sequential Recommender Systems: Challenges, Progress and Prospects](https://arxiv.org/pdf/2001.04830) 

### Collaborative Filtering - Matrix-factorization CF
Matrix-Factorization CF learns a latent vector $p_u$ for each user. Although it's not frequency by category, but in practice, $p_u$ ends up to be:
```math
$$p_u \approx (Q^⊤C_uQ+λI)^{−1} Q^⊤C_ur_u​$$
```

Here:
* {{< math >}}$|I|${{< /math >}} is the size of items; $k$ is item factor dimensions.
* {{< math >}}$Q \in \mathbb{R}^{|I|\times k}${{< /math >}}: Item factor matrix. Row $i$ is the $k$-dim vector for item $i$. 
* {{< math >}}$C_u \in \mathbb{R}^{|I|\times |I|}${{< /math >}}: A **diagonal** confidence matrix (all non-diagonal = 0) of user {{< math >}}$u${{< /math >}} for items. {{< math >}}$c_{ui}${{< /math >}} is the confidence score of user $u$ with item $i$. It's normally derived from engagement data {{< math >}}$r_{ui}${{< /math >}} (the interaction #clicks, #conversions): 

```math
$$c_{ui} = 1 + \alpha r_{ui}$$
```

The term {{< math >}}$(Q^⊤C_uQ+λI)^{−1}${{< /math >}} is some sort of normalization, while $C_u$ and $r_u$ are both related to consumer engagement, therefore the learned consumer representations end up engagement (clicks/conversions) weighted item vectors. So in my case, my representation will be closest to cat supplies embeddings. Therefore when predict the next items, it will likely end up **cat supplies**. 

### Before DNN - Sequential pattern mining
Mine frequent patterns on sequence data, and then utilize the patterns for subsequent recommendations. Although simple and straightforward, but the patterns could be redundant. e.g. I am buying cat supplies monthly, while sometimes buying bake supplies in between. The pattern could be something like `cat_food` -> `cat_litter` -> `baking_supplies`. 

### Basic Markov Chain
The hypothesis is future purchase depends only on previous $k$ purchases. And a transition matrix will be learned, with each value represent how often from state {{< math >}}$i${{< /math >}} to {{< math >}}$j${{< /math >}}, and then run a row normalization (each row sum = 1). 
- First order chain is to compute {{< math >}}$ P(x_{t+1}=j | x_t = i)${{< /math >}}
- High order chain is to compute {{< math >}}$ P(x_{t+1}=j | x_t = t, x_{t-1} = t_1,... x_{t-k} = t_k)${{< /math >}}
  - And for high order chain, the predicted probability is:
   ```math
   $P(X,T) = P(x_1, ...x_k) \Pi_{t=k+1}^T P(x_t | x_{t-1}...x_{t-k})
   ```
The trainsaction matrix is something like:
```math
$$T_{ij} = \frac{\#(i \rightarrow j)}{\sum_{j'} \#(i \rightarrow j')}$$
```
Since my last item is lifting glove, the prediction will be the one with largest probability in transaction matrix row {{< math >}}$row_{(lifting glove)}${{< /math >}}. 

### Latent Markov Embedding based approach




# Deep Learning Era
## RNN-based approaches
Recurrent Neural Networks can capture longer-term dependencies in sequential data, making them more suitable for modeling complex user behavior patterns.

## Attention-based models
Modern attention mechanisms can focus on relevant parts of the purchase history when making recommendations.

# LLM Wave
## Large Language Models for recommendation
Recent advances in LLMs show promise for understanding complex user preferences and generating personalized recommendations based on natural language descriptions of user behavior.

# Conclusion
Sequential recommendation systems offer a more nuanced understanding of user behavior by considering the temporal order of interactions. While traditional methods focus on static user-item relationships, sequential models can capture evolving preferences and behavioral patterns over time.
