1

I've been reading this paper on recommendation systems using reinforcement learning (RL) and knowledge graphs (KGs).

To give some background, the graph has several (finitely many) entities, of which some are user entities and others are item entities. The goal is to recommend items to users, i.e. to find a recommendation set of items for every user such that the user and the corresponding items are connected by one reasoning path.

I'm attaching an example of such a graph for more clarity (from the paper itself) -

enter image description here

In the paper above, they say

First, we do not have pre-defined targeted items for any user, so it is not applicable to use a binary reward indicating whether the user interacts with the item or not. A better design of the reward function is to incorporate the uncertainty of how an item is relevant to a user based on the rich heterogeneous information given by the knowledge graph.

I'm not able to understand the above extract, which talks about the reward function to use - binary, or something else. A detailed explanation of what the author is trying to convey in the above extract would really help.

nbro
  • 42,615
  • 12
  • 119
  • 217
stoic-santiago
  • 1,201
  • 9
  • 22

2 Answers2

2

I found the answer further into the paper (section 3.2 Formulation as Markov Decision Process)! I'll post it here for everyone.

Given any user, there is no pre-known targeted item in the KGRE-Rec problem, so it is unfeasible to consider binary rewards indicating whether the agent has reached a target or not. Instead, the agent is encouraged to explore as many “good” paths as possible. Intuitively, in the context of recommendations, a “good” path is one that leads to an item that a user will interact with, with high probability. To this end, we consider to give a soft reward only for the terminal state $s_{T}=\left(u, e_{T}, h_{T}\right)$ based on another scoring function $f(u, i)$. The terminal reward $R_{T}$ is defined as

$$ R_{T}= \begin{cases}\max \left(0, \frac{f\left(u, e_{T}\right)}{\max _{i \in I} f(u, i)}\right), & \text { if } e_{T} \in \mathcal{I} \\ 0, & \text { otherwise }\end{cases} $$

where the value of $R_{T}$ is normalized to the range of $[0,1] . f(u, i)$ is also introduced in the next section.

Details about $f(u,i)$, the scoring function, can be found in the paper.

nbro
  • 42,615
  • 12
  • 119
  • 217
stoic-santiago
  • 1,201
  • 9
  • 22
0

to incorporate the uncertainty... I'm not able to understand the above extract,

usually, uncertainty is difined by variance... therefore you can add some noise - see here

Moreover, the concept of natural noise is an emerging trend that is related to inconsistent behaviour of users

-> read Conclusion

the results obtained from experimental section indicate that it is possible to improvethe accuracy of recommendation methods through the detection and correction of noisy ratings

for details - read the whole...

though algorithms can differ... another possible implementation is described here... it's up to data-analyst what tools for analysis to choose

JeeyCi
  • 101
  • 1