0

I've built an agent-based model where agents are placed on an NK-landscape and tasked with finding the best fitting solution (highest point). To move around the landscape agents can choose between two different climbing methods. These climbing methods determine how solutions are generated. For example, one method is a simple hill climb comparing current fit with the fit of a potential solution in the local neighborhood, while another uses memory of past solutions as well to evaluate current solutions. My goal is to see if the optimal climbing method can be learned from maneuvering around the landscape and experimenting with both of the methods, and more broadly, whether there is an optimal policy for agents to follow that would allow them to outperform agents simply exploiting one of the two strategies.

To generate the NK landscape you take draws from a uniform distribution [0,1] and these make up the individual payoff values of each bit in the solution. Then, using a function that maps the dependencies between bits a fitness value is generated. I'm not sure whether this meaningfully changes the distribution of the original values or not.

I have been looking into Thompson sampling as a way to solve this problem; however, I am concerned, and my initial results suggest that the algorithm may not work with how my data is structured. I am currently using a Gaussian Thompson Sampling algorithm for unknown mean and variance based on these equations.

With all of this in mind, I believe that my data violates many of the assumptions of Thompson Sampling. Is Thompson Sampling robust enough to be able to work in this scenario?

0 Answers0