1

The code below is adapted from this implementation.

from math import floor

basehash = hash

class IHT: "Structure to handle collisions" def init(self, sizeval): self.size = sizeval self.overfullCount = 0 self.dictionary = {}

def count (self):
    return len(self.dictionary)

def getindex (self, obj, readonly=False):
    d = self.dictionary
    if obj in d: return d[obj]
    elif readonly: return None
    size = self.size
    count = self.count()
    if count >= size:
        if self.overfullCount==0: print('IHT full, starting to allow collisions')
        self.overfullCount += 1
        return basehash(obj) % self.size
    else:
        d[obj] = count
        return count

def hashcoords(coordinates, m, readonly=False): if type(m)==IHT: return m.getindex(tuple(coordinates), readonly) if type(m)==int: return basehash(tuple(coordinates)) % m if m==None: return coordinates

def tiles(ihtORsize, numtilings, floats, ints=[], readonly=False): """returns num-tilings tile indices corresponding to the floats and ints""" qfloats = [floor(f*numtilings) for f in floats] Tiles = [] for tiling in range(numtilings): tilingX2 = tiling*2 coords = [tiling] b = tiling for q in qfloats: coords.append( (q + b) // numtilings ) b += tilingX2 coords.extend(ints) Tiles.append(hashcoords(coords, ihtORsize, readonly)) return Tiles

if name == 'main': tc=IHT(4096) tiles = tiles(tc, 8, [0, 0.5], ints=[], readonly=False) print(tiles)

I'm trying to figure out how the function tiles() works. It implements tile coding, which is explained in "Reinforcement Learning: An Introduction" (2020) Sutton and Barto on page 217.

So far I've figured out:

  • qfloats rescales the floating numbers to be the largest integer less than or equal to the original floating number, these are then re-scaled by the number of tilings.

  • Then, for each tiling, a list is created, the first element of the list is the tiling number, followed by the coordinates of the floats for that tiling, i.e. each of the re-scaled numbers is offset by the tiling number, b and integer divided by numtilings.

  • Finally, hashcoords first checks the dictionary d to see if this list has appeared before, if it has it returns the number relating to that list. It not it either creates a new entry with that list at the key and the count as the value or if the count is more than or equal to the size it adds one to overfullCount and returns basehash(obj) % self.size.

I'm struggling to understand two parts:

  1. What is the tilingX2 part doing?

  2. Why is tilingX2 added to b after the first coordinate has been calculated? It seems to me that each coordinate should be treated separately

  3. And why by a factor of 2?

  4. What is the expression basehash(obj) % self.size doing? I'm quite new to the concept of hashing. I know that, generally, they create a unique number for a given input (up to a limit), but I'm really struggling to understand what is going on in the line above.

nbro
  • 42,615
  • 12
  • 119
  • 217
Bazman
  • 111
  • 2

0 Answers0