When I use llama-cli, I ask models questions and they generate tokens. I see the tokens appear as they model generates them. The model randomly selects the tokens based on the random seed.
But what I want to do is to see a list of tokens, and then I select the tokens from the list, then let the LLM continue to prompt me another list of next-tokens, and so on, so that I construct my own sentences.
Is there a way to do this with llama-cpp-python?
That is related, but not specific to llama-cpp-python. This is the same, except specific to llama-cpp.