For reference, I've been playing around with Mistral 7B v0.1 and v0.3 but did not like that I was limited by A100 availability on google colab. so I wanted to try 4 bit and 8 bit quantized models. but they are drastically slow.
at first it thought the models were not on GPU but that does not seem to be the case. I even used other ppl's Jupyter notebooks and they were just as slow.
side note: it was almost impossible to find any 8-bit code for mistral. I found some but only one. maybe I wasn't looking too hard but I found that interesting that it was not as easy to find as 4-bit related code.