Is a quantized Mistral LLM slower than non-quantized Mistral LLM?

Asked Jul 15 '24 at 05:10

Active Jul 15 '24 at 05:10

Viewed 86 times

For reference, I've been playing around with Mistral 7B v0.1 and v0.3 but did not like that I was limited by A100 availability on google colab. so I wanted to try 4 bit and 8 bit quantized models. but they are drastically slow.

at first it thought the models were not on GPU but that does not seem to be the case. I even used other ppl's Jupyter notebooks and they were just as slow.

side note: it was almost impossible to find any 8-bit code for mistral. I found some but only one. maybe I wasn't looking too hard but I found that interesting that it was not as easy to find as 4-bit related code.

asked Jul 15 '24 at 05:10

user23971971

Is a quantized Mistral LLM slower than non-quantized Mistral LLM?

0 Answers0