Highest Voted 'quantization' Questions - Artificial Intelligence Stack Exchange

1

vote

2 answers

How do quantized models manage to be fast while still being quantized in memory?

As I understand it, modern CPUs and GPUs are highly optimized for the following calculations: arithmetic on floating point numbers (8, 16, 32 or 64 bits) arithmetic on integers (8, 16, 32 or 64 bits) So all arithmetic is optimized for bytes, not…

gpu quantization

asked Feb 05 '25 at 07:39

Jonas Sourlier

161
5

0

votes

0 answers

Activations quantization in BitNet paper

I am having a closer look at the BitNet paper (arXiv:2310.11453v1). For quantizing the activations to b-bit, they use absmax to quantize to the range $[-Q_b, +Q_b]$. So, $$ \tilde{x} = Quant(x) = Clip(x \times \frac{Q_b}{\gamma}, -Q_b + \epsilon, …

large-language-models deep-neural-networks activation-functions research quantization

asked May 17 '25 at 05:15

ahron

265
2
7

0

votes

1 answer

What types of quantization will improve LLM inference latency and throughput?

Quantization is the mapping of values in a high-precision representation to a low-precision one. I observed that the either the weights of the model or the activation values, or both could be quantized in different techniques. While quantization is…

large-language-models inference quantization

asked Sep 06 '24 at 06:47

Tom Lin

1

0

votes

0 answers

Is a quantized Mistral LLM slower than non-quantized Mistral LLM?

For reference, I've been playing around with Mistral 7B v0.1 and v0.3 but did not like that I was limited by A100 availability on google colab. so I wanted to try 4 bit and 8 bit quantized models. but they are drastically slow. at first it thought…

mistral quantization

asked Jul 15 '24 at 05:10

user23971971

1

0

votes

3 answers

Does 1-bit quantization (layers with boolean tensors) machine learning exist?

Does 1-bit quantization machine learning exist? Pytorch's docs on "Quantization" define it as: techniques for performing computations and storing tensors at lower bitwidths than floating point precision. torch.bool tensors exist with Pytorch, but…

machine-learning pytorch tensor layers quantization

asked Jun 01 '24 at 04:52

Geremia

555
1
5
12

Questions tagged [quantization]

How do quantized models manage to be fast while still being quantized in memory?

Activations quantization in BitNet paper

What types of quantization will improve LLM inference latency and throughput?

Is a quantized Mistral LLM slower than non-quantized Mistral LLM?

Does 1-bit quantization (layers with boolean tensors) machine learning exist?