Questions tagged [large-language-models]

For questions about large language models (LLMs), i.e. language models that are "large" in size and the data they use.

Large Language Models (LLMs) is a collective term for large natural language models trained on large quantities of unlabelled text using self-supervised learning. Most notably, large language models include models such as BERT, GPT-(2, 3, 3.5, 4), LaMDA, Chinchilla, PaLM, and LLaMA. There is no formal definition for the term large language model.

249 questions

votes

1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?

asked Apr 23 '23 at 19:28

Robin van Hoorn

2,780
2
12
33

votes

4 answers

Why LLMs and RNNs learn so fast during inference but, ironically, are so slow during training?

Why LLMs learn so fast during inference, but, ironically, are so slow during training? That is, if you teach an AI a new concept in a prompt, it will learn and use the concept perfectly and flawless, through the whole prompt, after just one shot.…

recurrent-neural-networks meta-learning inference large-language-models

asked Mar 31 '23 at 12:19

MaiaVictor

votes

3 answers

Why are LLMs able to reproduce bodies of known text exactly?

Mathematically, I wouldn't expect LLMs to be able to reproduce source texts exactly unless the source text was the probable outcome given some prompt. However, I have now tested HuggingFaceH4/zephyr-7b-beta, TheBloke/Llama-2-7B-Chat-GGUF, and…

generative-model large-language-models text-generation

asked Jan 04 '24 at 19:35

Grant Curell

votes

3 answers

Why can't Lucene search be used to power LLM applications?

w.r.t. LLM applications using the RAG (retriever-augmented-generation) architecture, people have started taken it for granted that it will be powered by a vector database. e.g., see this: The most important piece of the preprocessing pipeline, from…

large-language-models

asked Jul 13 '23 at 21:14

morpheus

votes

5 answers

Are LLMs "lazy" in their responses?

(Just to preface that I do not have such a great understanding of LLMs and AI in general...) My question is, when I pose a question to a LLM, will it present the fastest response that satisfies the parameters of the query, irrespective whether it is…

large-language-models

asked Feb 27 '25 at 21:59

Oktarine

votes

3 answers

Are there strictly deterministic LLMs?

LLMs are understood to generate non-deterministic outputs. Are there LLMs out there that are capable to producing deterministic outputs for any given input given fixed parameters (like e.g temperature)? I heard that llama.cpp - if run on a CPU…

natural-language-processing large-language-models model-request

asked Dec 06 '23 at 15:34

user599464

votes

1 answer

Is PAC-unlearnability a fundamental limitation for LLM reasoning?

For simplicity, let’s focus on knowledge reasoning tasks with Yes/No answers. According to learning theory, even moderately complex knowledge reasoning tasks are PAC-unlearnable. This implies that no learning-based reasoning engine trained on a…

large-language-models reasoning pac-learning

asked Apr 03 '25 at 00:20

nova

votes

1 answer

How do LLMs tokenize python (significant whitespace)

I was learning about tokenization (WordPiece) and how there is a normalization step prior to that that will remove consecutive whitespace from the input text, since these are not significant normally. But that got we wonder how do LLMs still…

large-language-models tokenization

asked Mar 29 '25 at 20:33

Johannes Schaub - litb

votes

1 answer

Is natural language reasoning the right way to implement reasoning in AI?

It is well known that human reasoning, after evolving for at least several thousand years, has gradually transformed from natural language reasoning to formal reasoning. In modern science, a significant indicator of a discipline's maturity is…

large-language-models symbolic-ai reasoning

asked Dec 28 '24 at 05:36

jario

votes

2 answers

Are the model implementations in Hugging Face’s transformers library created by the original model authors or by Hugging Face?

I've been exploring the implementation of models like Llama in Hugging Face’s transformers library, for example: Hugging Face's Llama model implementation. I’m curious about how these implementations work: Are the model codes in Hugging Face’s…

machine-learning natural-language-processing transformer large-language-models huggingface

asked Sep 26 '24 at 05:27

mlibre

votes

1 answer

Why do LLM tokenizers use a special symbol for space such as Ġ in BPE or ▁ in SPM?

Popular tokenizers use a special symbol such as "Ġ" (BPE) or "▁" (SentencePiece) to represent space. What is the reasoning behind this? I did try searching for the answer. I got two types of explanations, but they don't explain anything to me. Some…

natural-language-processing large-language-models language-model

asked Mar 08 '24 at 08:55

Borislav Stanimirov

votes

3 answers

How do open source LLMs compare to GPT-4?

I have heard some back and forth regarding open source LLMs like Llama. I have heard that on certain benchmarks they perform close, the same or better than GPT-4, but caveats that they tend to lack the diversity and range of GPT-4, and also fail to…

transformer open-ai large-language-models gpt-4 open-source

asked Jul 09 '23 at 08:54

Julius Hamilton

votes

1 answer

Who invented DAN?

DAN was a prompt that went through many, many iterations during the initial months of ChatGPT’s release to the public. DAN is an acronym which stood for “Do Anything Now”, and was a prompt specifically designed to circumvent the grid lines OpenAI…

large-language-models prompt prompt-design

asked Jul 08 '23 at 11:24

Julius Hamilton

votes

3 answers

For an LLM model, how can I estimate its memory requirements based on storage usage?

It is easy to see the amount of disk space consumed by an LLM model (downloaded from huggingface, for instance). Just go in the relevant directory and check the file sizes. How can I estimate the amount of GPU RAM required to run the model? For…

large-language-models gpu hardware hardware-evaluation

asked Jun 14 '23 at 12:01

ahron

votes

2 answers

LLM-like architecture capable of dynamically learning from its own output

Language Learning Models (LLMs) have demonstrated remarkable capabilities in quick learning during inference. They can effectively grasp a concept from a single example and generate relevant outputs. However, a noticeable limitation of LLMs is their…

training recurrent-neural-networks large-language-models

asked Mar 30 '23 at 02:28

MaiaVictor

2 3

…

16 17 Next