Questions tagged [large-language-models]

For questions about large language models (LLMs), i.e. language models that are "large" in size and the data they use.

Large Language Models (LLMs) is a collective term for large natural language models trained on large quantities of unlabelled text using self-supervised learning. Most notably, large language models include models such as BERT, GPT-(2, 3, 3.5, 4), LaMDA, Chinchilla, PaLM, and LLaMA. There is no formal definition for the term large language model.

249 questions
34
votes
1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?
13
votes
4 answers

Why LLMs and RNNs learn so fast during inference but, ironically, are so slow during training?

Why LLMs learn so fast during inference, but, ironically, are so slow during training? That is, if you teach an AI a new concept in a prompt, it will learn and use the concept perfectly and flawless, through the whole prompt, after just one shot.…
11
votes
3 answers

Why are LLMs able to reproduce bodies of known text exactly?

Mathematically, I wouldn't expect LLMs to be able to reproduce source texts exactly unless the source text was the probable outcome given some prompt. However, I have now tested HuggingFaceH4/zephyr-7b-beta, TheBloke/Llama-2-7B-Chat-GGUF, and…
8
votes
3 answers

Why can't Lucene search be used to power LLM applications?

w.r.t. LLM applications using the RAG (retriever-augmented-generation) architecture, people have started taken it for granted that it will be powered by a vector database. e.g., see this: The most important piece of the preprocessing pipeline, from…
morpheus
  • 314
  • 1
  • 7
7
votes
5 answers

Are LLMs "lazy" in their responses?

(Just to preface that I do not have such a great understanding of LLMs and AI in general...) My question is, when I pose a question to a LLM, will it present the fastest response that satisfies the parameters of the query, irrespective whether it is…
Oktarine
  • 181
  • 1
  • 3
7
votes
3 answers

Are there strictly deterministic LLMs?

LLMs are understood to generate non-deterministic outputs. Are there LLMs out there that are capable to producing deterministic outputs for any given input given fixed parameters (like e.g temperature)? I heard that llama.cpp - if run on a CPU…
5
votes
1 answer

Is PAC-unlearnability a fundamental limitation for LLM reasoning?

For simplicity, let’s focus on knowledge reasoning tasks with Yes/No answers. According to learning theory, even moderately complex knowledge reasoning tasks are PAC-unlearnable. This implies that no learning-based reasoning engine trained on a…
nova
  • 180
  • 6
5
votes
1 answer

How do LLMs tokenize python (significant whitespace)

I was learning about tokenization (WordPiece) and how there is a normalization step prior to that that will remove consecutive whitespace from the input text, since these are not significant normally. But that got we wonder how do LLMs still…
5
votes
1 answer

Is natural language reasoning the right way to implement reasoning in AI?

It is well known that human reasoning, after evolving for at least several thousand years, has gradually transformed from natural language reasoning to formal reasoning. In modern science, a significant indicator of a discipline's maturity is…
jario
  • 53
  • 5
5
votes
2 answers

Are the model implementations in Hugging Face’s transformers library created by the original model authors or by Hugging Face?

I've been exploring the implementation of models like Llama in Hugging Face’s transformers library, for example: Hugging Face's Llama model implementation. I’m curious about how these implementations work: Are the model codes in Hugging Face’s…
5
votes
1 answer

Why do LLM tokenizers use a special symbol for space such as Ġ in BPE or ▁ in SPM?

Popular tokenizers use a special symbol such as "Ġ" (BPE) or "▁" (SentencePiece) to represent space. What is the reasoning behind this? I did try searching for the answer. I got two types of explanations, but they don't explain anything to me. Some…
5
votes
3 answers

How do open source LLMs compare to GPT-4?

I have heard some back and forth regarding open source LLMs like Llama. I have heard that on certain benchmarks they perform close, the same or better than GPT-4, but caveats that they tend to lack the diversity and range of GPT-4, and also fail to…
5
votes
1 answer

Who invented DAN?

DAN was a prompt that went through many, many iterations during the initial months of ChatGPT’s release to the public. DAN is an acronym which stood for “Do Anything Now”, and was a prompt specifically designed to circumvent the grid lines OpenAI…
Julius Hamilton
  • 225
  • 2
  • 10
5
votes
3 answers

For an LLM model, how can I estimate its memory requirements based on storage usage?

It is easy to see the amount of disk space consumed by an LLM model (downloaded from huggingface, for instance). Just go in the relevant directory and check the file sizes. How can I estimate the amount of GPU RAM required to run the model? For…
ahron
  • 265
  • 2
  • 7
5
votes
2 answers

LLM-like architecture capable of dynamically learning from its own output

Language Learning Models (LLMs) have demonstrated remarkable capabilities in quick learning during inference. They can effectively grasp a concept from a single example and generate relevant outputs. However, a noticeable limitation of LLMs is their…
1
2 3
16 17