Questions tagged [natural-language-processing]

For questions related to natural language processing (NLP), which is concerned with the interactions between computers and human (or natural) languages, in particular how to create programs that process and analyze large amounts of natural language data.

See: Natural language processing (NLP) at Wikipedia.

762 questions
102
votes
4 answers

How can neural networks deal with varying input sizes?

As far as I can tell, neural networks have a fixed number of neurons in the input layer. If neural networks are used in a context like NLP, sentences or blocks of text of varying sizes are fed to a network. How is the varying input size reconciled…
83
votes
4 answers

Why does the transformer do better than RNN and LSTM in long-range context dependencies?

I am reading the article How Transformers Work where the author writes Another problem with RNNs, and LSTMs, is that it’s hard to parallelize the work for processing sentences, since you have to process word by word. Not only that but there is no…
58
votes
2 answers

How does ChatGPT retain the context of previous questions?

One of the innovations with OpenAI's ChatGPT is how natural it is for users to interact with it. What is the technical enabler for ChatGPT to maintain the context of previous questions in its answers? For example, ChatGPT understands a prompt of…
39
votes
2 answers

How can Transformers handle arbitrary length input?

The transformer, introduced in the paper Attention Is All You Need, is a popular new neural network architecture that is commonly viewed as an alternative to recurrent neural networks, like LSTMs and GRUs. However, having gone through the paper, as…
35
votes
3 answers

Can BERT be used for sentence generating tasks?

I am a new learner in NLP. I am interested in the sentence generating task. As far as I am concerned, one state-of-the-art method is the CharRNN, which uses RNN to generate a sequence of words. However, BERT has come out several weeks ago and is…
34
votes
6 answers

How does an AI like ChatGPT answer a question in a subject which it may not know?

After seeing StackOverflow's banning of ChatGPT, I explored it out of curiosity. It's marvellous as it can write code by itself! Later to check if it knows chess as well like Google-Deepmind's AlphaZero AI, I asked below questions: Me: Does openai…
31
votes
1 answer

How is BERT different from the original transformer architecture?

As far as I can tell, BERT is a type of Transformer architecture. What I do not understand is: How is Bert different from the original transformer architecture? What tasks are better suited for BERT, and what tasks are better suited for the…
30
votes
4 answers

Why is ChatGPT bad at math?

As opposed to How does ChatGPT know math?, I've been seeing some things floating around the Twitterverse about how ChatGPT can actually be very bad at math. For instance, I asked it "If it takes 5 machines 5 minutes to make 5 devices, how long would…
Mithical
  • 2,965
  • 5
  • 28
  • 39
30
votes
9 answers

What is the actual quality of machine translations?

As an AI layman, till today I am confused by the promised and achieved improvements of automated translation. My impression is: there is still a very, very far way to go. Or are there other explanations why the automated translations (offered and…
24
votes
5 answers

Why does ChatGPT fail in playing "20 questions"?

IBM Watson's success in playing "Jeopardy!" was a landmark in the history of artificial intelligence. In the seemingly simpler game of "Twenty questions" where player B has to guess a word that player A thinks of by asking questions to be answered…
23
votes
2 answers

Why does GPT-2 Exclude the Transformer Encoder?

After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens. Why does GPT-2 not…
22
votes
1 answer

What is the intuition behind the dot product attention?

I am watching the video Attention Is All You Need by Yannic Kilcher. My question is: what is the intuition behind the dot product attention? $$A(q,K, V) = \sum_i\frac{e^{q.k_i}}{\sum_j e^{q.k_j}} v_i$$ becomes: $$A(Q,K, V) = \text{softmax}(QK^T)V$$
DRV
  • 1,843
  • 3
  • 15
  • 20
21
votes
3 answers

What kind of word embedding is used in the original transformer?

I am currently trying to understand transformers. To start, I read Attention Is All You Need and also this tutorial. What makes me wonder is the word embedding used in the model. Is word2vec or GloVe being used? Are the word embeddings trained from…
21
votes
2 answers

What are the main differences between skip-gram and continuous bag of words?

The skip-gram and continuous bag of words (CBOW) are two different types of word2vec models. What are the main differences between them? What are the pros and cons of both methods?
DRV
  • 1,843
  • 3
  • 15
  • 20
18
votes
2 answers

What research has been done in the domain of "identifying sarcasm in text"?

Identifying sarcasm is considered one of the most difficult open-ended problems in the domain of ML and NLP/NLU. So, was there any considerable research done on that front? If yes, then what is the accuracy like? Please, also, explain the NLP model…
1
2 3
50 51