Highest Voted 'n-gram' Questions - Artificial Intelligence Stack Exchange

1

vote

1 answer

What is the lowest possible loss for a language model?

Example: Suppose a character-level language model (three input letters to predict the next one), trained on a dataset which contains three instances of the sequence aei, with two occurrences preceding o and one preceding u, i.e., the dataset…

asked Oct 19 '23 at 15:29

ViniciusArruda

179
4

1

vote

1 answer

Bag of Tricks: n-grams as additional features?

I've been playing with PyTorch's nn.EmbeddingBag for sentence classification for about a month. I've been doing some feature engineering, playing with different tokenizers, etc. I'm just trying to get the best performance out of this simple model as…

natural-language-processing papers feature-extraction bag-of-words n-gram

asked Aug 26 '21 at 13:15

rocksNwaves

143
4

0

votes

1 answer

If the unigram precision is (N-1)/N, then the bigram precision is :

Consider the following machine translation scenario. The reference translation has N words (do not consider sentence beginner ‘hat’ and sentence finisher ‘dot’). The machine output also has N words. If the unigram precision is (N-1)/N, then the…

machine-learning natural-language-processing generative-model n-gram

asked Nov 06 '23 at 08:01

Geeklovenerds

101
2

0

votes

1 answer

Why would adding all the possible embeddings be "worse" than using 1D-convolutions?

Suppose we are using word2vec and have embeddings of individual words $w_1, \dots, w_{10}$. Let's say we wanted to analyze $2$ grams or $3$ grams. Why would adding all the possible embeddings, $\binom{10}{2}$ or $\binom{10}{3}$, be "worse" than…

natural-language-processing word-embedding word2vec 1d-convolution n-gram

asked Mar 14 '19 at 14:43

aiguy123

51
4

Questions tagged [n-gram]

What is the lowest possible loss for a language model?

Bag of Tricks: n-grams as additional features?

If the unigram precision is (N-1)/N, then the bigram precision is :

Why would adding all the possible embeddings be "worse" than using 1D-convolutions?