4

Imagine two languages that have only these words:

Man = 1,
deer = 2, 
eat = 3,
grass = 4 

And you would form all sentences possible from these words:

Man eats deer.
Deer eats grass.
Man eats.
Deer eats.

German:

Mensch = 5,
Gras = 6, 
isst = 7, 
Hirsch = 8

Possible german sentences:

Mensch isst Hirsch.
Hirsch isst Gras.
Mensch isst.
Hirsch isst.

How would you write a program that would figure out which words have the same meaning in English and German?

It is possible.

All words get their meaning from the information in which sentences they can be used. The connection with other words defines their meaning.

We need to write a program that would recognize that a word is connected to other words in the same way in both languages. Then it would know those two words must have the same meaning.

If we take the word "deer" (2) it has this structure in English

1-3-2
2-3-4

In german (8):

5-6-8
8-6-7

We get the same structure (pattern) in both languages: both 8 and 2 lie in first and last position, and the middle word is the same in both languages, the other word is different in both languages. So we can conclude that 8=2 because both elements are connected with other elements the same way.

Maybe we just need to write a very good program for recognizing analogies and we will be on the right track to creating AI?

nbro
  • 42,615
  • 12
  • 119
  • 217
Tone Škoda
  • 219
  • 1
  • 5

4 Answers4

2

You are implying that such ideas are novel, and that such tools do not exist. But the idea is very popular, and there are numerous tools.

We need to write a program that would recognize that a word is connected to other words in the same way in both language. Then it would know those two words must have the same meaning.

You are describing the essence of known natural language processing (NLP) tasks such as word alignment (link words in different languages that have the same meaning) and, of course, machine translation.

While learning a machine translation model, we actually do discover which words (or parts of words, or sequences of words) in different languages have the same meaning.

Here are some concepts I would recommend for further study of this subject:

  • Word alignment, an example for a well-known and popular tool would be fast_align
  • Word embeddings, word2vec is a widely used tool
  • Modern machine translation with sequence-to-sequence models, well-known tools are fairseq, or Sockeye
Mathias Müller
  • 361
  • 3
  • 13
1

Isn't this what already Word2Vec and other word-embedding techniques already use. You know your word by the company it keeps is an idea that has been around for some time now.

-1

It is wrong to assume that just the connections to other words define their meaning.

Give an AI a hundred novels and it would still not know what the word "cat" means.

Show the AI a picture of a cat with the word "cat" underneath it and it would know straight away.

In this way an AI needs to know a minimum number of words through experience other than combinations of other words. From then it may be able to deduce meanings of new words.

Just like, if I gave you a hundred novels in Chinese you would never be able to understand Chinese. I show you a picture book in Chinese and maybe you have a chance.

zooby
  • 2,260
  • 1
  • 14
  • 22
-1

For this example the function below will do: TSAI.Analogies.FindAnalogy(List ex1, List ex2, List ex3, out List ex4) ex1 is to ex2 as ex3 is to ex4. Figure out ex4.

Fill ex4 with values from ex2. For every value in ex3: find out to which positions in ex4 we have to copy this value, based on value in ex1 at the same position that was repeated in ex2.

Tone Škoda
  • 219
  • 1
  • 5