9

In programming languages, there is a set of grammar rules which govern the construction of valid statements and expressions. These rules help in parsing the programs written by the user.

Can there ever be a functionally complete set of grammar rules which can parse any statement in English (locale-specific) accurately and which can be possibly implemented for use in AI-based projects?

I know that there are a lot of NLP Toolkits available online, but they are not that effective. Most of them are trained using specific corpuses which sometimes fail to infer some complex correlations between various parts of an expression.

In other words, what I am asking is that if it is possible for a computer to parse a well-versed sentence written in English as if it were parsed by an adult English-speaking human?

EDIT: If it cannot be represented using simple grammar rules, what kind of semantic structure can be used to generalize it?

EDIT2: This paper proves the absence of context-freeness in natural languages. I am looking for a solution, even if it is too complex.

Douglas Daseeco
  • 7,543
  • 1
  • 28
  • 63
skrtbhtngr
  • 261
  • 1
  • 9

4 Answers4

6

Can there ever be a functionally complete set of grammar rules which can parse any statement in English (locale-specific) accurately and which can be possibly implemented for use in AI-based projects?

Parse it yes, accurately most likely no.

Why ?

According to my understanding on how we derive meaning from sounds, there are 2 complementary strategies:

Grammar Rules: A rule based system for ordering words to facilitate communication, here meaning is derived from interaction of discrete sounds and their independent meaning, so you could parse a sentence based on a rule book.

E.G. "This was a triumph" : the parser would extract a pronoun (This) with corresponding meaning ( a specific person or thing ) ; a verb (was) with corresponding meaning ( occurred ); ( a) and here we start with some parsing problems , what would the parser extract, a noun or an indefinite article ? An so we consult the grammar rule book, and settle for the meaning ( indefinite article any one of ), you have to parse the next word and refer to it though, but let's gloss over that for now, and finally (triumph) a noun ( it could also be a verb, but thanks to the grammar rule book we settled for a noun with meaning: ( victory,conquest), so in the end we have ( joining the meanings ):

A specific thing occurred of victory. Close enough and I am glossing over a few other rules, but that's not the point, the other strategy is:

A lexical dictionary (or lexicon) Where words or sounds are associated with specific meaning. Here meaning is derived from one or more words or sounds as a unit. This introduces the problem to a parser, since well, it shouldn't parse anything.

E.G. "Non Plus Ultra" And so the AI parser would recognize that this phrase is not to be parsed and instead matched with meaning :

The highest point or culmination

Lexical units introduce another issue in that they themselves could be part of the first example, and so you end up with recursion.

if it is possible for a computer to parse a well-versed sentence written in English as if it were parsed by an adult English-speaking human?

I believe it could be possible, most examples I've seen deal effectively with the grammar rule book or the lexicon part, but I am not aware of a combination of both, but in terms of programming, it could happen.

Unfortunately even if you solve this problem, your AI would not really understand things in the strict sense, but rather present you with very elaborate synonyms, additionally context (as mentioned in the comments) plays a role into the grammar and lexicon strategies.

If it cannot be represented using simple grammar rules, what kind of semantic structure can be used to generalize it?

A mixed one where there are both grammar rules and a lexicon and both can change and be influenced based on the AI specific context and experience as well as a system for dealing with these objects could be one way.

Keno
  • 545
  • 1
  • 3
  • 14
2

We've concluded that it is a two-faceted, circular problem: structure cannot be inferred without context but knowing the structure also helps infer the context. So, here is your complex solution: start with the context, which is determined by the combination of words in sentence (combinatorics and search problem), from there determine your structure, or "parse" (at this step you can also filter out some insignificant words or at least assign lesser weights to them), go back to the context, back to parsing, and on until you arrive at the meaning. Thus by iterative, recursive reduction the whole problem can be solved.

2

I strongly disagree with all the former comments. Not because they are wrong, -which they are not - but because they are misleading - though unintentionally.

For example: If one looks at these problems from an academic position, the problems will always seem insurmountable. This is because everything is coldly assessed and calculated in isolation to everything else.

The answer predominantly lies in word association. You have to write a program that can process a vast database of digital books, to register every word and all the words in that language which are associated with it. Plus all the statistical information with each associated word and its associated punctuation.

This will then give you the basis on which an AI can decide several things:

  1. Whether the structure of a given sentence is correct.
  2. If the structure is bad, what the probability is for determining the context and intent of what is being said.
  3. The correct meaning and application of a multifaceted word (Triumph), is by probability - according to the statistics.
  4. To determine where a conversation is likely to be going.
  5. What the correct grammar, and punctuation should be.

So, in conclusion, you have two things to look for: Association and probability.

When digitally databasing a language model, the possibility of word and sentence "strings" occurs, so that every variation of language structure in any given sentence can be determined before, during and after a text sample is being scribed. This intimate control over language model patterns, means that sensitive components such as "subject" and "object" can be determined easily by code.

Engage
  • 29
  • 6
1

I'm pretty sure that the answer is "no" in the strictest sense, since English simply doesn't have a formal definition. That is, nobody controls English and publishes a formal grammar that everyone is required to adhere to. English is built up through an experiential process and it has contradictions and flaws, but the probabilistic nature of the human mind allows us to work around those.

For example, that this "sentence":

This sentence no verb

Technically it's not a sentence at all, since it doesn't have a verb. But did anybody have any problem understanding what it meant? Doubtful. Try coming up with a formal rule for that though. And that's just one example.

Now, could you come up with a formal grammar that covers, maybe, 90% of cases, and is "good enough" for most practical uses? Possibly, maybe even probably. But I am pretty sure it's not possible to get to 100%.

mindcrime
  • 3,797
  • 1
  • 15
  • 31