0

For non-English languages (in my case Portuguese), what is the best approach? Should I use the not-so-complete tools in my language, or should I translate the text to English, and after using the tools in English? Lemmatization, for example, is not so good in non-English languages.

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

1

Check SpaCy, it's a powerful NLP library that provides lot of different language models, including one for Portuguese.

To answer the more generic question, translating to another language undermines the whole purpose of text pre-processing. Not only will translating generate errors, even when translating to a common language like English, but most importantly, you're forgetting that every language has its own specific linguistic characteristics, like different grammatical genders, tenses, grammar rules for plurals and adjectives, adverbs and so on. By translating you'll throw all that information in the bin.

Oliver Mason
  • 5,477
  • 14
  • 32
Edoardo Guerriero
  • 5,506
  • 1
  • 15
  • 25