i want to fine tune Marbert for tunisian text classification dialect, using this dataset : https://www.kaggle.com/datasets/waalbannyantudre/tunisian-arabizi-dialect-data-sentiment-analysis
i have test the tokenizer :
text example: "bravo slim riahii hay la3bed li tkhdem fi bladha"
tokenizer: ['[CLS]',
'bra',
'##vo',
'sl',
'##im',
'r',
'##iah',
'##ii',
'hay',
'la',
'##3',
'##be',
'##d',
'li',
't',
'##kh',
'##de',
'##m',
'fi',
'bl',
'##adh',
'##a',
'[SEP]',
'[PAD]',
'[PAD]',
'[PAD]',
'[PAD]',
'[PAD]',
'[PAD]',
'[PAD]',
'[PAD]',
'[PAD]']
can any one tell me if i can use this tokenizer or i must make my own ?