Refine
Year of publication
- 2025 (1)
Document Type
- Master's Thesis (1)
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- yes (1)
Keywords
Institute
Natural Language Processing (NLP) plays a significant role in enabling machines to understand, interpret, and produce human language across an array of tasks and domains. For low-resource languages, the development of feasible NLP solutions remains a challenge in the absence of large annotated datasets and linguistic infrastructure.The research presented in the thesis contributes to addressing this gap through the evaluation of Albanian language sentiment analysis on social media data. The main objective is the evaluation of the ability of cross-lingual pre-trained transformer models, mBERT, XLM-R, and mT5, to be adapted by fine-tuning for sentiment classification (classification of an input text into positive, negative, or neutral sentiment). Two fine-tuning approaches are evaluated: full (vanilla) and Low-Rank Adaptation (LoRA). The models are fine-tuned and tested on a manually annotated dataset for Albanian that contains expressions typical of social media interactions (i.e., code-switched linguistics, emoticon usage, repeated letter words, etc.) It was shown that the highest generalization potential of the language was achieved by XLM-R, which consistently performed higher across metrics such as F1-score and overall accuracy. mBERT followed closely in performance, while mT5, likely due to its generative architecture, yielded comparatively lower results than its encoder-based counterparts. On the other hand, LoRA demonstrated faster training ability with a notable drop in classification performance against the vanilla counterpart, emphasizing the significant trade-off for the usage of the strategy. With the results presented in this study, the thesis provides a basis for different fine-tuning strategies for three key pre-trained transformer-based models, which can inform future research on low-resource language modelling (specifically in the Albanian language) and domain-specific adaptation.
