Skip to main navigation Skip to search Skip to main content

Context-Aware personalized news delivery through efficient text classification of Sinhalese news articles

  • Nishan Wickramasinghe
  • , Raza Hasan
  • , Salman Mahmood
  • , Deborah Adedigba

Research output: Contribution to journalArticlepeer-review

Abstract

Personalized news delivery systems are critically underdeveloped for low-resource languages, with Sinhalese facing significant challenges due to its complex linguistic structure and the scarcity of comprehensive public datasets. This study aims to overcome these limitations by developing a context-aware personalized news delivery system for Sinhalese news articles, enhancing accessibility and relevance for readers. The methodology involved compiling a new, comprehensive dataset of 7254 articles across 14 distinct categories, sourced from public repositories and web scraping. This data underwent rigorous preprocessing, including tokenization, stop word removal, and stemming, before being used to train and evaluate several text classification models. A fine-tuned SinBERT-small model demonstrated the most effective performance, achieving a weighted F1-score of 87.63%. The developed system integrates this high-performance classifier with context-aware recommendation techniques to provide personalized content. A primary contribution of this research is the successful development of a functional and publicly accessible news categorizer API, representing a significant advancement in practical NLP tools for the Sinhalese language. By offering both a practical solution and a new, large-scale categorized dataset, this work bridges a crucial gap in NLP for low-resource languages and provides a foundation for future research.
Original languageEnglish
JournalInternational Journal of Information Technology (Singapore)
Early online date9 Oct 2025
DOIs
Publication statusPublished - 9 Oct 2025

Cite this