Abstract
Duplicate questions on crowd-sourced question and answer websites such as Quora create redundancy and make information retrieval inefficient. This research conducts a systematic comparative analysis of machine learning and deep learning models for detecting semantic similarity in questions. Using the Quora Question Pairs dataset, we evaluate a spectrum of models: a classical TF-IDF baseline, feature-engineered Random Forest and XGBoost, a Siamese Manhattan LSTM (MaLSTM), and a fine-tuned BERT model. The study reveals a clear performance hierarchy. A key finding is that classical models with a limited set of hand-crafted linguistic features underperformed the simple TF-IDF baseline. While the MaLSTM network showed moderate improvement, the fine-tuned BERT model was unequivocally superior, achieving a statistically significant accuracy of 86.26%. This highlights the critical role of deep contextual embeddings for this task. However, BERT’s state-of-the-art performance comes at a significant computational cost, revealing a crucial trade-off between accuracy and resource efficiency. These findings provide a pragmatic guide for designing effective and scalable duplicate question detection systems.
| Original language | English |
|---|---|
| Pages (from-to) | 1719-1728 |
| Number of pages | 10 |
| Journal | Journal of Umm Al-Qura University for Engineering and Architecture |
| Volume | 16 |
| Issue number | 4 |
| Early online date | 1 Sept 2025 |
| DOIs | |
| Publication status | Published - 1 Sept 2025 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver