An Analytical Survey of Cyberbullying Detection Using Machine Learning Algorithms

Main Article Content

Nikesh Aote, Sheron Sheikh, Soheb Pathan, Sohel Sayyad, Akash Bisen, Sumit kalbande

Abstract

Moderating harmful text directly from dynamic desktop screens requires converting pixels into textual content reliably and classifying that text accurately, all under tight latency and privacy constraints. This survey integrates research across three pillars to guide the design of on-device moderation systems: (i) optical character recognition (OCR) for heterogeneous user interfaces (UIs), including neural (LSTM-based) OCR, scene text detectors, and sequence recognition models; (ii) abusive/toxic language detection methods that range from lexicon rules to supervised transformers and zero-shot classification framed as natural language inference (NLI); and (iii) system-level design strategies, such as high-throughput screen capture, region-of-interest (ROI) scheduling, frame skipping, GPU-aware inference, and efficient overlay compositing. We prioritized peer-reviewed venues and canonical documentation in selecting sources. The review finds that: OCR fidelity is the principal ceiling for downstream moderation; hybrid pipelines combining lexicons with context-aware transformers typically outperform single-signal approaches; and zero-shot models broaden label coverage and cross-lingual generalization but require threshold calibration and bias auditing. Significant gaps remain in handling code-mixed Hindi–English text, stabilizing OCR on stylized UI renderings, and mitigating unintended biases across user groups. The survey concludes with practical engineering guidance, an evaluation blueprint (accuracy, latency, and fairness), and research directions for robust, privacy-preserving on-device moderation [1]–[14].

Article Details

Section
Articles