arXiv Research
Abstract:This paper presents a comparative analysis of LargeLanguage…▽ MoreThis paper presents a comparative analysis of LargeLanguageModels(LLMs) and traditional Optical Character Recognition (OCR) systems onUrdunewspapers, addressing challenges posed by complex multi-column layouts, low-resolution scans, and the stylistic variability of the Nastaliq script. To handle these challenges, we fine-tu
Open Full News
arXiv Research
Abstract:…of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-languagenews, there is a noticeable gap in resources and strategies to detect news in regional…▽ MoreThe rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformati
Open Full News
arXiv Research
Abstract:Speech Emotion Recognition (SER) has seen significant progress with deep learning, yet remains challenging for Low-ResourceLanguages(LRLs) due to the scarcity of annotated data. In this work, we explore unsupervised learning to improve SER in low-resource settings. Specifically, we investigate contrastive learning (CL) and Bootstrap Your Own Latent (BYOL)…▽ MoreSpeech Emotion Recognition
Open Full News
arXiv Research
Abstract:Large multimodalmodels(LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English…▽ MoreLarge multimodalmodels(LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in Englishlanguage. While few recent
Open Full News
arXiv Research
Abstract:…widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied inlanguageslike English and Spanish,…▽ MoreSocial media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly
Open Full News
arXiv Research
Abstract:…they remain vulnerable to errors from noisy pseudo labels. Moreover, most recent approaches to the few-label classification problem are either designed for resource-richlanguagessuch as English or involve complex cascading…▽ MoreTraining deep learning networks with minimal supervision has gained significant research attention due to its potential to reduce reliance on extensive labelled
Open Full News
arXiv Research
Abstract:In naturallanguageprocessing, multilingual…▽ MoreIn naturallanguageprocessing, multilingualmodelslike mBERT and XLM-RoBERTa promise broad coverage but often struggle withlanguagesthat share a script yet differ in orthographic norms and cultural context. This issue is especially notable in Arabic-scriptlanguagessuch as Kurdish Sorani, Arabic, Persian, andUrdu. We introduce the Arabic Scrip
Open Full News
arXiv Research
Abstract:Multilingual LargeLanguage…▽ MoreMultilingual LargeLanguageModels(LLMs) have shown remarkable performance across variouslanguages; however, they often include significantly less data for low-resourcelanguagessuch asUrducompared to high-resourcelanguageslike English. To assess the linguistic knowledge of LLMs inUrdu, we present theUrduBenchmark of Linguistic Minimal Pairs (UrBLiMP) i.e. pa
Open Full News
arXiv Research
Abstract:With the widespread adoption of LargeLanguage…▽ MoreWith the widespread adoption of LargeLanguageModels(LLMs) across various applications, it is empirical to ensure their fairness across all user communities. However, most LLMs are trained and evaluated on Western centric data, with little attention paid to low-resourcelanguagesand regional contexts. To address this gap, we introduce PakB
Open Full News
arXiv Research
Abstract:Urduis a cursive script…▽ MoreUrduis a cursive scriptlanguageand has similarities with Arabic and many other South Asianlanguages.Urduis difficult to classify due to its complex geometrical and morphological structure. Character classification can be processed further if segmentation technique is efficient, but due to context sensitivity inUrdu, segmentation-based recognition often result
Open Full News
arXiv Research
Abstract:Urdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-…▽ MoreUrdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-languageresearch. The absence of large-scale, high-quality datasets has limited the development ofUrdu-capable systems and reinforced biases in multilingual vision-languagemodelstrained pr
Open Full News
arXiv Research
Abstract:Multilinguallanguage…▽ MoreMultilinguallanguagemodelsare trained on a fixed set oflanguages, and to support newlanguages, themodelsneed to be retrained from scratch. This is an expensive endeavor and is often infeasible, asmodeldevelopers tend not to release their pre-training data. Naive approaches, such as continued pretraining, suffer from catastrophic forgetting; however, mitigation s
Open Full News