Urdu AI Dashboard

Monitoring AI enhancements in Urdu

AI News in Urdu

Latest Urdu-focused AI updates and research-backed news

arXiv Research

From Press to Pixels: EvolvingUrduText Recognition

Abstract:This paper presents a comparative analysis of LargeLanguage…▽ MoreThis paper presents a comparative analysis of LargeLanguageModels(LLMs) and traditional Optical Character Recognition (OCR) systems onUrdunewspapers, addressing challenges posed by complex multi-column layouts, low-resolution scans, and the stylistic variability of the Nastaliq script. To handle these challenges, we fine-tu

Open Full News
arXiv Research

Unified LargeLanguageModelsfor Misinformation Detection in Low-Resource Linguistic Settings

Abstract:…of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-languagenews, there is a noticeable gap in resources and strategies to detect news in regional…▽ MoreThe rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformati

Open Full News
arXiv Research

Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition

Abstract:Speech Emotion Recognition (SER) has seen significant progress with deep learning, yet remains challenging for Low-ResourceLanguages(LRLs) due to the scarcity of annotated data. In this work, we explore unsupervised learning to improve SER in low-resource settings. Specifically, we investigate contrastive learning (CL) and Bootstrap Your Own Latent (BYOL)…▽ MoreSpeech Emotion Recognition

Open Full News
arXiv Research

A Culturally-diverse Multilingual Multimodal Video Benchmark &Model

Abstract:Large multimodalmodels(LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English…▽ MoreLarge multimodalmodels(LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in Englishlanguage. While few recent

Open Full News
arXiv Research

Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with LargeLanguageModels

Abstract:…widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied inlanguageslike English and Spanish,…▽ MoreSocial media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly

Open Full News
arXiv Research

Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-ResourceLanguages

Abstract:…they remain vulnerable to errors from noisy pseudo labels. Moreover, most recent approaches to the few-label classification problem are either designed for resource-richlanguagessuch as English or involve complex cascading…▽ MoreTraining deep learning networks with minimal supervision has gained significant research attention due to its potential to reduce reliance on extensive labelled

Open Full News
arXiv Research

The Role of Orthographic Consistency in Multilingual EmbeddingModelsfor Text Classification in Arabic-ScriptLanguages

Abstract:In naturallanguageprocessing, multilingual…▽ MoreIn naturallanguageprocessing, multilingualmodelslike mBERT and XLM-RoBERTa promise broad coverage but often struggle withlanguagesthat share a script yet differ in orthographic norms and cultural context. This issue is especially notable in Arabic-scriptlanguagessuch as Kurdish Sorani, Arabic, Persian, andUrdu. We introduce the Arabic Scrip

Open Full News
arXiv Research

UrBLiMP: A Benchmark for Evaluating the Linguistic Competence of LargeLanguageModelsinUrdu

Abstract:Multilingual LargeLanguage…▽ MoreMultilingual LargeLanguageModels(LLMs) have shown remarkable performance across variouslanguages; however, they often include significantly less data for low-resourcelanguagessuch asUrducompared to high-resourcelanguageslike English. To assess the linguistic knowledge of LLMs inUrdu, we present theUrduBenchmark of Linguistic Minimal Pairs (UrBLiMP) i.e. pa

Open Full News
arXiv Research

PakBBQ: A Culturally Adapted Bias Benchmark for QA

Abstract:With the widespread adoption of LargeLanguage…▽ MoreWith the widespread adoption of LargeLanguageModels(LLMs) across various applications, it is empirical to ensure their fairness across all user communities. However, most LLMs are trained and evaluated on Western centric data, with little attention paid to low-resourcelanguagesand regional contexts. To address this gap, we introduce PakB

Open Full News
arXiv Research

Exploration of Deep Learning Based Recognition forUrduText

Abstract:Urduis a cursive script…▽ MoreUrduis a cursive scriptlanguageand has similarities with Arabic and many other South Asianlanguages.Urduis difficult to classify due to its complex geometrical and morphological structure. Character classification can be processed further if segmentation technique is efficient, but due to context sensitivity inUrdu, segmentation-based recognition often result

Open Full News
arXiv Research

COCO-Urdu: A Large-ScaleUrduImage-Caption Dataset with Multimodal Quality Estimation

Abstract:Urdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-…▽ MoreUrdu, spoken by over 250 million people, remains critically under-served in multimodal and vision-languageresearch. The absence of large-scale, high-quality datasets has limited the development ofUrdu-capable systems and reinforced biases in multilingual vision-languagemodelstrained pr

Open Full News
arXiv Research

Continually Adding NewLanguagesto MultilingualLanguageModels

Abstract:Multilinguallanguage…▽ MoreMultilinguallanguagemodelsare trained on a fixed set oflanguages, and to support newlanguages, themodelsneed to be retrained from scratch. This is an expensive endeavor and is often infeasible, asmodeldevelopers tend not to release their pre-training data. Naive approaches, such as continued pretraining, suffer from catastrophic forgetting; however, mitigation s

Open Full News