Urdu AI Dashboard

Monitoring AI enhancements in Urdu

AI News in Urdu

Latest Urdu-focused AI updates and research-backed news

arXiv Research

Code-SwitchedUrduASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Abstract:…have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective AutomatedSpeech…▽ MoreCall Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective AutomatedSpeechRecognitionsyste

Open Full News
arXiv Research

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

Abstract:…Llama 2 7B-parameter model, recently released by Meta GenAI. We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, RomanUrduand…▽ MoreDetecting online sexual predatory behaviours and abusive language on social media platforms has become a critical area of research due to the growing concerns about online safety, especially for vulnerab

Open Full News
arXiv Research

Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications

Abstract:This research investigates the transferability of AutomaticSpeechRecognition(ASR)-robust Natural Language Understanding (NLU) models from controlled experimental conditions to practical, real-world applications. Focused on smart home automation commands inUrdu, the study asse…▽ MoreThis research investigates the transferability of AutomaticSpeechRecognition(ASR)-robust Natural Language Un

Open Full News
arXiv Research

WER We Stand: BenchmarkingUrduASR Models

Abstract:This paper presents a comprehensive evaluation ofUrduAutomatic…▽ MoreThis paper presents a comprehensive evaluation ofUrduAutomaticSpeechRecognition(ASR) models. We analyze the performance of three ASR model families: Whisper, MMS, and Seamless-M4T using Word Error Rate (WER), along with a detailed examination of the most frequent wrong words and error types including insertions, deletion

Open Full News
arXiv Research

From Statistical Methods to Pre-Trained Models; A Survey on AutomaticSpeechRecognitionfor Resource ScarceUrduLanguage

Abstract:AutomaticSpeech…▽ MoreAutomaticSpeechRecognition(ASR) technology has witnessed significant advancements in recent years, revolutionizing human-computer interactions. While major languages have benefited from these developments, lesser-resourced languages likeUrduface unique challenges. This paper provides an extensive exploration of the dynamic landscape of ASR research, focusing particul

Open Full News
arXiv Research

Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases

Abstract:This study evaluates LargeLanguage…▽ MoreThis study evaluates LargeLanguageModels' (LLMs) ability to simulate non-native-like English use observed in human secondlanguage(L2) learners interfered with by their native firstlanguage(L1). In dialogue-based interviews, we prompt LLMs to mimic L2 English learners with specific L1s (e.g., Japanese, Thai,Urdu) across sevenlanguages, comparing the

Open Full News
arXiv Research

UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings

Abstract:Multilingual LargeLanguage…▽ MoreMultilingual LargeLanguageModels(LLMs) often provide suboptimal performance on low-resourcelanguageslikeUrdu. This paper introduces UrduLLaMA 1.0, amodelderived from the open-source Llama-3.1-8B-Instruct architecture and continually pre-trained on 128 millionUrdutokens, capturing the rich diversity of thelanguage. To enhance instruction-following and trans

Open Full News
arXiv Research

Low-Resource Transliteration for Roman-UrduandUrduUsing Transformer-BasedModels

Abstract:As the Information Retrieval (IR) field increasingly recognizes the importance of inclusivity, addressing the needs of low-resourcelanguagesremains a significant challenge. Transliteration between…▽ MoreAs the Information Retrieval (IR) field increasingly recognizes the importance of inclusivity, addressing the needs of low-resourcelanguagesremains a significant challenge. Transliteration

Open Full News
arXiv Research

Long-context Non-factoid Question Answering in IndicLanguages

Abstract:Question Answering (QA) tasks, which involve extracting answers from a given context, are relatively straightforward for modern LargeLanguage…▽ MoreQuestion Answering (QA) tasks, which involve extracting answers from a given context, are relatively straightforward for modern LargeLanguageModels(LLMs) when the context is short. However, long contexts pose challenges due to the quadratic co

Open Full News
arXiv Research

Improving Informally RomanizedLanguageIdentification

Abstract:The Latin script is often used to informally writelanguageswith non-Latin native scripts. In many cases (e.g., most…▽ MoreThe Latin script is often used to informally writelanguageswith non-Latin native scripts. In many cases (e.g., mostlanguagesin India), the lack of conventional spelling in the Latin script results in high spelling variability. Such romanization renderslanguagesthat are

Open Full News
arXiv Research

A Benchmark Dataset and a Framework forUrduMultimodal Named Entity Recognition

Abstract:…content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within NaturalLanguageProcessing. Despite progress in high-resource…▽ MoreThe emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasing

Open Full News
arXiv Research

EnhancedUrduIntent Detection with LargeLanguageModelsand Prototype-Informed Predictive Pipelines

Abstract:Multifarious intent detection predictors are developed for differentlanguages, including English, Chinese and French, however, the field remains underdeveloped for…▽ MoreMultifarious intent detection predictors are developed for differentlanguages, including English, Chinese and French, however, the field remains underdeveloped forUrdu, the 10th most spokenlanguage. In the realm of well-k

Open Full News