Transfer Learning in Low-Resource Language Processing Applications

Prof.(Dr.) Arpit Jain

doi:10.63345/sjaibt.v2.i3.210

Authors

Prof.(Dr.) Arpit Jain K L E F Deemed University Vaddeswaram, Andhra Pradesh 522302, India Author

DOI:

https://doi.org/10.63345/sjaibt.v2.i3.210

Keywords:

Transfer Learning, Low-Resource Languages, Natural Language Processing, Cross-Lingual Embeddings, Machine Translation, Multilingual Models

Abstract

The digital revolution has accelerated the development of natural language processing (NLP), yet its benefits remain unevenly distributed across languages. While high-resource languages such as English, Chinese, and Spanish enjoy state-of-the-art NLP applications, the majority of the world’s languages are classified as low-resource, lacking sufficient annotated corpora, computational resources, and linguistic expertise. This imbalance exacerbates digital exclusion and undermines linguistic diversity. Transfer learning has emerged as a powerful paradigm to address these challenges by leveraging pre-trained models on high-resource languages and adapting them to low-resource contexts.

Downloads

Download data is not yet available.

References

• https://lh7-rt.googleusercontent.com/docsz/AD_4nXeHafTkygwWTam5YFDobd1X1vg6DhQx6ZzJys1JWn3Hwe19m9MOEpL9D1ul4Re5leL5JVL6EMgzH0E1SpvprNDxPl4xWyit51Owy0BVs-yTY44n7-TdMYZEMkyYJNXbiyqngf_E?key=NiVHK71Ysr4ImSL58G9mg68W

• https://www.mdpi.com/electronics/electronics-13-03574/article_deploy/html/images/electronics-13-03574-g001-550.jpg

• Adelani, D. I., Ruder, S., Abdul-Mageed, M., Ahmad, I., Beloucif, M., ... & Alabi, J. O. (2021). MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics, 9, 1116–1131. https://doi.org/10.1162/tacl_a_00416

• Artetxe, M., Ruder, S., & Yogatama, D. (2020). On the cross-lingual transferability of monolingual representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4623–4637. https://doi.org/10.18653/v1/2020.acl-main.421

• Bapna, A., Firat, O., Cao, Y., Chen, M., & Wu, Y. (2019). Non-parametric adaptation for neural machine translation. Proceedings of NAACL-HLT 2019, 1921–1931. https://doi.org/10.18653/v1/N19-1197

• Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., ... & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. Proceedings of ACL 2020, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747

• Conneau, A., & Lample, G. (2019). Cross-lingual language model pretraining. Advances in Neural Information Processing Systems (NeurIPS), 32, 7057–7067. https://arxiv.org/abs/1901.07291

• Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.18653/v1/N19-1423

• Doddapaneni, S., Aralikatte, R., & Khapra, M. M. (2021). A primer on pre-trained models for natural language processing. ACM Computing Surveys, 54(4), 1–38. https://doi.org/10.1145/3453154

• Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2020). How transferable are multilingual BERT representations? Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5340–5356. https://doi.org/10.18653/v1/2020.emnlp-main.431

• Fang, Y., Sun, S., Gan, Z., Pillai, R., & Liu, J. (2022). Filter: An efficient transfer learning framework for low-resource NLP. Proceedings of ACL 2022, 525–537. https://doi.org/10.48550/arXiv.2109.04544

• Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2022). Language-agnostic BERT sentence embedding. Proceedings of ACL 2022, 878–891. https://doi.org/10.48550/arXiv.2007.01852

• Heffernan, K., & Ostling, R. (2022). Bootstrapping low-resource machine translation with multilingual pretraining. Machine Translation, 36(2), 123–145. https://doi.org/10.1007/s10590-021-09284-y

• Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. Proceedings of ACL 2020, 6282–6293. https://doi.org/10.18653/v1/2020.acl-main.560

• Kakwani, D., Siddhant, A., Goyal, N., Aggarwal, V., Raghavan, S., ... & Khapra, M. M. (2020). IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. Findings of EMNLP 2020, 4948–4961. https://doi.org/10.18653/v1/2020.findings-emnlp.445

• Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., ... & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474. https://arxiv.org/abs/2005.11401

• Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. https://arxiv.org/abs/1907.11692

• Nekoto, W., Marivate, V., Matsila, T., Fasubaa, T., Fagbohungbe, T., ... & Adelani, D. (2020). Participatory research for low-resourced machine translation: A case study in African languages. Findings of EMNLP 2020, 2144–2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195

• Ponti, E. M., Glavaš, G., Vulić, I., Reichart, R., Korhonen, A., & Bianchi, F. (2020). XCOPA: A multilingual dataset for causal commonsense reasoning. Proceedings of EMNLP 2020, 2362–2376. https://doi.org/10.18653/v1/2020.emnlp-main.187

• Ruder, S., Peters, M., Swayamdipta, S., & Wolf, T. (2019). Transfer learning in natural language processing. Proceedings of NAACL-HLT: Tutorials, 15–18. https://doi.org/10.18653/v1/N19-5004

• Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. Proceedings of AMTA 2006, 223–231. https://aclanthology.org/2006.amta-papers.25

• Wu, S., Cotterell, R., & Søgaard, A. (2020). Emerging cross-lingual structure in pretrained language models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6022–6034. https://doi.org/10.18653/v1/2020.acl-main.536

Transfer Learning in Low-Resource Language Processing Applications

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Make a Submission

Keywords

Information

Latest publications

Similar Articles

Data Sovereignty in Global AI-Blockchain Infrastructure

Blockchain Adoption Barriers in Developing Countries

Federated Learning Over Blockchain for Collaborative AI Training

AI-Powered Intrusion Detection Systems in Blockchain Networks

Adversarial Attacks in Computer Vision: Challenges and Defense Strategies

Machine Learning for Urban Air Quality Forecasting

AI-Driven Smart Contract Optimization in Financial Derivatives

Real-Time AI-Powered Fraud Detection in Mobile Payment Apps

Resume Parser and Auto-Formatter Using NLP

AI-Assisted Consensus Mechanisms for Scalable Blockchain Networks