Date of Award

12-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Committee Chair/Advisor

Feng Luo

Committee Member

Rong Ge

Committee Member

Long Cheng

Committee Member

Nianyi Li

Abstract

The large language models play an important role in many natural language tasks. However, training these models requires large amounts of data, which is not available for many languages. A noticeable performance gap exists between English and other languages, with low-resource languages showcasing this gap prominently. Therefore, it becomes imperative to improve large language models for low-resource languages. To address these challenges, we developed knowledge distillation and strategic prompt-learning, and attention alignment methods to improve the representation capabilities of large language models for low-resource language, and then enhanced their performance in downstream tasks.

In our first study, we developed a new knowledge distillation method to transfer knowledge encoded in English to low-resource languages for pre-trained multilingual language models. Existing knowledge distillation methods did not focus on learning the semantic structure of representation, and thus could not optimize their performance for multilingual language models. We proposed Multi-level Multilingual Knowledge Distillation (MMKD), a novel method for improving multilingual language models. Specifically, we employed a teacher-student framework to adopt rich semantic representation knowledge in English BERT. We proposed token-, word-, sentence-, and structure-level alignment objectives to encourage multiple levels of consistency between source-target pairs and correlation similarity between teacher and student models. We conducted experiments on cross-lingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD. Experimental results showed that MMKD outperforms other baseline models of similar size on XNLI and XQuAD and obtains comparable performance on PAWS-X. Especially, MMKD obtains significant performance gains on low-resource languages.

In the second study, we developed a prompt-learning method to fine-tune multilingual language models effectively. Prompt-learning has shown impressive performance on the cross-lingual natural language inference task. However, prior works relied on external tools like human-designed multilingual templates and bilingual dictionaries, hindering their adaptability to other languages. This may not be feasible in a low-resource regime. Meanwhile, fine-tuning the entire model along with prompt-related parameters can result in overfitting in the few-shot scenario. We introduced the Lottery Ticket Prompt-learning (LTP) framework that integrated the Lottery Ticket Hypothesis into cross-lingual prompt-learning. Specifically, we selected a subset of parameters that have been changed the most during the fine-tuning with the Masked Language Modeling objective. Then, we prepended soft prompts to the original pre-trained language model and only updated the selected parameters together with prompt-related parameters when adapting to the downstream tasks. We demonstrated the effectiveness of our proposed LTP framework on the XNLI dataset in the few-shot setting. Our approach outperformed the baselines by only updating 20\% of the original parameters.

In the third study, we explored a method to unlock Chain-of-Thought (CoT) reasoning for multilingual language models through attention alignment. While CoT reasoning has demonstrated success in enhancing reasoning capabilities in English, extending this to low-resource languages presents unique challenges. Additionally, the process of effectively activating reasoning in these languages remains unclear. To address this, we proposed an attention-guided prompt optimization (AttnPO) framework that aligns the model’s attention with LLM-suggested key elements in multilingual contexts. By analyzing attention matrices and refining prompts based on attention alignment, our framework enhances CoT reasoning in multilingual language models, enabling structured, step-by-step reasoning across languages. Experimental results on multilingual tasks such as commonsense reasoning and math problems demonstrated that this approach improves model accuracy, particularly in low-resource languages, by guiding the model’s focus to relevant parts of the input.

In summary, our proposed MMKD, LTP, and AttnPO have demonstrated their effectiveness in improving multilingual performance, particularly in low-resource scenarios, using a limited amount of data. Together, these approaches help narrow the performance gap between high-resource and low-resource languages, promoting greater linguistic equality and accessibility in language model applications.

Recommended Citation

Li, Mingqi, "Enhancing Low-Resource Language Performance in Multilingual Large Language Models" (2024). All Dissertations. 3782.
https://open.clemson.edu/all_dissertations/3782

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

All Dissertations

Enhancing Low-Resource Language Performance in Multilingual Large Language Models

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Search

Browse by

Useful Links

All Dissertations

Enhancing Low-Resource Language Performance in Multilingual Large Language Models

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

Search

Browse by

Useful Links