Let's read some LLMs paper in April!
Stay Up to Date with Latest Large Language Models Research.
Large language models (LLMs) have significantly advanced in recent years, necessitating continuous research and development. In this article, I summarizes important LLM papers published during April 2024, covering topics such as model reasoning, performance enhancement, or even traits of personality in LLMs. Staying updated on these research areas will guide continued progress towards more robust, human-centered models.
1. Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Authors: Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui and Furu Wei
Date: 4 April 2024
Link: https://arxiv.org/pdf/2404.03622
Summary: The article suggest Visualization-of-Thought (VoT) prompting, which enables spatial reasoning by providing LLMs with a visuospatial sketchpad to represent their reasoning stages and inform following ones. This approach uses zero-shot prompting rather than few-shot examples or CLIP for text-to-image visualization. The study assessed the efficiency of VoT in spatial reasoning by focusing on three activities that demand spatial awareness in LLMs: natural-language navigation, visual navigation, and visual tiling. The results showed that VoT prompting regularly causes LLMs to visualize their reasoning steps and inform following steps, resulting in significant performance increases on the associated tasks.
Personal thoughts: Beside what have been told, I find this paper interesting since it discusses the Mind’s Eye, which allows humans to have the unique ability to generate mental representations of unseen objects and actions, allowing them to imagine the unseen world.
2. Advancing LLM Reasoning Generalists with Preference Trees
Authors: Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu and Maosong Sun.
Date: 02 April 2024
Link:
- Article: https://arxiv.org/pdf/2404.02078
- Github: https://github.com/OpenBMB/Eurus.git
Summary: The article introduces EURUS, a collection of big language models optimized for reasoning, derived from Mistral-7B and CodeLlama-70B. It outperforms open-source models on a variety of metrics, including mathematics, code creation, and logical reasoning challenges. EURUS’s excellent performance is due to ULTRAINTERACT, a large-scale alignment dataset built for difficult reasoning tasks. ULTRAINTERACT features a preference tree for each instruction, enabling for a thorough investigation of preference learning for reasoning tasks.
3. Editing Personality For Large Language Models
Authors: Shengyu Mao, Xiaohan Wang, Mengru Wang, Yong Jiang, Pengjun Xie, Fei Huang and Ningyu Zhang
Date: 04 April 2024
Link: https://arxiv.org/pdf/2310.02168
Summary: The paper introduces a task to edit personality traits in Large Language Models (LLMs) by adjusting their responses to opinion-related questions. It constructs PersonalityEdit, a new benchmark dataset based on Social Psychology theory, using three representative traits: Neuroticism, Extraversion, and Agreeableness. Data is collected using GPT-4, generating responses aligned with the topic and embodying the targeted personality trait. The study evaluates multiple model editing methods and proposes a benchmark, PersonalityEdit, based on the big-five-factor structure. It also analyzes LLM behaviors before and after personality editing.
Personal thoughts: From my point of view, this paper would play a vital role in the field of investigating personality traits in LLMs. I have done a small demo as a proof-of-concept for the idea that LLMs can act as if they have a persona. I’ll leave the link here.
4. Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Authors Jooyoung Lee, Fan Yang, Thanh Tran, Qian Hu, Emre Barut, Kai-Wei Chang and Chengwei Su
Date: 04 April 2024
Link: https://arxiv.org/pdf/2404.03414
Summary: The LM-Guided CoT framework uses a lightweight language model to guide a large language model in reasoning tasks. The lightweight model generates rationales for each input instance, while the large model predicts task outputs. The approach is resource-efficient and optimizes knowledge distillation and reinforcement learning. Experiments on an extractive multihop question answering task using HotpotQA and 2WikiMultiHopQA are conducted. The results show that LM-guided CoT prompting outperforms both standard prompting and the original CoT prompting. The framework offers a unique alternative to direct optimization of large LMs by fine-tuning the smaller LM, and it provides practitioners with greater control over each task.
5. THOUGHTSCULPT: Reasoning with Intermediate Revision and Search
Authors: Yizhou Chi, Kevin Yang, Dan Klein
Date: 09 April 2024
Link: https://arxiv.org/abs/2404.05966
Summary: THOUGHTSCULPT is a broad reasoning and search algorithm for tasks with broken down outputs. It employs Monte Carlo Tree Search (MCTS) to construct solutions one action at a time and evaluates them using a domain-specific criteria. The method comprises revision actions and outperforms cutting-edge reasoning methods in three difficult tasks: story outline improvement, mini-crosswords solving, and constrained generation.
6. Leave No Context Behind
Authors: Tsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal
Date: 10 April 2024
Link: https://arxiv.org/pdf/2404.07143
Summary: This paper presents an effective way for scaling Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. The authors proposed strategy includes a new attention technique known as Infini-attention. Infini-attention combines compressive memory with masked local and long-term linear attention in a single Transformer block. The technique is effective for long-context language modeling benchmarks, 1M sequence-length passkey context block retrieval, and 500K book summarizing tasks using 1B and 8B LLMs. The proposed solution uses minimally constrained memory parameters to enable fast streaming inference for LLMs.
7. Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Author: Ye Tian , Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi and Dong Yu
Date: 18 April 2024
Link: https://arxiv.org/pdf/2404.12253
Summary: Large Language Models (LLMs) are capable of complex reasoning and planning tasks, but struggle with complex scenarios. Recent work suggests advanced prompting techniques and fine-tuning with high-quality data to enhance LLMs’ reasoning abilities. However, data availability and quality limit these approaches. Self-correction and self-learning are viable solutions, but their efficacy in self-refining is uncertain. This paper introduces AlphaLLM, which integrates Monte Carlo Tree Search (MCTS) with LLMs to enhance their capabilities without additional annotations. Drawing inspiration from AlphaGo, AlphaLLM addresses challenges like data scarcity, vast search spaces, and subjective feedback in language tasks. Experimental results show that AlphaLLM significantly enhances LLMs’ performance without additional annotations.
8. A Survey on Self-Evolution of Large Language Models
Authors: Zhengwei Tao, Ting-En Lin, Xiancai Chen , Hangyu Li , Yuchuan Wu, Yongbin Li , Zhi Jin, Fei Huang , Dacheng Tao and Jingren Zhou
Date: 22 April 2024
Link:
- Article: https://arxiv.org/pdf/2404.14387
- Github: https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/Awesome-Self-Evolution-of-LLM
Summary: Self-evolution approaches, which enable LLMs to acquire, refine, and learn from experiences generated by the model, are rapidly growing. This work presents a comprehensive survey of self-evolution approaches in LLMs, proposing a conceptual framework, categorizing evolution objectives, summarizing literature, and proposing future directions to improve self-evolution frameworks.
9. How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs’ internal prior
Authors: Kevin Wu, Eric Wu, James Zou
Date: 16 April 2024
Link: https://arxiv.org/abs/2404.10198
Summary: The study examines the relationship between a large language model’s internal knowledge and retrieved information in situations where they disagree. It found that providing the correct retrieved information fixes most model mistakes (94% accuracy). However, when the reference document is perturbed with increasing levels of wrong values, the LLM is more likely to recite incorrect, modified information when its internal prior is weaker but more resistant when its prior is stronger. The more modified information deviates from the model’s prior, the less likely the model is to prefer it. These results highlight an underlying tension between a model’s prior knowledge and the information presented in reference documents.
10. Make Your LLM Fully Utilize the Context
Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng and Jian-Guang Lou
Date: 26 April 2024
Link:
- Article: https://arxiv.org/pdf/2404.16811
- Github: https://github.com/microsoft/FILM.git
Summary: The lost-in-the-middle dilemma is a problem addressed by large language models (LLMs) when employing information within extended contexts. This issue is due to insufficient explicit monitoring during long-context training. A study presents INformation-INtensive (IN2) training, a data-driven solution based on a synthesised long-context question-and-answer dataset. FILM-7B (FILlin-the-Middle) was created using this training. The study used probing tasks to evaluate FILM-7B’s capacity to use lengthy contexts. The results reveal that FILM-7B can retrieve information from various points inside its 32K context window, boosting performance on real-world long-context tasks while remaining similar on short-context tasks.
11. CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Authors: Makesh Narsimhan Sreedhar, Traian Rebedea, Shaona Ghosh and Christopher Parisien
Date: 4 April 2024
Link: https://arxiv.org/pdf/2404.03820
Summary: The CantTalkAboutThis dataset aims to align language models with task relevance in conversations, a crucial aspect for chatbot deployment. It consists of synthetic dialogues on various conversation topics, with distractor turns to maintain topic coherence. Fine-tuning language models on this dataset improves their resilience to deviating from their assigned roles and enhances performance on fine-grained instruction following tasks, compared to general-purpose instruction-tuned LLMs like GPT-4-turbo and Mixtral-Instruct.
12. Social Skill Training with Large Language Models
Authors: Diyi Yang, Caleb Ziems, William Held, Omar Shaikh, Michael S. Bernstein and John Mitchell
Date: 5 Apr 2024
Link: https://arxiv.org/pdf/2404.04204
Summary: The article proposes large language models that can help make social skill training more accessible, safe, and inviting. Two complementary visions of AI assistance are proposed: the AI Partner, which provides a scalable solution to experiential learning through simulated practice, and the AI Mentor, which offers personalized feedback based on domain expertise and factual knowledge. These frameworks aim to merge experiential learning with realistic practice and tailored feedback, reducing the socioeconomic barrier to entering specialized fields. Cross-disciplinary innovation is needed to address the broad implications of these approaches.
All rights reserved