Directive stimulus prompting (DSP) is useful when a task calls for a specific set of responses, very much context-sensitive.
Directional stimulus prompting (DSP) is a new prompting methodology in natural language processing (NLP) wherein a model is presented with a directive or structured stimulus to generate wanted outputs.
Unlike standard prompting such as one-shot, zero-shot or few-shot prompting, this approach distinguishes itself by giving direct control over the model's output by establishing criteria or providing instruction. In this approach, a guiding stimulus acts as a control mechanism of the model's generative process along lines defined by a certain criterion.
Directive stimulus prompting (DSP) is useful when a task calls for a specific set of responses, very much context-sensitive, but still without labeled data.
For instance, in the case of summarization tasks, where retaining essential information is crucial, DSP provides a guiding stimulus that nudges the model to produce in a specific way. This leads to the overall generation of more accurate and contextually appropriate summaries.1
Large language models (LLMs) such as GPT-3, 4 and PaLM are commonly referred to as "black box" models because users do not have access to their internals, such as parameters, tuning methods or decision-making processes.
Such interaction is essentially through text prompts that use application programming interface (API) calls as the major input and output mechanisms. While these models are quite excellent, their capability to produce precise task-specific outputs is often highly contingent on prompt quality.2, 3
With this, prompt engineering to design targeted prompts to steer model behavior is relevant. Both manual and automated approaches to prompt engineering have yielded notable success. However, they do not come without bitter pills, especially for those tasks that call for strong control or much instance-specific output.
For example, tasks such as summarization or dialogue generation require the model to follow target behaviors systematically, such as including key details or adhering to a strict reasoning pattern or prescribed stylistic guidelines. Conventional techniques are often not enough to guarantee consistent compliance with these nuanced requirements.
Directional stimulus prompting (DSP) comes to fill this gap. DSP is a small auxiliary policy model and generates instance-specific directional stimulus prompts that guide the LLM toward its decisions.
The prompts issued serve a specific context for each instance and are seen to coax the LLM to yield more aligned and desirable outputs. By plugging DSP into the process, users have a powerful tool to correct the behavior of black box LLMs to greater consistency, relevance and accuracy in work that needs precision.1
Training the policy model with supervised fine-tuning (SFT)
The process of training the policy model begins with supervised fine-tuning (SFT) on a pretrained model such as T5, GPT-2 or any other suitable LLM. The key idea is to fine-tune a smaller policy model on training data that generates directional stimuli rather than directly modifying the LLM.
This process is efficient because fine-tuning a smaller, task-specific policy model avoids the challenges and computational costs associated with training large, complex models directly.
To train this policy model, a small labeled dataset is created, where each input is paired with a pseudostimulus. These pseudo stimuli are designed to guide the LLM's responses in the wanted direction based on the task at hand.
For instance, in a summarization task, the pseudostimulus could consist of keywords or phrases drawn from a reference summary. Similarly, for dialogue generation tasks, dialogue acts such as requests, questions or statements can be used as pseudo stimuli.
These stimuli serve as signals that the policy model uses to generate task-specific inputs that effectively direct the LLM's output toward the target behavior.
The labeled dataset used for SFT might be relatively small, as the focus is on providing the policy language model with the necessary knowledge to generate stimuli, not training a massive LLM from scratch. This makes SFT a resource-efficient way to bootstrap the policy model with foundational knowledge about the task-specific requirements.4
Refinement through reinforcement learning (RL)
After the initial fine-tuning with SFT, the policy model is optimized through reinforcement learning (RL). RL enables the policy model to explore and refine its ability to generate stimuli that lead to higher-quality LLM outputs. The core idea in this phase is to use a reward function to evaluate the effectiveness of the generated stimuli.
For example, in summarization tasks, the reward function could be based on metrics such as ROUGE or BLEU scores, which measure the quality of the generated summary in comparison to refer to.
By focusing on training the policy model instead of the LLM directly, DSP overcomes the challenges associated with fine-tuning black box models, leading to a more efficient and scalable method.
Figure 1: Architecture of DSP framework
Directional stimulus prompting has notable advantages and some challenges, making it an intriguing yet intricate technique. Here’s a closer examination of its merits and demerits.5
Pros:
Targeted attention mechanism: The targeted attention mechanism in DSP emphasizes relevant tokens or information, enhancing accuracy and efficiency by concentrating processing on essential components.
Optimized resource usage: By concentrating on pertinent stimuli, directional stimulus prompting reduces dataset requirements, resulting in faster processing times and lower computational costs.
Enhanced precision: By isolating and emphasizing the most relevant input tokens, directional stimulus prompting boosts the accuracy of language model responses and interpretations.
Adaptability: This approach can be customized for various language tasks, ranging from text generation to sentiment analysis, offering versatility across different natural language processing applications.
Cons:
Reliance on accurate cues: The success of directional stimulus prompting heavily relies on precise stimuli, which can be challenging to achieve in complex or noisy environments. If the context or stimuli undergo significant changes, the method's effectiveness might diminish, resulting in reduced reliability.
Configuration complexity: Setting up directional stimuli needs careful design and calibration, which can make the initial configuration process more complicated.
Limited generalization: Its capacity to generalize across different signal types or unexpected input variations is limited, restricting its applicability in wider contexts.
Directive stimulus prompting (DSP) shows great potential across various NLP tasks, effectively guiding models to enhance their performance.
Summarization: DSP is used to create wanted summaries that align more closely with reference summaries. In an experimental result, using a small dataset of just 4,000 samples from the CNN/Daily Mail dataset, DSP improved benchmark performances such as ROUGE and BLEU or other measures including human preferences scores by 4–13%, surpassing some fully supervised models.6
Dialogue response generation: In task-oriented dialogue generation, DSP assisted ChatGPT in producing more accurate and relevant responses. For example, with only 80 dialogues from the MultiWOZ dataset, DSP achieved a performance boost of 41.4%, outpacing several state-of-the-art models (such as ChatGPT, Codex and InstructGPT) trained on larger datasets.7
Chain-of-thought reasoning: DSP also enhances chain-of-thought reasoning by generating instance-specific prompts that outperformed human-designed and automatically generated task-specific prompts, leading to improved reasoning accuracy. These examples illustrate how DSP can offer targeted guidance, enhancing model performance across a range of NLP applications.8
1 Zekun Li, Baolin Peng, Pengcheng He, Michel Galley, Xifeng Yan, jianfeng gao,(Microsoft, 22nd Feb 2023), Guiding Large Language Models via Directional Simulus Prompting, arXiv:2302.11520.
https://github.com/Leezekun/Directional-Stimulus-Prompting.
2 Sun, T., et.al, Black-box tuning for language-model as-a-service. In International Conference on Machine Learning, pp. 20841–20855. PMLR, 2022.
3 OpenAI. Gpt-4 technical report, 2023.
4 Wanwei He, et al., Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10749–10757, 2022.
5 Fei Liu (11th October 2024), A Systematic Survey on Large Language Models for Algorithm Design. arXiv: 2410.14716.
6 Goyal, T., Li, J. J., and Durrett, G. News summarization and evaluation in the era of GPT-3. arXiv preprint arXiv: 2209.12356, 2022.
7 Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C., and Zaharia, M. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv: 2212.14024, 2022.
8 Shi, W., Min, S., Yasunaga, M., Seo, M., James, R., Lewis, M., Zettlemoyer, L., and Yih, W.-t. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv: 2301.12652, 2023.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.
Learn how CEOs can balance the value generative AI can create against the investment it demands and the risks it introduces.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Learn how to confidently incorporate generative AI and machine learning into your business.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
IBM® Granite™ is our family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Learn how to select the most suitable AI foundation model for your use case.
Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.