# Lecture 10: Prompting (Chapter 3)

**CSC 375/575 - Generative AI**  
**Prof. Rongyu Lin, Quinnipiac University**

Based on Chapter 3 "Prompting" from *Foundations of Large Language Models* (NiuTrans Research)

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image

# 3.1 Prompt Design

## Introduction

**Prompting** is the process of providing specific instructions or examples to guide an LLM in generating desired outputs.

### Mathematical Foundation

Given a prompt $\mathbf{x}$ (input text), the model generates output $\mathbf{y}$ by maximizing:

$$\text{Pr}(\mathbf{y} \mid \mathbf{x})$$

The model predicts the most likely continuation token-by-token.

## 3.1.1 Components of a Prompt

The term prompt is used in many different ways. In this chapter we define a **prompt** as the input text to an LLM, denoted by $\mathbf{x}$. The LLM generates a text $\mathbf{y}$ by maximizing the probability $\operatorname{Pr}(\mathbf{y} \mid \mathbf{x})$. In this generation process, the prompt acts as the condition on which we make predictions, and it can contain any information that helps describe and solve the problem.

### Prompt Templates

A prompt can be obtained using a **prompt template** (or template for short) [Liu et al., 2023a]. A template is a piece of text containing placeholders or variables, where each placeholder can be filled with specific information. Here are two templates for asking the LLM for weekend suggestions (from textbook, Page 104-105):

**Template 1:**
```
Please give me some suggestions for a fun weekend.
```

**Template 2:**
```
If {premise}, what are your suggestions for a fun weekend.
```

In the first template, we simply instruct the LLM to return some suggestions. So the template is just a piece of text with no variables. In the second template, the variable `{premise}` needs to be specified by the users to provide a premise for making suggestions. For example, if we input:

$$\text{premise} = \text{the weather is nice this weekend}$$

then we can generate a prompt:

```
If the weather is nice this weekend, what are your suggestions for a fun weekend.
```

**Example: Multiple variables (from textbook, Page 105)**

We can also design a template with multiple variables. Here is an example in which we compare the two sentences in terms of their semantic similarity:

```
Here is a sentence
{sentence1}
Here is another sentence
{sentence2}
Compute the semantic similarity between the two sentences
```

### Common Prompt Formats

**1. "name:content" Format**

A popular way to format prompts is to write each input or output in a "name:content" style. For example, we can describe a conversation between two people, named John and David, and use the LLM to continue the conversation. A template of such prompts is given by:

```
John: {utterance1}
David: {utterance2}
John: {utterance3}
David: {utterance4}
John: {utterance5}
David: {utterance6}
John: {utterance7}
David: _
```

**2. Question-Answer Format**

The "name:content" format can be used to define the task that we want the LLM to perform. For example, given that "Q" and "A" are commonly used abbreviations for "Question" and "Answer", respectively, we can use the following template to do question-answering:

```
Q: {question}
A: _
```

**3. Task Specification Format**

This format can be used to describe more complex tasks. For example, the following is an example of providing a specification for a translation task (from textbook, Page 105):

```
Task: Translation
Source language: English
Target language: Chinese
Style: Formal text
Template: Translate the following sentence: {sentence}
```

**Footnote 1 (from textbook):** In practical systems, it is common to represent and store such data in key-value pairs, such as the JSON format.

**4. Role-Based Prompts with System Information**

When the problem is difficult to describe in an attribute-based manner, it is more common to instruct LLMs with a clear and detailed description. One example is to assign a role to LLMs and provide sufficient context. The following is a template that instructs an LLM to act as an expert and answer questions from children (from textbook, Page 106):

```
You are a computer scientist with extensive knowledge in the field of deep learning.
Please explain the following computer-related concept to a child around 10 years old,
using simple examples whenever possible.
{concept}
```

Here the text "You are a computer scientist ... deep learning." is sometimes called **system information**, and is provided to help the LLM understand the context or constraints of the task it is being asked to perform.

### Zero-Shot Learning

Zero-shot learning, as its name implies, does not involve a traditional "learning" process. It instead directly applies LLMs to address new problems that were not observed during training. In practice, we can repetitively adjust prompts to guide the LLMs in generating better responses, without demonstrating problem-solving steps or providing examples.

**Example: Grammar correction (from textbook, Page 106-107)**

Consider the following example. Suppose we want to use an LLM as an assistant that can help correct English sentences. A zero-shot learning prompt is given by:

**INPUT (Prompt):**
```
SYSTEM: You are a helpful assistant, and are great at grammar correction.
USER: You will be provided with a sentence in English. The task is to output the
correct sentence.
Input: She don't like going to the park.
Output:
```

**OUTPUT (LLM Response):**
```
She doesn't like going to the park.
```

Here the gray words (in textbook) are used to indicate different fields of the prompt.

### Additional Few-Shot Learning Examples

**Example 1: Translation Pattern (from textbook, Page 108)**

It is also possible to use simpler patterns to achieve few-shot learning. For example, one can use the following few-shot learning prompt for translating words from Chinese to English:

| DEMO | 现在 | → | now |
|:---:|:---:|:---:|:---:|
|  | 来 | → | come |
|  | 去 | → | go |
|  | 男孩 | → | boy |
| USER | 女孩 | → |  |

**Example 2: Mathematical Reasoning (from textbook, Page 108)**

If the LLM is powerful enough, few-shot learning can enable it to address complex problems, such as mathematical reasoning. For example, consider the following task of summing two numbers and then dividing the sum by their product:

| DEMO | 125 | → | $(12+5)/(12 \times 5)=0.283$ |
|:---:|:---:|:---:|:---:|
|  | 31 | → | $(3+1)/(3 \times 1)=1.33$ |
|  | -94 | → | $(-9+4)/(-9 \times 4)=0.138$ |
|  | 1515 | → | $(15+15)/(15 \times 15)=0.133$ |
| USER | 1973 | → |  |

### Effectiveness of In-Context Learning

In many practical applications, the effectiveness of in-context learning relies heavily on the quality of prompts and the fundamental abilities of pre-trained LLMs. On one hand, we need a significant prompt engineering effort to develop appropriate prompts that help LLMs learn more effectively from demonstrations. On the other hand, stronger LLMs can make better use of in-context learning for performing new tasks.

**Example: Limitations (from textbook, Page 108)**

For example, suppose we wish to use an LLM to translate words from Inuktitut to English. If the LLM lacks pre-training on Inuktitut data, its understanding of Inuktitut will be weak, and it will be difficult for the model to perform well in translation regardless of how we prompt it. In this case, we need to continue training the LLM with more Inuktitut data, rather than trying to find better prompts.

### Theoretical Perspectives

It might be interesting to explore how in-context learning emerges during pre-training and why it works during inference. One simple understanding is that LLMs have gained some knowledge of problem-solving, but there are many possible predictions, which are hard to distinguish when the models confront new problems. Providing demonstrations can guide the LLMs to follow the "correct" paths.

Furthermore, some researchers have tried to interpret in-context learning from several different perspectives, including (from textbook, Page 108):

- **Bayesian inference** [Xie et al., 2022]
- **Gradient descent** [Dai et al., 2023; Von Oswald et al., 2023]
- **Linear regression** [Akyürek et al., 2023]
- **Meta learning** [Garg et al., 2022]
- And so on.

---

### Exercise: Zero-shot vs Few-shot

Try both with `ollama run llama3.2` and compare results:

```
Classify sentiment: "This restaurant was terrible."
```

```
"Amazing service!" -> Positive
"Horrible food" -> Negative
"This restaurant was terrible." -> ?
```

---

## 3.1.3 Prompt Engineering Strategies

Designing prompts is highly empirical. In general, there are many ways to prompt an LLM for performing the same task, and we need to perform a number of trial-and-error runs to find a satisfactory prompt. To write good prompts more efficiently, one can follow certain strategies. Examples of common prompting principles include (from textbook, Page 109):

### Strategy 1: Describing the task as clearly as possible

When we apply an LLM to solve a problem, we need to provide a precise, specific, and clear description of the problem and instruct the LLM to perform as we expect. This is particularly important when we want the output of the LLM to meet certain expectations.

**Example (from textbook, Page 109):**

For example, suppose we are curious about climate change. A simple prompt for asking the LLM to provide some information is:

```
Tell me about climate change.
```

Since this instruction is too general, the LLM may generate a response that addresses any aspect of climate change, which may not align with our specific interests. In this case, we can instead use prompts that are specific and detailed. One such example is:

```
Provide a detailed explanation of the causes and effects of climate change, including
the impact on global temperatures, weather patterns, and sea levels. Also, discuss
possible solutions and actions being taken to mitigate these effects.
```

Now suppose we intend to explain climate change to a 10-year-old child. We can adjust the above prompt further:

```
Explain the causes and effects of climate change to a 10-year-old child. Talk about
how it affects the weather, sea levels, and temperatures. Also, mention some things
people are doing to help. Try to explain in simple terms and do not exceed 500 words.
```

### Strategy 2: Guiding LLMs to think

LLMs have exhibited surprisingly good capabilities to "think". A common example is that well-developed LLMs have achieved impressive performance in mathematical reasoning tasks, which are considered challenging. In prompt engineering, the "thinking" ability of LLMs needs to be activated through appropriate prompting, especially for problems that require significant reasoning efforts.

In many cases, an LLM that is instructed to "think" can produce completely different results compared with the same LLM that is instructed to perform the task straightforwardly. For example, Kojima et al. [2022] found that simply appending **"Let's think step by step"** to the end of each prompt can improve the performance of LLMs on several reasoning tasks.

LLMs can be prompted to "think" in a number of ways. One method is to instruct LLMs to generate steps for reasoning about the problem before reaching the final answer.

**Method 1: Instruct LLMs to generate reasoning steps**

For example, consider a task of solving mathematical problems. See below for a simple prompt for this task (from textbook, Page 110):

```
You are a mathematician. You will be provided with a math problem. Please solve the problem.
```

Since solving math problems requires a detailed reasoning process, LLMs would probably make mistakes if they attempted to work out the answer directly. So we can explicitly ask LLMs to follow a given reasoning process before coming to a conclusion:

```
You are a mathematician. You will follow these detailed reasoning steps when solving
math problems.

Step 1: Problem Interpretation.
The mathematician carefully listens to your query and understands the intricate details
of the mathematical challenge you have presented.

Step 2: Strategy Formulation.
Drawing upon their extensive knowledge, the mathematician chooses the most effective
strategy tailored to the type of math problem, whether it is algebra, calculus, or
geometry.

Step 3: Detailed Calculation.
With precision and expertise, the mathematician performs the necessary calculations
step by step, adhering to all mathematical principles.

Step 4: Solution Review.
Before providing the final answer, the mathematician meticulously checks the
calculations for accuracy and offers a concise explanation or rationale for the solution.

You will be provided with a math problem. Please solve the problem.
{problem}
```

**Method 2: Multiple rounds of interaction with LLMs**

Another method to guide LLMs to "think" is through multiple rounds of interaction with LLMs. For example, as a first step, we can instruct LLMs to solve the problem directly (from textbook, Page 110-111):

```
You will be provided with a math problem. Please solve the problem.
{problem}
```

Now we have an initial answer to the problem. As a second step, we prompt LLMs to evaluate the correctness of the answer and, if necessary, rework it to find a better solution:

```
You will be provided with a math problem, along with a solution. Evaluate the
correctness of this solution, and identify any errors if present. Then, work out
your own solution.
Problem: {problem}
Solution: {solution}
```

The prompts presented here are closely related to a long line of research on reasoning problems in LLMs. It is impossible to provide a complete discussion of all related issues because this topic covers a large family of methods. But we will see a relatively more detailed discussion on how to improve prompting through more reasoning in Section 3.2.

### Strategy 3: Providing reference information

As discussed in the previous section, we can include demonstrations in prompts and allow LLMs to in-context learn from these demonstrations how to perform the task. In fact, given the remarkable ability of language understanding of LLMs, we can add any type of text into the prompts and so these models can predict based on enriched contexts.

In many applications, we have various information that is relevant to user queries. Instead of using LLMs to make unconstrained predictions, we often want LLMs to produce outputs that are confined to the relevant text. One such example is RAG, where the relevant text for the user query is provided by calling an IR system, and we prompt LLMs to generate responses based on this provided relevant text. The following prompt shows an example (from textbook, Page 111):

```
You are an expert that can generate answers to input queries. You have now been
provided with a query and the corresponding context information. Please generate an
answer based on this context information. Note that you need to provide the answer
in your own words, not just copy from the context provided.

Context information: {IR-result}
Query: {query}
```

If the context information is highly reliable, we can even restrict LLMs to answering using only the provided text. An example prompt is shown as follows (from textbook, Page 112):

```
You are an expert tasked with generating answers from input queries. You have been
provided with a query and corresponding context information, organized in a table
where each row represents a useful record. Please generate an answer using only this
context information. Ensure that you provide the answer in your own words.
Context information: {table}
Query: {query}
```

When dealing with real-world problems, we often have prior knowledge and additional information about the problems that help produce better answers. Considering such information in prompting is generally helpful in improving the result.

### Strategy 4: Paying attention to prompt formats

In general, the performance of LLMs is highly sensitive to the prompts we input. Sometimes a small modification to a prompt can lead to a big change in model output. An interesting example is that changing the order of sentences in a prompt may cause LLMs to generate different results. To make prompts easy to read and reduce ambiguity, it is common to format them in a way that ensures clarity.

**Formatting Approaches:**

1. **Define fields for prompts** and fill different information in each field
2. **Use code-style prompts** for LLMs which can understand and generate both natural language and code

**Example: Code-style prompt for translation (from textbook, Page 112):**

**INPUT (Prompt):**
```
[English] = [I have an apple.]
[German] = [Ich habe einen Apfel.]
[English] = [I have an orange.]
[German] =
```

**OUTPUT (LLM Response):**
```
[Ich habe eine Orange.]
```

LLMs can receive text in various formats. This allows us to use control characters, XML tags, and specific formatting to represent complex data. And it is useful to specify how the input and output should be formatted or structured. For example, we can delimit sections of text using quotes and prompt LLMs accordingly (e.g., adding a sentence like "the input text is delimited by double quotes" to the prompt).

Above, we have discussed only a few strategies for writing good prompts. There are, of course, many such methods, and one needs to develop their own through practice. Interested readers can refer to various online documents for more information, such as OpenAI's manual on the GPT series models.

**Footnote 2 (from textbook):** See https://platform.openai.com/docs/guides/prompt-engineering/six-strategies-for-getting-better-results.

---

### Exercise: Vague vs Specific Instructions

Compare the quality of responses:

```
Explain machine learning.
```

```
Explain supervised learning to a beginner in 3 sentences using a spam detection example.
```

---

## 3.1.4 More Examples

In this subsection, we consider more examples of prompting LLMs to perform various NLP tasks. The motivation here is not to give standard prompts for these tasks, but rather to use simple examples to illustrate how LLMs can be prompted to deal with NLP problems.

### 3.1.4.1 Text Classification

Text classification is perhaps one of the most common problems in NLP. Many tasks can be broadly categorized as assigning pre-defined labels to a given text. Here we consider the polarity classification problem in sentiment analysis. We choose polarity classification for illustration because it is one of the most popular and well-defined text classification tasks. In a general setup of polarity classification, we are required to categorize a given text into one of three categories: negative, positive, or neutral. Below is a simple prompt for doing this (for easy reading, we highlight the task description in the prompt) (from textbook, Page 113):

**INPUT (Prompt):**
```
Analyze the polarity of the following text and classify it as positive, negative, or
neutral.

Text:
The service at the restaurant was slower than expected, which was a bit frustrating.
```

**OUTPUT (LLM Response):**
```
The polarity of the text can be classified as negative.
```

**To make the example complete, we show the response generated by the LLM (underlined text in textbook).**

Although the answer is correct, the LLM gives this answer not in labels but in text describing the result. The problem is that LLMs are designed to generate text but not to assign labels to text and treat classification problems as text generation problems. As a result, we need another system to map the LLM's output to the label space (call it **label mapping**), that is, we extract "negative" from "The polarity of the text can be classified as negative". This is trivial in most cases because we can identify label words via simple heuristics. But occasionally, LLMs may not express the classification results using these label words. In this case, the problem becomes more complicated, as we need some way to map the generated text or words to predefined label words.

**Method 1: Cloze Task (from textbook, Page 114)**

One method to induce output labels from LLMs is to reframe the problem as a cloze task. For example, the following shows a cloze-like prompt for polarity classification:

```
Analyze the polarity of the following text and classify it as positive, negative, or
neutral.
```

Text:
The service at the restaurant was slower than expected, which was a bit frustrating.

The polarity of the text is _______

We can use LLMs to complete the text and fill the blank with the most appropriate word. Ideally, we wish the filled word would be positive, negative, or neutral. However, LLMs are not guaranteed to generate these label words. One method to address this problem is to constrain the prediction to the set of label words and select the one with the highest probability. Then, the output label is given by:

$$\text{label} = \underset{y \in Y}{\arg \max} \operatorname{Pr}(y \mid \mathbf{x})$$

where $y$ denotes the word filled in the blank, and $Y$ denotes the set of label words {positive, negative, neutral}.

**Method 2: Constrain output with prompts (from textbook, Page 114-115)**

Another method of using LLMs to generate labels is to constrain the output with prompts. For example, we can prompt LLMs to predict within a controlled set of words. Here is an example:

**INPUT (Prompt):**
```
Analyze the polarity of the following text and classify it as positive, negative, or neutral.

Text:
The service at the restaurant was slower than expected, which was a bit frustrating.

What is the polarity of the text?
Just answer: positive, negative, or neutral.
```

**OUTPUT (LLM Response):**
```
Negative
```

Sentiment analysis is a common NLP problem that has probably been well understood by LLMs through pre-training or fine-tuning. Thus we can prompt LLMs using simple instructions to perform the task. However, for new classification problems, it may be necessary to provide additional details about the task, such as the classification standards, so that the LLMs can perform correctly. To do this, we can add a more detailed description of the task and/or demonstrate classification examples in the prompts.

**Method 3: Detailed task description (from textbook, Page 115)**

To provide more detailed description of the task and/or demonstrate classification examples in the prompts, consider the following example:

**INPUT (Prompt):**
```
Analyze the polarity of the following text and classify it as positive, negative, or neutral.
Here's what each category represents:

Positive: This indicates that the text conveys a positive emotion or attitude. For example,
texts expressing happiness, satisfaction, excitement, or admiration are considered positive.

Negative: This refers to a text that expresses a negative emotion or attitude. It encompasses
feelings of sadness, anger, frustration, or criticism.

Neutral: Neutral sentiment is used to describe texts that do not exhibit clear positive or
negative emotions but instead convey informational, factual, or indifferent tones.

Text:
The service at the restaurant was slower than expected, which was a bit frustrating.

What is the polarity of the text?
```

**OUTPUT (LLM Response):**
```
Negative
```

While it seems straightforward to use LLMs for classification problems, there are still issues that have not been well addressed. For example, when dealing with a large number of categories, it remains challenging to effectively prompt LLMs. Note that if we face a very difficult classification problem and have a certain amount of labeled data, fine-tuning LLMs or adopting "BERT + classifier"-like architectures is also desirable.

---

### 3.1.4.2 Information Extraction

Many NLP problems can be regarded as information extraction problems, involving the identification or extraction of specific pieces of information from unstructured text. This information can include named entities, relationships, events, and other relevant data points. The goal of information extraction is to transform raw data into a format that can be easily analyzed and used in various downstream applications.

As information extraction covers a wide range of problems, we cannot discuss them all here. Instead, we start with the task of **named entity recognition** - a task that has long been a concern in NLP. Named entity recognition is a process that detects and classifies key information in text into specific groups. These key pieces of information, known as named entities, typically include proper names and are categorized into distinct classes such as people, locations, organizations, dates, monetary values, and percentages.

**Example 1: Extracting person names (from textbook, Page 116)**

Consider first a simple example of extracting person names from a text:

**INPUT (Prompt):**
```
Identify all person names in the provided text.

Text:
Is the UK really doing that badly or have travel trends shifted, post-pandemic? For Tom
Jenkins, CEO of the European Tourism Organisation, it's the latter. "I think the UK is
doing perfectly well but we'll see more people going to Europe," he says of 2024, adding
that there's "allegedly a slight plateau of American demand for the UK."
```

**OUTPUT (LLM Response):**
```
The person name identified in the provided text is: Tom Jenkins
```
```

**Footnote 3 (from textbook):** The text is from https://edition.cnn.com/travel

**Example 2: Extract all named entities (from textbook, Page 116-117)**

Then, we can extract all named entities:

**INPUT (Prompt):**
```
Identify and classify all named entities in the provided text into categories such as
person names, locations, dates, and organizations. List each entity with its type on one line.

Text:
Is the UK really doing that badly or have travel trends shifted, post-pandemic? For Tom
Jenkins, CEO of the European Tourism Organisation, it's the latter. "I think the UK is
doing perfectly well but we'll see more people going to Europe," he says of 2024, adding
that there's "allegedly a slight plateau of American demand for the UK."
```

**OUTPUT (LLM Response):**
```
1. Tom Jenkins - Person Name
2. European Tourism Organisation - Organization
3. UK - Location
4. Europe - Location
5. 2024 - Date
```

**Example 3: Relation extraction (from textbook, Page 117)**

Given these identified named entities, we can further process the text using other information extraction tasks. For example, we can identify the relationships between these named entities (call it **relation extraction**). The corresponding prompt is shown as follows:

**INPUT (Prompt):**
```
Given a text and a list of named entities identified within it, analyze and describe the
relationships between these entities. Explain how each entity is contextually related to others.

Text:
Is the UK really doing that badly or have travel trends shifted, post-pandemic? For Tom
Jenkins, CEO of the European Tourism Organisation, it's the latter. "I think the UK is
doing perfectly well but we'll see more people going to Europe," he says of 2024, adding
that there's "allegedly a slight plateau of American demand for the UK."

Named Entity:
Tom Jenkins, European Tourism Organisation, UK, Europe, 2024
```

**OUTPUT (LLM Response):**
```
Tom Jenkins - European Tourism Organisation: Tom Jenkins is the CEO of the European Tourism
Organisation. This indicates a professional association where Tom Jenkins likely has a
significant role in directing the organization's activities, especially those related to
tourism in Europe.
...
```

**General Template for Information Extraction (from textbook, Page 117)**

If LLMs have been fine-tuned with instruction following for information extraction tasks, it is relatively easy to perform various information extraction tasks. For example, the following is a prompt template for information extraction:

```
You will be provided with a text. Your task is to {task-description}
Text: {text}
```

Here `task-description` is the instruction for the task. Examples of the instruction include:

- extract the keywords from the text.
- extract the key events from the text.
- detect the coreference and anaphoric links between entities from the text.
- and so on.

Of course, we can provide a more detailed description of the task to prompt LLMs more accurately.

---

### 3.1.4.3 Text Generation

In NLP, the term text generation is used broadly to cover a wide range of different problems, many of which focus on generating text based on given text. These problems can be roughly categorized into two classes:

1. **Text completion**: Continual writing based on the input text
2. **Text transformation**: Transformation of the input text into another text (discussed in Section 3.1.4.4)

Here, we illustrate the prompts for the former.

**Example 1: Story continuation (from textbook, Page 118)**

A simple example of text completion is continuing to write a story starting with a few given sentences:

**INPUT (Prompt):**
```
You are a writer who is great at writing stories. You are provided with a few sentences
to start with. Please continue the story from the provided sentences.

Start of the story:
One sunny afternoon, Lily discovered a tiny, golden key hidden under a rock in her
backyard. Curious, she picked it up and wondered what it might unlock. Just then, her
cat, Whiskers, began to meow excitedly, leading her towards the old oak tree at the
edge of the garden.
```

**OUTPUT (LLM Response):**
```
Lily followed Whiskers, her heart pounding with excitement and a touch of nervousness.
The old oak tree had always seemed mysterious to her, its gnarled branches casting long
shadows even on the brightest days. As they reached the tree, Whiskers darted ahead and
started pawing at something buried beneath a pile of autumn leaves...
```

**Example 2: Conversation completion (from textbook, Page 118)**

Likewise, we can create a conversation given the conversational context:

**INPUT (Prompt):**
```
You are provided with a conversation between a tourist and a taxi driver. Please complete
the conversation.

Tourist: Could you take me to the downtown museum, please?
Taxi Driver: Of course! First time in the city?
Tourist: Yes, any must-see places around?
Taxi Driver: Definitely check out the botanical gardens after the museum. They're nearby
and quite beautiful.
Tourist: Great, thanks for the tip!
```

**OUTPUT (LLM Response):**
```
Taxi Driver: No problem. Enjoy your visit!
```

**Example 3: Content generation based on requirements (from textbook, Page 118-119)**

Another application of text completion is generating text based on user requirements. The requirements can include style, tone, length, and any specific content that the text should contain. For example, users may need a formal report, a creative story, or a piece of programming code. Here is a prompt for generating a five-character regulated Chinese poem:

**INPUT (Prompt):**
```
You are a poet. Please write a traditional Chinese poem. Here is the requirement.
Theme: Spring's rejuvenation
Structure: Five-character regulated poem
Emotion: Convey a sense of happiness and renewal through vivid spring imagery.
```

**OUTPUT (LLM Response):**
```
春意盎然
翠柳风中舞。
红花雨后新。
山明水又绿。
天宽鸟自频。
```

**Example 4: Code completion (from textbook, Page 119)**

If the LLM is trained to generate language and code, we can prompt it to perform code completion tasks. Here is an example:

**INPUT (Prompt):**
```
Please write a Python function to calculate the average of a list of numbers.
```

**OUTPUT (LLM Response):**
```python
def calculate_average(numbers):
    if numbers:  # Check if the list is not empty
        return sum(numbers) / len(numbers)
    else:
        return 0  # Return 0 if the list is empty
```

---

### 3.1.4.4 Text Transformation

Typical text transformation tasks include machine translation, summarization, text style transfer, and so on. The prompts used to perform these tasks are not significantly different from the ones presented previously. Here, we present a few examples to showcase these prompts (from textbook, Page 119-120).

**Example 1: Machine translation**

**INPUT (Prompt):**
```
Translate the following text from English to Spanish.
Text:
The quick brown fox jumps over the lazy dog.
```

**OUTPUT (LLM Response):**
```
El rápido zorro marrón salta sobre el perro perezoso.
```
```

**Example 2: Summarization**

```
Summarize the following article in no more than 50 words:

Article:
In recent years, urban areas have been facing significant challenges related to
sustainability and environmental management. Cities are experiencing higher rates of
pollution, increased traffic congestion, and greater demands on infrastructure. This
growth has led to numerous environmental issues, including elevated levels of air and
water pollution, increased waste production, and strained public services...

Urban areas are grappling with sustainability challenges, such as rising pollution,
traffic congestion, and infrastructure demands...
```

**Example 3: Style transfer**

```
Rewrite this text in a formal tone.
Text:
Hey, what's up? Long time no see!

Hello, how have you been? It has been quite some time since we last met!
```

## 3.1.4.5 Question Answering

The question-answering format is inherently simple. For a given question, there is an answer that corresponds to it. For example, in **open-domain question answering**, we expect the system to return an answer in response to a user-submitted question. Prompt templates for general-purpose question answering can be (from textbook, Page 120-121):

```
{question}
```

Or:

```
Question: {question}
Answer:
```

**Important:** Question answering is important in NLP because **many problems can be framed as question-answering tasks**. In particular, many recent reasoning tasks are defined in the form of question answering.

### Benchmark 1: MMLU

**MMLU (Massive Multitask Language Understanding)** [Hendrycks et al., 2021]

- Each example consists of a **multiple-choice question**
- LLMs are required to select the correct answer

**Example from textbook (Page 121):**

**INPUT (Prompt):**
```
(Z, *) is a group with a * b = a + b + 1 for all a, b in Z. 
The inverse of a is

(A) 0
(B) -2
(C) a - 2
(D) (2 + a) * -1
```

**OUTPUT (LLM Response):**
```
D
```

### Benchmark 2: GSM8K

**GSM8K (Grade School Math 8K)** [Cobbe et al., 2021]

- Consists of thousands of grade school math word problems
- For each problem, LLMs return a solution in **natural language**

**Example from textbook (Page 121-122):**

**INPUT (Prompt with demonstration):**
```
DEMO Q: Jerry's two daughters play softball on different teams. They each have 8 games 
this season. Each team practices 4 hours for every game they play. If each game lasts 
for 2 hours, how many hours will Jerry spend at the field watching his daughters play 
and practice altogether?

A: Jerry will spend 8 games * 2 hours per game = <<8*2=16>> 16 hours watching one 
daughter play her games. He will spend 16 * 2 = <<16*2=32>> 32 hours watching both 
daughters play their games. He will spend 8 games * 4 hours of practice = <<8*4=32>> 
32 hours watching one daughter practice. He will spend 32 * 2 = <<32*2=64>> 64 hours 
watching both daughters practice. He will spend a total of 32 hours watching games + 
64 hours watching practice = <<32+64=96>> 96 hours. #### 96
```

**Key Features (from textbook):**

- **Few-shot prompt** is adopted
- LLM learns from demonstrations: (1) the way of problem-solving, (2) the way of formatting output
- **Final result** follows the `####` token
- **Calculation annotation**: `<<...>>` annotates detailed calculation steps

**Footnote 4 (from textbook):** During prediction, a calculator is used when we see `<<...>>`. More specifically, once the LLM encounters "=" in a `<<...>>`, then the calculator calculates the expression on the left-hand side of "=". This method helps reduce the calculation errors made by LLMs.

**Additional GSM8K Examples (from textbook):**

```
Q: Mary bought six apples from the store. From the apples she bought, for each that 
Mary ate, she planted two trees from the remaining ones. How many apples did Mary eat?

A: She planted eight trees. This means she used half of that amount of apples, which 
is 8 trees / 2 trees/apple = <<8/2=4>> 4 apples. That means that she planted four of 
the six apples she bought, leaving only 6 apples - 4 apples = <<6-4=2>> 2 apples to 
be eaten. #### 2
```

```
USER Q: Boris has 100 apples. Beck has 23 fewer apples than Boris. If Boris gives Beck 
10 apples, how many fewer apples does Beck have than Boris now?

A: Boris starts with 100 apples and gives Beck 10 apples, leaving him with 100 apples 
- 10 apples = <<100-10=90>> 90 apples. Initially, Beck has 23 fewer apples than Boris, 
giving him a starting count of 100 - 23 = <<100-23=77>> 77 apples. After receiving 10 
apples from Boris, Beck's total comes to 77 apples + 10 apples = <<77+10=87>> 87 apples. 
Consequently, Beck now has 90 - 87 = <<90-87=3>> 3 fewer apples than Boris. #### 3
```

### Understanding the GSM8K Format

Here a **few-shot prompt** is adopted. The LLM learns from these demonstrations of problem-solution pairs not only the way of problem-solving but also the way of formatting the output. For example, the final result of calculation follows the `####` token, and `<<...>>` annotates the detailed calculation steps (called **calculation annotation**).

**Footnote 4 (from textbook):** During prediction, a calculator is used when we see `<<...>>`. More specifically, once the LLM encounters "=" in a `<<...>>`, then the calculator calculates the expression on the left-hand side of "=". This method helps reduce the calculation errors made by LLMs.

---

# 3.2 Advanced Prompting Methods

So far in this chapter, we have introduced the basic concepts related to LLM prompting and presented a number of prompts for NLP tasks. We now consider several techniques for enhancing the effectiveness of prompting.

## 3.2.1 Chain of Thought

We have encountered the concept of **chain of thought (CoT)** several times in this chapter and previous ones [Wei et al., 2022c; Chowdhery et al., 2022].

CoT methods provide a simple way to prompt LLMs to generate **step-by-step reasoning** for complex problems, thereby approaching tasks in a more human-like manner. Rather than coming to a conclusion directly, the CoT methods instruct LLMs to generate reasoning steps or to learn from demonstrations of detailed reasoning processes provided in the prompts.

### The Problem: Direct Answers Fail

To illustrate CoT, we consider the problem of algebraic calculation, as commonly described in the literature. Suppose we are given an algebraic problem:

```
Calculate the average of the numbers 2, 4, and 6.
```

We can consider it as the question and prompt an LLM to answer it.

**Attempt 1 (from textbook, Page 122-123):**

```
Q: Please calculate the average of the numbers 2, 4, and 9.
A: The answer is 6.
```

It seems **difficult for the LLM to directly give a correct answer**. (The correct answer is 5, not 6.)

**Attempt 2 with Few-Shot (from textbook):**

A simple improvement is to add demonstrations of similar problems in the prompt, and thus the LLM can learn from these demonstrations.

```
Q: Please calculate the average of the numbers 1, 3, 5, and 7.
A: The answer is 4.
Q: Please calculate the average of the numbers 2, 4, and 9.
A: The answer is 7.
```

**The problem here is that**, although we have shown a similar question-answer pair, it remains **difficult for the LLM to reason out the correct answer**.

---

### Exercise: Test Direct Math Reasoning

Try these math problems directly and check accuracy:

```
What is the average of 8, 15, and 22?
```

```
A laptop costs $850 with 15% discount. What's the final price?
```

```
Alice has twice as many books as Bob. Bob has 12 books. How many total?
```

### Solution: Few-Shot CoT

**In CoT**, not only can LLMs learn from the correspondence between questions and answers but they may gain more from **detailed problem-solving steps** that are used to derive the answers. To do this, we can incorporate some reasoning steps into the prompt to obtain a CoT prompt.

**Few-Shot CoT Example (from textbook, Page 123):**

**INPUT (Prompt with reasoning steps):**
```
Q: Please calculate the mean square of the numbers 1, 3, 5, and 7.
A: Calculate the square of each number: 1² = 1, 3² = 9, 5² = 25, and 7² = 49.
   Sum the squares, 1 + 9 + 25 + 49 = 84. There are 4 numbers in total. Divide
   the sum by the number of items, 84 / 4 = 21. The answer is 21.

Q: Please calculate the average of the numbers 2, 4, and 9.
A:
```

**OUTPUT (LLM Response):**
```
Calculate 2 + 4 + 9, which equals 15. There are three numbers. Divide the
total sum by the count, resulting in 15 / 3 = 5. The answer is 5.
```

**Key point:** The reasoning steps (highlighted in green in textbook) include step-by-step calculations. The DEMO shows how to solve "mean square" with detailed reasoning. The LLM learns this reasoning pattern and applies it to solve "average" problem.

By providing a detailed reasoning process for a similar problem, the LLM learns to reason and thereby generates a problem-solving path that leads to the correct answer.

### Benefits of CoT (from textbook, Page 123)

There are several benefits of using CoT prompting:

1. **Decomposition**: CoT allows LLMs to decompose complex problems into smaller, sequential reasoning steps. This somewhat mirrors human problem-solving behaviors, making it particularly effective for tasks requiring detailed, multi-step reasoning.

2. **Transparency**: CoT makes the reasoning process more transparent and interpretable. Since all reasoning steps are visible, we can understand and interpret how a conclusion was reached.

3. **Trust**: If users can follow the logic behind the reasoning process, they will be more likely to trust the predictions of an LLM. This is particularly important when applying LLMs in fields like medicine, education, and finance.

4. **Efficiency**: CoT is an in-context learning approach, and thus, it is applicable to most well-trained, off-the-shelf LLMs. Moreover, CoT provides efficient ways to adapt LLMs to different types of problems.

5. **Creativity**: It can even inspire more creative solutions by exploring various alternative reasoning paths, which might not be obvious when arriving at a conclusion directly.

### Zero-Shot CoT

The method described above requires providing one or more examples of CoT reasoning, typically called the **few-shot CoT method**. By contrast, the **zero-shot CoT method** does not require such examples. It instead prompts LLMs to reason step-by-step by incorporating specific instructions in prompts.

**Zero-Shot CoT Example (from textbook, Page 123):**

**INPUT (Prompt):**
```
Q: Please calculate the average of the numbers 2, 4, and 9.
A: Let's think step-by-step.
```

**OUTPUT (LLM Response):**
```
We have three numbers: 2, 4, and 9. Add these numbers together, 2 + 4 + 9 = 15. 
Determine how many numbers there are, which in this case is three. The average is 
calculated by dividing the total sum by the number of elements. Completing the 
division gives 15 / 3 = 5. So the answer is 5.
```

Following the instruction **"Let's think step by step"**, the LLM is prompted to generate detailed reasoning steps.

**Additional Notes (from textbook, Page 123-124):**

As discussed in Kojima et al. [2022]'s work, prompting with such instructions may result in LLMs generating only the reasoning steps without a clear conclusion. In this case, a **second round of prompting** can be used to extract the answer from these reasoning steps. For example, Kojima et al. [2022] create a second prompt which combines both the input and output in the first round of prompting. Using this combined input, the LLM can continue its reasoning process and then generate the correct answer.

Furthermore, it is possible to prompt LLMs to reason using instructions other than "Let's think step by step", such as:
- "Let's think logically"
- "Please show me your thinking steps first"

---

### Exercise: Add Chain-of-Thought

Now add "Let's think step by step" and compare:

```
What is the average of 8, 15, and 22?
Let's think step by step.
```

```
A laptop costs $850 with 15% discount. What's the final price?
Let's think step by step.
```

### Applications of CoT (from textbook, Page 124)

While we have illustrated CoT methods using an algebraic reasoning problem, these methods can be applied to a variety of different problems. **Typical problem-solving scenarios for CoT include:**

- Mathematical reasoning
- Logical reasoning
- Commonsense reasoning
- Symbolic reasoning
- Code generation
- And so on

**See Figure 3.1 for more examples of applying CoT in various tasks.**

In [None]:
# Display Figure 3.1 from textbook
Image(filename='fig_3_1_cot_applications.jpeg', width=900)

**Figure 3.1 (from textbook, Page 125):** CoT in four different reasoning tasks, including CSQA, StrategyQA, Dyck languages, and Last Letter Concatenation. The CoT parts are highlighted in green.

### Understanding the Benchmark Tasks in Figure 3.1

The figure shows CoT applied to four different reasoning benchmarks:

**1. CSQA (CommonsenseQA)**
- Tests commonsense reasoning about everyday situations
- Multiple-choice questions requiring world knowledge
- Example: "What type of person typically contracts illness?" → Answer: "elderly person" (people with weaker immune systems)

**2. StrategyQA**
- Tests multi-hop reasoning with implicit decomposition
- Yes/no questions requiring multiple reasoning steps
- Example: "Can you use Microsoft Office without internet?" → Need to reason about offline capabilities vs cloud features

**3. Dyck Languages**
- Tests formal language understanding and bracket matching
- Requires tracking nested structures step-by-step
- Example: Complete "[ { " by ensuring proper closing → Answer: " ] ]"

**4. Last Letter Concatenation**
- Tests string manipulation and pattern following
- Extract last letters from each word and concatenate
- Example: "Elon Musk" → "n" + "k" = "nk"

**Key Insight**: CoT helps LLMs solve these diverse tasks by breaking them into explicit reasoning steps (shown in green highlighting in the textbook figure). This demonstrates CoT's versatility across mathematical, logical, commonsense, and symbolic reasoning domains.

### Current Research and Extensions (from textbook, Page 124)

CoT today is one of the most active fields of prompt engineering. This has not only led to improved performance for LLM prompting but has opened the door to a wide range of methods for studying and verifying reasoning capabilities of LLMs.

Although we have focused on the basic idea of CoT in this section, it can be improved in several ways. For example, we can consider the reasoning process as **a problem of searching through many possible paths**, each of which may consist of multiple intermediate states (i.e., reasoning steps). In general, we wish the search space to be well-defined and sufficiently large, so that we are more likely to find the optimal result.

For this reason, an area of current LLM research is aimed at **designing better structures for representing reasoning processes**, allowing LLMs to tackle more complex reasoning challenges. These structures include:
- Tree-based structures [Yao et al., 2024]
- Graph-based structures [Besta et al., 2024]
- And so on

By using these compact representations of reasoning paths, LLMs can explore a wider range of decision-making paths, analogous to **System 2 thinking**.

**Footnote 5 (from textbook):** System 1 and System 2 thinking, as described by Kahneman [2011], represent two different modes of cognitive processing:
- **System 1** is fast, automatic, intuitive, and emotional. This mode of thinking operates effortlessly and quickly, and is often what guides our daily decisions, judgments, and impressions.
- **System 2** is slow, deliberate, and analytical. It is activated when we need to perform complex computations.

### Understanding Reasoning Structures: Chain vs Tree vs Graph

The textbook mentions that CoT can be improved through **tree-based** and **graph-based structures**. Let's understand what these mean and why they matter.

---

## 1. Traditional CoT: Chain Structure

```
Problem → Step 1 → Step 2 → Step 3 → Answer
```

**Characteristics**: 
- Linear, single path from start to finish
- Once you make a decision, you continue forward
- No backtracking or exploring alternatives

**Limitation**: If you make a mistake in any intermediate step, the entire reasoning chain fails.

**Example**:
```
Q: Calculate 15 + 23 × 2
A: First multiply: 23 × 2 = 46
   Then add: 15 + 46 = 61 ✓
```

If the LLM makes an error in step 1, there's no way to recover.

---

## 2. Tree-Based Structure (e.g., Tree of Thoughts - ToT)

```
                Problem
               /  |  \
           Path1 Path2 Path3
           /  \    |     |
        Step1A Step1B Step2C Step2D
          |     |    |     |
        Result1 Result2 Result3 Result4
```

**Characteristics**:
- Explore multiple reasoning paths simultaneously
- Can branch at any decision point
- Select the best path based on evaluation
- Like a decision tree in games or algorithms

**Advantage**: If one path doesn't work, you have alternatives to explore.

**Example - Tree of Thoughts (ToT)**:
```
Q: Use four 4's and operators to get 24

Branch 1: (4 + 4) × (4 - 4) = 8 × 0 = 0  ✗ (dead end)
Branch 2: (4 × 4) + (4 + 4) = 16 + 8 = 24 ✓ (found it!)
Branch 3: 4 × (4 + 4 - 4) = 4 × 4 = 16  ✗ (dead end)

→ Select Branch 2 as the solution
```

**How It Works**:
1. At each reasoning step, generate multiple candidate next steps
2. Evaluate each candidate (e.g., "how promising is this path?")
3. Expand the most promising candidates
4. Prune unpromising branches
5. Continue until finding a solution

**Use Cases**: Creative tasks, puzzle solving, strategic planning, game playing

---

## 3. Graph-Based Structure (e.g., Graph of Thoughts - GoT)

```
      Problem A ↔ Problem B
       ↓ ↘      ↓
      Step 1 → Step 2
       ↓       ↗ ↓
      Step 3 ← Step 4
       ↓
      Answer
```

**Characteristics**:
- Most flexible structure
- Can have cycles, loops, and bidirectional connections
- Steps can depend on multiple previous steps
- Can revisit and revise earlier decisions
- Supports parallel reasoning paths that merge later

**Advantage**: Handles complex dependencies and allows iterative refinement.

**Example - Graph of Thoughts (GoT)**:
```
Q: Sort the array [5, 2, 8, 1, 9]

Node A: Compare 5 and 2 → [2, 5, ...]    (parallel)
Node B: Compare 8 and 1 → [1, 8, ...]    (parallel)
Node C: Merge results from A and B → [1, 2, 5, 8, ...]
Node D: Insert 9 → [1, 2, 5, 8, 9]

Can loop back to Node C to try different merge strategies
```

**Key Features**:
- **Parallel reasoning**: Multiple sub-problems solved simultaneously
- **Iterative refinement**: Can revisit previous steps to improve solutions
- **Dependencies**: A step can depend on results from multiple earlier steps
- **Aggregation**: Multiple reasoning paths can merge into a unified answer

**Use Cases**: Complex problem-solving with interdependencies, multi-document QA, system design, optimization problems

---

## Comparison Table

| Structure | Shape | Exploration | Best For | Cost |
|-----------|-------|-------------|----------|------|
| **Chain** (CoT) | Linear | Single path | Simple, well-defined problems | Low |
| **Tree** (ToT) | Tree | Multi-path + pruning | Problems requiring trial-and-error | Medium |
| **Graph** (GoT) | Graph | Any connections + cycles | Complex dependencies, parallel reasoning | High |

---

## Why Do We Need These Complex Structures?

**The Problem with Simple Chain**:
```
Q: Use 1, 3, 5, 7 and +, -, ×, ÷ to get 24

Chain: 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16 ✗ (stuck, can't backtrack)

Tree: Try multiple combinations
  - (1 + 3) × (5 + 7) = 4 × 12 = 48 ✗
  - (1 + 5) × (7 - 3) = 6 × 4 = 24 ✓ (found it!)
  
Graph: Remember failed attempts to avoid repeating them
       Can try paths in parallel and merge insights
```

---

## Real-World Analogy

**Chain (CoT)**: Like following GPS directions - one route, no alternatives
- Fast and efficient if the route works
- Problems if you hit a roadblock

**Tree (ToT)**: Like considering multiple routes on a map
- Can compare different paths
- Choose the best based on criteria (time, distance, traffic)

**Graph (GoT)**: Like a group of friends planning a road trip together
- Multiple people can explore different routes simultaneously
- Can share findings and combine the best parts of each route
- Can revise the plan as new information comes in

---

## Key Takeaway

As reasoning tasks become more complex, we need more sophisticated structures to represent the problem-solving process:

- **Simple math**: Chain is enough
- **Puzzles/creativity**: Tree helps explore alternatives  
- **Complex systems**: Graph handles interdependencies and parallel thinking

These structures allow LLMs to think more like humans: we explore multiple possibilities, backtrack when stuck, and refine our thinking iteratively.

### Limitations of CoT (from textbook, Page 124)

Before leaving our discussion of CoT, we should consider its practical limitations:

1. **Need for detailed demonstrations**: One limitation is the need for detailed, multi-step reasoning demonstrations in few-shot CoT scenarios, which may be difficult to obtain, either automatically or manually.

2. **No standard decomposition method**: There is no standard method for breaking down complex problems into simpler problem-solving steps. This often heavily depends on the user's experience.

3. **Error propagation**: Errors in intermediate steps can also affect the accuracy of the final conclusion.

For further discussion on the pros and cons of CoT, the interested reader can refer to recent surveys on this topic [Chu et al., 2023; Yu et al., 2023; Zhang et al., 2023a].

---

## 3.2.2 Problem Decomposition （Will continue in the next lecture)

We have seen that LLMs can benefit from solving a complex problem by breaking it down into simpler problem-solving tasks. Such an approach can be seen as an example of a broader paradigm known as **problem decomposition**, which has been extensively explored and discussed in psychology and computer science.

### Psychological Perspective (from textbook, Page 124-125)

From the psychological perspective, **complex problem-solving** refers to a process of addressing a problem using knowledge that helps overcome the barriers of the problem.

**Footnote 6 (from textbook):** A relatively formal definition can be found in Frensch and Funke [2014]'s book: complex problem-solving occurs to overcome barriers between a given state and a desired goal state by means of behavioral and/or cognitive, multi-step activities.

There are generally no standard or clear paths to a solution for a complex problem. However, it is often advantageous to employ strategies that decompose the problem, thereby making it easier to tackle the corresponding sub-problems with less effort.

---

### Exercise: Problem Decomposition

Compare direct vs decomposed prompts:

**Direct:**
```
Explain how a neural network learns.
```

**Decomposed:**
```
Explain neural network learning in steps:
1. How does forward propagation work?
2. What is the loss function?
3. How does backpropagation update weights?
Keep each step to 2 sentences.
```

### Example 1: Blog Writing (from textbook, Page 125-126)

For example, consider writing a blog about the risks of AI. If we simply prompt an LLM with the instruction "Please write a blog about the risks of AI", the LLM may generate a blog with arbitrary structures and writing styles. A better method, instead, could be to outline the blog and provide more detailed information about each section. Consider the following prompt:

```
You are a blog writer. Please follow the provided outline below to write a blog about 
the risks of AI.

- Introduction
  Introduce AI, its relevance, and the importance of understanding its risks for youth.

- Privacy Concerns
  Discuss how AI might compromise personal privacy through interactions online.

- Misinformation
  Explore AI's role in spreading misinformation and influencing young people's decisions.

- Cyberbullying
  Highlight how AI tools can be utilized in cyberbullying and the impact on mental health.

- Tips for Safe AI Use
  Offer guidelines for responsible AI usage and promote critical thinking.

- Conclusion
  Recap main points and encourage proactive engagement with AI ethics.
```

Here we give the title and major points for each section. Then, the LLM can use this structure to break down the writing task by filling in content for these sections. Note that the way to structure the blog can be provided by humans or even generated automatically. For example, we can use the LLM to first generate the outline, and then ask it to follow this outline to complete the writing.

### Computer Science Perspective (from textbook, Page 126)

In computer science, decomposing complex problems is a commonly used strategy in software and hardware system design. A well-known example is the **divide-and-conquer paradigm**, which is often used to design algorithms for computation problems that can be reduced to simpler, more manageable problems.

### Example 2: Document Classification (from textbook, Page 126-127)

For example, consider a problem of determining whether a document discusses the risks of AI. We can instruct the LLM with the following prompt:

```
You are provided with a text. Please determine whether it discusses the risks of AI.
{document}
```

If the document is long, the computation will be expensive. Alternatively, we can **divide the document into relatively short segments** and perform the same task on each segment. These segments can be processed in parallel to further reduce the computational cost. Next, we determine the relevancy of each segment to the topic of AI risks. The final output is then generated using another prompt:

```
Your task is to determine whether a text discusses the risks of AI. This text has been 
divided into segments, and you have obtained the relevancy of each segment to the topic 
of AI risks. Based on this, please provide your final result.

Segment 1: {relevancy-to-the-topic1}
Segment 2: {relevancy-to-the-topic2}
Segment 3: {relevancy-to-the-topic3}
...
```

### General Framework (from textbook, Page 127)

Now let us return to a more general discussion of problem decomposition in prompting. While problem decomposition can be applied to various NLP problems, it has been more extensively discussed and tested in reasoning tasks recently.

For complex reasoning tasks, we often need a multi-step reasoning path to reach a correct conclusion. We can use LLMs to achieve this in **three different ways**:

1. **Direct conclusion**: LLMs can directly reach the conclusion. In other words, they can predict without explicit reasoning processes, and there is a hidden and uninterpretable reasoning mechanism.

2. **Single-run CoT**: LLMs are prompted to generate a multi-step reasoning path that leads to the conclusion, like CoT. However, we run LLMs just once, and all intermediate steps in reasoning are generated in a single prediction.

3. **Decomposition**: We break down the original problem into a number of sub-problems, which are either addressed in separate runs of LLMs or tackled using other systems.

Here we focus our attention on the third approach, which is closely related to problem decomposition.

**A general framework for problem decomposition involves two elements:**

- **Sub-problem Generation**: This involves decomposing the input problem into a number of sub-problems.
- **Sub-problem Solving**: This involves solving each sub-problem and deriving intermediate and final conclusions through reasoning.

### Two-Step Process (from textbook, Page 127)

These two issues can be modeled in different ways, leading to various problem decomposition methods. One approach is to treat them as separate steps in a two-step process. For example, consider the blog writing task described at the beginning of this subsection:

- In the **first step**, we decompose the entire problem into sub-problems all at once (i.e., outline the blog).
- In the **second step**, we solve the sub-problems either sequentially or in another order (i.e., fill in content for each section as needed).

The final output of this process combines the results from solving each sub-problem. While this method is simple and straightforward, it assumes that the problem is compositional, making it more suitable for tasks like writing and code generation.

### Dynamic Sub-Problem Generation (from textbook, Page 127-128)

However, many real-world problems require complex reasoning. One key characteristic of these problems is that **the reasoning steps may not be fixed**. The reasoning path can vary for different problems, and each step of reasoning may depend on the outcomes of prior steps.

In such cases, it is undesirable to use fixed sub-problem generation in advance. Instead, **sub-problems should be generated dynamically** based on the input problem, and, if possible, generated on the fly during the reasoning process. This makes problem decomposition more challenging compared with designing divide-and-conquer algorithms.

Ideally, we would like to jointly design both the systems for sub-problem generation and sub-problem solving. But a more practical and widely used approach is to adopt separate models for these tasks. A straightforward way to achieve this is to adapt an LLM for these tasks by either prompting or tuning the model.

### Least-to-Most Prompting (from textbook, Page 128-130)

Here we consider a method based on the above idea, called **least-to-most prompting** [Zhou et al., 2023b]. The motivation for this method arises from the challenges of solving difficult reasoning problems - those that cannot be addressed by simply generalizing from a few examples. For these problems, a more effective problem-solving strategy is to follow a progressive sequence of sub-problems that systematically lead to the conclusion.

More specifically, in the least-to-most prompting method, **sub-problem generation is performed by prompting an LLM** with instructions and/or demonstrations.

#### Stage 1: Sub-Problem Generation

**Example from textbook (Page 128):** Below is a 2-shot prompt for sub-problem generation in least-to-most prompting:

```
TASK: Your task is to decompose a problem into several sub-problems. You will be given 
a few examples to illustrate how to achieve this.

DEMO Q: In a community, 5% of the population are infants, 15% are children, 40% are 
adults, and 40% are seniors. Which group makes up the largest portion of the population?
A: To answer the question "Which group makes up the largest portion of the population?", 
we need to know: "How many percent are infants?", "How many percent are children?", 
"How many percent are adults?", "How many percent are seniors?".

Q: Alice, Bob, and Charlie brought beads for their group project in their craft class. 
Alice has twice as many beads as Bob, and Bob has five times as many beads as Charlie. 
If Charlie has 6 beads, how many beads can they use for their craft project?
A: To answer the question "How many beads can they use for their craft project?", we 
need to know: "How many beads does Bob have?", "How many beads does Alice have?".

USER Q: The environmental study conducted from 2015 to 2020 revealed that the average 
temperature in the region increased by 2.3 degrees Celsius. What was the duration of 
the environmental study?
A: To answer the question "What was the duration of the environmental study?", we need 
to know: "When did the environmental study start?", "When did the environmental study end?".
```

By learning from the examples, the LLM can generate two sub-problems for answering the new problem "What was the duration of the environmental study?" (highlighted in blue and orange in the textbook).

#### Stage 2: Sequential Sub-Problem Solving

Given these sub-problems, we solve them **sequentially**. For each sub-problem, we take all previously-generated QA pairs as context, and then produce the answer.

**Step 1: Solve first sub-problem (from textbook, Page 128-129):**

**INPUT:**
```
The environmental study conducted from 2015 to 2020 revealed that the average 
temperature in the region increased by 2.3 degrees Celsius.

SUB-PROB1 Q: When did the environmental study start?
```

**OUTPUT:**
```
A: The environmental study started in 2015.
```
```

**Step 2: Solve second sub-problem using first answer (from textbook):**

Once we have the answer to the first sub-problem, we proceed to the second one. This time, we include both the first sub-problem and its corresponding answer in the input.

**INPUT:**
```
The environmental study conducted from 2015 to 2020 revealed that the average 
temperature in the region increased by 2.3 degrees Celsius.

SUB-PROB1 Q: When did the environmental study start?
```

**OUTPUT:**
```
A: The environmental study started in 2015.
```
SUB-PROB2 Q: When did the environmental study end?
A: The environmental study ended in 2020.
```

**Step 3: Solve original problem (from textbook):**

Finally, we use the LLM to solve the original problem given the answers to all the sub-problems.

```
The environmental study conducted from 2015 to 2020 revealed that the average 
temperature in the region increased by 2.3 degrees Celsius.

SUB-PROB1 Q: When did the environmental study start?
```

**OUTPUT:**
```
A: The environmental study started in 2015.
```
SUB-PROB2 Q: When did the environmental study end?
A: The environmental study ended in 2020.
FINAL Q: What was the duration of the environmental study?
A: The duration of the environmental study was 5 years.
```

### Improvements to Least-to-Most (from textbook, Page 129)

The least-to-most method offers a basic approach to prompting LLMs to generate and solve sub-problems separately. We can improve it in several ways. One simple improvement is to apply various advanced prompting techniques, which do not require changes to the problem decomposition framework. For example, we can **incorporate CoT into the prompting** to enhance the reasoning performance of sub-problem generation and solving.

### Formal Mathematical Description (from textbook, Page 129-130)

Another improvement is to explore methods for better decomposing problems and organizing problem-solving paths. To describe these approaches, we will use the symbol $p_0$ to denote the input problem, and use the symbols $\{p_1, \ldots, p_n\}$ to denote the sub-problems corresponding to $p_0$.

For least-to-most prompting, we decompose $p_0$ into $\{p_1, \ldots, p_n\}$, given by:

$$\{p_1, \ldots, p_n\} = G(p_0)$$

where $G(\cdot)$ denotes the function of sub-problem generation.

Then, we solve the sub-problems $\{p_1, \ldots, p_n\}$ sequentially, resulting in a sequence of answers $\{a_1, \ldots, a_n\}$. For answering the $i$-th sub-problem $p_i$, we include both the original problem $p_0$ and all previously-seen problem-answer pairs in the context for prediction. The answer $a_i$ is given by:

$$a_i = S_i(p_i, \{p_0, p_{<i}, a_{<i}\})$$

where $p_{<i} = \{p_1, \ldots, p_{i-1}\}$ and $a_{<i} = \{a_1, \ldots, a_{i-1}\}$. $S_i(\cdot)$ denotes the function that solves the sub-problem $p_i$ given the context $\{p_0, p_{<i}, a_{<i}\}$.

The last step is to generate the answer to the original problem $p_0$, which can be expressed in a similar manner:

$$a_0 = S_0(p_0, \{p_{\leq n}, a_{\leq n}\})$$

### Extension 1: Dynamic Generation (from textbook, Page 130)

One way to refine this model is to modify the $G(\cdot)$ function so that the model can **dynamically generate answers**. Instead of generating all sub-problems at one time, we can generate each of them during problem-solving [Dua et al., 2022]. To do this, we can replace the equation with:

$$p_i = G_i(p_0, \{p_{<i}, a_{<i}\})$$

Hence we obtain a sub-problem generation model that operates in a **step-by-step manner**. At each step $i$:
1. We first generate the sub-problem $p_i$ by prompting an LLM with the original problem $p_0$ and the problem-solving history $\{p_{<i}, a_{<i}\}$.
2. We then generate the answer $a_i$ for this sub-problem using the same or a different LLM, based on the same contextual information.

This method effectively expands the reasoning capacity of LLMs by allowing them to dynamically generate and solve sub-problems in intermediate reasoning steps. As a result, the reasoning paths are not fixed in advance, and the models can choose and adapt their reasoning strategies during problem-solving.

### Extension 2: Better Sub-Problem Solvers (from textbook, Page 130)

Another way to improve the above model is to focus on developing better sub-problem solvers. In our previous discussion, we restricted $S_i(\cdot)$ to LLMs that are prompted to solve the sub-problem $p_i$. In fact, we can expand this function to **any system that is capable of addressing the sub-problem**. For example:

- $S_i(\cdot)$ could make calls to **IR systems**, thereby allowing us to access a broader range of data for problem-solving.
- Another example is using $S_i(\cdot)$ as a **calculator** to accurately compute results in mathematical problem-solving.
- If the sub-problem $p_i$ is complex and requires multiple intermediate problem-solving steps, it is also possible to **further decompose $p_i$ into smaller sub-problems**. For example, $S_i(\cdot)$ can be defined as a recursive program that generates and solves sub-problems. This incorporates recursion into problem-solving and allows us to address problems by iteratively decomposing them. As a result, we can define a **hierarchical structure for problem-solving** [Khot et al., 2023].

### Extension 3: Reinforcement Learning Formulation (from textbook, Page 130-131)

If we generalize the above formulation a bit further, we can consider it as a **reinforcement learning problem**. A typical method is to model a problem-solving process as a decision making process. In each step of this process, an action is taken based on the current state. These actions can include all functions for sub-problem generation and solving (i.e., $G_i(\cdot)$ and $S_i(\cdot)$). Thus, the action sequence corresponds to a problem-solving path.

Since the discussion of reinforcement learning problems is beyond the scope of this chapter, we skip the precise description of this learning task. Nevertheless, developing an **agent or controller** to determine when and how to generate and solve a sub-problem is also a natural choice.

### Related Work (from textbook, Page 131)

#### Multi-Hop Question Answering

In NLP, problem decomposition is related to a long line of research on **multi-hop question answering** [Mavi et al., 2024]. This task requires the system to gather and combine information from multiple pieces of text to provide an accurate answer to a complex question.

For example, to answer the question "What is the capital of the country where Albert Einstein was born?", we need to know:
1. "Where Albert Einstein was born?"
2. "What's the capital of Germany?"

Earlier work in this area and related ones has investigated the issue of problem decomposition, though the methods might not be based on LLMs. For example, a popular method is to develop an additional neural model to generate simpler questions that address different aspects of the original question [Andreas et al., 2016; Talmor and Berant, 2018; Min et al., 2019]. This question generator can create questions in a batch or sequential manner.

#### Compositionality

Broadly speaking, problem decomposition is also related to the **compositionality issue in NLP** [Drozdov et al., 2022; Press et al., 2023]. For example, in semantic parsing, we map natural language sentences into structured meaning representations by breaking them down into constituent parts and understanding the sentences based on the meanings of these parts and the rules used to combine them.

In early studies of this field, highly compositional sentences were considered easier for testing systems, as it is relatively straightforward to decompose such sentences and compose the meanings of their parts. However, the task becomes much more difficult when more generalization is required for modeling compositionality in new data. In this case, we want systems to have improved abilities of **compositional generalization**.

In more recent research on LLMs, this issue has been frequently discussed in compositional reasoning tasks, such as **SCAN**.

**Footnote 7 (from textbook):** The SCAN tasks (Simplified versions of the CommAI Navigation tasks) are designed to evaluate the ability of LLMs to perform compositional generalization [Lake and Baroni, 2018]. They involve translating natural language commands into a sequence of actions. For example, a command "jump opposite left and walk thrice" can be translated into the action sequence "LTURN LTURN JUMP WALK WALK WALK".

This is considered an important aspect of testing the language understanding and reasoning abilities of LLMs. This also presents new tasks for developing and examining problem decomposition methods.

### Application: Tool Use (from textbook, Page 131)

In LLMs, one interesting application of problem decomposition is **tool use**. In some cases, it is necessary to integrate external tools into LLMs to access accurate data not available during training or fine-tuning. For example, LLMs can integrate with APIs to fetch real-time data such as weather updates, stock market prices, or news feeds, enabling them to provide up-to-date responses to user queries.

When using tools, LLM predictions might include markers that indicate where and how to call external APIs. This requires decomposing the problem into sub-problems, with some handled by the LLMs and others by external tools. More detailed discussions on this topic will be presented in Section 3.2.5.

<cell_type>markdown</cell_type>---

## Final Exercise: Comprehensive Prompting Challenge

Use `ollama run llama3.2` to solve this multi-dimensional problem using different prompting techniques.

---

### Scenario: Research Paper Analysis and Decision Making

You are helping a graduate student decide whether a research paper is suitable for their literature review on **"AI Safety and Alignment"**.

**Paper Abstract:**
```
"We present a novel approach to value alignment in large language models through 
constitutional AI. Our method involves training models with explicit principles 
and using AI feedback to evaluate outputs. Experiments on GPT-3 variants show 
47% reduction in harmful outputs and 23% improvement in truthfulness metrics. 
The technique requires minimal human supervision (only 160 hours of expert annotation) 
compared to traditional RLHF methods (2000+ hours). Implementation costs averaged 
$12,000 in compute resources. We released our dataset of 15,000 annotated examples 
on GitHub."
```

**Your Task:** Design prompts using THREE different approaches and compare results:


**Deliverables:**
1. Write all three prompts
2. Note which approach gives the most useful analysis
3. Reflection: Which technique would you use for complex academic decisions? Why?