# Lecture 10: Prompting (Chapter 3) - Part II

**CSC 375/575 - Generative AI**  
**Prof. Rongyu Lin, Quinnipiac University**


### Topics Covered

- **3.2.2 Problem Decomposition** - Breaking complex problems into manageable sub-problems
- **3.2.3 Self-refinement** - Iterative improvement of LLM outputs
- **3.2.4 Ensembling** - Combining multiple predictions for better results
- **3.2.5 RAG and Tool Use** - Retrieving external knowledge and using tools

### Learning Objectives

By the end of this lecture, students will be able to:

1. **Apply problem decomposition techniques** to break down complex reasoning tasks into sequential sub-problems using methods like least-to-most prompting
2. **Implement self-refinement workflows** to iteratively improve LLM outputs through systematic prediction, feedback collection, and refinement cycles
3. **Design ensemble strategies** for LLM prompting by combining multiple prompts or outputs to improve prediction quality and robustness
4. **Integrate RAG systems** effectively by retrieving relevant external knowledge and prompting LLMs to generate contextually accurate responses
5. **Understand tool use patterns** in LLMs and how to decompose problems for integration with external APIs and computational systems

## Setup

First, let's import the necessary libraries for our demonstrations.

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from typing import List, Dict, Tuple
from IPython.display import Image, display

# For those with API access (optional):
# import openai
# from anthropic import Anthropic

print("Setup complete!")

## Background: Review of Chain of Thought

Before diving into advanced techniques, let's briefly review **Chain of Thought (CoT)** prompting, which forms the foundation for many of the methods we'll discuss.

### What is Chain of Thought?

**CoT** methods prompt LLMs to generate **step-by-step reasoning** for complex problems. Rather than directly reaching a conclusion, CoT instructs LLMs to:
- Generate intermediate reasoning steps, or
- Learn from demonstrations of detailed reasoning processes

### Example: Average Calculation

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>‚ùå Direct Prompting (often fails):</strong><br>
<strong>Q:</strong> Calculate the average of 2, 4, and 9.<br>
<strong>A:</strong> The answer is <span style="background-color: #ffcdd2; padding: 2px 4px;">6</span>. ‚ùå (Incorrect!)<br><br>


<strong>‚úì Few-shot CoT (with reasoning steps):</strong><br>

<strong>Q:</strong> Calculate the mean square of 1, 3, 5, and 7.<br>
<strong>A:</strong> Calculate squares: <span style="background-color: #c8e6c9; padding: 2px 4px;">1¬≤=1, 3¬≤=9, 5¬≤=25, 7¬≤=49</span>.<br>
Sum: <span style="background-color: #c8e6c9; padding: 2px 4px;">1+9+25+49=84</span>. Count: 4 numbers.<br>
Divide: <span style="background-color: #c8e6c9; padding: 2px 4px;">84/4=21</span>. Answer: <strong style="background-color: #4caf50; color: white; padding: 2px 4px;">21</strong>. ‚úì<br>

<strong>Q:</strong> Calculate the average of 2, 4, and 9.<br>
<strong>A:</strong> Calculate <span style="background-color: #c8e6c9; padding: 2px 4px;">2+4+9=15</span>. Count: 3 numbers.<br>
Divide: <span style="background-color: #c8e6c9; padding: 2px 4px;">15/3=5</span>. Answer: <strong style="background-color: #4caf50; color: white; padding: 2px 4px;">5</strong>. ‚úì<br><br>


<strong>‚úì Zero-shot CoT (using trigger instruction):</strong><br>

<strong>Q:</strong> Calculate the average of 2, 4, and 9.<br>
<strong>A:</strong> <span style="background-color: #ffeb3b; padding: 2px 4px;">Let's think step-by-step.</span><br>
Add: <span style="background-color: #bbdefb; padding: 2px 4px;">2+4+9=15</span>. Count: 3 numbers.<br>
Average: <span style="background-color: #bbdefb; padding: 2px 4px;">15/3=5</span>. Answer: <strong style="background-color: #2196f3; color: white; padding: 2px 4px;">5</strong>. ‚úì

</div>

### Key Benefits

- **Transparency**: Makes reasoning visible and verifiable
- **Flexibility**: Works across different problem types
- **No Training**: In-context learning with off-the-shelf LLMs

### Limitations

Despite success, CoT has practical limitations:
1. **Demonstration Cost**: Few-shot requires detailed multi-step examples
2. **No Standard Method**: Problem decomposition depends on user experience
3. **Error Propagation**: Mistakes in intermediate steps affect final accuracy
4. **Complexity**: Some problems too complex for simple sequential reasoning

The following figure shows CoT applied to various reasoning tasks:

<img src="images/fig3-1_cot_examples.jpeg" alt="Figure 3.1: Chain of Thought Examples" width="50%">

*Figure 3.1: CoT prompting examples across different reasoning tasks (CSQA, StrategyQA, Dyck languages, Last Letter Concatenation)*

## 3.2.2 Problem Decomposition

### Introduction

We have seen that LLMs can benefit from solving complex problems by **breaking them down into simpler problem-solving tasks**. This approach exemplifies a broader paradigm known as **problem decomposition**, which has been extensively explored in psychology and computer science.

From a **psychological perspective**, complex problem-solving refers to addressing problems using knowledge that helps overcome barriers. There are generally no standard or clear paths to solutions for complex problems. However, **decomposing the problem** often makes it easier to tackle corresponding sub-problems with less effort.

### General Framework

A general framework for problem decomposition involves two elements:

1. **Sub-problem Generation**: Decomposing the input problem into a number of sub-problems
2. **Sub-problem Solving**: Solving each sub-problem and deriving intermediate and final conclusions through reasoning

<img src="images/problem_decomposition_framework.png" alt="Problem Decomposition Framework" width="50%">

*Problem Decomposition Framework*

### Example 1: Blog Writing with Problem Decomposition

Consider this example of breaking down a blog writing task:

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Prompt:</strong><br>

You are a blog writer. Please follow this outline to write about AI risks:<br>

‚Ä¢ <strong>Introduction</strong> ‚Äî Introduce AI, its relevance, and importance of understanding risks<br>
‚Ä¢ <strong>Privacy Concerns</strong> ‚Äî Discuss how AI might compromise personal privacy<br>
‚Ä¢ <strong>Misinformation</strong> ‚Äî Explore AI's role in spreading misinformation<br>
‚Ä¢ <strong>Cyberbullying</strong> ‚Äî Highlight how AI tools can be utilized in cyberbullying<br>
‚Ä¢ <strong>Tips for Safe AI Use</strong> ‚Äî Offer guidelines for responsible AI usage<br>
‚Ä¢ <strong>Conclusion</strong> ‚Äî Recap main points and encourage proactive engagement<br><br>
<u>___________</u>

<strong>üí° Key Insight:</strong> By decomposing the complex <span style="background-color: #ffeb3b; padding: 2px 4px;">"write a blog"</span> task into <span style="background-color: #90ee90; padding: 2px 4px;">structured sections</span>, each sub-task becomes more manageable for the LLM.

</div>

### Example 2: Document Analysis with Divide-and-Conquer

In computer science, decomposing complex problems is a commonly used strategy. A well-known example is the **divide-and-conquer paradigm**.

Consider determining whether a long document discusses the risks of AI:

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Direct Approach (Computationally Expensive):</strong><br><br>

<span style="background-color: #ffb6c1; padding: 2px 4px;">Task:</span> You are provided with a text. Determine whether it discusses AI risks.<br>

{document}<br>
<u>___________</u>


<br>
<strong>Divide-and-Conquer Approach (Efficient):</strong><br>

<strong>Step 1:</strong> Divide the document into relatively short segments<br>
<strong>Step 2:</strong> Process each segment in parallel<br>
<strong>Step 3:</strong> Determine relevancy of each segment<br>
<strong>Step 4:</strong> Aggregate results for final output<br><br>



<strong>Final Aggregation Prompt:</strong><br>

<span style="background-color: #87ceeb; padding: 2px 4px;">Task:</span> Your task is to determine whether a text discusses AI risks. The text has been divided into segments with relevancy scores:<br><br>

<span style="background-color: #90ee90; padding: 2px 4px;">Segment 1:</span> {relevancy-to-topic1}<br>
<span style="background-color: #90ee90; padding: 2px 4px;">Segment 2:</span> {relevancy-to-topic2}<br>
<span style="background-color: #90ee90; padding: 2px 4px;">Segment 3:</span> {relevancy-to-topic3}<br>
...<br>
<u>___________</u>

<strong>üí° Key Benefit:</strong> This approach enables <span style="background-color: #ffeb3b; padding: 2px 4px;">parallel processing</span> of segments, making analysis of long documents much more efficient.

</div>

<img src="images/document_analysis_workflow.png" alt="Document Analysis Workflow" width="50%">

*Document Analysis Workflow: Long documents are divided into segments, processed in parallel, and results are aggregated for a final decision.*

### Least-to-Most Prompting

**Least-to-most prompting** [Zhou et al., 2023b] addresses difficult reasoning problems by following a **progressive sequence of sub-problems** that systematically lead to the conclusion.

The motivation for this method arises from challenges of solving difficult reasoning problems ‚Äî those that cannot be addressed by simply generalizing from a few examples. For these problems, a more effective strategy is to follow a progressive sequence of sub-problems that systematically lead to the conclusion.

#### Two-Stage Process:

**Stage 1: Sub-problem Generation**
- Use prompting with instructions and/or demonstrations
- LLM decomposes the problem into sub-problems
- Uses few-shot examples to learn decomposition pattern

**Stage 2: Sequential Solving**  
- Solve sub-problems **one by one**
- Each step includes **all previously-generated QA pairs** as context
- Build up knowledge progressively
- Finally solve the original problem with all sub-problem answers

#### Example: Environmental Study Duration

Let's see how this works with a concrete example.

<img src="images/least_to_most_workflow.png" alt="Least-to-Most Prompting Workflow" width="60%">

*Least-to-Most Prompting Workflow*

**Stage 1: Sub-problem Generation (Textbook Example)**

<div style="max-width: 800px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>TASK:</strong> Your task is to decompose a problem into several sub-problems. You will be given a few examples to illustrate how to achieve this.<br><br>

<strong>DEMO </strong><br>
<strong>Q:</strong> In a community, 5% of the population are infants, 15% are children, 40% are adults, and 40% are seniors. Which group makes up the largest portion of the population?<br>
<strong>A:</strong> To answer the question "Which group makes up the largest portion of the population?", we need to know: <span style="background-color: #ffeb3b; padding: 2px 4px;">"How many percent are infants?"</span>, <span style="background-color: #ffeb3b; padding: 2px 4px;">"How many percent are children?"</span>, <span style="background-color: #ffeb3b; padding: 2px 4px;">"How many percent are adults?"</span>, <span style="background-color: #ffeb3b; padding: 2px 4px;">"How many percent are seniors?"</span>.<br><br>

<strong>Q:</strong> Alice, Bob, and Charlie brought beads for their group project in their craft class. Alice has twice as many beads as Bob, and Bob has five times as many beads as Charlie. If Charlie has 6 beads, how many beads can they use for their craft project?<br>
<strong>A:</strong> To answer the question "How many beads can they use for their craft project?", we need to know: <span style="background-color: #ffeb3b; padding: 2px 4px;">"How many beads does Bob have?"</span>, <span style="background-color: #ffeb3b; padding: 2px 4px;">"How many beads does Alice have?"</span>.<br><br>

<strong>USER</strong><br>
<strong> Q:</strong> The environmental study conducted from 2015 to 2020 revealed that the average temperature in the region increased by 2.3 degrees Celsius. What was the duration of the environmental study?<br>
<strong>A:</strong> To answer the question<span style="background-color: #4caf50; padding: 2px 4px; color: white;"> "What was the duration of the environmental study?"</span>, we need to know: <span style="background-color: #4caf50; padding: 2px 4px; color: white;">"When did the environmental study start?"</span>, <span style="background-color: #4caf50; padding: 2px 4px; color: white;">"When did the environmental study end?"</span>.
</div>

**Stage 2 - Sequential Sub-problem Solving**:

Given the sub-problems, we solve them sequentially, taking all previously-generated QA pairs as context.

**Stage 2: Sequential Sub-problem Solving (Textbook Example)**

<div style="max-width: 800px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Step 1: Solving First Sub-problem</strong><br>
The environmental study conducted from 2015 to 2020 revealed that the average temperature in the region increased by 2.3 degrees Celsius.<br>

<span style="background-color: #90ee90; padding: 2px 4px;">SUB-PROB1 Q: When did the environmental study start?</span><br>
<strong>A:</strong> <u>The environmental study started in 2015.</u><br><br>


<strong>Step 2: Solving Second Sub-problem (with context)</strong><br>
The environmental study conducted from 2015 to 2020 revealed that the average temperature in the region increased by 2.3 degrees Celsius.<br>

<span style="background-color: #87ceeb; padding: 2px 4px;">SUB-PROB1 Q: When did the environmental study start?</span><br>
<strong>A:</strong> The environmental study started in 2015.<br>
<span style="background-color: #90ee90; padding: 2px 4px;">SUB-PROB2 Q: When did the environmental study end?</span><br>
<strong>A:</strong> <u>The environmental study ended in 2020.</u><br><br>


<strong>Step 3: Solving Original Problem (with all sub-problem answers)</strong><br>
The environmental study conducted from 2015 to 2020 revealed that the average temperature in the region increased by 2.3 degrees Celsius.<br>

<span style="background-color: #87ceeb; padding: 2px 4px;">SUB-PROB1 Q: When did the environmental study start?</span><br>
<strong>A:</strong> The environmental study started in 2015.<br>
<span style="background-color: #87ceeb; padding: 2px 4px;">SUB-PROB2 Q: When did the environmental study end?</span><br>
<strong>A:</strong> The environmental study ended in 2020.<br>
<span style="background-color: #ffa500; padding: 2px 4px;"><strong>FINAL Q: What was the duration of the environmental study?</strong></span><br>
<strong>A:</strong><u> The duration of the environmental study was 5 years.</u><br><br>

<hr style="border: 1px solid #ddd; margin: 12px 0;">

<strong>üí° Key Point:</strong> Each step includes ALL previous QA pairs as context!<br><br>
<strong>Color Legend:</strong><br>
‚Ä¢ <span style="background-color: #90ee90; padding: 2px 4px;">Green</span> = New sub-problem being solved<br>
‚Ä¢ <span style="background-color: #87ceeb; padding: 2px 4px;">Light blue</span> = Previous sub-problems (context)<br>
‚Ä¢ <span style="background-color: #ffa500; padding: 2px 4px;">Orange</span> = Final question

</div>

### Mathematical Formulation

Let $p_0$ denote the input problem, and $\{p_1, \ldots, p_n\}$ the sub-problems.

**Sub-problem Generation**:
$$\{p_1, \ldots, p_n\} = G(p_0)$$

where $G(\cdot)$ is the sub-problem generation function.

**Sequential Solving**: For the $i$-th sub-problem:
$$a_i = S_i(p_i, \{p_0, p_{<i}, a_{<i}\})$$

where:
- $p_{<i} = \{p_1, \ldots, p_{i-1}\}$ (previous sub-problems)
- $a_{<i} = \{a_1, \ldots, a_{i-1}\}$ (previous answers)
- $S_i(\cdot)$ solves sub-problem $p_i$ given context

**Final Answer**:
$$a_0 = S_0(p_0, \{p_{\leq n}, a_{\leq n}\})$$

### Dynamic Sub-problem Generation

Instead of generating all sub-problems at once, we can generate each dynamically during problem-solving:
$$p_i = G_i(p_0, \{p_{<i}, a_{<i}\})$$

This allows the reasoning path to adapt based on intermediate results.

### Related Topics

Problem decomposition relates to:

1. **Multi-hop Question Answering**: Gathering and combining information from multiple sources
2. **Compositional Reasoning**: Breaking complex sentences into constituent parts
3. **Tool Use**: Integrating external tools and APIs (discussed in Section 3.2.5)

**Example - SCAN Task**: Tests compositional generalization by translating commands into action sequences:
- Command: "jump opposite left and walk thrice"
- Actions: "LTURN LTURN JUMP WALK WALK WALK"

## 3.2.3 Self-refinement

### Introduction

In many cases, predictions of LLMs can be inaccurate or incorrect. **Self-refinement** explores methods for LLMs to iteratively improve their outputs.

This is analogous to human behavior - for example, a designer might:
1. Create a basic prototype
2. Evaluate and test
3. Refine the design to enhance user experience
4. Iterate until satisfactory

In NLP, early examples include:
- **Brill's tagger** (1992): Iteratively refining POS tagging using rules
- **Sequence-to-sequence refinement**: Grammar correction, text rewriting

### Example: Translation Refinement

Let's see how self-refinement works with Chinese-to-English translation:

<img src="images/self_refinement_cycle.png" alt="Self-refinement Iterative Cycle" width="50%">

*Self-refinement Iterative Cycle*

**Example: Self-refinement for Translation**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Step 1: Initial Translation</strong><br>

<span style="background-color: #ffeb3b; padding: 2px 4px;">Chinese:</span><br>
‰∏ÄÁ≥ªÂàóËÄÉÂè§ÂèëÁé∞Â•†ÂÆöÁ∫¢Â±±ÊñáÂåñÂú®‰∏≠ÂçéÊñáÊòéËµ∑Ê∫êÁ†îÁ©∂‰∏≠ÁöÑÈáçË¶ÅÂú∞‰Ωç„ÄÇÊúÄÊñ∞ÂÖ¨Â∏ÉÁöÑÁ†îÁ©∂ÊàêÊûúËÆ§‰∏∫ÔºåÂ§ßÁ∫¶‰ªéË∑ù‰ªä5800Âπ¥ÂºÄÂßãÔºå‰∏≠ÂçéÂ§ßÂú∞‰∏äÂêÑ‰∏™Âå∫ÂüüÁõ∏ÁªßÂá∫Áé∞ËæÉ‰∏∫ÊòéÊòæÁöÑÁ§æ‰ºöÂàÜÂåñÔºåËøõÂÖ•ÊñáÊòéËµ∑Ê∫êÁöÑÂä†ÈÄüÈò∂ÊÆµ„ÄÇ<br>

<span style="background-color: #90ee90; padding: 2px 4px;">English Translation:</span>
<u>A series of discoveries have cemented the significant role of the Hongshan culture in studies on the origins of Chinese civilization. The latest research findings suggest that, starting from around 5800 years ago, many regions across China began to experience noticeable social differentiations, entering an accelerated phase in the origin of civilization.</u><br><br>


<strong>Step 2: Refinement</strong><br>

Please review and refine the following English translation to improve its accuracy, fluency, and naturalness:<br>

<span style="background-color: #ffb6c1; padding: 2px 4px;">Current Translation:</span><br>
A series of discoveries have cemented the significant role...<br><br>

<span style="background-color: #90ee90; padding: 2px 4px;">Refined Translation:</span><br>
<u>A series of <strong style="background-color: #90ee90; padding: 1px 3px;">archaeological</strong> discoveries have cemented the significant role of the Hongshan culture in studies on the origins of Chinese civilization. The latest research findings suggest that, starting from around 5800 years ago, <strong style="background-color: #90ee90; padding: 1px 3px;">various</strong> regions across China began to experience noticeable social differentiations, entering an accelerated phase in the origin of civilization.</u><br><br>

<span style="background-color: #ffeb3b; padding: 2px 4px;">üí° Improvements:</span> Added "archaeological", changed "many" ‚Üí "various"

</div>

### General Framework

A general framework of self-refinement with LLMs involves three steps [Madaan et al., 2024]:

1. **Prediction**: Use an LLM to produce the initial model output
2. **Feedback Collection**: Obtain feedback on the model output
3. **Refinement**: Use the LLM to refine the output based on feedback

The last two steps can be repeated multiple times, leading to an iterative self-refinement process.

### Example: Response Generation with Feedback

Let's consider generating responses to user questions with iterative refinement:

**Example: Response Generation with Feedback**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Step 1: Initial Response</strong><br>

<span style="background-color: #87ceeb; padding: 2px 4px;">Question:</span> What are some major environmental concerns today?<br>

Response:<br>
<u>Major environmental concerns today include climate change, air and water pollution, deforestation, loss of biodiversity, and ocean degradation.</u><br><br>

<strong>Step 2: Feedback Generation</strong><br>

<span style="background-color: #ffb6c1; padding: 2px 4px;">Feedback:</span><br>
<u>The response is accurate and highlights key concerns. However, it would benefit from more specific and detailed descriptions. <strong>Suggestions for improvement:</strong><br>
1. Include specific examples of environmental issues<br>
2. Add information about major pollutants<br>
3. Specify the regions most affected</u><br><br>


<strong>Step 3: Refined Response</strong><br>

<span style="background-color: #90ee90; padding: 2px 4px;">Improved Response:</span><br>
<u>Key environmental concerns include <strong style="background-color: #90ee90; padding: 1px 3px;">climate change, evidenced by rising global temperatures and severe weather patterns</strong>; <strong style="background-color: #90ee90; padding: 1px 3px;">significant air and water pollution in urban and industrial areas</strong>; rampant <strong style="background-color: #90ee90; padding: 1px 3px;">deforestation in regions like the Amazon</strong>, causing biodiversity loss; and ocean degradation, highlighted by <strong style="background-color: #90ee90; padding: 1px 3px;">coral reef bleaching and widespread overfishing</u></strong>.

</div>

### Deliberate-then-Generate (DTG) Method

The **deliberate-then-generate (DTG)** method [Li et al., 2023a] prompts LLMs to first deliberate on potential errors before generating improved output.

**Example: DTG with Negative Evidence**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>DTG Template:</strong><br><br>

Given the Chinese sentence: {source}<br>
The English translation is: {target}<br>
<span style="background-color: #ffeb3b; padding: 2px 4px;">Please first detect the type of error, and then refine the translation.</span><br>
Error Type:
<span style="background-color: #87ceeb; padding: 2px 4px;"><u>___...________</u></span><br>
<span style="background-color: #ffb6c1; padding: 2px 4px;"><u>____..._______</u></span><br><br>

<strong>‚ùå Example with Negative Evidence (Incorrect Translation):</strong><br>

Chinese Sentence:<br>
‰∏ÄÁ≥ªÂàóËÄÉÂè§ÂèëÁé∞Â•†ÂÆöÁ∫¢Â±±ÊñáÂåñÂú®‰∏≠ÂçéÊñáÊòéËµ∑Ê∫êÁ†îÁ©∂‰∏≠ÁöÑÈáçË¶ÅÂú∞‰Ωç„ÄÇ<br>

Incorrect English Translation:<br>
A variety of <strong style="background-color: #ffb6c1; padding: 1px 3px;">innovative techniques</strong> have redefined the importance of <strong style="background-color: #ffb6c1; padding: 1px 3px;">modern art</strong> in <strong style="background-color: #ffb6c1; padding: 1px 3px;">contemporary cultural studies</strong>.<br>

Error Type Detected: <strong><span style="background-color: #ffa500; padding: 2px 4px;">Incorrect Translation</span></strong><br>
(The translation talks about <em>modern art</em> instead of <em>archaeological discoveries</em> and Hongshan culture!)<br><br>

<strong>üí° Key Insight:</strong> DTG uses <span style="background-color: #ffeb3b; padding: 2px 4px;">"negative evidence"</span> - showing the LLM what <strong>NOT</strong> to do. This helps the model learn from mistakes and generate better outputs by first identifying errors.

</div>

## 3.2.4 Ensembling

### Introduction

**Model ensembling** for text generation has been extensively discussed in NLP literature. The idea is to **combine predictions of two or more models** to generate a better prediction.

For LLM prompting, it's possible to improve performance by combining predictions based on **different prompts**.

### Example: Text Simplification with Multiple Prompts

Consider three different prompt templates for text simplification:

<img src="images/ensembling_workflow.png" alt="LLM Ensembling Workflow" width="50%">

*LLM Ensembling Workflow*

**Example: Multiple Prompts for Text Simplification**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Three Different Prompt Templates:</strong><br><br>

<strong>Prompt 1:</strong> Make this text simpler.</span><br>
{text}<br><u>________</u><br>

<strong>Prompt 2:</strong> Condense and simplify this text.</span><br>
{text}<br><u>________</u><br>

<strong>Prompt 3:</strong>Rewrite for easy reading.</span><br>
{text}<br><u>________</u><br><br>



<strong>üí° Ensembling Strategy:</strong> Each prompt leads to a <strong>different prediction</strong>. We can combine all three predictions to generate a <span style="background-color: #ffeb3b; padding: 2px 4px;">final ensemble prediction</span> that is typically better than any single prompt's output.

</div>

### Mathematical Formulation

Let $\{\mathbf{x}_1, \ldots, \mathbf{x}_K\}$ be $K$ prompts for the same task.

Given an LLM $\text{Pr}(\cdot \mid \cdot)$, find the best prediction for each prompt:
$$\hat{\mathbf{y}}_i = \arg\max_{\mathbf{y}_i} \text{Pr}(\mathbf{y}_i \mid \mathbf{x}_i)$$

**Combine predictions**:
$$\hat{\mathbf{y}} = \text{Combine}(\hat{\mathbf{y}}_1, \ldots, \hat{\mathbf{y}}_K)$$

**Token-level Averaging**:
$$\hat{y}_j = \arg\max_{y_j} \sum_{k=1}^{K} \log \text{Pr}(y_j \mid \mathbf{x}_k, \hat{y}_1, \ldots, \hat{y}_{j-1})$$

### Bayesian Perspective

Treat the prompt $\mathbf{x}$ as a latent variable given problem $p$:
$$\text{Pr}(\mathbf{y} \mid p) = \int \text{Pr}(\mathbf{y} \mid \mathbf{x}) \text{Pr}(\mathbf{x} \mid p) d\mathbf{x}$$

This marginalizes over all possible prompts, weighted by their likelihood.

**Approximation**: Use Monte Carlo sampling with a finite set of diverse prompts.

### Creating Diverse Prompts

Methods for generating diverse prompts:

1. **Manual Creation**: Create different demonstrations manually
2. **Automatic Generation**: Use LLMs to generate demonstrations and prompts
3. **Demonstration Ordering**: Change the order of examples in prompts
4. **Paraphrasing**: Use LLMs to generate similar prompts
5. **Translation**: Translate prompts into other languages

**Key Principle**: Diverse prompts lead to diverse outputs, especially for difficult problems.

### Self-Consistency Method

**Self-consistency** [Wang et al., 2022a; 2023b] outputs the prediction that best aligns with other predictions rather than the one with highest probability.

**Process**:
1. Prompt LLM with CoT and generate multiple reasoning paths by sampling
2. Count the frequency of each answer across reasoning paths
3. Select the answer with the highest count

**Example: Coin Flip Probability**

Question: "Three friends flip a fair coin once each. What is the probability that exactly one flips heads?"

**Example: Self-Consistency with Multiple Predictions**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Question:</strong> Three friends flip a fair coin once each. What is the probability that exactly one flips heads?<br><br>

<hr style="border: 1px solid #ddd; margin: 10px 0;">

<strong>Prediction 1 ‚úì Correct</strong><br>
<strong>Reasoning:</strong> Let's think step by step. Each flip is independent with 50% chance of H or T. Possible outcomes: HHH, HHT, HTH, HTT, THH, THT, TTH, TTT<br>
Exactly one head: <span style="background-color: #90ee90; padding: 2px 4px;">HTT, THT, TTH</span>. That's 3 out of 8.<br>
<strong>Answer:</strong> <strong style="font-size: 1.1em;">3/8 = 37.5%</strong><br><br>

<hr style="border: 1px solid #ddd; margin: 10px 0;">

<strong>Prediction 2 ‚úì Correct</strong><br>
<strong>Reasoning:</strong> Let's think step by step. Using <span style="background-color: #87ceeb; padding: 2px 4px;">binomial probability formula</span>: p=0.5, n=3 trials, k=1 success.<br>
<span style="background-color: #87ceeb; padding: 2px 4px;">P(X=1) = C(3,1) √ó (0.5)^1 √ó (0.5)^2</span> = 3 √ó 0.5 √ó 0.25 = 0.375<br>
<strong>Answer:</strong> <strong style="font-size: 1.1em;">37.5%</strong><br><br>

<hr style="border: 1px solid #ddd; margin: 10px 0;">

<strong>Prediction 3 ‚úó Incorrect</strong><br>
<strong>Reasoning:</strong> Let's think step by step. Each flip is 50%. For exactly one head, we pick which flip is heads. There are three flips, so <span style="background-color: #ffb6c1; padding: 2px 4px;">1 out of 3</span>. ‚ùå<br>
<strong>Answer:</strong> <strong style="font-size: 1.1em;">1/3 = 33.3%</strong><br><br>

<hr style="border: 2px solid #ddd; margin: 12px 0;">

<strong style="font-size: 1.1em;">üìä Answer Frequency Count:</strong><br>

<span style="background-color: #90ee90; padding: 3px 8px; font-weight: bold;">37.5%</span> ‚Üí <strong style="font-size: 1.2em;">2 occurrences</strong> (Predictions 1 & 2)<br>
<span style="background-color: #ffb6c1; padding: 3px 8px; font-weight: bold;">33.3%</span> ‚Üí 1 occurrence (Prediction 3)<br><br>

<strong style="font-size: 1.15em;">‚úì Self-Consistency selects: <span style="background-color: #90ee90; padding: 2px 6px; font-weight: bold;">37.5%</span></strong><br>
This is correct! Predictions 1 and 2 agree on the same answer.<br><br>

<strong>üí° Key Point:</strong> Self-consistency selects the answer with the <span style="background-color: #ffeb3b; padding: 2px 4px;">highest frequency</span> across multiple reasoning paths, rather than the one with highest probability.

</div>

### Types of Ensembling

Ensembling methods for LLMs can be categorized into three main types, each with distinct approaches to improving prediction quality:

<img src="images/fig3-2_ensembling_methods.png" alt="Figure 3.2: Ensembling methods for LLMs" width="75%" style="display: block; margin: 20px auto;">

*Figure 3.2: Ensembling methods for LLMs*

#### 1. Model Ensembling (a)

**Approach**: Use **multiple LLMs** with different architectures or parameters

**Process**:
- Each LLM receives the **same prompt**
- Each LLM produces its own prediction independently
- Predictions are **combined** to generate the final output

**Benefits**:
- Leverages strengths of different model architectures
- Reduces bias from any single model
- Most diverse but computationally expensive

**Example**: Combine predictions from GPT-4, Claude, and PaLM for the same query

#### 2. Prompt Ensembling (b)

**Approach**: Use **one LLM** with **multiple prompts**

**Process**:
- Generate diverse prompts for the same task
- Feed each prompt to the same LLM
- Combine predictions from different prompts

**Benefits**:
- More cost-effective than model ensembling
- Can be applied with single API access
- Leverages prompt diversity

**Example**: Use different phrasings like "Simplify this text", "Make this easier to read", and "Rewrite for clarity"

#### 3. Output Ensembling (c)

**Approach**: Use **one LLM** to **sample multiple predictions** from the prediction space

**Process**:
- Given a single prompt, sample multiple outputs
- Use temperature or top-k/top-p sampling for diversity
- Combine sampled predictions

**Benefits**:
- Simplest approach with single model and prompt
- Boosts performance of the LLM itself
- Exploits model's stochasticity

**Example**: Sample 5 different responses with temperature=0.8 and select the best

### Combining Ensembling Methods

**Key Insight**: These methods can be **combined** to increase prediction diversity even further.

**Combined Approaches**:
- **Prompt + Output**: Use multiple prompts and sample multiple outputs for each
- **Model + Prompt**: Use multiple models with different prompts for each
- **Model + Output**: Use multiple models and sample from each
- **All Three**: Maximum diversity but highest computational cost

**Example Strategy**: Apply both prompt ensembling and output ensembling to obtain more diverse predictions that are more likely to include correct answers.

## 3.2.5 RAG and Tool Use

### Introduction to RAG

**Retrieval-Augmented Generation (RAG)** is employed when standard LLMs, relying solely on pre-trained knowledge, lack accuracy and depth. By drawing from **external databases and documents**, RAG can significantly improve response quality, ensuring contextual relevance and factual correctness.

RAG is particularly useful in scenarios requiring:
- High factual accuracy
- Up-to-date information
- Complex question answering

### Key Steps in RAG

1. **Prepare a collection of texts** as an additional knowledge source
2. **Retrieve relevant texts** for a given query
3. **Input retrieved texts and query** into an LLM to produce the final prediction

Steps 1 and 2 can be implemented using an external **information retrieval system** (e.g., vector database with similarity search).

<img src="images/rag_workflow.png" alt="Retrieval-Augmented Generation (RAG) Workflow" width="45%" style="display: block; margin: 20px auto;">

*Retrieval-Augmented Generation (RAG) Workflow*

### Example: Olympics Location Query

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Question:</strong> <span style="background-color: #ffeb3b; padding: 2px 4px;">"Where will the 2028 Olympics be held?"</span><br><br>

<hr style="border: 1px solid #ddd; margin: 12px 0;">

<strong>Step 1: Retrieval</strong><br><br>

Search online, retrieve relevant texts:<br><br>

<span style="background-color: #87ceeb; padding: 2px 4px;">(Wikipedia)</span><br>
The <span style="background-color: #ffeb3b; padding: 2px 4px;">2028 Summer Olympics</span>, officially the Games of the XXXIV Olympiad and commonly known as <strong style="background-color: #90ee90; padding: 1px 3px;">Los Angeles 2028 or LA28</strong>, is an upcoming international multi-sport event scheduled to take place from <span style="background-color: #ffeb3b; padding: 2px 4px;">July 14-30, 2028</span>, in the <strong style="background-color: #90ee90; padding: 1px 3px;">United States</strong>. ...<br><br>

<span style="background-color: #87ceeb; padding: 2px 4px;">(The Sporting News)</span><br>
In 2028, <strong style="background-color: #90ee90; padding: 1px 3px;">Los Angeles</strong> will become the third city, following London and Paris, to host three Olympics after hosting the Summer Games in 1932 and 1984. It will also be the first time the United States has hosted an Olympic Games since the 2002 Winter Games in Salt Lake City. ...<br><br>

<hr style="border: 1px solid #ddd; margin: 12px 0;">

<strong>Step 2: Augmentation & Generation</strong><br><br>

Prompt LLM with <span style="background-color: #ffeb3b; padding: 2px 4px;">retrieved texts</span> + <span style="background-color: #ffeb3b; padding: 2px 4px;">original question</span> to generate informed answer.

</div>

**Example: Basic RAG Prompt**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

Task:</span> Answer the following question based on provided texts.<br><br>

Question:</span> Where will the 2028 Olympics be held?<br><br>

Relevant Text 1:</span><br>
The 2028 Summer Olympics, officially the Games of the XXXIV Olympiad and commonly known as <strong style="background-color: #90ee90; padding: 1px 3px;">Los Angeles 2028 or LA28</strong>, is an upcoming international multi-sport event scheduled to take place from July 14-30, 2028, <strong style="background-color: #90ee90; padding: 1px 3px;">in the United States</strong>...<br><br>

Relevant Text 2:</span><br>
In 2028, <strong style="background-color: #90ee90; padding: 1px 3px;">Los Angeles</strong> will become the third city, following London and Paris, to host three Olympics after hosting the Summer Games in 1932 and 1984...<br><br>


Answer:<u> The 2028 Olympics will be held in <strong>Los Angeles</u></strong>.

</div>

### Robust RAG with Insufficient Information Handling

The information retrieval system may sometimes provide **irrelevant or incorrect texts**. We need to enhance LLM robustness to handle inaccurate inputs.

**Improved Prompt**: Allow the LLM to refuse answering when information is insufficient.

**Example: Robust RAG Prompt (Handling Insufficient Information)**

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

Task: Answer the question if sufficient information is provided. Otherwise, output "No answer!"<br><br>
<span style="background-color: #ffa500; padding: 3px 6px; font-weight: bold;">Please note that your answers need to be as accurate as possible and faithful to the facts. If the information provided is insufficient for an accurate response, you may simply output "No answer!".</span> <br><br>
<span style="background-color: #87ceeb; padding: 2px 4px;">Question:</span> Where will the 2028 Olympics be held?<br><br>

Relevant Text (Incorrect):<br>
The <strong style="background-color: #ffb6c1; padding: 1px 3px;">2024 Summer Olympics</strong>, officially the Games of the XXXIII Olympiad and branded as <strong style="background-color: #ffb6c1; padding: 1px 3px;">Paris 2024</strong>, were an international multi-sport event...<br><br>



<strong>‚ö†Ô∏è Analysis:</strong> The provided text is about the <span style="background-color: #ffb6c1; padding: 1px 3px;">2024 Olympics (Paris)</span>, NOT the 2028 Olympics!<br><br>


<span style="background-color: #ffeb3b; padding: 3px 6px; font-weight: bold;">Answer:</span> <strong style="font-size: 1.1em;">No answer!</strong><br><br>

<span style="background-color: #90ee90; padding: 2px 4px;">üí° Key Point:</span> The LLM correctly refuses to answer because the provided information is insufficient/incorrect.

</div>

### RAG vs. Fine-tuning

Both RAG and fine-tuning are common methods for adapting LLMs using task-specific data:

**RAG**:
- Training-free, can be directly applied
- External knowledge can be updated easily
- Lower computational cost
- Requires good retrieval system

**Fine-tuning**:
- Internalizes knowledge into model weights
- Better for consistent task-specific behavior
- Higher computational cost
- Requires labeled training data

RAG can be further improved through fine-tuning components of the retrieval or generation process.

### 3.2.5.1 RAG Implementation: Popular Approaches

Now let's explore the **two most popular RAG approaches** in production systems today, with practical implementation examples.

## Approach 1: Vector Embedding RAG (Most Widely Used)

**Vector Embedding RAG** is the **most popular and widely adopted** RAG approach, used by the majority of enterprise systems.

### Why Vector RAG is Most Popular

**Reasons**:
- **Simple Implementation**: embedding model + vector database
- **Mature Tooling**: FAISS, Milvus, Pinecone, Chroma, Weaviate
- **Default in Frameworks**: LangChain and LlamaIndex use vector RAG by default
- **Production-Ready**: Battle-tested in thousands of applications

**Best For**: General Q&A, enterprise document search, customer support, standard RAG applications

### How It Works

1. **Document Processing**: Split documents into chunks (e.g., 512 tokens each)
2. **Embedding**: Convert each chunk into dense vectors using embedding models
3. **Vector Storage**: Store embeddings in a vector database (e.g., FAISS)
4. **Query Processing**: Convert user query into a vector
5. **Similarity Search**: Find k most similar document chunks (cosine similarity)
6. **Generation**: Feed retrieved chunks + query to LLM for answer generation

### Popular Vector Databases

<div style="max-width: 900px; font-size: 0.9em; line-height: 1.5; padding: 15px; border-radius: 5px;">

<table style="width: 100%; border-collapse: collapse; margin: 15px 0;">
<thead style="background-color: #003865; color: white;">
<tr>
<th style="padding: 10px; border: 1px solid #ddd;">Tool</th>
<th style="padding: 10px; border: 1px solid #ddd;">Type</th>
<th style="padding: 10px; border: 1px solid #ddd;">Best For</th>
</tr>
</thead>
<tbody>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>FAISS</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Local library (Facebook AI)</td>
<td style="padding: 8px; border: 1px solid #ddd;">Research, prototyping, local development</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Chroma</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Open-source, embeddable</td>
<td style="padding: 8px; border: 1px solid #ddd;">Lightweight applications, easy integration</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Pinecone</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Cloud-managed service</td>
<td style="padding: 8px; border: 1px solid #ddd;">Production deployments, scalability</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Milvus</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Open-source, distributed</td>
<td style="padding: 8px; border: 1px solid #ddd;">Large-scale enterprise systems</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Weaviate</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Open-source, GraphQL</td>
<td style="padding: 8px; border: 1px solid #ddd;">Hybrid search, complex queries</td>
</tr>
</tbody>
</table>

</div>

Let's implement a simple Vector RAG system using **FAISS** (Facebook AI Similarity Search).

### Vector RAG Demo with FAISS

Let's use **FAISS** (Facebook AI Similarity Search) to build a simple RAG system.

**Install**: `pip install faiss-cpu`

**Important**: FAISS and sentence-transformers serve different purposes:

- **sentence-transformers**: Converts text ‚Üí vector embeddings (text-to-vector model)
- **FAISS**: Stores vectors and performs fast similarity search (vector database)

**Key Point**: **FAISS does NOT generate vectors** - it's embedding model agnostic!

You can use **any** embedding model with FAISS:
- `sentence-transformers` (we use this)
- `OpenAI embeddings` (text-embedding-3)
- `Cohere embeddings`
- `BGE`, `E5`, custom models, etc.

**Workflow**: 
```
text ‚Üí [YOUR CHOICE OF EMBEDDING MODEL] ‚Üí embeddings ‚Üí FAISS ‚Üí search results
```

FAISS only requires:
1. All vectors have the **same dimension** (e.g., all 384D)
2. You use the **same embedding model** for documents and queries

In our demo: We use `sentence-transformers` to generate embeddings, then `FAISS` to search them.

In [None]:
!pip install faiss-cpu
!pip install sentence-transformers

In [None]:
# Step 1: Vectorize documents with sentence-transformers
# Install: pip install sentence-transformers

from sentence_transformers import SentenceTransformer

documents = ["2028 Olympics: Los Angeles", "2024 Olympics: Paris", "2020 Olympics: Tokyo"]

# Use sentence-transformers to vectorize
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(documents)

print(f"Documents: {len(documents)}")
print(f"Embeddings: {embeddings.shape}")
print(f"Each document ‚Üí {embeddings.shape[1]}D vector")

In [None]:
# Step 2: Build FAISS index
# Install: pip install faiss-cpu
import faiss

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)  # L2 distance
index.add(embeddings)

print(f"FAISS index created: {dimension}D vectors")
print(f"Total documents indexed: {index.ntotal}")

In [None]:
# Step 3: Search with FAISS
query = "Where is 2028 Olympics?"
query_vec = model.encode([query])

# Search for top 1 similar document
k = 1
distances, indices = index.search(query_vec, k)

result = documents[indices[0][0]]
print(f"Query: {query}")
print(f"Distance: {distances[0][0]:.4f}")
print(f"Retrieved: {result}")
print("\n‚úì FAISS search complete!")

## Approach 2: Graph RAG (Rapidly Rising)

**Graph RAG** is **rapidly gaining popularity**, especially for complex reasoning tasks. Major tech companies like Microsoft, OpenAI, and NVIDIA are using it internally.

### How Graph RAG Works

Instead of flat vector retrieval, Graph RAG builds a **knowledge graph** from documents where:

**Nodes** represent:
- Document paragraphs
- Entities (people, organizations, locations, concepts)
- Sub-topics and themes
- Key concepts

**Edges** represent:
- Semantic relationships between entities
- References and citations
- Causal connections
- Temporal relationships

### Why Graph RAG is Gaining Momentum

**Advantages**:
1. **Deep Reasoning**: Supports complex multi-hop reasoning across documents
2. **Structured Knowledge**: Maintains relationships and context
3. **Explainability**: Can trace reasoning paths through the graph
4. **Complex Domains**: Excels in domains with rich interconnections

**Industry Adoption**:
- **Microsoft**: GraphRAG for enterprise knowledge bases
- **OpenAI**: Internal knowledge graph systems
- **NVIDIA**: NeMo Guardrails with graph-based retrieval
- **Academic Research**: Growing body of research papers

### Best Use Cases

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Ideal Applications:</strong><br>
‚Ä¢ <strong>Scientific Literature</strong>: Research papers with complex citations and dependencies<br>
‚Ä¢ <strong>Legal Documents</strong>: Laws, regulations, case references<br>
‚Ä¢ <strong>Medical Knowledge</strong>: Disease relationships, treatment protocols, drug interactions<br>
‚Ä¢ <strong>Engineering Documentation</strong>: System dependencies, component relationships<br>
‚Ä¢ <strong>Cross-Document Reasoning</strong>: Questions requiring information from multiple sources<br><br>

<strong>Example Query:</strong><br>
<span style="background-color: #ffeb3b; padding: 2px 4px;">"What treatments are effective for patients with both diabetes and hypertension, considering potential drug interactions?"</span><br><br>

This requires:<br>
1. Finding diabetes treatments<br>
2. Finding hypertension treatments<br>
3. Identifying drug interactions between them<br>
4. Filtering safe combinations<br><br>

Graph RAG excels at this by traversing: <span style="background-color: #90ee90; padding: 2px 4px;">Diabetes ‚Üí Treatments ‚Üí Drug Interactions ‚Üê Treatments ‚Üê Hypertension</span>

</div>

### Vector RAG vs. Graph RAG: Comparison

<div style="max-width: 900px; font-size: 0.9em; line-height: 1.5; padding: 15px; border-radius: 5px;">

<table style="width: 100%; border-collapse: collapse; margin: 15px 0;">
<thead style="background-color: #003865; color: white;">
<tr>
<th style="padding: 10px; border: 1px solid #ddd;">Aspect</th>
<th style="padding: 10px; border: 1px solid #ddd;">Vector RAG</th>
<th style="padding: 10px; border: 1px solid #ddd;">Graph RAG</th>
</tr>
</thead>
<tbody>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Popularity</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Most widely used</span></td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #ffeb3b; padding: 2px 4px;">Rapidly rising</span></td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Implementation</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Simple (embedding + vector DB)</td>
<td style="padding: 8px; border: 1px solid #ddd;">Complex (entity extraction + graph construction)</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Setup Time</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Hours to days</td>
<td style="padding: 8px; border: 1px solid #ddd;">Days to weeks</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Reasoning</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Single-hop, direct similarity</td>
<td style="padding: 8px; border: 1px solid #ddd;">Multi-hop, relational reasoning</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Best For</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">General Q&A, document search</td>
<td style="padding: 8px; border: 1px solid #ddd;">Complex domains, relational queries</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Scalability</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Excellent (billions of vectors)</td>
<td style="padding: 8px; border: 1px solid #ddd;">Good (millions of nodes)</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Explainability</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Limited (similarity scores)</td>
<td style="padding: 8px; border: 1px solid #ddd;">High (reasoning paths visible)</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Tools</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">FAISS, Pinecone, Chroma, Milvus</td>
<td style="padding: 8px; border: 1px solid #ddd;">Neo4j, Microsoft GraphRAG, LlamaIndex</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Maintenance</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Low (just update embeddings)</td>
<td style="padding: 8px; border: 1px solid #ddd;">Medium (maintain graph structure)</td>
</tr>
</tbody>
</table>

</div>

### Recommendation

**Start with Vector RAG** for most applications:
- Faster to implement
- Mature tooling
- Sufficient for 80% of use cases

**Consider Graph RAG** when you need:
- Complex multi-hop reasoning
- Explainable retrieval paths
- Rich entity relationships
- Domain-specific knowledge structures

Many production systems use **hybrid approaches**: Vector RAG for initial retrieval + Graph RAG for complex reasoning.

### Tool Use in LLMs

**Tool use** integrates external tools into LLMs to access accurate data not available during training or fine-tuning.

**Examples**:
- **APIs**: Fetch real-time data (weather, stock prices, news)
- **Calculators**: Perform accurate mathematical computations
- **Code Executors**: Run code and return results
- **Search Engines**: Retrieve up-to-date information

### Problem Decomposition for Tool Use

Tool use requires decomposing problems into sub-problems:
- **Some handled by LLMs**: Natural language understanding, reasoning
- **Some handled by external tools**: Computation, data retrieval, execution

LLM predictions might include **markers** indicating where and how to call external APIs.

This is a natural application of the problem decomposition framework discussed in Section 3.2.2.

### Example 1: Web Search Tool

Consider answering "Where will the 2028 Olympics be held?" with web search capability:

<div style="max-width: 880px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

Task: Answer the following question. You may use external tools, such as web search, to assist you.<br><br>

Question: Where will the 2028 Olympics be held?<br><br>

<span style="background-color: #90ee90; padding: 2px 4px;">LLM Output:</span><br>
<u>The information regarding this question is given as follows:<br>
<strong style="background-color: #ffa500; padding: 2px 6px;">{tool: web-search, query: "2028 Olympics"}</strong><br>
So the answer is: <strong>Los Angeles</strong></u><br><br>


<strong>üí° How it works:</strong><br>

<strong>1. Tool Marker:</strong> The string <span style="background-color: #ffa500; padding: 2px 4px;">{tool: web-search, query: "2028 Olympics"}</span> indicates a request to the web search system<br>

<strong>2. Execution:</strong> When the LLM generates this marker, the system executes a web search using query "2028 Olympics"<br>

<strong>3. Context Replacement:</strong> The web search result replaces the marker string<br>

<strong>4. Continued Generation:</strong> The LLM uses this search result as context to produce the correct answer "Los Angeles"

</div>

### Example 2: Calculator Tool for Mathematical Problem-Solving

Consider a mathematical problem requiring precise calculations:

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<span style="background-color: #ffeb3b; padding: 3px 6px; font-weight: bold;">Problem:</span><br>
A swimming pool needs to be filled with water. The pool measures <strong>10 meters</strong> in length, <strong>4 meters</strong> in width, and <strong>2 meters</strong> in depth. Calculate the volume of the pool in cubic meters and then determine how many liters of water are needed to fill it (considering 1 cubic meter equals 1000 liters).<br><br>


<span style="background-color: #90ee90; padding: 2px 4px;">LLM Solution:</span><br>

To solve this problem, I need to first calculate the volume of the pool using the formula for a rectangular prism: <span style="background-color: #87ceeb; padding: 2px 4px;">Length √ó Width √ó Depth</span>.<br><br>

Therefore, the volume is:<br>
10 m √ó 4 m √ó 2 m = <strong style="background-color: #ffa500; padding: 2px 6px;">{tool: calculator, expression: 10 * 4 * 2}</strong> m¬≥<br>
<em>‚Üí Calculator returns: <strong style="background-color: #90ee90; padding: 2px 6px;">80</strong></em><br><br>

Next, to find how many liters of water are needed, I multiply the volume in cubic meters by 1000 (since 1 cubic meter equals 1000 liters).<br><br>

Therefore:<br>
80 √ó 1000 = <strong style="background-color: #ffa500; padding: 2px 6px;">{tool: calculator, expression: 80 * 1000}</strong> liters<br>
<em>‚Üí Calculator returns: <strong style="background-color: #90ee90; padding: 2px 6px;">80000</strong></em><br><br>

<strong style="font-size: 1.1em;">Final Answer:</strong> The pool has a volume of <strong>80 m¬≥</strong> and requires <strong>80,000 liters</strong> of water to fill it.<br><br>

<hr style="border: 1px solid #ddd; margin: 12px 0;">

<strong>üí° Key Points:</strong><br><br>

<strong>1. Tool Markers During Inference:</strong> The string <span style="background-color: #ffa500; padding: 2px 4px;">{tool: calculator, expression: 10 * 4 * 2}</span> triggers the calculator during token prediction<br>

<strong>2. Result Replacement:</strong> The calculation result (80) replaces the tool marker and becomes part of the context<br>

<strong>3. Sequential Tool Use:</strong> The LLM can use the result from the first calculation (80) in subsequent reasoning steps<br>

<strong>4. Accuracy:</strong> External computational tools ensure mathematical accuracy that LLMs might struggle with

</div>

### RAG vs. Tool Use: Key Differences

While both RAG and tool use integrate external resources, they differ in **when** and **how** the integration happens:

<div style="max-width: 900px; font-size: 0.9em; line-height: 1.5; padding: 15px; border-radius: 5px;">

<table style="width: 100%; border-collapse: collapse; margin: 15px 0;">
<thead style="background-color: #003865; color: white;">
<tr>
<th style="padding: 10px; border: 1px solid #ddd;">Aspect</th>
<th style="padding: 10px; border: 1px solid #ddd;">RAG (Retrieval-Augmented Generation)</th>
<th style="padding: 10px; border: 1px solid #ddd;">Tool Use</th>
</tr>
</thead>
<tbody>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Timing</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #87ceeb; padding: 2px 4px;">Before prediction</span><br>Retrieved texts provided before generation begins</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #ffa500; padding: 2px 4px;">During inference</span><br>External functions called while generating tokens</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Integration</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Static context augmentation</td>
<td style="padding: 8px; border: 1px solid #ddd;">Dynamic function execution with markers</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Purpose</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Access external knowledge bases, documents</td>
<td style="padding: 8px; border: 1px solid #ddd;">Execute computations, API calls, real-time data</td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Examples</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Wikipedia search, document retrieval</td>
<td style="padding: 8px; border: 1px solid #ddd;">Calculator, web search, code execution</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Training</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Can work with off-the-shelf LLMs</td>
<td style="padding: 8px; border: 1px solid #ddd;">Requires fine-tuning to generate tool markers</td>
</tr>
</tbody>
</table>

<hr style="border: 1px solid #ddd; margin: 12px 0;">

<strong>üîç Unified Perspective:</strong><br><br>

From the language modeling perspective, <strong>both approaches do the same thing</strong>: before generating the final result, they use external tools (either manually or automatically) to obtain <span style="background-color: #90ee90; padding: 2px 4px;">sufficient and relevant context</span>.<br><br>

The high-level interpretation: Both rely on an <strong>"agent"</strong> that determines <em>where</em> and <em>how</em> to call external functions to generate the necessary context for accurate prediction.

</div>

### Fine-tuning for Tool Use

Original LLMs are not trained to generate tool use markers. Therefore:

1. **Data Annotation**: Replace parts of outputs requiring tools with predefined commands/markers
2. **Fine-tuning**: Train the LLM on annotated data to generate tool commands
3. **Inference**: Execute tool commands in model outputs to get external assistance

This chapter focuses on prompting, so fine-tuning details are beyond scope. The key insight is that tool use enables LLMs to function as **autonomous agents** rather than mere text generators.

### 3.2.5.2 Tool Use with MCP (Model Context Protocol)

**Model Context Protocol (MCP)** is an open standard protocol introduced by Anthropic for connecting LLMs with external tools and data sources in a standardized way.

### What is MCP?

MCP provides a **universal protocol** for LLM-tool integration, solving the problem of fragmented tool ecosystems.

**Key Concepts**:
- **Standardized Interface**: All tools expose capabilities through a unified protocol
- **Client-Server Architecture**: LLMs act as clients, tools act as servers
- **Type Safety**: Strongly-typed tool definitions and parameters
- **Bidirectional Communication**: Tools can request information from LLMs

### Why MCP Matters

**Before MCP**: Each tool required custom integration code
```
LLM ‚Üí Custom Code for Tool A
LLM ‚Üí Different Custom Code for Tool B  
LLM ‚Üí Yet Another Integration for Tool C
```

**With MCP**: One standardized protocol for all tools
```
LLM ‚Üí MCP Protocol ‚Üí Tool A (MCP Server)
                   ‚Üí Tool B (MCP Server)
                   ‚Üí Tool C (MCP Server)
```

### Advantages of MCP

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Benefits:</strong><br>
‚Ä¢ <strong>Standardization</strong>: Single protocol for all tools<br>
‚Ä¢ <strong>Composability</strong>: Easily combine multiple tools<br>
‚Ä¢ <strong>Discoverability</strong>: Tools can advertise their capabilities<br>
‚Ä¢ <strong>Security</strong>: Built-in authorization and sandboxing<br>
‚Ä¢ <strong>Extensibility</strong>: Easy to add new tools without changing LLM code<br><br>

<strong>Popular MCP Use Cases:</strong><br>
‚Ä¢ File system access (read/write files)<br>
‚Ä¢ Database queries (SQL, vector databases)<br>
‚Ä¢ Web browsing and scraping<br>
‚Ä¢ API calls (REST, GraphQL)<br>
‚Ä¢ Code execution (Python, JavaScript)<br>
‚Ä¢ System commands (git, docker, etc.)

</div>

### MCP Tool Use Example: Brave Search (Real MCP Server)

Let's see how MCP works with **Brave Search**, a real MCP server for web search.

**Scenario**: User asks "What are the latest advances in AI?"

#### Step 1: Tool Registration (Real MCP Schema)

The Brave Search MCP server registers with this schema:

```json
{
  "name": "brave_search",
  "description": "Perform web search using Brave Search API",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "count": {
        "type": "integer",
        "description": "Number of results",
        "default": 5
      }
    },
    "required": ["query"]
  }
}
```

#### Step 2: LLM Recognizes Need for Tool

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>User Query:</strong> <span style="background-color: #ffeb3b; padding: 2px 4px;">"What are the latest advances in AI?"</span><br><br>

<strong>LLM Analysis:</strong><br>
"This requires current information from the web. I should use <span style="background-color: #90ee90; padding: 2px 4px;">brave_search</span> tool."<br><br>

<strong>LLM ‚Üí MCP Tool Call:</strong><br>
<pre style="background-color: #f5f5f5; padding: 10px; border-radius: 5px;">
{
  "tool": "brave_search",
  "parameters": {
    "query": "latest advances in AI 2025",
    "count": 3
  }
}
</pre>

</div>

#### Step 3: MCP Server Executes and Returns Results

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<strong>Brave Search MCP Server Response:</strong><br>
<pre style="background-color: #f5f5f5; padding: 10px; border-radius: 5px;">
{
  "results": [
    {
      "title": "GPT-5 and Multimodal AI Advances",
      "url": "https://example.com/ai-2025",
      "snippet": "Major breakthroughs in multimodal AI..."
    },
    {
      "title": "Quantum AI Computing Progress",
      "url": "https://example.com/quantum-ai",
      "snippet": "New quantum algorithms for AI..."
    },
    {
      "title": "AI Safety Research Updates",
      "url": "https://example.com/ai-safety",
      "snippet": "Latest developments in AI alignment..."
    }
  ]
}
</pre>

</div>

#### Step 4: LLM Generates Natural Language Response

<div style="max-width: 850px; font-family: monospace; font-size: 0.9em; line-height: 1.6; padding: 15px; border-left: 4px solid #003865; border-radius: 5px;">

<span style="background-color: #90ee90; padding: 2px 4px;">LLM Final Answer:</span><br><br>

"Based on recent web search results, here are the latest advances in AI:<br><br>

1. <strong>Multimodal AI</strong>: GPT-5 and similar models show significant improvements in combining text, images, and audio processing.<br><br>

2. <strong>Quantum AI</strong>: New quantum algorithms are accelerating AI training and inference.<br><br>

3. <strong>AI Safety</strong>: Major progress in AI alignment and safety research to ensure responsible AI development."

</div>

**Key Advantage**: MCP enables LLMs to access <strong>real-time web information</strong> through standardized protocols!

### MCP vs. Traditional Tool Use

<div style="max-width: 900px; font-size: 0.9em; line-height: 1.5; padding: 15px; border-radius: 5px;">

<table style="width: 100%; border-collapse: collapse; margin: 15px 0;">
<thead style="background-color: #003865; color: white;">
<tr>
<th style="padding: 10px; border: 1px solid #ddd;">Aspect</th>
<th style="padding: 10px; border: 1px solid #ddd;">Traditional Tool Use</th>
<th style="padding: 10px; border: 1px solid #ddd;">MCP Tool Use</th>
</tr>
</thead>
<tbody>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Integration</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Custom code for each tool</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Standardized protocol</span></td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Discovery</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Manual documentation</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Automatic capability discovery</span></td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Type Safety</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">String-based, error-prone</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Strongly typed schemas</span></td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Maintenance</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">High (update each integration)</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Low (update protocol once)</span></td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Composability</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Difficult to chain tools</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Easy tool chaining</span></td>
</tr>
<tr>
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Security</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Ad-hoc per tool</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #90ee90; padding: 2px 4px;">Built-in authorization</span></td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 8px; border: 1px solid #ddd;"><strong>Adoption</strong></td>
<td style="padding: 8px; border: 1px solid #ddd;">Varies by framework</td>
<td style="padding: 8px; border: 1px solid #ddd;"><span style="background-color: #ffeb3b; padding: 2px 4px;">Growing (Anthropic, community)</span></td>
</tr>
</tbody>
</table>

</div>

### MCP Resources

**Official Documentation**: [modelcontextprotocol.io](https://modelcontextprotocol.io)

**Common MCP Servers**:
- **File System**: Read/write local files
- **GitHub**: Repository operations
- **PostgreSQL**: Database queries  
- **Brave Search**: Web search
- **Playwright**: Browser automation

**Key Takeaway**: MCP represents the **future of LLM-tool integration**, providing a standardized, secure, and composable approach to extending LLM capabilities with external tools and data sources.

## Summary

In this lecture, we covered advanced prompting techniques that extend and enhance basic Chain of Thought methods:

3.2.2 Problem Decomposition
- **Framework**: Sub-problem generation + sub-problem solving
- **Least-to-Most Prompting**: Progressive sub-problem sequences leading to solutions
- **Dynamic Generation**: Adaptively create sub-problems based on intermediate results
- **Applications**: Multi-hop QA, compositional reasoning, tool use

3.2.3 Self-refinement
- **Process**: Prediction ‚Üí Feedback Collection ‚Üí Refinement ‚Üí Iterate
- **Benefits**: Error recovery, quality improvement, iterative enhancement
- **Methods**: Direct feedback, DTG (deliberate-then-generate), negative evidence
- **Challenges**: Computational cost, evaluation difficulty, convergence

3.2.4 Ensembling
- **Types**: Model ensembling, prompt ensembling, output ensembling
- **Self-Consistency**: Select answer with highest frequency across reasoning paths
- **Diversity**: Key to effective ensembling through varied prompts/outputs
- **Bayesian View**: Marginalize over prompt uncertainty

3.2.5 RAG and Tool Use
- **RAG Steps**: Prepare knowledge ‚Üí Retrieve relevant texts ‚Üí Generate with context
- **Robustness**: Handle insufficient/irrelevant information gracefully
- **Tool Use**: Integrate external APIs and computational systems
- **Problem Decomposition**: Separate LLM tasks from tool tasks

### Key Takeaways

1. **Decomposition** makes complex problems tractable by breaking them into simpler steps
2. **Iteration** enables improvement through refinement and feedback loops
3. **Combination** leverages diversity to improve prediction quality
4. **External Knowledge** compensates for LLM limitations through retrieval and tools

These techniques represent cutting-edge approaches to prompt engineering, enabling more effective and sophisticated use of large language models across diverse applications.

## Exercise: Implementing a Simple "Problem Decomposition + Tool Use" Process

**Background**: You are building an intelligent Q&A system where users input questions requiring multi-step reasoning. You need to use problem decomposition to break down complex questions into sub-problems and call external tools (such as calculators) to assist with calculations.

**Task**: Complete the following Python code to implement the following functions:

1. Define a function decompose_question(question) that breaks down the input question into two sub-questions.

2. Define a function use_tool(sub_question) that simulates calling external tools (like a calculator) to answer sub-questions.

3. In the main function, combine the answers to the sub-questions to form the final answer.

**Expected Output**:

Sub-question 1: What is the volume of the swimming pool in cubic meters?<br>

Tool call: Calculate 10 * 4 * 2<br>

Sub-question 1 answer: 80 cubic meters<br><br>

Sub-question 2: How many liters of water are needed to fill it?<br>

Tool call: Calculate 80 * 1000<br>

Sub-question 2 answer: 80000 liters<br><br>

Final answer: The swimming pool has a volume of 80 cubic meters and requires 80000 liters of water to fill.

In [None]:
def decompose_question(question):
    # Return a list of two sub-questions
    # Hint: Observe the example question, it can be decomposed into two sub-questions:
    # 1. Calculate volume (cubic meters)
    # 2. Calculate water amount (liters)
    sub_questions = []
    # Your code here
    return sub_questions

def use_tool(sub_question):
    # Simulate tool calls, such as a calculator
    # Hint: Determine what calculation to perform based on keywords in the sub-question
    # If the sub-question contains "volume", calculate 10*4*2
    # If the sub-question contains "liters", calculate 80*1000
    # Your code here
    return answer

def main(question):
    sub_questions = decompose_question(question)
    answers = []
    for i, sub_q in enumerate(sub_questions, 1):
        print(f"Sub-question {i}: {sub_q}")
        answer = use_tool(sub_q)
        print(f"Sub-question {i} answer: {answer}")
        answers.append(answer)
    # Combine final answer
    final_answer = f"The swimming pool has a volume of {answers[0]} and requires {answers[1]} to fill."
    print("Final answer:", final_answer)

# Test
question = "A swimming pool is 10 meters long, 4 meters wide, and 2 meters deep. What is its volume in cubic meters? How many liters of water are needed to fill it?"
main(question)

In [8]:
def decompose_question(question):
    # Return a list of two sub-questions
    sub_questions = [
        "What is the volume of the swimming pool in cubic meters?",
        "How many liters of water are needed to fill it?"
    ]
    return sub_questions

def use_tool(sub_question):
    # Simulate tool calls, such as a calculator
    if "volume" in sub_question:
        # Calculate volume: length √ó width √ó height
        result = 10 * 4 * 2
        return f"{result} cubic meters"
    elif "liters" in sub_question:
        # Calculate water amount: volume √ó 1000
        result = 80 * 1000  # Assuming volume is already 80 cubic meters
        return f"{result} liters"
    else:
        return "Cannot answer this sub-question"

def main(question):
    sub_questions = decompose_question(question)
    answers = []
    for i, sub_q in enumerate(sub_questions, 1):
        print(f"Sub-question {i}: {sub_q}")
        answer = use_tool(sub_q)
        print(f"Tool call: Calculate {sub_q.split('?')[0].split(' ')[-1]}")
        print(f"Sub-question {i} answer: {answer}")
        answers.append(answer)
        print()
    
    # Combine final answer
    final_answer = f"The swimming pool has a volume of {answers[0]} and requires {answers[1]} to fill."
    print("Final answer:", final_answer)

# Test
question = "A swimming pool is 10 meters long, 4 meters wide, and 2 meters deep. What is its volume in cubic meters? How many liters of water are needed to fill it?"
main(question)

Sub-question 1: What is the volume of the swimming pool in cubic meters?
Tool call: Calculate meters
Sub-question 1 answer: 80 cubic meters

Sub-question 2: How many liters of water are needed to fill it?
Tool call: Calculate it
Sub-question 2 answer: 80000 liters

Final answer: The swimming pool has a volume of 80 cubic meters and requires 80000 liters to fill.
