Skip to content

大语言模型应用开发:从 RAG 到 Agent 构建

概述

大语言模型(Large Language Models, LLM)正在重塑我们与计算机交互的方式。从智能助手到代码生成,从文档分析到创意写作,LLM 的应用场景不断扩展。本教程将带你深入理解 LLM 应用开发的核心技术,通过 5 个实战案例,掌握构建生产级 LLM 应用的完整技能。

你将学到:

  • LLM 基础概念和 API 使用
  • 提示工程(Prompt Engineering)技巧
  • 检索增强生成(RAG)系统构建
  • 向量数据库与 embeddings
  • LLM Agent 架构与实现
  • 函数调用与工具集成
  • 模型微调基础
  • 应用部署与优化

第一章 LLM 基础与 API 使用

1.1 什么是大语言模型?

大语言模型是基于 Transformer 架构的深度学习模型,通过在海量文本数据上预训练,获得强大的语言理解和生成能力。

核心特点:

  • 自回归生成:逐 token 预测下一个词
  • 上下文学习:从示例中学习任务
  • 涌现能力:规模增大后出现新能力
  • 通用性:一个模型处理多种任务

主流模型:

  • GPT 系列(OpenAI)
  • Claude 系列(Anthropic)
  • LLaMA 系列(Meta)
  • Qwen 系列(阿里)
  • Gemini 系列(Google)

1.2 使用 OpenAI API

python
# 安装依赖
pip install openai python-dotenv

import os
from openai import OpenAI
from dotenv import load_dotenv

# 加载环境变量
load_dotenv()

# 初始化客户端
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

# 基础对话
def chat_completion(messages, model='gpt-3.5-turbo', temperature=0.7):
    """发送对话请求"""
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=1024
    )
    return response.choices[0].message.content

# 示例
messages = [
    {"role": "system", "content": "你是一个有帮助的助手。"},
    {"role": "user", "content": "请解释什么是机器学习?"}
]

response = chat_completion(messages)
print(response)

1.3 流式响应

python
def stream_chat(messages, model='gpt-3.5-turbo'):
    """流式输出"""
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end='', flush=True)
    
    print()  # 换行

# 使用流式
messages = [
    {"role": "user", "content": "写一首关于春天的诗"}
]
stream_chat(messages)

1.4 使用其他模型

python
# Anthropic Claude
from anthropic import Anthropic

anthropic_client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))

def claude_chat(prompt, max_tokens=1024):
    response = anthropic_client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=max_tokens,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

# 本地模型(Ollama)
import requests

def ollama_chat(prompt, model='llama2'):
    response = requests.post(
        'http://localhost:11434/api/generate',
        json={
            'model': model,
            'prompt': prompt,
            'stream': False
        }
    )
    return response.json()['response']

# 阿里云通义千问
from dashscope import Generation

def qwen_chat(prompt):
    response = Generation.call(
        model='qwen-turbo',
        prompt=prompt,
        api_key=os.getenv('DASHSCOPE_API_KEY')
    )
    return response.output.text

第二章 提示工程实战

2.1 提示工程基础

提示工程(Prompt Engineering)是设计和优化输入提示以获得更好模型输出的艺术和科学。

核心原则:

  1. 清晰明确:避免歧义
  2. 提供上下文:帮助模型理解任务
  3. 给出示例:少样本学习
  4. 结构化输出:指定格式要求
  5. 迭代优化:持续改进提示

2.2 基础提示技巧

python
class PromptTemplates:
    """提示模板集合"""
    
    @staticmethod
    def role_prompt(role, task, context=""):
        """角色设定提示"""
        return f"""你是一位{role}
{context}
请完成以下任务:{task}"""
    
    @staticmethod
    def few_shot_prompt(task, examples, input_text):
        """少样本学习提示"""
        prompt = f"任务:{task}\n\n"
        prompt += "示例:\n"
        for ex in examples:
            prompt += f"输入:{ex['input']}\n"
            prompt += f"输出:{ex['output']}\n\n"
        prompt += f"现在请处理:{input_text}"
        return prompt
    
    @staticmethod
    def chain_of_thought(prompt):
        """思维链提示"""
        return f"""请逐步思考这个问题:
{prompt}

让我们一步一步地思考:"""
    
    @staticmethod
    def structured_output(task, schema):
        """结构化输出提示"""
        return f"""请完成以下任务:{task}

请按照以下 JSON 格式输出:
{schema}

只输出 JSON,不要有其他内容。"""

# 使用示例
templates = PromptTemplates()

# 角色提示
role_prompt = templates.role_prompt(
    role="资深 Python 工程师",
    task="审查以下代码并提供改进建议",
    context="你擅长编写高效、可读的 Python 代码。"
)

# 少样本提示
examples = [
    {"input": "今天天气真好", "output": "POSITIVE"},
    {"input": "这个产品太糟糕了", "output": "NEGATIVE"},
    {"input": "一般般吧,没什么特别的", "output": "NEUTRAL"}
]
sentiment_prompt = templates.few_shot_prompt(
    task="情感分析",
    examples=examples,
    input_text="我非常喜欢这个功能!"
)

# 思维链
cot_prompt = templates.chain_of_thought(
    "如果 5 台机器 5 分钟生产 5 个零件,那么 100 台机器生产 100 个零件需要多少分钟?"
)

2.3 高级提示技巧

python
class AdvancedPrompts:
    """高级提示技巧"""
    
    @staticmethod
    def self_consistency(question, num_samples=5):
        """自一致性:多次采样取多数"""
        prompts = [f"请回答:{question}" for _ in range(num_samples)]
        # 实际应用中会调用模型多次
        return "多数投票结果"
    
    @staticmethod
    def generated_knowledge(prompt):
        """生成知识提示"""
        return f"""首先,列出回答这个问题所需的知识:
{prompt}

相关知识:
[在此列出相关知识]

现在,基于以上知识回答问题:
{prompt}"""
    
    @staticmethod
    def reflexion(prompt, feedback=""):
        """反思提示"""
        return f"""{prompt}

初始回答:{feedback}

请反思你的回答,考虑是否有改进空间。
如果有错误,请纠正并提供更好的答案。"""
    
    @staticmethod
    def tree_of_thoughts(problem, num_branches=3):
        """思维树提示"""
        return f"""让我们用思维树方法解决这个问题:{problem}

步骤 1:生成 {num_branches} 个不同的解决思路
步骤 2:评估每个思路的可行性
步骤 3:选择最佳思路并深入展开
步骤 4:得出最终答案

现在开始:"""

# 使用示例
advanced = AdvancedPrompts()

# 生成知识提示
knowledge_prompt = advanced.generated_knowledge(
    "量子计算机和传统计算机有什么区别?"
)

# 反思提示
reflexion_prompt = advanced.reflexion(
    prompt="解释相对论的核心概念",
    feedback="[初始回答]"
)

2.4 提示优化实践

python
def optimize_prompt(base_prompt, task_description, test_cases):
    """提示优化框架"""
    
    iterations = []
    current_prompt = base_prompt
    
    for i in range(5):  # 最多 5 次迭代
        # 评估当前提示
        results = []
        for test in test_cases:
            response = chat_completion([
                {"role": "user", "content": current_prompt.format(input=test['input'])}
            ])
            results.append({
                'input': test['input'],
                'expected': test['expected'],
                'actual': response,
                'correct': response.strip() == test['expected']
            })
        
        accuracy = sum(1 for r in results if r['correct']) / len(results)
        iterations.append({
            'iteration': i,
            'prompt': current_prompt,
            'accuracy': accuracy,
            'results': results
        })
        
        print(f"Iteration {i}: Accuracy = {accuracy*100:.1f}%")
        
        if accuracy == 1.0:
            break
        
        # 基于错误分析改进提示
        errors = [r for r in results if not r['correct']]
        current_prompt = improve_prompt(current_prompt, errors)
    
    return iterations

def improve_prompt(current_prompt, errors):
    """基于错误改进提示"""
    # 分析错误模式
    error_patterns = [e['actual'] for e in errors]
    
    # 生成改进建议(实际应用中可调用 LLM)
    improvements = []
    if any('format' in str(e).lower() for e in error_patterns):
        improvements.append("明确指定输出格式")
    if any(len(str(e)) > 100 for e in error_patterns):
        improvements.append("要求简洁回答")
    
    # 应用改进
    improved_prompt = current_prompt
    for imp in improvements:
        improved_prompt += f"\n\n注意:{imp}"
    
    return improved_prompt

第三章 实战案例一:构建 RAG 系统

3.1 RAG 原理

检索增强生成(Retrieval-Augmented Generation, RAG)结合了检索系统和生成模型的优势:

  1. 检索:从知识库中查找相关信息
  2. 增强:将检索结果作为上下文
  3. 生成:基于上下文生成回答

优势:

  • 减少幻觉
  • 提供最新信息
  • 可追溯来源
  • 降低训练成本

3.2 向量数据库基础

python
# 安装依赖
pip install chromadb sentence-transformers

import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer

# 初始化向量数据库
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="documents")

# 初始化嵌入模型
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

def embed_text(text):
    """生成文本嵌入"""
    return embedding_model.encode(text).tolist()

# 添加文档
def add_documents(documents, ids=None, metadatas=None):
    """添加文档到向量库"""
    if ids is None:
        ids = [f"doc_{i}" for i in range(len(documents))]
    
    embeddings = [embed_text(doc) for doc in documents]
    
    collection.add(
        documents=documents,
        embeddings=embeddings,
        ids=ids,
        metadatas=metadatas
    )

# 示例文档
documents = [
    "Python 是一种高级编程语言,由 Guido van Rossum 于 1991 年创建。",
    "机器学习是人工智能的一个分支,让计算机从数据中学习。",
    "深度学习使用神经网络模拟人脑的工作方式。",
    "Transformer 架构彻底改变了自然语言处理领域。",
    "RAG 系统结合了检索和生成的优势。"
]

add_documents(documents)

3.3 实现 RAG 检索

python
class RAGRetriever:
    """RAG 检索器"""
    
    def __init__(self, collection, embedding_model, top_k=3):
        self.collection = collection
        self.embedding_model = embedding_model
        self.top_k = top_k
    
    def retrieve(self, query):
        """检索相关文档"""
        query_embedding = self.embedding_model.encode(query).tolist()
        
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=self.top_k,
            include=['documents', 'metadatas', 'distances']
        )
        
        return {
            'documents': results['documents'][0],
            'metadatas': results['metadatas'][0],
            'distances': results['distances'][0]
        }
    
    def retrieve_with_scores(self, query):
        """检索并返回相关性分数"""
        results = self.retrieve(query)
        
        # 将距离转换为相似度分数
        scores = [1 - d for d in results['distances']]
        
        return list(zip(results['documents'], scores))

# 使用检索器
retriever = RAGRetriever(collection, embedding_model, top_k=3)

# 测试检索
query = "Python 是什么时候创建的?"
results = retriever.retrieve(query)

print(f"查询:{query}")
print(f"\n检索结果:")
for i, (doc, score) in enumerate(zip(results['documents'], 
                                      [1-d for d in results['distances']])):
    print(f"{i+1}. [相似度:{score:.3f}] {doc}")

3.4 构建完整 RAG 系统

python
class RAGSystem:
    """完整 RAG 系统"""
    
    def __init__(self, collection, embedding_model, llm_client, top_k=3):
        self.retriever = RAGRetriever(collection, embedding_model, top_k)
        self.llm_client = llm_client
    
    def build_prompt(self, query, contexts):
        """构建 RAG 提示"""
        context_text = "\n\n".join([
            f"[来源 {i+1}]: {ctx}" 
            for i, ctx in enumerate(contexts)
        ])
        
        return f"""基于以下信息回答问题。如果信息不足,请说明。

相关信息:
{context_text}

问题:{query}

回答:"""
    
    def query(self, question):
        """RAG 查询"""
        # 检索相关文档
        results = self.retriever.retrieve(question)
        contexts = results['documents']
        
        # 构建提示
        prompt = self.build_prompt(question, contexts)
        
        # 生成回答
        response = chat_completion([
            {"role": "user", "content": prompt}
        ])
        
        return {
            'answer': response,
            'contexts': contexts,
            'sources': results['metadatas']
        }
    
    def query_with_sources(self, question):
        """带来源引用的 RAG 查询"""
        result = self.query(question)
        
        # 添加来源引用
        answer = result['answer']
        answer += "\n\n**来源:**\n"
        for i, ctx in enumerate(result['contexts']):
            answer += f"- [来源 {i+1}] {ctx[:100]}...\n"
        
        return answer

# 初始化 RAG 系统
rag = RAGSystem(collection, embedding_model, client)

# 测试
questions = [
    "Python 是谁创建的?",
    "机器学习和深度学习有什么区别?",
    "Transformer 有什么重要性?"
]

for q in questions:
    print(f"\n{'='*50}")
    print(f"问题:{q}")
    print(f"{'='*50}")
    result = rag.query(q)
    print(f"回答:{result['answer']}")

3.5 文档加载与处理

python
from pathlib import Path
import PyPDF2
import docx

class DocumentLoader:
    """文档加载器"""
    
    @staticmethod
    def load_text_file(file_path):
        """加载文本文件"""
        with open(file_path, 'r', encoding='utf-8') as f:
            return f.read()
    
    @staticmethod
    def load_pdf(file_path):
        """加载 PDF 文件"""
        text = ""
        with open(file_path, 'rb') as f:
            reader = PyPDF2.PdfReader(f)
            for page in reader.pages:
                text += page.extract_text()
        return text
    
    @staticmethod
    def load_docx(file_path):
        """加载 Word 文件"""
        doc = docx.Document(file_path)
        return "\n".join([para.text for para in doc.paragraphs])
    
    @staticmethod
    def chunk_text(text, chunk_size=500, overlap=50):
        """文本分块"""
        chunks = []
        start = 0
        while start < len(text):
            end = start + chunk_size
            chunk = text[start:end]
            chunks.append(chunk)
            start = end - overlap
        return chunks
    
    @staticmethod
    def load_directory(dir_path, extensions=['.txt', '.md']):
        """加载目录下所有文档"""
        documents = []
        for file_path in Path(dir_path).rglob('*'):
            if file_path.suffix in extensions:
                try:
                    content = DocumentLoader.load_text_file(file_path)
                    chunks = DocumentLoader.chunk_text(content)
                    for i, chunk in enumerate(chunks):
                        documents.append({
                            'content': chunk,
                            'source': str(file_path),
                            'chunk': i
                        })
                except Exception as e:
                    print(f"加载失败 {file_path}: {e}")
        return documents

# 使用文档加载器
loader = DocumentLoader()

# 加载目录
docs = loader.load_directory('./knowledge_base')

# 添加到向量库
if docs:
    add_documents(
        documents=[d['content'] for d in docs],
        ids=[f"{Path(d['source']).stem}_{d['chunk']}" for d in docs],
        metadatas=[{'source': d['source'], 'chunk': d['chunk']} for d in docs]
    )
    print(f"已加载 {len(docs)} 个文档块")

3.6 高级 RAG 技巧

python
class AdvancedRAG(RAGSystem):
    """高级 RAG 系统"""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.query_history = []
    
    def hyde_retrieval(self, query):
        """HyDE:假设性文档嵌入"""
        # 生成假设性回答
        hyde_prompt = f"请写出这个问题的理想答案:{query}"
        hypothetical_doc = chat_completion([{"role": "user", "content": hyde_prompt}])
        
        # 用假设性回答检索
        return self.retriever.retrieve(hypothetical_doc)
    
    def multi_query_retrieval(self, query, num_queries=3):
        """多查询检索"""
        # 生成多个相关查询
        gen_prompt = f"""基于以下问题,生成 {num_queries} 个相关的查询变体:
问题:{query}

查询变体:"""
        variations = chat_completion([{"role": "user", "content": gen_prompt}])
        
        # 检索并合并结果
        all_results = []
        for q in variations.split('\n'):
            if q.strip():
                results = self.retriever.retrieve(q.strip())
                all_results.extend(results['documents'])
        
        # 去重
        unique_results = list(dict.fromkeys(all_results))
        return unique_results[:self.retriever.top_k]
    
    def rerank_results(self, query, results):
        """重排序结果"""
        # 使用交叉编码器重排序
        from sentence_transformers import CrossEncoder
        
        reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
        
        pairs = [[query, doc] for doc in results]
        scores = reranker.predict(pairs)
        
        # 按分数排序
        sorted_results = sorted(
            zip(results, scores), 
            key=lambda x: x[1], 
            reverse=True
        )
        
        return [r[0] for r in sorted_results]
    
    def query(self, question, use_hyde=False, use_multi_query=False):
        """增强查询"""
        if use_hyde:
            results = self.hyde_retrieval(question)
            contexts = results['documents']
        elif use_multi_query:
            contexts = self.multi_query_retrieval(question)
        else:
            results = self.retriever.retrieve(question)
            contexts = results['documents']
        
        # 可选:重排序
        # contexts = self.rerank_results(question, contexts)
        
        prompt = self.build_prompt(question, contexts)
        response = chat_completion([{"role": "user", "content": prompt}])
        
        return {
            'answer': response,
            'contexts': contexts
        }

# 使用高级 RAG
advanced_rag = AdvancedRAG(collection, embedding_model, client)

# HyDE 检索
result = advanced_rag.query("如何提高代码质量?", use_hyde=True)
print(f"HyDE 回答:{result['answer']}")

第四章 实战案例二:构建 LLM Agent

4.1 Agent 架构

LLM Agent 是能够自主执行任务的智能系统,核心组件包括:

  1. 规划:分解任务、制定计划
  2. 记忆:短期和长期记忆
  3. 工具使用:调用外部 API 和函数
  4. 反思:评估和改进

4.2 基础 Agent 实现

python
import json
from typing import List, Dict, Any

class Tool:
    """工具基类"""
    
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
    
    def run(self, **kwargs) -> Any:
        raise NotImplementedError

class CalculatorTool(Tool):
    """计算器工具"""
    
    def __init__(self):
        super().__init__(
            name="calculator",
            description="执行数学计算。输入:数学表达式,如 '2 + 2'"
        )
    
    def run(self, expression: str) -> str:
        try:
            result = eval(expression, {"__builtins__": {}}, {})
            return f"计算结果:{result}"
        except Exception as e:
            return f"计算错误:{e}"

class SearchTool(Tool):
    """搜索工具"""
    
    def __init__(self):
        super().__init__(
            name="search",
            description="搜索网络信息。输入:搜索查询"
        )
    
    def run(self, query: str) -> str:
        # 模拟搜索(实际应用中调用搜索 API)
        return f"搜索结果:关于'{query}'的信息..."

class LLMEngine:
    """LLM 引擎"""
    
    def __init__(self, client):
        self.client = client
    
    def chat(self, messages: List[Dict]) -> str:
        return chat_completion(messages)
    
    def extract_json(self, text: str) -> Dict:
        """从文本中提取 JSON"""
        import re
        match = re.search(r'\{.*\}', text, re.DOTALL)
        if match:
            return json.loads(match.group())
        return {}

class SimpleAgent:
    """简单 Agent"""
    
    def __init__(self, llm: LLMEngine, tools: List[Tool]):
        self.llm = llm
        self.tools = {tool.name: tool for tool in tools}
        self.tool_descriptions = "\n".join([
            f"- {tool.name}: {tool.description}"
            for tool in tools
        ])
        self.memory = []
    
    def build_system_prompt(self):
        """构建系统提示"""
        return f"""你是一个智能助手,可以使用以下工具:

{self.tool_descriptions}

当需要时,请以 JSON 格式回复,包含:
{{
    "thought": "你的思考过程",
    "action": "工具名称",
    "action_input": "工具输入"
}}

如果不需要工具,直接回复:
{{
    "thought": "你的思考",
    "action": "final_answer",
    "action_input": "你的回答"
}}"""
    
    def run(self, query: str, max_iterations: int = 5) -> str:
        """运行 Agent"""
        messages = [
            {"role": "system", "content": self.build_system_prompt()},
            {"role": "user", "content": query}
        ]
        
        for i in range(max_iterations):
            # 获取 LLM 响应
            response = self.llm.chat(messages)
            
            # 解析响应
            action_dict = self.llm.extract_json(response)
            thought = action_dict.get('thought', '')
            action = action_dict.get('action', '')
            action_input = action_dict.get('action_input', '')
            
            print(f"\n[迭代 {i+1}]")
            print(f"思考:{thought}")
            print(f"动作:{action}")
            
            # 执行动作
            if action == "final_answer":
                return action_input
            elif action in self.tools:
                tool_result = self.tools[action].run(**{action.split('_')[0]: action_input})
                print(f"结果:{tool_result}")
                
                # 添加到记忆
                messages.append({"role": "assistant", "content": response})
                messages.append({
                    "role": "user", 
                    "content": f"工具执行结果:{tool_result}\n请继续。"
                })
            else:
                return f"未知工具:{action}"
        
        return "达到最大迭代次数,未能完成任务。"

# 创建 Agent
llm = LLMEngine(client)
tools = [CalculatorTool(), SearchTool()]
agent = SimpleAgent(llm, tools)

# 测试
queries = [
    "计算 123 * 456",
    "搜索 Python 最新版本的特性"
]

for q in queries:
    print(f"\n{'='*50}")
    print(f"查询:{q}")
    print(f"{'='*50}")
    result = agent.run(q)
    print(f"\n最终答案:{result}")

4.3 带记忆的 Agent

python
class MemoryAgent(SimpleAgent):
    """带记忆的 Agent"""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.short_term_memory = []  # 对话历史
        self.long_term_memory = []   # 重要事实
    
    def add_to_short_term(self, role: str, content: str):
        """添加到短期记忆"""
        self.short_term_memory.append({"role": role, "content": content})
    
    def add_to_long_term(self, fact: str):
        """添加到长期记忆"""
        self.long_term_memory.append(fact)
        # 限制长期记忆大小
        if len(self.long_term_memory) > 100:
            self.long_term_memory = self.long_term_memory[-50:]
    
    def get_memory_context(self):
        """获取记忆上下文"""
        context = ""
        if self.long_term_memory:
            context += "已知信息:\n"
            for fact in self.long_term_memory[-10:]:
                context += f"- {fact}\n"
        return context
    
    def run(self, query: str, max_iterations: int = 5) -> str:
        """运行带记忆的 Agent"""
        memory_context = self.get_memory_context()
        
        messages = [
            {"role": "system", "content": self.build_system_prompt() + "\n\n" + memory_context},
            *self.short_term_memory[-10:],  # 最近 10 轮对话
            {"role": "user", "content": query}
        ]
        
        for i in range(max_iterations):
            response = self.llm.chat(messages)
            action_dict = self.llm.extract_json(response)
            
            thought = action_dict.get('thought', '')
            action = action_dict.get('action', '')
            action_input = action_dict.get('action_input', '')
            
            if action == "final_answer":
                # 更新记忆
                self.add_to_short_term("user", query)
                self.add_to_short_term("assistant", action_input)
                return action_input
            elif action in self.tools:
                tool_result = self.tools[action].run(**{action.split('_')[0]: action_input})
                
                messages.append({"role": "assistant", "content": response})
                messages.append({
                    "role": "user", 
                    "content": f"工具执行结果:{tool_result}\n请继续。"
                })
            else:
                return f"未知工具:{action}"
        
        return "达到最大迭代次数。"

# 测试记忆 Agent
memory_agent = MemoryAgent(llm, tools)

# 多轮对话
conversation = [
    "我的名字是张三",
    "记住我喜欢 Python 编程",
    "我叫什么名字?",
    "我喜欢什么编程语言?"
]

for query in conversation:
    print(f"\n用户:{query}")
    response = memory_agent.run(query)
    print(f"助手:{response}")

4.4 ReAct Agent

python
class ReActAgent:
    """ReAct(Reasoning + Acting)Agent"""
    
    def __init__(self, llm: LLMEngine, tools: List[Tool]):
        self.llm = llm
        self.tools = {tool.name: tool for tool in tools}
    
    def build_prompt(self, query: str, history: List[str] = ""):
        """构建 ReAct 提示"""
        return f"""解决以下问题,使用思考 - 行动 - 观察循环。

可用工具:
{chr(10).join([f'- {t.name}: {t.description}' for t in self.tools.values()])}

格式:
Thought: 你的思考
Action: 工具名称
Action Input: 工具输入
Observation: 工具结果
... (可以重复 Thought/Action/Observation)
Thought: 我现在知道最终答案
Final Answer: 最终答案

问题:{query}
{history}"""
    
    def parse_response(self, response: str):
        """解析 ReAct 响应"""
        lines = response.strip().split('\n')
        
        thought = ""
        action = None
        action_input = None
        
        for line in lines:
            if line.startswith('Thought:'):
                thought = line.replace('Thought:', '').strip()
            elif line.startswith('Action:'):
                action = line.replace('Action:', '').strip()
            elif line.startswith('Action Input:'):
                action_input = line.replace('Action Input:', '').strip()
            elif line.startswith('Final Answer:'):
                return {
                    'type': 'final',
                    'answer': line.replace('Final Answer:', '').strip()
                }
        
        if action and action_input:
            return {
                'type': 'action',
                'thought': thought,
                'action': action,
                'action_input': action_input
            }
        
        return {'type': 'thought', 'thought': thought}
    
    def run(self, query: str, max_iterations: int = 5) -> str:
        """运行 ReAct Agent"""
        history = ""
        
        for i in range(max_iterations):
            prompt = self.build_prompt(query, history)
            response = self.llm.chat([{"role": "user", "content": prompt}])
            
            parsed = self.parse_response(response)
            
            print(f"\n[迭代 {i+1}]")
            print(f"思考:{parsed.get('thought', '')}")
            
            if parsed['type'] == 'final':
                print(f"最终答案:{parsed['answer']}")
                return parsed['answer']
            
            elif parsed['type'] == 'action':
                action = parsed['action']
                action_input = parsed['action_input']
                print(f"动作:{action}({action_input})")
                
                if action in self.tools:
                    observation = self.tools[action].run(**{action.split('_')[0]: action_input})
                    print(f"观察:{observation}")
                    history += f"\n{response}\nObservation: {observation}"
                else:
                    history += f"\n{response}\nObservation: 未知工具 {action}"
            else:
                history += f"\n{response}"
        
        return "达到最大迭代次数。"

# 测试 ReAct Agent
react_agent = ReActAgent(llm, tools)

result = react_agent.run("计算 (15 + 25) * 3 的结果")
print(f"\n结果:{result}")

4.5 使用 LangChain 构建 Agent

python
# 安装 LangChain
# pip install langchain langchain-openai langchain-community

from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain.tools import Tool
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

# 定义工具
def calculator(expression: str) -> str:
    """计算数学表达式"""
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"错误:{e}"

def search(query: str) -> str:
    """搜索信息"""
    return f"搜索结果:{query}"

tools = [
    Tool(
        name="Calculator",
        func=calculator,
        description="用于数学计算。输入:数学表达式"
    ),
    Tool(
        name="Search",
        func=search,
        description="搜索网络信息。输入:搜索查询"
    )
]

# 创建 LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# 创建提示
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个有帮助的助手,可以使用工具。"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

# 创建 Agent
agent = create_openai_functions_agent(llm, tools, prompt)

# 创建执行器
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True
)

# 使用 Agent
result = agent_executor.invoke({"input": "计算 123 + 456"})
print(f"结果:{result['output']}")

第五章 实战案例三:函数调用与工具集成

5.1 函数调用基础

现代 LLM 支持函数调用(Function Calling),让模型能够结构化地调用外部函数。

python
# 定义函数 schema
functions = [
    {
        "name": "get_weather",
        "description": "获取指定城市的天气信息",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "城市名称,如'北京'、'上海'"
                },
                "date": {
                    "type": "string",
                    "description": "日期,格式 YYYY-MM-DD,默认为今天"
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "calculate",
        "description": "执行数学计算",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "数学表达式,如 '2 + 2'"
                }
            },
            "required": ["expression"]
        }
    }
]

# 实现函数
def get_weather(city: str, date: str = None) -> str:
    """获取天气"""
    # 模拟天气 API
    return f"{city}今天晴朗,气温 25°C"

def calculate(expression: str) -> str:
    """计算"""
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"错误:{e}"

# 函数映射
function_map = {
    "get_weather": get_weather,
    "calculate": calculate
}

# 使用函数调用
def chat_with_functions(messages: List[Dict]) -> str:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        functions=functions,
        function_call="auto"
    )
    
    message = response.choices[0].message
    
    # 检查是否需要调用函数
    if message.function_call:
        function_name = message.function_call.name
        function_args = json.loads(message.function_call.arguments)
        
        print(f"调用函数:{function_name}")
        print(f"参数:{function_args}")
        
        # 执行函数
        if function_name in function_map:
            result = function_map[function_name](**function_args)
            
            # 将结果返回给模型
            messages.append(message)
            messages.append({
                "role": "function",
                "name": function_name,
                "content": result
            })
            
            # 获取最终响应
            final_response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=messages
            )
            return final_response.choices[0].message.content
    
    return message.content

# 测试
messages = [{"role": "user", "content": "北京今天天气怎么样?"}]
response = chat_with_functions(messages)
print(f"回答:{response}")

5.2 多工具集成

python
class ToolIntegration:
    """多工具集成"""
    
    def __init__(self):
        self.tools = {
            "get_weather": {
                "description": "获取天气",
                "schema": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string"}
                    },
                    "required": ["city"]
                },
                "func": self._get_weather
            },
            "search_web": {
                "description": "网络搜索",
                "schema": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string"}
                    },
                    "required": ["query"]
                },
                "func": self._search_web
            },
            "send_email": {
                "description": "发送邮件",
                "schema": {
                    "type": "object",
                    "properties": {
                        "to": {"type": "string"},
                        "subject": {"type": "string"},
                        "body": {"type": "string"}
                    },
                    "required": ["to", "subject", "body"]
                },
                "func": self._send_email
            }
        }
    
    def _get_weather(self, city: str) -> str:
        return f"{city}天气晴朗,25°C"
    
    def _search_web(self, query: str) -> str:
        return f"搜索结果:{query}"
    
    def _send_email(self, to: str, subject: str, body: str) -> str:
        return f"邮件已发送到 {to}"
    
    def get_openai_functions(self):
        """获取 OpenAI 格式的函数定义"""
        return [
            {
                "name": name,
                "description": tool["description"],
                "parameters": tool["schema"]
            }
            for name, tool in self.tools.items()
        ]
    
    def execute(self, name: str, args: Dict) -> str:
        """执行工具"""
        if name in self.tools:
            return self.tools[name]["func"](**args)
        raise ValueError(f"未知工具:{name}")
    
    def chat(self, user_message: str) -> str:
        """聊天并自动使用工具"""
        messages = [{"role": "user", "content": user_message}]
        
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages,
            functions=self.get_openai_functions(),
            function_call="auto"
        )
        
        message = response.choices[0].message
        
        while message.function_call:
            func_name = message.function_call.name
            func_args = json.loads(message.function_call.arguments)
            
            result = self.execute(func_name, func_args)
            
            messages.append(message)
            messages.append({
                "role": "function",
                "name": func_name,
                "content": result
            })
            
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=messages,
                functions=self.get_openai_functions()
            )
            message = response.choices[0].message
        
        return message.content

# 使用工具集成
integration = ToolIntegration()

queries = [
    "北京天气怎么样?",
    "搜索 Python 教程",
    "给 test@example.com 发邮件,主题是问候,内容是你好"
]

for q in queries:
    print(f"\n用户:{q}")
    response = integration.chat(q)
    print(f"助手:{response}")

5.3 自定义工具开发

python
from abc import ABC, abstractmethod

class BaseTool(ABC):
    """工具基类"""
    
    @property
    @abstractmethod
    def name(self) -> str:
        pass
    
    @property
    @abstractmethod
    def description(self) -> str:
        pass
    
    @property
    @abstractmethod
    def parameters(self) -> Dict:
        pass
    
    @abstractmethod
    def execute(self, **kwargs) -> str:
        pass
    
    def to_openai_format(self) -> Dict:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": self.parameters
        }

# 数据库查询工具
class DatabaseQueryTool(BaseTool):
    @property
    def name(self) -> str:
        return "query_database"
    
    @property
    def description(self) -> str:
        return "查询数据库获取信息"
    
    @property
    def parameters(self) -> Dict:
        return {
            "type": "object",
            "properties": {
                "table": {
                    "type": "string",
                    "description": "表名"
                },
                "columns": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "要查询的列"
                },
                "where": {
                    "type": "string",
                    "description": "WHERE 条件"
                }
            },
            "required": ["table"]
        }
    
    def execute(self, table: str, columns: List[str] = None, where: str = None) -> str:
        # 实际应用中连接数据库
        cols = ", ".join(columns) if columns else "*"
        query = f"SELECT {cols} FROM {table}"
        if where:
            query += f" WHERE {where}"
        return f"执行查询:{query}\n结果:[模拟数据]"

# 文件操作工具
class FileTool(BaseTool):
    @property
    def name(self) -> str:
        return "file_operation"
    
    @property
    def description(self) -> str:
        return "读取或写入文件"
    
    @property
    def parameters(self) -> Dict:
        return {
            "type": "object",
            "properties": {
                "operation": {
                    "type": "string",
                    "enum": ["read", "write"],
                    "description": "操作类型"
                },
                "path": {
                    "type": "string",
                    "description": "文件路径"
                },
                "content": {
                    "type": "string",
                    "description": "写入的内容(仅 write 操作需要)"
                }
            },
            "required": ["operation", "path"]
        }
    
    def execute(self, operation: str, path: str, content: str = None) -> str:
        try:
            if operation == "read":
                with open(path, 'r', encoding='utf-8') as f:
                    return f.read()[:1000]  # 限制长度
            elif operation == "write":
                with open(path, 'w', encoding='utf-8') as f:
                    f.write(content)
                return f"文件已写入:{path}"
        except Exception as e:
            return f"错误:{e}"

# 代码执行工具
class CodeExecutionTool(BaseTool):
    @property
    def name(self) -> str:
        return "execute_code"
    
    @property
    def description(self) -> str:
        return "执行 Python 代码(安全沙箱环境)"
    
    @property
    def parameters(self) -> Dict:
        return {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "要执行的 Python 代码"
                },
                "language": {
                    "type": "string",
                    "enum": ["python"],
                    "description": "编程语言"
                }
            },
            "required": ["code"]
        }
    
    def execute(self, code: str, language: str = "python") -> str:
        # 注意:实际应用中需要使用安全沙箱
        import io
        import sys
        
        old_stdout = sys.stdout
        sys.stdout = io.StringIO()
        
        try:
            exec(code, {"__builtins__": {}}, {})
            output = sys.stdout.getvalue()
            return f"执行成功\n输出:{output}" if output else "执行成功,无输出"
        except Exception as e:
            return f"执行错误:{e}"
        finally:
            sys.stdout = old_stdout

# 使用自定义工具
custom_tools = [
    DatabaseQueryTool(),
    FileTool(),
    CodeExecutionTool()
]

def create_agent_with_tools(tools: List[BaseTool]):
    """创建带自定义工具的 Agent"""
    functions = [tool.to_openai_format() for tool in tools]
    tool_map = {tool.name: tool for tool in tools}
    
    def chat(user_message: str) -> str:
        messages = [{"role": "user", "content": user_message}]
        
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages,
            functions=functions,
            function_call="auto"
        )
        
        message = response.choices[0].message
        
        while message.function_call:
            func_name = message.function_call.name
            func_args = json.loads(message.function_call.arguments)
            
            if func_name in tool_map:
                result = tool_map[func_name].execute(**func_args)
            else:
                result = f"未知函数:{func_name}"
            
            messages.append(message)
            messages.append({
                "role": "function",
                "name": func_name,
                "content": result
            })
            
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=messages,
                functions=functions
            )
            message = response.choices[0].message
        
        return message.content
    
    return chat

# 测试
agent_chat = create_agent_with_tools(custom_tools)
response = agent_chat("读取文件 test.txt 的内容")
print(f"响应:{response}")

第六章 实战案例四:构建智能文档助手

6.1 项目概述

我们将构建一个完整的智能文档助手,支持:

  • 文档上传和解析
  • 向量检索和 RAG
  • 多轮对话
  • 引用来源

6.2 完整实现

python
import os
import hashlib
from datetime import datetime
from typing import List, Dict, Optional
from dataclasses import dataclass

@dataclass
class Document:
    """文档数据类"""
    id: str
    content: str
    source: str
    metadata: Dict
    created_at: str

class DocumentStore:
    """文档存储"""
    
    def __init__(self, persist_dir: str = "./document_store"):
        self.persist_dir = persist_dir
        os.makedirs(persist_dir, exist_ok=True)
        self.documents: Dict[str, Document] = {}
        self._load_metadata()
    
    def _generate_id(self, content: str) -> str:
        return hashlib.md5(content.encode()).hexdigest()
    
    def _load_metadata(self):
        """加载元数据"""
        meta_path = os.path.join(self.persist_dir, "metadata.json")
        if os.path.exists(meta_path):
            import json
            with open(meta_path, 'r', encoding='utf-8') as f:
                data = json.load(f)
                for doc_id, doc_data in data.items():
                    self.documents[doc_id] = Document(**doc_data)
    
    def _save_metadata(self):
        """保存元数据"""
        import json
        meta_path = os.path.join(self.persist_dir, "metadata.json")
        data = {
            doc_id: {
                'id': doc.id,
                'content': doc.content,
                'source': doc.source,
                'metadata': doc.metadata,
                'created_at': doc.created_at
            }
            for doc_id, doc in self.documents.items()
        }
        with open(meta_path, 'w', encoding='utf-8') as f:
            json.dump(data, f, ensure_ascii=False, indent=2)
    
    def add_document(self, content: str, source: str, metadata: Dict = None) -> Document:
        """添加文档"""
        doc_id = self._generate_id(content)
        
        if doc_id in self.documents:
            return self.documents[doc_id]
        
        doc = Document(
            id=doc_id,
            content=content,
            source=source,
            metadata=metadata or {},
            created_at=datetime.now().isoformat()
        )
        
        self.documents[doc_id] = doc
        self._save_metadata()
        
        # 保存到向量库
        collection.add(
            documents=[content],
            embeddings=[embed_text(content)],
            ids=[doc_id],
            metadatas=[{'source': source, **metadata}]
        )
        
        return doc
    
    def get_document(self, doc_id: str) -> Optional[Document]:
        return self.documents.get(doc_id)
    
    def list_documents(self) -> List[Document]:
        return list(self.documents.values())

class DocumentAssistant:
    """智能文档助手"""
    
    def __init__(self, document_store: DocumentStore, collection, embedding_model):
        self.store = document_store
        self.collection = collection
        self.embedding_model = embedding_model
        self.conversation_history = []
    
    def upload_document(self, file_path: str) -> Document:
        """上传文档"""
        content = DocumentLoader.load_text_file(file_path)
        chunks = DocumentLoader.chunk_text(content, chunk_size=500, overlap=50)
        
        docs = []
        for i, chunk in enumerate(chunks):
            doc = self.store.add_document(
                content=chunk,
                source=file_path,
                metadata={'chunk': i, 'total_chunks': len(chunks)}
            )
            docs.append(doc)
        
        print(f"已上传 {len(docs)} 个文档块")
        return docs[0] if docs else None
    
    def query(self, question: str, top_k: int = 3) -> Dict:
        """查询文档"""
        # 检索相关文档
        query_embedding = self.embedding_model.encode(question).tolist()
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=top_k,
            include=['documents', 'metadatas', 'distances']
        )
        
        contexts = []
        sources = set()
        
        for doc, meta, dist in zip(
            results['documents'][0],
            results['metadatas'][0],
            results['distances'][0]
        ):
            contexts.append(f"[来源:{meta.get('source', 'unknown')}] {doc}")
            sources.add(meta.get('source', 'unknown'))
        
        # 构建提示
        context_text = "\n\n".join(contexts)
        prompt = f"""基于以下文档内容回答问题。如果文档中没有相关信息,请说明。

相关文档:
{context_text}

问题:{question}

请给出详细回答,并注明信息来源。

回答:"""
        
        # 生成回答
        messages = [
            {"role": "system", "content": "你是一个专业的文档助手,基于提供的文档内容回答问题。"},
            *self.conversation_history[-6:],  # 最近 3 轮对话
            {"role": "user", "content": prompt}
        ]
        
        response = chat_completion(messages)
        
        # 更新对话历史
        self.conversation_history.append({"role": "user", "content": question})
        self.conversation_history.append({"role": "assistant", "content": response})
        
        return {
            'answer': response,
            'contexts': contexts,
            'sources': list(sources)
        }
    
    def chat(self, message: str) -> str:
        """普通聊天(不检索文档)"""
        messages = [
            {"role": "system", "content": "你是一个友好的助手。"},
            *self.conversation_history[-6:],
            {"role": "user", "content": message}
        ]
        
        response = chat_completion(messages)
        
        self.conversation_history.append({"role": "user", "content": message})
        self.conversation_history.append({"role": "assistant", "content": response})
        
        return response
    
    def clear_history(self):
        """清除对话历史"""
        self.conversation_history = []

# 使用文档助手
store = DocumentStore()
assistant = DocumentAssistant(store, collection, embedding_model)

# 上传文档
# assistant.upload_document("./docs/guide.md")

# 查询
result = assistant.query("文档中提到了哪些主要内容?")
print(f"回答:{result['answer']}")
print(f"来源:{result['sources']}")

6.3 Web 界面(Streamlit)

python
# app.py - Streamlit Web 应用
"""
运行:streamlit run app.py
"""

import streamlit as st
from document_assistant import DocumentAssistant, DocumentStore

st.set_page_config(page_title="智能文档助手", page_icon="📚")

# 初始化
@st.cache_resource
def get_assistant():
    store = DocumentStore()
    return DocumentAssistant(store, collection, embedding_model)

assistant = get_assistant()

# 侧边栏
with st.sidebar:
    st.title("📚 文档管理")
    
    uploaded_file = st.file_uploader("上传文档", type=['txt', 'md', 'pdf'])
    if uploaded_file:
        # 保存临时文件
        import tempfile
        with tempfile.NamedTemporaryFile(delete=False, suffix=uploaded_file.name) as f:
            f.write(uploaded_file.getvalue())
            temp_path = f.name
        
        with st.spinner("处理文档中..."):
            assistant.upload_document(temp_path)
        st.success("文档上传成功!")
    
    st.divider()
    
    if st.button("清除对话历史"):
        assistant.clear_history()
        st.success("历史已清除")

# 主界面
st.title("🤖 智能文档助手")
st.markdown("上传文档后,可以基于文档内容进行问答")

# 对话历史
if "messages" not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# 用户输入
if prompt := st.chat_input("请输入问题..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    
    with st.chat_message("assistant"):
        with st.spinner("思考中..."):
            response = assistant.query(prompt)
            st.markdown(response['answer'])
            
            # 显示来源
            if response['sources']:
                with st.expander("📖 查看信息来源"):
                    for i, ctx in enumerate(response['contexts']):
                        st.markdown(f"**来源 {i+1}**:\n{ctx}")
    
    st.session_state.messages.append({"role": "assistant", "content": response['answer']})

第七章 实战案例五:模型微调基础

7.1 微调概述

当预训练模型不能满足特定需求时,可以进行微调(Fine-tuning):

微调场景:

  • 特定领域知识
  • 特殊输出格式
  • 特定任务优化
  • 降低成本(用小模型)

7.2 数据准备

python
import json
from datasets import Dataset

# 准备微调数据
training_data = [
    {
        "messages": [
            {"role": "system", "content": "你是一个 Python 编程助手。"},
            {"role": "user", "content": "如何读取 CSV 文件?"},
            {"role": "assistant", "content": "可以使用 pandas 库:\n```python\nimport pandas as pd\ndf = pd.read_csv('file.csv')\n```"}
        ]
    },
    # ... 更多样本
]

# 保存为 JSONL 格式
with open('training_data.jsonl', 'w', encoding='utf-8') as f:
    for item in training_data:
        f.write(json.dumps(item, ensure_ascii=False) + '\n')

# 使用 HuggingFace Dataset
dataset = Dataset.from_list(training_data)

7.3 使用 OpenAI 微调

python
from openai import OpenAI

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

# 上传训练文件
file_response = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

print(f"文件 ID: {file_response.id}")

# 创建微调任务
job_response = client.fine_tuning.jobs.create(
    training_file=file_response.id,
    model="gpt-3.5-turbo"
)

job_id = job_response.id
print(f"微调任务 ID: {job_id}")

# 查看状态
status = client.fine_tuning.jobs.retrieve(job_id)
print(f"状态:{status.status}")

# 等待完成
import time
while status.status == "running":
    time.sleep(60)
    status = client.fine_tuning.jobs.retrieve(job_id)
    print(f"当前状态:{status.status}")

# 使用微调后的模型
if status.status == "succeeded":
    fine_tuned_model = status.fine_tuned_model
    
    response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[
            {"role": "user", "content": "如何排序列表?"}
        ]
    )
    print(response.choices[0].message.content)

7.4 使用 HuggingFace 微调

python
# 安装依赖
# pip install transformers datasets accelerate peft

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer
)
from peft import LoraConfig, get_peft_model
import torch

# 加载模型和分词器
model_name = "Qwen/Qwen-1.8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 配置 LoRA(参数高效微调)
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# 数据预处理
def preprocess(example):
    messages = example["messages"]
    text = tokenizer.apply_chat_template(messages, tokenize=False)
    return tokenizer(text, truncation=True, max_length=512)

tokenized_dataset = dataset.map(preprocess)

# 训练配置
training_args = TrainingArguments(
    output_dir="./fine_tuned_model",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    num_train_epochs=3,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    report_to="none"
)

# 创建 Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

# 开始训练
trainer.train()

# 保存模型
trainer.save_model("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")

7.5 评估微调模型

python
def evaluate_fine_tuned_model(model_path, test_data):
    """评估微调模型"""
    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    
    results = []
    
    for example in test_data:
        messages = example["messages"][:-1]  # 去掉最后一个助手回复
        expected = example["messages"][-1]["content"]
        
        input_text = tokenizer.apply_chat_template(messages, tokenize=False, return_tensors="pt")
        input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            output = model.generate(
                input_ids,
                max_new_tokens=256,
                temperature=0.7,
                do_sample=True
            )
        
        generated = tokenizer.decode(output[0], skip_special_tokens=True)
        
        results.append({
            'input': messages[-1]['content'],
            'expected': expected,
            'generated': generated
        })
    
    return results

# 使用评估
test_data = [...]  # 测试数据
results = evaluate_fine_tuned_model("./fine_tuned_model", test_data)

for r in results:
    print(f"输入:{r['input']}")
    print(f"期望:{r['expected']}")
    print(f"生成:{r['generated']}")
    print("-" * 50)

第八章 部署与优化

8.1 API 服务部署

python
# 使用 FastAPI 部署
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn

app = FastAPI(title="LLM API Service")

class ChatRequest(BaseModel):
    message: str
    use_rag: bool = False
    conversation_id: str = None

class ChatResponse(BaseModel):
    response: str
    sources: list = None
    conversation_id: str

@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    try:
        if request.use_rag:
            result = assistant.query(request.message)
            return ChatResponse(
                response=result['answer'],
                sources=result['sources'],
                conversation_id=request.conversation_id
            )
        else:
            response = assistant.chat(request.message)
            return ChatResponse(
                response=response,
                conversation_id=request.conversation_id
            )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/upload")
async def upload_document(file):
    # 处理文件上传
    pass

# 运行服务
# uvicorn app:app --host 0.0.0.0 --port 8000

8.2 性能优化技巧

python
# 1. 缓存
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_embedding(text: str):
    return embed_text(text)

# 2. 批处理
def batch_embeddings(texts: List[str], batch_size=32):
    all_embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        embeddings = embedding_model.encode(batch)
        all_embeddings.extend(embeddings.tolist())
    return all_embeddings

# 3. 流式响应
async def stream_response(messages):
    stream = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

# 4. 异步处理
import asyncio

async def process_multiple_queries(queries):
    tasks = [async_query(q) for q in queries]
    results = await asyncio.gather(*tasks)
    return results

8.3 成本控制

python
class CostTracker:
    """成本追踪"""
    
    PRICES = {
        'gpt-3.5-turbo': {'input': 0.0015, 'output': 0.002},  # per 1K tokens
        'gpt-4': {'input': 0.03, 'output': 0.06},
    }
    
    def __init__(self):
        self.total_cost = 0
        self.usage_log = []
    
    def track_usage(self, model: str, input_tokens: int, output_tokens: int):
        price = self.PRICES.get(model, {'input': 0, 'output': 0})
        cost = (input_tokens * price['input'] + output_tokens * price['output']) / 1000
        self.total_cost += cost
        self.usage_log.append({
            'model': model,
            'input_tokens': input_tokens,
            'output_tokens': output_tokens,
            'cost': cost
        })
    
    def get_summary(self):
        return {
            'total_cost': self.total_cost,
            'total_requests': len(self.usage_log),
            'avg_cost_per_request': self.total_cost / len(self.usage_log) if self.usage_log else 0
        }

# 使用成本追踪
tracker = CostTracker()

def cost_aware_chat(messages, model='gpt-3.5-turbo'):
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    
    usage = response.usage
    tracker.track_usage(
        model=model,
        input_tokens=usage.prompt_tokens,
        output_tokens=usage.completion_tokens
    )
    
    return response.choices[0].message.content

总结

本教程系统介绍了 LLM 应用开发的核心技术:

  1. 基础 API 使用:掌握主流模型的调用方法
  2. 提示工程:设计高效提示的技巧
  3. RAG 系统:构建知识增强应用
  4. Agent 开发:创建自主智能体
  5. 函数调用:集成外部工具
  6. 模型微调:定制专属模型
  7. 部署优化:生产环境实践

LLM 应用开发趋势:

  • 多模态理解(文本 + 图像 + 音频)
  • 长上下文处理(100K+ tokens)
  • Agent 协作(多智能体系统)
  • 边缘部署(本地运行大模型)
  • 垂直领域应用(医疗、法律、金融)

下一步学习建议:

  1. 深入理解 Transformer 架构
  2. 学习更多 RAG 优化技术
  3. 探索 Agent 框架(AutoGen、CrewAI)
  4. 实践完整项目开发
  5. 关注最新研究和工具

记住,LLM 领域发展迅速,保持学习和实践是关键!


参考资料:

Released under the MIT License.