大语言模型应用开发:从 RAG 到 Agent 构建
概述
大语言模型(Large Language Models, LLM)正在重塑我们与计算机交互的方式。从智能助手到代码生成,从文档分析到创意写作,LLM 的应用场景不断扩展。本教程将带你深入理解 LLM 应用开发的核心技术,通过 5 个实战案例,掌握构建生产级 LLM 应用的完整技能。
你将学到:
- LLM 基础概念和 API 使用
- 提示工程(Prompt Engineering)技巧
- 检索增强生成(RAG)系统构建
- 向量数据库与 embeddings
- LLM Agent 架构与实现
- 函数调用与工具集成
- 模型微调基础
- 应用部署与优化
第一章 LLM 基础与 API 使用
1.1 什么是大语言模型?
大语言模型是基于 Transformer 架构的深度学习模型,通过在海量文本数据上预训练,获得强大的语言理解和生成能力。
核心特点:
- 自回归生成:逐 token 预测下一个词
- 上下文学习:从示例中学习任务
- 涌现能力:规模增大后出现新能力
- 通用性:一个模型处理多种任务
主流模型:
- GPT 系列(OpenAI)
- Claude 系列(Anthropic)
- LLaMA 系列(Meta)
- Qwen 系列(阿里)
- Gemini 系列(Google)
1.2 使用 OpenAI API
python
# 安装依赖
pip install openai python-dotenv
import os
from openai import OpenAI
from dotenv import load_dotenv
# 加载环境变量
load_dotenv()
# 初始化客户端
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
# 基础对话
def chat_completion(messages, model='gpt-3.5-turbo', temperature=0.7):
"""发送对话请求"""
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=1024
)
return response.choices[0].message.content
# 示例
messages = [
{"role": "system", "content": "你是一个有帮助的助手。"},
{"role": "user", "content": "请解释什么是机器学习?"}
]
response = chat_completion(messages)
print(response)1.3 流式响应
python
def stream_chat(messages, model='gpt-3.5-turbo'):
"""流式输出"""
stream = client.chat.completions.create(
model=model,
messages=messages,
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='', flush=True)
print() # 换行
# 使用流式
messages = [
{"role": "user", "content": "写一首关于春天的诗"}
]
stream_chat(messages)1.4 使用其他模型
python
# Anthropic Claude
from anthropic import Anthropic
anthropic_client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))
def claude_chat(prompt, max_tokens=1024):
response = anthropic_client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=max_tokens,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
# 本地模型(Ollama)
import requests
def ollama_chat(prompt, model='llama2'):
response = requests.post(
'http://localhost:11434/api/generate',
json={
'model': model,
'prompt': prompt,
'stream': False
}
)
return response.json()['response']
# 阿里云通义千问
from dashscope import Generation
def qwen_chat(prompt):
response = Generation.call(
model='qwen-turbo',
prompt=prompt,
api_key=os.getenv('DASHSCOPE_API_KEY')
)
return response.output.text第二章 提示工程实战
2.1 提示工程基础
提示工程(Prompt Engineering)是设计和优化输入提示以获得更好模型输出的艺术和科学。
核心原则:
- 清晰明确:避免歧义
- 提供上下文:帮助模型理解任务
- 给出示例:少样本学习
- 结构化输出:指定格式要求
- 迭代优化:持续改进提示
2.2 基础提示技巧
python
class PromptTemplates:
"""提示模板集合"""
@staticmethod
def role_prompt(role, task, context=""):
"""角色设定提示"""
return f"""你是一位{role}。
{context}
请完成以下任务:{task}"""
@staticmethod
def few_shot_prompt(task, examples, input_text):
"""少样本学习提示"""
prompt = f"任务:{task}\n\n"
prompt += "示例:\n"
for ex in examples:
prompt += f"输入:{ex['input']}\n"
prompt += f"输出:{ex['output']}\n\n"
prompt += f"现在请处理:{input_text}"
return prompt
@staticmethod
def chain_of_thought(prompt):
"""思维链提示"""
return f"""请逐步思考这个问题:
{prompt}
让我们一步一步地思考:"""
@staticmethod
def structured_output(task, schema):
"""结构化输出提示"""
return f"""请完成以下任务:{task}
请按照以下 JSON 格式输出:
{schema}
只输出 JSON,不要有其他内容。"""
# 使用示例
templates = PromptTemplates()
# 角色提示
role_prompt = templates.role_prompt(
role="资深 Python 工程师",
task="审查以下代码并提供改进建议",
context="你擅长编写高效、可读的 Python 代码。"
)
# 少样本提示
examples = [
{"input": "今天天气真好", "output": "POSITIVE"},
{"input": "这个产品太糟糕了", "output": "NEGATIVE"},
{"input": "一般般吧,没什么特别的", "output": "NEUTRAL"}
]
sentiment_prompt = templates.few_shot_prompt(
task="情感分析",
examples=examples,
input_text="我非常喜欢这个功能!"
)
# 思维链
cot_prompt = templates.chain_of_thought(
"如果 5 台机器 5 分钟生产 5 个零件,那么 100 台机器生产 100 个零件需要多少分钟?"
)2.3 高级提示技巧
python
class AdvancedPrompts:
"""高级提示技巧"""
@staticmethod
def self_consistency(question, num_samples=5):
"""自一致性:多次采样取多数"""
prompts = [f"请回答:{question}" for _ in range(num_samples)]
# 实际应用中会调用模型多次
return "多数投票结果"
@staticmethod
def generated_knowledge(prompt):
"""生成知识提示"""
return f"""首先,列出回答这个问题所需的知识:
{prompt}
相关知识:
[在此列出相关知识]
现在,基于以上知识回答问题:
{prompt}"""
@staticmethod
def reflexion(prompt, feedback=""):
"""反思提示"""
return f"""{prompt}
初始回答:{feedback}
请反思你的回答,考虑是否有改进空间。
如果有错误,请纠正并提供更好的答案。"""
@staticmethod
def tree_of_thoughts(problem, num_branches=3):
"""思维树提示"""
return f"""让我们用思维树方法解决这个问题:{problem}
步骤 1:生成 {num_branches} 个不同的解决思路
步骤 2:评估每个思路的可行性
步骤 3:选择最佳思路并深入展开
步骤 4:得出最终答案
现在开始:"""
# 使用示例
advanced = AdvancedPrompts()
# 生成知识提示
knowledge_prompt = advanced.generated_knowledge(
"量子计算机和传统计算机有什么区别?"
)
# 反思提示
reflexion_prompt = advanced.reflexion(
prompt="解释相对论的核心概念",
feedback="[初始回答]"
)2.4 提示优化实践
python
def optimize_prompt(base_prompt, task_description, test_cases):
"""提示优化框架"""
iterations = []
current_prompt = base_prompt
for i in range(5): # 最多 5 次迭代
# 评估当前提示
results = []
for test in test_cases:
response = chat_completion([
{"role": "user", "content": current_prompt.format(input=test['input'])}
])
results.append({
'input': test['input'],
'expected': test['expected'],
'actual': response,
'correct': response.strip() == test['expected']
})
accuracy = sum(1 for r in results if r['correct']) / len(results)
iterations.append({
'iteration': i,
'prompt': current_prompt,
'accuracy': accuracy,
'results': results
})
print(f"Iteration {i}: Accuracy = {accuracy*100:.1f}%")
if accuracy == 1.0:
break
# 基于错误分析改进提示
errors = [r for r in results if not r['correct']]
current_prompt = improve_prompt(current_prompt, errors)
return iterations
def improve_prompt(current_prompt, errors):
"""基于错误改进提示"""
# 分析错误模式
error_patterns = [e['actual'] for e in errors]
# 生成改进建议(实际应用中可调用 LLM)
improvements = []
if any('format' in str(e).lower() for e in error_patterns):
improvements.append("明确指定输出格式")
if any(len(str(e)) > 100 for e in error_patterns):
improvements.append("要求简洁回答")
# 应用改进
improved_prompt = current_prompt
for imp in improvements:
improved_prompt += f"\n\n注意:{imp}"
return improved_prompt第三章 实战案例一:构建 RAG 系统
3.1 RAG 原理
检索增强生成(Retrieval-Augmented Generation, RAG)结合了检索系统和生成模型的优势:
- 检索:从知识库中查找相关信息
- 增强:将检索结果作为上下文
- 生成:基于上下文生成回答
优势:
- 减少幻觉
- 提供最新信息
- 可追溯来源
- 降低训练成本
3.2 向量数据库基础
python
# 安装依赖
pip install chromadb sentence-transformers
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer
# 初始化向量数据库
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="documents")
# 初始化嵌入模型
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
def embed_text(text):
"""生成文本嵌入"""
return embedding_model.encode(text).tolist()
# 添加文档
def add_documents(documents, ids=None, metadatas=None):
"""添加文档到向量库"""
if ids is None:
ids = [f"doc_{i}" for i in range(len(documents))]
embeddings = [embed_text(doc) for doc in documents]
collection.add(
documents=documents,
embeddings=embeddings,
ids=ids,
metadatas=metadatas
)
# 示例文档
documents = [
"Python 是一种高级编程语言,由 Guido van Rossum 于 1991 年创建。",
"机器学习是人工智能的一个分支,让计算机从数据中学习。",
"深度学习使用神经网络模拟人脑的工作方式。",
"Transformer 架构彻底改变了自然语言处理领域。",
"RAG 系统结合了检索和生成的优势。"
]
add_documents(documents)3.3 实现 RAG 检索
python
class RAGRetriever:
"""RAG 检索器"""
def __init__(self, collection, embedding_model, top_k=3):
self.collection = collection
self.embedding_model = embedding_model
self.top_k = top_k
def retrieve(self, query):
"""检索相关文档"""
query_embedding = self.embedding_model.encode(query).tolist()
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=self.top_k,
include=['documents', 'metadatas', 'distances']
)
return {
'documents': results['documents'][0],
'metadatas': results['metadatas'][0],
'distances': results['distances'][0]
}
def retrieve_with_scores(self, query):
"""检索并返回相关性分数"""
results = self.retrieve(query)
# 将距离转换为相似度分数
scores = [1 - d for d in results['distances']]
return list(zip(results['documents'], scores))
# 使用检索器
retriever = RAGRetriever(collection, embedding_model, top_k=3)
# 测试检索
query = "Python 是什么时候创建的?"
results = retriever.retrieve(query)
print(f"查询:{query}")
print(f"\n检索结果:")
for i, (doc, score) in enumerate(zip(results['documents'],
[1-d for d in results['distances']])):
print(f"{i+1}. [相似度:{score:.3f}] {doc}")3.4 构建完整 RAG 系统
python
class RAGSystem:
"""完整 RAG 系统"""
def __init__(self, collection, embedding_model, llm_client, top_k=3):
self.retriever = RAGRetriever(collection, embedding_model, top_k)
self.llm_client = llm_client
def build_prompt(self, query, contexts):
"""构建 RAG 提示"""
context_text = "\n\n".join([
f"[来源 {i+1}]: {ctx}"
for i, ctx in enumerate(contexts)
])
return f"""基于以下信息回答问题。如果信息不足,请说明。
相关信息:
{context_text}
问题:{query}
回答:"""
def query(self, question):
"""RAG 查询"""
# 检索相关文档
results = self.retriever.retrieve(question)
contexts = results['documents']
# 构建提示
prompt = self.build_prompt(question, contexts)
# 生成回答
response = chat_completion([
{"role": "user", "content": prompt}
])
return {
'answer': response,
'contexts': contexts,
'sources': results['metadatas']
}
def query_with_sources(self, question):
"""带来源引用的 RAG 查询"""
result = self.query(question)
# 添加来源引用
answer = result['answer']
answer += "\n\n**来源:**\n"
for i, ctx in enumerate(result['contexts']):
answer += f"- [来源 {i+1}] {ctx[:100]}...\n"
return answer
# 初始化 RAG 系统
rag = RAGSystem(collection, embedding_model, client)
# 测试
questions = [
"Python 是谁创建的?",
"机器学习和深度学习有什么区别?",
"Transformer 有什么重要性?"
]
for q in questions:
print(f"\n{'='*50}")
print(f"问题:{q}")
print(f"{'='*50}")
result = rag.query(q)
print(f"回答:{result['answer']}")3.5 文档加载与处理
python
from pathlib import Path
import PyPDF2
import docx
class DocumentLoader:
"""文档加载器"""
@staticmethod
def load_text_file(file_path):
"""加载文本文件"""
with open(file_path, 'r', encoding='utf-8') as f:
return f.read()
@staticmethod
def load_pdf(file_path):
"""加载 PDF 文件"""
text = ""
with open(file_path, 'rb') as f:
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
text += page.extract_text()
return text
@staticmethod
def load_docx(file_path):
"""加载 Word 文件"""
doc = docx.Document(file_path)
return "\n".join([para.text for para in doc.paragraphs])
@staticmethod
def chunk_text(text, chunk_size=500, overlap=50):
"""文本分块"""
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunk = text[start:end]
chunks.append(chunk)
start = end - overlap
return chunks
@staticmethod
def load_directory(dir_path, extensions=['.txt', '.md']):
"""加载目录下所有文档"""
documents = []
for file_path in Path(dir_path).rglob('*'):
if file_path.suffix in extensions:
try:
content = DocumentLoader.load_text_file(file_path)
chunks = DocumentLoader.chunk_text(content)
for i, chunk in enumerate(chunks):
documents.append({
'content': chunk,
'source': str(file_path),
'chunk': i
})
except Exception as e:
print(f"加载失败 {file_path}: {e}")
return documents
# 使用文档加载器
loader = DocumentLoader()
# 加载目录
docs = loader.load_directory('./knowledge_base')
# 添加到向量库
if docs:
add_documents(
documents=[d['content'] for d in docs],
ids=[f"{Path(d['source']).stem}_{d['chunk']}" for d in docs],
metadatas=[{'source': d['source'], 'chunk': d['chunk']} for d in docs]
)
print(f"已加载 {len(docs)} 个文档块")3.6 高级 RAG 技巧
python
class AdvancedRAG(RAGSystem):
"""高级 RAG 系统"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.query_history = []
def hyde_retrieval(self, query):
"""HyDE:假设性文档嵌入"""
# 生成假设性回答
hyde_prompt = f"请写出这个问题的理想答案:{query}"
hypothetical_doc = chat_completion([{"role": "user", "content": hyde_prompt}])
# 用假设性回答检索
return self.retriever.retrieve(hypothetical_doc)
def multi_query_retrieval(self, query, num_queries=3):
"""多查询检索"""
# 生成多个相关查询
gen_prompt = f"""基于以下问题,生成 {num_queries} 个相关的查询变体:
问题:{query}
查询变体:"""
variations = chat_completion([{"role": "user", "content": gen_prompt}])
# 检索并合并结果
all_results = []
for q in variations.split('\n'):
if q.strip():
results = self.retriever.retrieve(q.strip())
all_results.extend(results['documents'])
# 去重
unique_results = list(dict.fromkeys(all_results))
return unique_results[:self.retriever.top_k]
def rerank_results(self, query, results):
"""重排序结果"""
# 使用交叉编码器重排序
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
pairs = [[query, doc] for doc in results]
scores = reranker.predict(pairs)
# 按分数排序
sorted_results = sorted(
zip(results, scores),
key=lambda x: x[1],
reverse=True
)
return [r[0] for r in sorted_results]
def query(self, question, use_hyde=False, use_multi_query=False):
"""增强查询"""
if use_hyde:
results = self.hyde_retrieval(question)
contexts = results['documents']
elif use_multi_query:
contexts = self.multi_query_retrieval(question)
else:
results = self.retriever.retrieve(question)
contexts = results['documents']
# 可选:重排序
# contexts = self.rerank_results(question, contexts)
prompt = self.build_prompt(question, contexts)
response = chat_completion([{"role": "user", "content": prompt}])
return {
'answer': response,
'contexts': contexts
}
# 使用高级 RAG
advanced_rag = AdvancedRAG(collection, embedding_model, client)
# HyDE 检索
result = advanced_rag.query("如何提高代码质量?", use_hyde=True)
print(f"HyDE 回答:{result['answer']}")第四章 实战案例二:构建 LLM Agent
4.1 Agent 架构
LLM Agent 是能够自主执行任务的智能系统,核心组件包括:
- 规划:分解任务、制定计划
- 记忆:短期和长期记忆
- 工具使用:调用外部 API 和函数
- 反思:评估和改进
4.2 基础 Agent 实现
python
import json
from typing import List, Dict, Any
class Tool:
"""工具基类"""
def __init__(self, name: str, description: str):
self.name = name
self.description = description
def run(self, **kwargs) -> Any:
raise NotImplementedError
class CalculatorTool(Tool):
"""计算器工具"""
def __init__(self):
super().__init__(
name="calculator",
description="执行数学计算。输入:数学表达式,如 '2 + 2'"
)
def run(self, expression: str) -> str:
try:
result = eval(expression, {"__builtins__": {}}, {})
return f"计算结果:{result}"
except Exception as e:
return f"计算错误:{e}"
class SearchTool(Tool):
"""搜索工具"""
def __init__(self):
super().__init__(
name="search",
description="搜索网络信息。输入:搜索查询"
)
def run(self, query: str) -> str:
# 模拟搜索(实际应用中调用搜索 API)
return f"搜索结果:关于'{query}'的信息..."
class LLMEngine:
"""LLM 引擎"""
def __init__(self, client):
self.client = client
def chat(self, messages: List[Dict]) -> str:
return chat_completion(messages)
def extract_json(self, text: str) -> Dict:
"""从文本中提取 JSON"""
import re
match = re.search(r'\{.*\}', text, re.DOTALL)
if match:
return json.loads(match.group())
return {}
class SimpleAgent:
"""简单 Agent"""
def __init__(self, llm: LLMEngine, tools: List[Tool]):
self.llm = llm
self.tools = {tool.name: tool for tool in tools}
self.tool_descriptions = "\n".join([
f"- {tool.name}: {tool.description}"
for tool in tools
])
self.memory = []
def build_system_prompt(self):
"""构建系统提示"""
return f"""你是一个智能助手,可以使用以下工具:
{self.tool_descriptions}
当需要时,请以 JSON 格式回复,包含:
{{
"thought": "你的思考过程",
"action": "工具名称",
"action_input": "工具输入"
}}
如果不需要工具,直接回复:
{{
"thought": "你的思考",
"action": "final_answer",
"action_input": "你的回答"
}}"""
def run(self, query: str, max_iterations: int = 5) -> str:
"""运行 Agent"""
messages = [
{"role": "system", "content": self.build_system_prompt()},
{"role": "user", "content": query}
]
for i in range(max_iterations):
# 获取 LLM 响应
response = self.llm.chat(messages)
# 解析响应
action_dict = self.llm.extract_json(response)
thought = action_dict.get('thought', '')
action = action_dict.get('action', '')
action_input = action_dict.get('action_input', '')
print(f"\n[迭代 {i+1}]")
print(f"思考:{thought}")
print(f"动作:{action}")
# 执行动作
if action == "final_answer":
return action_input
elif action in self.tools:
tool_result = self.tools[action].run(**{action.split('_')[0]: action_input})
print(f"结果:{tool_result}")
# 添加到记忆
messages.append({"role": "assistant", "content": response})
messages.append({
"role": "user",
"content": f"工具执行结果:{tool_result}\n请继续。"
})
else:
return f"未知工具:{action}"
return "达到最大迭代次数,未能完成任务。"
# 创建 Agent
llm = LLMEngine(client)
tools = [CalculatorTool(), SearchTool()]
agent = SimpleAgent(llm, tools)
# 测试
queries = [
"计算 123 * 456",
"搜索 Python 最新版本的特性"
]
for q in queries:
print(f"\n{'='*50}")
print(f"查询:{q}")
print(f"{'='*50}")
result = agent.run(q)
print(f"\n最终答案:{result}")4.3 带记忆的 Agent
python
class MemoryAgent(SimpleAgent):
"""带记忆的 Agent"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.short_term_memory = [] # 对话历史
self.long_term_memory = [] # 重要事实
def add_to_short_term(self, role: str, content: str):
"""添加到短期记忆"""
self.short_term_memory.append({"role": role, "content": content})
def add_to_long_term(self, fact: str):
"""添加到长期记忆"""
self.long_term_memory.append(fact)
# 限制长期记忆大小
if len(self.long_term_memory) > 100:
self.long_term_memory = self.long_term_memory[-50:]
def get_memory_context(self):
"""获取记忆上下文"""
context = ""
if self.long_term_memory:
context += "已知信息:\n"
for fact in self.long_term_memory[-10:]:
context += f"- {fact}\n"
return context
def run(self, query: str, max_iterations: int = 5) -> str:
"""运行带记忆的 Agent"""
memory_context = self.get_memory_context()
messages = [
{"role": "system", "content": self.build_system_prompt() + "\n\n" + memory_context},
*self.short_term_memory[-10:], # 最近 10 轮对话
{"role": "user", "content": query}
]
for i in range(max_iterations):
response = self.llm.chat(messages)
action_dict = self.llm.extract_json(response)
thought = action_dict.get('thought', '')
action = action_dict.get('action', '')
action_input = action_dict.get('action_input', '')
if action == "final_answer":
# 更新记忆
self.add_to_short_term("user", query)
self.add_to_short_term("assistant", action_input)
return action_input
elif action in self.tools:
tool_result = self.tools[action].run(**{action.split('_')[0]: action_input})
messages.append({"role": "assistant", "content": response})
messages.append({
"role": "user",
"content": f"工具执行结果:{tool_result}\n请继续。"
})
else:
return f"未知工具:{action}"
return "达到最大迭代次数。"
# 测试记忆 Agent
memory_agent = MemoryAgent(llm, tools)
# 多轮对话
conversation = [
"我的名字是张三",
"记住我喜欢 Python 编程",
"我叫什么名字?",
"我喜欢什么编程语言?"
]
for query in conversation:
print(f"\n用户:{query}")
response = memory_agent.run(query)
print(f"助手:{response}")4.4 ReAct Agent
python
class ReActAgent:
"""ReAct(Reasoning + Acting)Agent"""
def __init__(self, llm: LLMEngine, tools: List[Tool]):
self.llm = llm
self.tools = {tool.name: tool for tool in tools}
def build_prompt(self, query: str, history: List[str] = ""):
"""构建 ReAct 提示"""
return f"""解决以下问题,使用思考 - 行动 - 观察循环。
可用工具:
{chr(10).join([f'- {t.name}: {t.description}' for t in self.tools.values()])}
格式:
Thought: 你的思考
Action: 工具名称
Action Input: 工具输入
Observation: 工具结果
... (可以重复 Thought/Action/Observation)
Thought: 我现在知道最终答案
Final Answer: 最终答案
问题:{query}
{history}"""
def parse_response(self, response: str):
"""解析 ReAct 响应"""
lines = response.strip().split('\n')
thought = ""
action = None
action_input = None
for line in lines:
if line.startswith('Thought:'):
thought = line.replace('Thought:', '').strip()
elif line.startswith('Action:'):
action = line.replace('Action:', '').strip()
elif line.startswith('Action Input:'):
action_input = line.replace('Action Input:', '').strip()
elif line.startswith('Final Answer:'):
return {
'type': 'final',
'answer': line.replace('Final Answer:', '').strip()
}
if action and action_input:
return {
'type': 'action',
'thought': thought,
'action': action,
'action_input': action_input
}
return {'type': 'thought', 'thought': thought}
def run(self, query: str, max_iterations: int = 5) -> str:
"""运行 ReAct Agent"""
history = ""
for i in range(max_iterations):
prompt = self.build_prompt(query, history)
response = self.llm.chat([{"role": "user", "content": prompt}])
parsed = self.parse_response(response)
print(f"\n[迭代 {i+1}]")
print(f"思考:{parsed.get('thought', '')}")
if parsed['type'] == 'final':
print(f"最终答案:{parsed['answer']}")
return parsed['answer']
elif parsed['type'] == 'action':
action = parsed['action']
action_input = parsed['action_input']
print(f"动作:{action}({action_input})")
if action in self.tools:
observation = self.tools[action].run(**{action.split('_')[0]: action_input})
print(f"观察:{observation}")
history += f"\n{response}\nObservation: {observation}"
else:
history += f"\n{response}\nObservation: 未知工具 {action}"
else:
history += f"\n{response}"
return "达到最大迭代次数。"
# 测试 ReAct Agent
react_agent = ReActAgent(llm, tools)
result = react_agent.run("计算 (15 + 25) * 3 的结果")
print(f"\n结果:{result}")4.5 使用 LangChain 构建 Agent
python
# 安装 LangChain
# pip install langchain langchain-openai langchain-community
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain.tools import Tool
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
# 定义工具
def calculator(expression: str) -> str:
"""计算数学表达式"""
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"错误:{e}"
def search(query: str) -> str:
"""搜索信息"""
return f"搜索结果:{query}"
tools = [
Tool(
name="Calculator",
func=calculator,
description="用于数学计算。输入:数学表达式"
),
Tool(
name="Search",
func=search,
description="搜索网络信息。输入:搜索查询"
)
]
# 创建 LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# 创建提示
prompt = ChatPromptTemplate.from_messages([
("system", "你是一个有帮助的助手,可以使用工具。"),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
# 创建 Agent
agent = create_openai_functions_agent(llm, tools, prompt)
# 创建执行器
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True
)
# 使用 Agent
result = agent_executor.invoke({"input": "计算 123 + 456"})
print(f"结果:{result['output']}")第五章 实战案例三:函数调用与工具集成
5.1 函数调用基础
现代 LLM 支持函数调用(Function Calling),让模型能够结构化地调用外部函数。
python
# 定义函数 schema
functions = [
{
"name": "get_weather",
"description": "获取指定城市的天气信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "城市名称,如'北京'、'上海'"
},
"date": {
"type": "string",
"description": "日期,格式 YYYY-MM-DD,默认为今天"
}
},
"required": ["city"]
}
},
{
"name": "calculate",
"description": "执行数学计算",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "数学表达式,如 '2 + 2'"
}
},
"required": ["expression"]
}
}
]
# 实现函数
def get_weather(city: str, date: str = None) -> str:
"""获取天气"""
# 模拟天气 API
return f"{city}今天晴朗,气温 25°C"
def calculate(expression: str) -> str:
"""计算"""
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"错误:{e}"
# 函数映射
function_map = {
"get_weather": get_weather,
"calculate": calculate
}
# 使用函数调用
def chat_with_functions(messages: List[Dict]) -> str:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
functions=functions,
function_call="auto"
)
message = response.choices[0].message
# 检查是否需要调用函数
if message.function_call:
function_name = message.function_call.name
function_args = json.loads(message.function_call.arguments)
print(f"调用函数:{function_name}")
print(f"参数:{function_args}")
# 执行函数
if function_name in function_map:
result = function_map[function_name](**function_args)
# 将结果返回给模型
messages.append(message)
messages.append({
"role": "function",
"name": function_name,
"content": result
})
# 获取最终响应
final_response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages
)
return final_response.choices[0].message.content
return message.content
# 测试
messages = [{"role": "user", "content": "北京今天天气怎么样?"}]
response = chat_with_functions(messages)
print(f"回答:{response}")5.2 多工具集成
python
class ToolIntegration:
"""多工具集成"""
def __init__(self):
self.tools = {
"get_weather": {
"description": "获取天气",
"schema": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
},
"func": self._get_weather
},
"search_web": {
"description": "网络搜索",
"schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
},
"func": self._search_web
},
"send_email": {
"description": "发送邮件",
"schema": {
"type": "object",
"properties": {
"to": {"type": "string"},
"subject": {"type": "string"},
"body": {"type": "string"}
},
"required": ["to", "subject", "body"]
},
"func": self._send_email
}
}
def _get_weather(self, city: str) -> str:
return f"{city}天气晴朗,25°C"
def _search_web(self, query: str) -> str:
return f"搜索结果:{query}"
def _send_email(self, to: str, subject: str, body: str) -> str:
return f"邮件已发送到 {to}"
def get_openai_functions(self):
"""获取 OpenAI 格式的函数定义"""
return [
{
"name": name,
"description": tool["description"],
"parameters": tool["schema"]
}
for name, tool in self.tools.items()
]
def execute(self, name: str, args: Dict) -> str:
"""执行工具"""
if name in self.tools:
return self.tools[name]["func"](**args)
raise ValueError(f"未知工具:{name}")
def chat(self, user_message: str) -> str:
"""聊天并自动使用工具"""
messages = [{"role": "user", "content": user_message}]
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
functions=self.get_openai_functions(),
function_call="auto"
)
message = response.choices[0].message
while message.function_call:
func_name = message.function_call.name
func_args = json.loads(message.function_call.arguments)
result = self.execute(func_name, func_args)
messages.append(message)
messages.append({
"role": "function",
"name": func_name,
"content": result
})
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
functions=self.get_openai_functions()
)
message = response.choices[0].message
return message.content
# 使用工具集成
integration = ToolIntegration()
queries = [
"北京天气怎么样?",
"搜索 Python 教程",
"给 test@example.com 发邮件,主题是问候,内容是你好"
]
for q in queries:
print(f"\n用户:{q}")
response = integration.chat(q)
print(f"助手:{response}")5.3 自定义工具开发
python
from abc import ABC, abstractmethod
class BaseTool(ABC):
"""工具基类"""
@property
@abstractmethod
def name(self) -> str:
pass
@property
@abstractmethod
def description(self) -> str:
pass
@property
@abstractmethod
def parameters(self) -> Dict:
pass
@abstractmethod
def execute(self, **kwargs) -> str:
pass
def to_openai_format(self) -> Dict:
return {
"name": self.name,
"description": self.description,
"parameters": self.parameters
}
# 数据库查询工具
class DatabaseQueryTool(BaseTool):
@property
def name(self) -> str:
return "query_database"
@property
def description(self) -> str:
return "查询数据库获取信息"
@property
def parameters(self) -> Dict:
return {
"type": "object",
"properties": {
"table": {
"type": "string",
"description": "表名"
},
"columns": {
"type": "array",
"items": {"type": "string"},
"description": "要查询的列"
},
"where": {
"type": "string",
"description": "WHERE 条件"
}
},
"required": ["table"]
}
def execute(self, table: str, columns: List[str] = None, where: str = None) -> str:
# 实际应用中连接数据库
cols = ", ".join(columns) if columns else "*"
query = f"SELECT {cols} FROM {table}"
if where:
query += f" WHERE {where}"
return f"执行查询:{query}\n结果:[模拟数据]"
# 文件操作工具
class FileTool(BaseTool):
@property
def name(self) -> str:
return "file_operation"
@property
def description(self) -> str:
return "读取或写入文件"
@property
def parameters(self) -> Dict:
return {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["read", "write"],
"description": "操作类型"
},
"path": {
"type": "string",
"description": "文件路径"
},
"content": {
"type": "string",
"description": "写入的内容(仅 write 操作需要)"
}
},
"required": ["operation", "path"]
}
def execute(self, operation: str, path: str, content: str = None) -> str:
try:
if operation == "read":
with open(path, 'r', encoding='utf-8') as f:
return f.read()[:1000] # 限制长度
elif operation == "write":
with open(path, 'w', encoding='utf-8') as f:
f.write(content)
return f"文件已写入:{path}"
except Exception as e:
return f"错误:{e}"
# 代码执行工具
class CodeExecutionTool(BaseTool):
@property
def name(self) -> str:
return "execute_code"
@property
def description(self) -> str:
return "执行 Python 代码(安全沙箱环境)"
@property
def parameters(self) -> Dict:
return {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "要执行的 Python 代码"
},
"language": {
"type": "string",
"enum": ["python"],
"description": "编程语言"
}
},
"required": ["code"]
}
def execute(self, code: str, language: str = "python") -> str:
# 注意:实际应用中需要使用安全沙箱
import io
import sys
old_stdout = sys.stdout
sys.stdout = io.StringIO()
try:
exec(code, {"__builtins__": {}}, {})
output = sys.stdout.getvalue()
return f"执行成功\n输出:{output}" if output else "执行成功,无输出"
except Exception as e:
return f"执行错误:{e}"
finally:
sys.stdout = old_stdout
# 使用自定义工具
custom_tools = [
DatabaseQueryTool(),
FileTool(),
CodeExecutionTool()
]
def create_agent_with_tools(tools: List[BaseTool]):
"""创建带自定义工具的 Agent"""
functions = [tool.to_openai_format() for tool in tools]
tool_map = {tool.name: tool for tool in tools}
def chat(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
functions=functions,
function_call="auto"
)
message = response.choices[0].message
while message.function_call:
func_name = message.function_call.name
func_args = json.loads(message.function_call.arguments)
if func_name in tool_map:
result = tool_map[func_name].execute(**func_args)
else:
result = f"未知函数:{func_name}"
messages.append(message)
messages.append({
"role": "function",
"name": func_name,
"content": result
})
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
functions=functions
)
message = response.choices[0].message
return message.content
return chat
# 测试
agent_chat = create_agent_with_tools(custom_tools)
response = agent_chat("读取文件 test.txt 的内容")
print(f"响应:{response}")第六章 实战案例四:构建智能文档助手
6.1 项目概述
我们将构建一个完整的智能文档助手,支持:
- 文档上传和解析
- 向量检索和 RAG
- 多轮对话
- 引用来源
6.2 完整实现
python
import os
import hashlib
from datetime import datetime
from typing import List, Dict, Optional
from dataclasses import dataclass
@dataclass
class Document:
"""文档数据类"""
id: str
content: str
source: str
metadata: Dict
created_at: str
class DocumentStore:
"""文档存储"""
def __init__(self, persist_dir: str = "./document_store"):
self.persist_dir = persist_dir
os.makedirs(persist_dir, exist_ok=True)
self.documents: Dict[str, Document] = {}
self._load_metadata()
def _generate_id(self, content: str) -> str:
return hashlib.md5(content.encode()).hexdigest()
def _load_metadata(self):
"""加载元数据"""
meta_path = os.path.join(self.persist_dir, "metadata.json")
if os.path.exists(meta_path):
import json
with open(meta_path, 'r', encoding='utf-8') as f:
data = json.load(f)
for doc_id, doc_data in data.items():
self.documents[doc_id] = Document(**doc_data)
def _save_metadata(self):
"""保存元数据"""
import json
meta_path = os.path.join(self.persist_dir, "metadata.json")
data = {
doc_id: {
'id': doc.id,
'content': doc.content,
'source': doc.source,
'metadata': doc.metadata,
'created_at': doc.created_at
}
for doc_id, doc in self.documents.items()
}
with open(meta_path, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
def add_document(self, content: str, source: str, metadata: Dict = None) -> Document:
"""添加文档"""
doc_id = self._generate_id(content)
if doc_id in self.documents:
return self.documents[doc_id]
doc = Document(
id=doc_id,
content=content,
source=source,
metadata=metadata or {},
created_at=datetime.now().isoformat()
)
self.documents[doc_id] = doc
self._save_metadata()
# 保存到向量库
collection.add(
documents=[content],
embeddings=[embed_text(content)],
ids=[doc_id],
metadatas=[{'source': source, **metadata}]
)
return doc
def get_document(self, doc_id: str) -> Optional[Document]:
return self.documents.get(doc_id)
def list_documents(self) -> List[Document]:
return list(self.documents.values())
class DocumentAssistant:
"""智能文档助手"""
def __init__(self, document_store: DocumentStore, collection, embedding_model):
self.store = document_store
self.collection = collection
self.embedding_model = embedding_model
self.conversation_history = []
def upload_document(self, file_path: str) -> Document:
"""上传文档"""
content = DocumentLoader.load_text_file(file_path)
chunks = DocumentLoader.chunk_text(content, chunk_size=500, overlap=50)
docs = []
for i, chunk in enumerate(chunks):
doc = self.store.add_document(
content=chunk,
source=file_path,
metadata={'chunk': i, 'total_chunks': len(chunks)}
)
docs.append(doc)
print(f"已上传 {len(docs)} 个文档块")
return docs[0] if docs else None
def query(self, question: str, top_k: int = 3) -> Dict:
"""查询文档"""
# 检索相关文档
query_embedding = self.embedding_model.encode(question).tolist()
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=top_k,
include=['documents', 'metadatas', 'distances']
)
contexts = []
sources = set()
for doc, meta, dist in zip(
results['documents'][0],
results['metadatas'][0],
results['distances'][0]
):
contexts.append(f"[来源:{meta.get('source', 'unknown')}] {doc}")
sources.add(meta.get('source', 'unknown'))
# 构建提示
context_text = "\n\n".join(contexts)
prompt = f"""基于以下文档内容回答问题。如果文档中没有相关信息,请说明。
相关文档:
{context_text}
问题:{question}
请给出详细回答,并注明信息来源。
回答:"""
# 生成回答
messages = [
{"role": "system", "content": "你是一个专业的文档助手,基于提供的文档内容回答问题。"},
*self.conversation_history[-6:], # 最近 3 轮对话
{"role": "user", "content": prompt}
]
response = chat_completion(messages)
# 更新对话历史
self.conversation_history.append({"role": "user", "content": question})
self.conversation_history.append({"role": "assistant", "content": response})
return {
'answer': response,
'contexts': contexts,
'sources': list(sources)
}
def chat(self, message: str) -> str:
"""普通聊天(不检索文档)"""
messages = [
{"role": "system", "content": "你是一个友好的助手。"},
*self.conversation_history[-6:],
{"role": "user", "content": message}
]
response = chat_completion(messages)
self.conversation_history.append({"role": "user", "content": message})
self.conversation_history.append({"role": "assistant", "content": response})
return response
def clear_history(self):
"""清除对话历史"""
self.conversation_history = []
# 使用文档助手
store = DocumentStore()
assistant = DocumentAssistant(store, collection, embedding_model)
# 上传文档
# assistant.upload_document("./docs/guide.md")
# 查询
result = assistant.query("文档中提到了哪些主要内容?")
print(f"回答:{result['answer']}")
print(f"来源:{result['sources']}")6.3 Web 界面(Streamlit)
python
# app.py - Streamlit Web 应用
"""
运行:streamlit run app.py
"""
import streamlit as st
from document_assistant import DocumentAssistant, DocumentStore
st.set_page_config(page_title="智能文档助手", page_icon="📚")
# 初始化
@st.cache_resource
def get_assistant():
store = DocumentStore()
return DocumentAssistant(store, collection, embedding_model)
assistant = get_assistant()
# 侧边栏
with st.sidebar:
st.title("📚 文档管理")
uploaded_file = st.file_uploader("上传文档", type=['txt', 'md', 'pdf'])
if uploaded_file:
# 保存临时文件
import tempfile
with tempfile.NamedTemporaryFile(delete=False, suffix=uploaded_file.name) as f:
f.write(uploaded_file.getvalue())
temp_path = f.name
with st.spinner("处理文档中..."):
assistant.upload_document(temp_path)
st.success("文档上传成功!")
st.divider()
if st.button("清除对话历史"):
assistant.clear_history()
st.success("历史已清除")
# 主界面
st.title("🤖 智能文档助手")
st.markdown("上传文档后,可以基于文档内容进行问答")
# 对话历史
if "messages" not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# 用户输入
if prompt := st.chat_input("请输入问题..."):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
with st.spinner("思考中..."):
response = assistant.query(prompt)
st.markdown(response['answer'])
# 显示来源
if response['sources']:
with st.expander("📖 查看信息来源"):
for i, ctx in enumerate(response['contexts']):
st.markdown(f"**来源 {i+1}**:\n{ctx}")
st.session_state.messages.append({"role": "assistant", "content": response['answer']})第七章 实战案例五:模型微调基础
7.1 微调概述
当预训练模型不能满足特定需求时,可以进行微调(Fine-tuning):
微调场景:
- 特定领域知识
- 特殊输出格式
- 特定任务优化
- 降低成本(用小模型)
7.2 数据准备
python
import json
from datasets import Dataset
# 准备微调数据
training_data = [
{
"messages": [
{"role": "system", "content": "你是一个 Python 编程助手。"},
{"role": "user", "content": "如何读取 CSV 文件?"},
{"role": "assistant", "content": "可以使用 pandas 库:\n```python\nimport pandas as pd\ndf = pd.read_csv('file.csv')\n```"}
]
},
# ... 更多样本
]
# 保存为 JSONL 格式
with open('training_data.jsonl', 'w', encoding='utf-8') as f:
for item in training_data:
f.write(json.dumps(item, ensure_ascii=False) + '\n')
# 使用 HuggingFace Dataset
dataset = Dataset.from_list(training_data)7.3 使用 OpenAI 微调
python
from openai import OpenAI
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
# 上传训练文件
file_response = client.files.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
print(f"文件 ID: {file_response.id}")
# 创建微调任务
job_response = client.fine_tuning.jobs.create(
training_file=file_response.id,
model="gpt-3.5-turbo"
)
job_id = job_response.id
print(f"微调任务 ID: {job_id}")
# 查看状态
status = client.fine_tuning.jobs.retrieve(job_id)
print(f"状态:{status.status}")
# 等待完成
import time
while status.status == "running":
time.sleep(60)
status = client.fine_tuning.jobs.retrieve(job_id)
print(f"当前状态:{status.status}")
# 使用微调后的模型
if status.status == "succeeded":
fine_tuned_model = status.fine_tuned_model
response = client.chat.completions.create(
model=fine_tuned_model,
messages=[
{"role": "user", "content": "如何排序列表?"}
]
)
print(response.choices[0].message.content)7.4 使用 HuggingFace 微调
python
# 安装依赖
# pip install transformers datasets accelerate peft
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer
)
from peft import LoraConfig, get_peft_model
import torch
# 加载模型和分词器
model_name = "Qwen/Qwen-1.8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# 配置 LoRA(参数高效微调)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 数据预处理
def preprocess(example):
messages = example["messages"]
text = tokenizer.apply_chat_template(messages, tokenize=False)
return tokenizer(text, truncation=True, max_length=512)
tokenized_dataset = dataset.map(preprocess)
# 训练配置
training_args = TrainingArguments(
output_dir="./fine_tuned_model",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
num_train_epochs=3,
fp16=True,
logging_steps=10,
save_strategy="epoch",
report_to="none"
)
# 创建 Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset
)
# 开始训练
trainer.train()
# 保存模型
trainer.save_model("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")7.5 评估微调模型
python
def evaluate_fine_tuned_model(model_path, test_data):
"""评估微调模型"""
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
results = []
for example in test_data:
messages = example["messages"][:-1] # 去掉最后一个助手回复
expected = example["messages"][-1]["content"]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, return_tensors="pt")
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=256,
temperature=0.7,
do_sample=True
)
generated = tokenizer.decode(output[0], skip_special_tokens=True)
results.append({
'input': messages[-1]['content'],
'expected': expected,
'generated': generated
})
return results
# 使用评估
test_data = [...] # 测试数据
results = evaluate_fine_tuned_model("./fine_tuned_model", test_data)
for r in results:
print(f"输入:{r['input']}")
print(f"期望:{r['expected']}")
print(f"生成:{r['generated']}")
print("-" * 50)第八章 部署与优化
8.1 API 服务部署
python
# 使用 FastAPI 部署
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn
app = FastAPI(title="LLM API Service")
class ChatRequest(BaseModel):
message: str
use_rag: bool = False
conversation_id: str = None
class ChatResponse(BaseModel):
response: str
sources: list = None
conversation_id: str
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
try:
if request.use_rag:
result = assistant.query(request.message)
return ChatResponse(
response=result['answer'],
sources=result['sources'],
conversation_id=request.conversation_id
)
else:
response = assistant.chat(request.message)
return ChatResponse(
response=response,
conversation_id=request.conversation_id
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/upload")
async def upload_document(file):
# 处理文件上传
pass
# 运行服务
# uvicorn app:app --host 0.0.0.0 --port 80008.2 性能优化技巧
python
# 1. 缓存
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_embedding(text: str):
return embed_text(text)
# 2. 批处理
def batch_embeddings(texts: List[str], batch_size=32):
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
embeddings = embedding_model.encode(batch)
all_embeddings.extend(embeddings.tolist())
return all_embeddings
# 3. 流式响应
async def stream_response(messages):
stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
# 4. 异步处理
import asyncio
async def process_multiple_queries(queries):
tasks = [async_query(q) for q in queries]
results = await asyncio.gather(*tasks)
return results8.3 成本控制
python
class CostTracker:
"""成本追踪"""
PRICES = {
'gpt-3.5-turbo': {'input': 0.0015, 'output': 0.002}, # per 1K tokens
'gpt-4': {'input': 0.03, 'output': 0.06},
}
def __init__(self):
self.total_cost = 0
self.usage_log = []
def track_usage(self, model: str, input_tokens: int, output_tokens: int):
price = self.PRICES.get(model, {'input': 0, 'output': 0})
cost = (input_tokens * price['input'] + output_tokens * price['output']) / 1000
self.total_cost += cost
self.usage_log.append({
'model': model,
'input_tokens': input_tokens,
'output_tokens': output_tokens,
'cost': cost
})
def get_summary(self):
return {
'total_cost': self.total_cost,
'total_requests': len(self.usage_log),
'avg_cost_per_request': self.total_cost / len(self.usage_log) if self.usage_log else 0
}
# 使用成本追踪
tracker = CostTracker()
def cost_aware_chat(messages, model='gpt-3.5-turbo'):
response = client.chat.completions.create(
model=model,
messages=messages
)
usage = response.usage
tracker.track_usage(
model=model,
input_tokens=usage.prompt_tokens,
output_tokens=usage.completion_tokens
)
return response.choices[0].message.content总结
本教程系统介绍了 LLM 应用开发的核心技术:
- 基础 API 使用:掌握主流模型的调用方法
- 提示工程:设计高效提示的技巧
- RAG 系统:构建知识增强应用
- Agent 开发:创建自主智能体
- 函数调用:集成外部工具
- 模型微调:定制专属模型
- 部署优化:生产环境实践
LLM 应用开发趋势:
- 多模态理解(文本 + 图像 + 音频)
- 长上下文处理(100K+ tokens)
- Agent 协作(多智能体系统)
- 边缘部署(本地运行大模型)
- 垂直领域应用(医疗、法律、金融)
下一步学习建议:
- 深入理解 Transformer 架构
- 学习更多 RAG 优化技术
- 探索 Agent 框架(AutoGen、CrewAI)
- 实践完整项目开发
- 关注最新研究和工具
记住,LLM 领域发展迅速,保持学习和实践是关键!
参考资料:
- OpenAI API 文档:https://platform.openai.com/docs
- LangChain:https://python.langchain.com/
- HuggingFace:https://huggingface.co/
- LlamaIndex:https://docs.llamaindex.ai/
- Papers With Code LLM:https://paperswithcode.com/area/large-language-models