LightRAG解读

2025-08-19

介绍

LightRAG跟GraphRAG类似，是通过将文档处理成知识图谱，然后针对知识图谱进行检索的一套实现，所以接下来我们大概看下其流程。

知识图谱生成

这里和GraphRAG基本保持一致，通过prompt来进行生成。


---Goal---
Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.
Use English as output language.

---Steps---
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, use same language as input text. If English, capitalized the name
- entity_type: One of the following types: [organization,person,geo,event,category]
- entity_description: Provide a comprehensive description of the entity's attributes and activities *based solely on the information present in the input text*. **Do not infer or hallucinate information not explicitly stated.** If the text provides insufficient information to create a comprehensive description, state "Description not available in text."
Format each entity as ("entity"<|><entity_name><|><entity_type><|><entity_description>)

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details
Format each relationship as ("relationship"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_keywords><|><relationship_strength>)

3. Identify high-level key words that summarize the main concepts, themes, or topics of the entire text. These should capture the overarching ideas present in the document.
Format the content-level key words as ("content_keywords"<|><high_level_keywords>)

4. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **##** as the list delimiter.

5. When finished, output <|COMPLETE|>

######################
---Examples---
######################
Example 1:

Entity_types: [person, technology, mission, organization, location]
Text:
```
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.

Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. "If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us."

The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.

It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
```

Output:
("entity"<|>"Alex"<|>"person"<|>"Alex is a character who experiences frustration and is observant of the dynamics among other characters.")##
("entity"<|>"Taylor"<|>"person"<|>"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective.")##
("entity"<|>"Jordan"<|>"person"<|>"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device.")##
("entity"<|>"Cruz"<|>"person"<|>"Cruz is associated with a vision of control and order, influencing the dynamics among other characters.")##
("entity"<|>"The Device"<|>"technology"<|>"The Device is central to the story, with potential game-changing implications, and is revered by Taylor.")##
("relationship"<|>"Alex"<|>"Taylor"<|>"Alex is affected by Taylor's authoritarian certainty and observes changes in Taylor's attitude towards the device."<|>"power dynamics, perspective shift"<|>7)##
("relationship"<|>"Alex"<|>"Jordan"<|>"Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision."<|>"shared goals, rebellion"<|>6)##
("relationship"<|>"Taylor"<|>"Jordan"<|>"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce."<|>"conflict resolution, mutual respect"<|>8)##
("relationship"<|>"Jordan"<|>"Cruz"<|>"Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order."<|>"ideological conflict, rebellion"<|>5)##
("relationship"<|>"Taylor"<|>"The Device"<|>"Taylor shows reverence towards the device, indicating its importance and potential impact."<|>"reverence, technological significance"<|>9)##
("content_keywords"<|>"power dynamics, ideological conflict, discovery, rebellion")<|COMPLETE|>
#############################
Example 2:

Entity_types: [company, index, commodity, market_trend, economic_policy, biological]
Text:
```
Stock markets faced a sharp downturn today as tech giants saw significant declines, with the Global Tech Index dropping by 3.4% in midday trading. Analysts attribute the selloff to investor concerns over rising interest rates and regulatory uncertainty.

Among the hardest hit, Nexon Technologies saw its stock plummet by 7.8% after reporting lower-than-expected quarterly earnings. In contrast, Omega Energy posted a modest 2.1% gain, driven by rising oil prices.

Meanwhile, commodity markets reflected a mixed sentiment. Gold futures rose by 1.5%, reaching $2,080 per ounce, as investors sought safe-haven assets. Crude oil prices continued their rally, climbing to $87.60 per barrel, supported by supply constraints and strong demand.

Financial experts are closely watching the Federal Reserve's next move, as speculation grows over potential rate hikes. The upcoming policy announcement is expected to influence investor confidence and overall market stability.
```

Output:
("entity"<|>"Global Tech Index"<|>"index"<|>"The Global Tech Index tracks the performance of major technology stocks and experienced a 3.4% decline today.")##
("entity"<|>"Nexon Technologies"<|>"company"<|>"Nexon Technologies is a tech company that saw its stock decline by 7.8% after disappointing earnings.")##
("entity"<|>"Omega Energy"<|>"company"<|>"Omega Energy is an energy company that gained 2.1% in stock value due to rising oil prices.")##
("entity"<|>"Gold Futures"<|>"commodity"<|>"Gold futures rose by 1.5%, indicating increased investor interest in safe-haven assets.")##
("entity"<|>"Crude Oil"<|>"commodity"<|>"Crude oil prices rose to $87.60 per barrel due to supply constraints and strong demand.")##
("entity"<|>"Market Selloff"<|>"market_trend"<|>"Market selloff refers to the significant decline in stock values due to investor concerns over interest rates and regulations.")##
("entity"<|>"Federal Reserve Policy Announcement"<|>"economic_policy"<|>"The Federal Reserve's upcoming policy announcement is expected to impact investor confidence and market stability.")##
("relationship"<|>"Global Tech Index"<|>"Market Selloff"<|>"The decline in the Global Tech Index is part of the broader market selloff driven by investor concerns."<|>"market performance, investor sentiment"<|>9)##
("relationship"<|>"Nexon Technologies"<|>"Global Tech Index"<|>"Nexon Technologies' stock decline contributed to the overall drop in the Global Tech Index."<|>"company impact, index movement"<|>8)##
("relationship"<|>"Gold Futures"<|>"Market Selloff"<|>"Gold prices rose as investors sought safe-haven assets during the market selloff."<|>"market reaction, safe-haven investment"<|>10)##
("relationship"<|>"Federal Reserve Policy Announcement"<|>"Market Selloff"<|>"Speculation over Federal Reserve policy changes contributed to market volatility and investor selloff."<|>"interest rate impact, financial regulation"<|>7)##
("content_keywords"<|>"market downturn, investor sentiment, commodities, Federal Reserve, stock performance")<|COMPLETE|>
#############################
Example 3:

Entity_types: [economic_policy, athlete, event, location, record, organization, equipment]
Text:
```
At the World Athletics Championship in Tokyo, Noah Carter broke the 100m sprint record using cutting-edge carbon-fiber spikes.
```

Output:
("entity"<|>"World Athletics Championship"<|>"event"<|>"The World Athletics Championship is a global sports competition featuring top athletes in track and field.")##
("entity"<|>"Tokyo"<|>"location"<|>"Tokyo is the host city of the World Athletics Championship.")##
("entity"<|>"Noah Carter"<|>"athlete"<|>"Noah Carter is a sprinter who set a new record in the 100m sprint at the World Athletics Championship.")##
("entity"<|>"100m Sprint Record"<|>"record"<|>"The 100m sprint record is a benchmark in athletics, recently broken by Noah Carter.")##
("entity"<|>"Carbon-Fiber Spikes"<|>"equipment"<|>"Carbon-fiber spikes are advanced sprinting shoes that provide enhanced speed and traction.")##
("entity"<|>"World Athletics Federation"<|>"organization"<|>"The World Athletics Federation is the governing body overseeing the World Athletics Championship and record validations.")##
("relationship"<|>"World Athletics Championship"<|>"Tokyo"<|>"The World Athletics Championship is being hosted in Tokyo."<|>"event location, international competition"<|>8)##
("relationship"<|>"Noah Carter"<|>"100m Sprint Record"<|>"Noah Carter set a new 100m sprint record at the championship."<|>"athlete achievement, record-breaking"<|>10)##
("relationship"<|>"Noah Carter"<|>"Carbon-Fiber Spikes"<|>"Noah Carter used carbon-fiber spikes to enhance performance during the race."<|>"athletic equipment, performance boost"<|>7)##
("relationship"<|>"World Athletics Federation"<|>"100m Sprint Record"<|>"The World Athletics Federation is responsible for validating and recognizing new sprint records."<|>"sports regulation, record certification"<|>9)##
("content_keywords"<|>"athletics, sprinting, record-breaking, sports technology, competition")<|COMPLETE|>
#############################

#############################
---Real Data---
######################
Entity_types: [organization,person,geo,event,category]
Text:
<这里是chunk>
######################
Output:

显而易见在一个这么长的prompt里面再插入文本，会导致LLM理解能力变弱，一个特征是提取的relation会更少，所以它这里又进行了二次提取。

MANY entities and relationships were missed in the last extraction. Please find only the missing entities and relationships from previous text.

---Remember Steps---

1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, use same language as input text. If English, capitalized the name
- entity_type: One of the following types: [organization,person,geo,event,category]
- entity_description: Provide a comprehensive description of the entity's attributes and activities *based solely on the information present in the input text*. **Do not infer or hallucinate information not explicitly stated.** If the text provides insufficient information to create a comprehensive description, state "Description not available in text."
Format each entity as ("entity"<|><entity_name><|><entity_type><|><entity_description>)

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details
Format each relationship as ("relationship"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_keywords><|><relationship_strength>)

3. Identify high-level key words that summarize the main concepts, themes, or topics of the entire text. These should capture the overarching ideas present in the document.
Format the content-level key words as ("content_keywords"<|><high_level_keywords>)

4. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **##** as the list delimiter.

5. When finished, output <|COMPLETE|>

---Output---

Add new entities and relations below using the same format, and do not include entities and relations that have been previously extracted. :

即prompt1 + LLM output + prompt2（上面这个）来提取更多entity和relations.

至此完成知识图谱的构建。

我觉得腾讯开源的WeKnora提示词也许会更好，实体是实体，关系是关系。

同时这里也暴露出来些许问题：

知识图谱构建效果，有没有漏提错提三元组。
实体进行融合。
整体来讲，知识图谱构建效果，将影响后续的使用效果。

查询

作者这里提供了Native Search、Local Search、Global Search和Hybrid Search四种查询方式，由于Native Search是原始chunk查询方式，Hybrid Search是Local和Global的融合，所以我们接下来单看这两块。

用户query解析

LightRAG将对用户query解析分成两块，即Low Keywords和High Keywords,这两者区别在哪，即前者更注重具体实体，后者更关注全局表达。对应代码如下:


hl_keywords, ll_keywords = await get_keywords_from_query(
        query, query_param, global_config, hashing_kv
    )

下面是他的提示词：

---Role---
You are an expert keyword extractor, specializing in analyzing user queries for a Retrieval-Augmented Generation (RAG) system. Your purpose is to identify both high-level and low-level keywords in the user's query that will be used for effective document retrieval.

---Goal---
Given a user query, your task is to extract two distinct types of keywords:
1. **high_level_keywords**: for overarching concepts or themes, capturing user's core intent, the subject area, or the type of question being asked.
2. **low_level_keywords**: for specific entities or details, identifying the specific entities, proper nouns, technical jargon, product names, or concrete items.

---Instructions & Constraints---
1. **Output Format**: Your output MUST be a valid JSON object and nothing else. Do not include any explanatory text, markdown code fences (like ```json), or any other text before or after the JSON. It will be parsed directly by a JSON parser.
2. **Source of Truth**: All keywords must be derived directly from or be a direct interpretation of the user query.
3. **Concise & Meaningful**: Keywords should be concise words or meaningful phrases. Prioritize multi-word phrases when they represent a single concept. For example, from "latest financial report of Apple Inc.", you should extract "latest financial report" and "Apple Inc." rather than "latest", "financial", "report", and "Apple".
4. **No Overlap**: A keyword or its core concept should not appear in both the high-level and low-level lists.
5. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.

---Examples---
Example 1:

Query: "How does international trade influence global economic stability?"

Output:
{
  "high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
  "low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
}


Example 2:

Query: "What are the environmental consequences of deforestation on biodiversity?"

Output:
{
  "high_level_keywords": ["Environmental consequences", "Deforestation", "Biodiversity loss"],
  "low_level_keywords": ["Species extinction", "Habitat destruction", "Carbon emissions", "Rainforest", "Ecosystem"]
}


Example 3:

Query: "What is the role of education in reducing poverty?"

Output:
{
  "high_level_keywords": ["Education", "Poverty reduction", "Socioeconomic development"],
  "low_level_keywords": ["School access", "Literacy rates", "Job training", "Income inequality"]
}



---Real Data---
User Query: 安徽芜湖奇瑞生产的新能源车辆其在北美销售量有多少？

---Output---


ChatGPT结果	DeepSeek结果

新能源车辆甚至是新能源难道不应该在low keywords里么，所以这里第一个问题在于LLM对其的理解能力将影响下游查询效果。当然如果将上述prompt更改成中文我觉得DS也会更好。

接下来根据搜索模式来走对应流程，如下图，不同模式最终都会返回entities和relations,接下来看这两种不同模式下的搜索策略。

Local Search

entities：通过low keywords对entities vector库进行召回。即topK(cos_sim(get_embed('安徽芜湖, 奇瑞, 奇瑞新能源汽车, 北美'), entities vector db))。
relations: 拿到上述entities,接着获取一跳节点所组成的边。即[graph.list_edges(node) for node in entities]。

接下来就是排序，entities根据其degree进行排序，relations根据src degree + tgt degree和edge weight（通过LLM在生成知识图谱时获取）进行综合rank。

Global Search

relations：通过high keywords对relations vector库进行召回。即topK(cos_sim(get_embed('新能源汽车, 销售量, 北美市场'), relations vector db))。
entities：根据上面获取到的边自然而然获取到对应src和tgt所对应的entities。

根据entities和relations获取有关chunks

如果是Local Search,Global那边的relation和entity就没有，反之亦然，但如果是Hybrid,则是将这两者对应的entities和relations进行融合。

简短理解就是获取到的entities包括了chunk_ids, 表明某个entity从哪些chunks里获取，进行汇集，然后和用户原始query进行cos_sim排序。
那同理，某个relation也包含了从哪个chunk所获取到的，然后使用原始query进行排序。

LLM回答

基于上述三者，包装成一个大的prompt来进行回复。其prompt如下：

-----Entities(KG)-----

```json
{entities_str}
```

-----Relationships(KG)-----

```json
{relations_str}
```

-----Document Chunks(DC)-----

```json
[]
```

至此我们大致捋清了其实现方式。

总结

在上面每部分已经总结出来的问题除外，我认为其实现还可以加入多跳查询来满足更复杂的链路关系，不过这里也就见仁见智了～

展开全文 >>

PIKE-RAG解读

2025-08-13

介绍

PIKE-RAG是 “sPecIalized KnowledgE and Rationale Augmented Generation”（专领域知识与推理增强生成）的缩写，由微软推出，它通过专注于提取、理解和应用领域特定的知识，并构建推理逻辑来逐步引导大型语言模型（LLMs）生成准确答案，解决传统 RAG 系统在复杂工业应用中的局限。

Chunk流程

1. 数据准备

在data/biology/contents目录下放待解析文件，例如：

2. chunking.yaml

修改examples/biology/configs/chunking.yml（点击展开）

由于使用的素材是中文的，这里的prompt也使用对应官方的。更改后的配置文件如下：

# Environment Variable Setting
################################################################################
dotenv_path: env_configs/.env


# Logging Setting
################################################################################
log_root_dir: logs/biology

# experiment_name: would be used to create log_dir = log_root_dir/experiment_name/
experiment_name: chunking


# Input Document & Output Dir Setting
################################################################################
input_doc_setting:
  doc_dir: data/biology/contents

output_doc_setting:
  doc_dir: data/biology/chunks


# LLM Setting
################################################################################
llm_client:
  module_path: pikerag.llm_client
  # available class_name: AzureMetaLlamaClient, AzureOpenAIClient, HFMetaLlamaClient
  class_name: AzureOpenAIClient
  args: {}

  llm_config:
    # available model name for AzureOpenAIClient: gpt-4, gpt-35-turbo
    # available model name for AzureMetaLlamaClient: llama-2-7b-chat-22, llama-2-13b-chat-19, llama-2-70b-chat-19,
    #                                                meta-llama-3-8b-instruct-4, meta-llama-3-70b-instruct-4
    # available model name for HuggingFaceMetaLlamaClient: meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct
    model: deepseek-chat
    temperature: 0
    # enable max_new_tokens when using llama model, response seems truncated without it
    # max_new_tokens: 1024

  cache_config:
    # location: will be joined with log_dir to generate the full path;
    #   if set to null, the experiment_name would be used
    location_prefix: null
    auto_dump: True


# Splitter Setting
################################################################################
chunking_protocol:
  module_path: pikerag.prompts.chunking
  chunk_summary: chunk_summary_protocol_Chinese
  chunk_summary_refinement: chunk_summary_refinement_protocol_Chinese
  chunk_resplit: chunk_resplit_protocol_Chinese


splitter:
  module_path: pikerag.document_transformers
  class_name: LLMPoweredRecursiveSplitter
  args:
    separators:
      - "\n"
    is_separator_regex: False
    chunk_size: 512
    chunk_overlap: 0

1
2
3

# 运行：
python examples/chunking.py examples/biology/configs/chunking.yml

3. split_documents

进入文档切分流程：

4. first_chunk_summary

对第一个chunk进行summary（点击展开）

# 原文来源

原文来自 data/biology/contents/招标公告.txt 的政策文档 招标公告.txt。

# 原文

“部分原文”：
一、招标编号：皖E0-51-2025-0160
二、项目名称：安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包
三、项目实施地点：马鞍山市含山县
四、项目实施主体（招标人）：安徽润含林业发展有限公司
五、项目实施主体（招标人）地址：马鞍山市含山县望梅路龙亢大厦4楼
六、招标方式：公开招标
七、项目概况、投资规模：
1、项目概况：本项目为安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包。主要分布在
含山县环峰镇、陶厂镇、仙踪镇及林头镇境内，主要施工项目包括中幼林抚育、现有林改培及集约人工林栽培；主
要实施内容为苗木栽植、运输及养护，种子播种及养护，林内灌藤杂草全部砍伐割除，伐除作业，林地上枝丫、杂
灌、树根等剩余物进行清运、粉碎并运输到光大生物能源（含山）有限公司，施工放线及打栽植穴，新造林进行人
为管护及工程隔离带；苗木包括娜塔栎、油茶、野菊花、麻栎、枫香、栓皮栎等；施工现场、交通运输情况及自然
地理条件自行现场勘查。（具体详见工程量清单）
2、项目审批核准或备案机关名称：含山县发展和改革委员会（含发改投资（2023）55号），项目统一代码：2304-

# 任务要求

你的任务是输出以上“部分原文”的总结。

# 输出

只输出内容总结，不要添加其他任何内容。

输出第一个chunk对应的chunk_summary(点击展开)

1. 招标编号：皖E0-51-2025-0160  
2. 项目名称：安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包  
3. 实施地点：马鞍山市含山县  
4. 招标人：安徽润含林业发展有限公司（地址：马鞍山市含山县望梅路龙亢大厦4楼）  
5. 招标方式：公开招标  
6. 项目概况：  
   - 范围：含山县环峰镇、陶厂镇、仙踪镇及林头镇  
   - 内容：中幼林抚育、现有林改培、集约人工林栽培，包括苗木栽植、运输、养护，种子播种，灌藤杂草清理，剩余物清运粉碎，施工放线，造林管护等  
   - 苗木种类：娜塔栎、油茶、野菊花等  
7. 审批机关：含山县发展和改革委员会（备案号：含发改投资（2023）55号）

5. resplit_chunk_and_generate_summary

对chunk进行重新切分和生成对应summary(点击展开)：

# 原文来源

原文来自 data/biology/contents/招标公告.txt 的政策文档 招标公告.txt。

# 原文

“部分原文”的“第一部分”内容概括：
1. 招标编号：皖E0-51-2025-0160  
2. 项目名称：安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包  
3. 实施地点：马鞍山市含山县  
4. 招标人：安徽润含林业发展有限公司（地址：马鞍山市含山县望梅路龙亢大厦4楼）  
5. 招标方式：公开招标  
6. 项目概况：  
   - 范围：含山县环峰镇、陶厂镇、仙踪镇及林头镇  
   - 内容：中幼林抚育、现有林改培、集约人工林栽培，包括苗木栽植、运输、养护，种子播种，灌藤杂草清理，剩余物清运粉碎，施工放线，造林管护等  
   - 苗木种类：娜塔栎、油茶、野菊花等  
7. 审批机关：含山县发展和改革委员会（备案号：含发改投资（2023）55号）

“部分原文”：
Line 0 	 一、招标编号：皖E0-51-2025-0160
Line 1 	 二、项目名称：安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包
Line 2 	 三、项目实施地点：马鞍山市含山县
Line 3 	 四、项目实施主体（招标人）：安徽润含林业发展有限公司
Line 4 	 五、项目实施主体（招标人）地址：马鞍山市含山县望梅路龙亢大厦4楼
Line 5 	 六、招标方式：公开招标
Line 6 	 七、项目概况、投资规模：
Line 7 	 1、项目概况：本项目为安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包。主要分布在
Line 8 	 含山县环峰镇、陶厂镇、仙踪镇及林头镇境内，主要施工项目包括中幼林抚育、现有林改培及集约人工林栽培；主
Line 9 	 要实施内容为苗木栽植、运输及养护，种子播种及养护，林内灌藤杂草全部砍伐割除，伐除作业，林地上枝丫、杂
Line 10 	 灌、树根等剩余物进行清运、粉碎并运输到光大生物能源（含山）有限公司，施工放线及打栽植穴，新造林进行人
Line 11 	 为管护及工程隔离带；苗木包括娜塔栎、油茶、野菊花、麻栎、枫香、栓皮栎等；施工现场、交通运输情况及自然
Line 12 	 地理条件自行现场勘查。（具体详见工程量清单）
Line 13 	 2、项目审批核准或备案机关名称：含山县发展和改革委员会（含发改投资（2023）55号），项目统一代码：2304-
Line 14 	 340522-04-01-681551。
Line 15 	 3、投资规模：项目总投资为199722.66万元。
Line 16 	 4、质量要求：合格。
Line 17 	 5、最高投标限价（人民币）：本工程的招标控制价（即招标人委托工程造价咨询企业编制的本工程的“最高投标
Line 18 	 限价”，下同）为11195580.75元。
Line 19 	 八、资金来源：其他投资
Line 20 	 九、计划工期：365日历天
Line 21 	 八、资金来源：其他投资
Line 22 	 九、计划工期：365日历天
Line 23 	 计划开工日期：2025年6月16日
Line 24 	 （实际开工日期以监理工程师签发的开工令为准）
Line 25 	 计划竣工日期：2026年6月15日。
Line 26 	 十、投标人资质、资格要求：
Line 27 	 1、投标人资格：无。
Line 28 	 2、拟派项目经理要求：项目经理须具备林业类专业或园林绿化类专业高级工程师以上（含高级工程师）职称。（
Line 29 	 注：职称证书上的专业须为林业类专业或园林绿化类专业，若职称证书上无专业的，投标文件中须提供其他能证明
Line 30 	 是以上专业的相关证明材料。）
Line 31 	 3、是否接受联合体:本次招标接受联合体投标，联合体投标的，应满足下列要求：
Line 32 	 （1）联合体各方应按招标文件提供的格式签订联合体协议书，明确联合体牵头人和各方权利义务，并在投标文件
Line 33 	 中提交联合体协议书。
Line 34 	 （2）联合体成员（包括联合体牵头人）数量不得超过3家

# 任务要求

你的任务:
1. 理解“部分原文”的“第一部分”的辅助信息和“部分原文”的内容。
2. 分析“部分原文”的结构，将“部分原文”严格切分为“第一部分”和“第二部分”，不允许有内容缺失。
3. 给出“第一部分”的“结束行号”，请注意，这里“第一部分”的内容定义为：从“Line 0”到“Line 结束行号 + 1”之间的全部“部分原文”内容，不允许为空。请注意，此文“最大行号”为34。
4. 概括“第一部分”的主要内容。
5. 对于“第二部分”，结合上下文和“第一部分”的内容概括它的主要内容，请注意，这里“第二部分”的内容定义为：从“Line 结束行号 + 1”之后的全部“部分原文”内容。

# 输出

按以下格式输出：

思考：<按照任务要求，仔细分析以上“部分原文”的结构，思考如何将它合理划分为两个部分，输出你的思考过程。>

<result>
<chunk>
  <endline>结束行号，一个非负的数字，表示“第一部分”在这一行结束。第一部分会包含这一行。</endline>
  <summary>“第一部分”的详细内容总结。以“这部分的主要内容为”开头，可以结合“部分原文”的内容概括。</summary>
</chunk>
<chunk>
  <summary>结合上下文和第一部分的内容概括第二部分的主要内容。以“这部分的主要内容为”开头。</summary>
</chunk>
</result>

6. 第一个完整的chunk

第一个完整的chunk内容(点击展开)

一、招标编号：皖E0-51-2025-0160
二、项目名称：安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包
三、项目实施地点：马鞍山市含山县
四、项目实施主体（招标人）：安徽润含林业发展有限公司
五、项目实施主体（招标人）地址：马鞍山市含山县望梅路龙亢大厦4楼
六、招标方式：公开招标
七、项目概况、投资规模：
1、项目概况：本项目为安徽省含山县国家储备林项目2025年度营造林建设（第一批次）施工总承包。主要分布在
含山县环峰镇、陶厂镇、仙踪镇及林头镇境内，主要施工项目包括中幼林抚育、现有林改培及集约人工林栽培；主
要实施内容为苗木栽植、运输及养护，种子播种及养护，林内灌藤杂草全部砍伐割除，伐除作业，林地上枝丫、杂
灌、树根等剩余物进行清运、粉碎并运输到光大生物能源（含山）有限公司，施工放线及打栽植穴，新造林进行人
为管护及工程隔离带；苗木包括娜塔栎、油茶、野菊花、麻栎、枫香、栓皮栎等；施工现场、交通运输情况及自然
地理条件自行现场勘查。（具体详见工程量清单）
2、项目审批核准或备案机关名称：含山县发展和改革委员会（含发改投资（2023）55号），项目统一代码：2304-
340522-04-01-681551。
3、投资规模：项目总投资为199722.66万元。
4、质量要求：合格。
5、最高投标限价（人民币）：本工程的招标控制价（即招标人委托工程造价咨询企业编制的本工程的“最高投标
限价”，下同）为11195580.75元。
八、资金来源：其他投资
九、计划工期：365日历天
八、资金来源：其他投资
九、计划工期：365日历天
计划开工日期：2025年6月16日
（实际开工日期以监理工程师签发的开工令为准）
计划竣工日期：2026年6月15日。
十、投标人资质、资格要求：

并且在原文中可以高亮：

接着根据输出的Line Number来算对应的dropped_len，下一次的text=text[dropped_len:]。

chunk流程总结

使用Text Splitter对text（原文）进行初次切分，得到chunks。
对chunk[1]进行总结得到chunk summary。
基于chunk summary + (chunk[1] + chunk[2])得到based llm chunk、current llm chunk summary、下一个chunk的summary，此时的text=text[dropped_len:]。
以下一个chunk的summary作为新一轮的chunk summary, 重复步骤3，直至获取所有切片。

与传统基于固定字符数切分不同，PIKE-RAG 的切分过程是一个滚动窗口+语义引导的迭代过程。它先生成chunk summary，再利用summary+下一段文本判断切分边界并生成新的chunk，直到文本处理完毕。这保证了chunk间的上下文衔接，并让后续的Atomic Question Tagging更精准。
注意：我们可以看到，每次的切分是动态的，但是每次切分的chunk_size是固定的。

至此我们屡清了基于LLM来做Chunk(LLMPoweredRecursiveSplitter)的流程，但同时我们也可以看到，基于LLM来做Chunk切分很考验LLM的理解能力。你可以从这里下载精简后代码。

Atomic Question Tagging

流程

入口：

对这段内容生成尽可能多的可以回答的问题：

总结

用deepseek-chat模型来生成question,虽不会出现例如prompt中简单的他她它这种，但是仍然会存在不完善的情况，例如：

1
2
3

招标代理机构的名称是什么？
项目总投资金额是多少？
项目主要分布在含山县哪些乡镇？

而这种不完善的情况就会导致下游在召回时带来更多影响，而ChatGPT效果会更好些。。
所以生成的question质量直接决定了召回的粒度和准确度，因为下游检索完全依赖这个问题库来定位相关chunk。

QA

入口和整体流程

1. propose question decomposition（对用户输入的问题进行分解）

分解query提示词(点击展开)：

# Task
Your task is to analyse the providing context then raise atomic sub-questions for the knowledge that can help you answer the question better. Think in different ways and raise as many diverse questions as possible.
# Output Format
Please output in following JSON format:
{
    "thinking": <A string. Your thinking for this task, including analysis to the question and the given context.>,
    "sub_questions": <A list of string. The sub-questions indicating what you need.>
}
# Context
The context we already have:
# Question
Which country is home to Alsa Mall and Spencer Plaza?
# Your Output:

至此拿到用户输入question分解后的子问题，当然子问题是可以为空的，如果为空：

说明这个子问题无法能够继续拆分了，但是这里并不成立。
更可能在于基于已经给的{chosen_context}能够回答用户问题，如果{chosen_context}为空，那肯定会进行拆分子问题的。

所以这一步结果为空，则直接进入回复用户问题阶段，不会走下面流程了。

2. Retrieve relevant atom information（召回相关chunk）

2.1 基于用户question生成的sub-questions进行召回

这里流程简述如下:
step1: 例如当用户问,

当黄庄遭遇百年一遇洪水时，丹江口如何调度？
得到的sub-questions如下:

{
    "thinking": "这个问题涉及到洪水管理和水利工程调度，特别是在黄庄遭遇百年一遇洪水的情况下，丹江口水库的调度策略。为了全面回答这个问题，需要了解多个方面的信息，包括丹江口水库的基本功能、调度原则、黄庄的地理位置与洪水情况、以及两者之间的水利联系等。",
    "sub_questions": [
        "丹江口水库的主要功能是什么？它在防洪中扮演什么角色？",
        "黄庄位于哪个流域？与丹江口水库的水系是否相连？",
        "百年一遇洪水的具体标准是什么？对黄庄的影响有多大？",
        "丹江口水库在遭遇上游或下游洪水时的常规调度策略是什么？",
        "是否有历史案例显示丹江口水库在类似洪水情况下的调度操作？",
        "丹江口水库的调度决策由哪个机构负责？决策流程是怎样的？",
        "在调度过程中，如何平衡防洪、供水、发电等多重目标？",
        "黄庄附近是否有其他水利设施可以与丹江口水库协同调度？",
        "当前丹江口水库的水位和蓄水量如何？是否有调度余量？",
        "气象预报对未来几天的降雨情况有何预测？这是否会影响调度决策？"
    ]
}

step2: 对于每一个sub_questions,对Atomic Question Tagging生成的question库进行召回。例如每个限制返回个数为4，那么可以上述10个sub_questions可以返回40个搜索结果，由于Atomic Question Tagging生成的question库会包含每个question从哪个chunk进行生成的，所以也会返回这些chunks信息。

2.2 Backup retrieval1

如果没有sub-questions，则用用户query直接到questions库去搜。

2.3 Backup retrieval2

如果前面两步搜索还是空，则是query to document这种搜索了。

3. 让大模型决定搜索结果

对sub-questions进行筛选(点击展开)

# Task
Your task is to analyse the providing context then decide which sub-questions may be useful to be answered before you can answer the given question. Select a most relevant sub-question from the given question list, avoid selecting sub-question that can already be answered with the given context or with your own knowledge.
# Output Format
Please output in following JSON format:
{
    "thinking": <A string. Your thinking for this selection task.>,
    "question_idx": <An integer, indicating a sub-question index from 1 to 30.>
}
# Context
The context we already have:
# Sub-Questions You Can Choose From
Question 1: Where is Alsa Mall located?
Question 2: What is the address of Alsa Mall?
Question 3: Which city is Alsa Mall in?
Question 4: What is Alsa Mall known for?
Question 5: Where is Spencer Plaza located?
Question 6: When was Spencer Plaza originally built?
Question 7: What was the original purpose of the site where Spencer Plaza now stands?
Question 8: What are some notable features of Spencer Plaza's history?
Question 9: Where is Alsa Mall located?
Question 10: Which city is Alsa Mall in?
Question 11: What is the address of Alsa Mall?
Question 12: Which country is Alsa Mall located in?
Question 13: What is Alsa Mall known for?
Question 14: What is the address of Alsa Mall?
Question 15: Where is Alsa Mall located?
Question 16: What are some landmarks near Alsa Mall?
Question 17: What are some notable features of Spencer Plaza's history?
Question 18: Where is Spencer Plaza located?
Question 19: When was Spencer Plaza originally built?
Question 20: What is the gross lettable area of Spencer Plaza?
Question 21: Which country is Alsa Mall located in?
Question 22: What is Alsa Mall known for?
Question 23: Which other mall was established around the same time as Alsa Mall?
Question 24: Which city is Alsa Mall in?
Question 25: What is Alsa Mall known for?
Question 26: Where is Alsa Mall located?
Question 27: What are some landmarks near Alsa Mall?
Question 28: What is the historical importance of Alsa Mall?
Question 29: What is the significance of Alsa Mall in Chennai's history?
Question 30: What are some landmarks near Alsa Mall?
# Question
Which country is home to Alsa Mall and Spencer Plaza?
# Your output:

入口：

对应提示词：

总结：

根据召回的sub-questions,来让LLM决定哪一条sub-question对回答用户问题有帮助。（为啥只选择一条sub-question？）
如果LLM一条也不选择，那就走备选prompt,备选prompt是将这些sub-questions对应的chunks给LLM,让其选择一条最合适的chunk（chunk同时也包含了sub-question以及chunk_id等信息，所以又相当是获取了sub-question）。
注意：每个prompt都有chosen_context这一部分，其核心目的是我已经选择了哪些相关context,你（LLM）要基于我已经选择的进行思考再决定选择新的chunk来塞入context。
重复这个过程，达到设置循环次数，最终将用户提出的question和获取到的chosen_context来交由LLM进行最终回复。（至此我们也看到了另外一个ReAct思想实现，基于每次sub-question返回的chunk,来决定下次选择哪个新的sub-question以及chunk）。

备选prompt（点击展开）

# Task
Your task is to analyse the providing context then decide which paragraph in the list may be useful for you to answer the given question. Select a most relevant paragraph from the given paragraph list.

# Output Format
Please output in following JSON format:
{{
    "thinking": <A string. Your thinking for this selection task.>,
    "paragraph_idx": <An integer. A paragraph index from 1 to {num_chunks}.>
}}

# Context
The context we already have:
{chosen_context}

# Paragraph List You Can Choose From
{chunk_list_str}

# Question
{content}

# Your output:

总结

至此我们完成了这三个部分的分析，如果有什么不对之处还请多指出～。

展开全文 >>

VIT升级版Swin Transformer:一眼看穿你的照片歪没

2025-08-11

前言

写通俗易懂的文章更容易吸引读者，那我们这篇文章依然不抠细节、不讲复杂原理，从实战角度带你快速上手Swin Transformer。

Swin Transformer是视觉Transformer（ViT）的升级版。ViT把图片分成一个个固定大小的小块（Patch），然后用Transformer全局计算，但Swin Transformer引入了“滑动窗口”（Shifted Window）机制，能在不同尺度和局部区域灵活观察图像：

它先在小窗口内学习细节
然后滑动窗口跨区域连接上下文
最后融合全局信息

简单来说，它在图像处理上更灵活、更高效，是ViT的强力升级。

数据示例


example1	example2

实战演示

训练

import os
import pprint
import random
import re
from collections import defaultdict

import numpy as np
import torch
import torch.nn as nn
import tqdm
from PIL import Image
from datasets import Dataset as HFDataset
from sklearn.metrics import classification_report, f1_score
from torch.nn import BCELoss
from torch.functional import F
from torch.utils.data import Dataset
# 加载 SwinTransformer 模型和处理器
from transformers import (AutoFeatureExtractor, SwinModel, Trainer,
                          TrainingArguments)
from transformers.models.swin.modeling_swin import SwinImageClassifierOutput
from torchvision import transforms

os.environ["WANDB_DISABLED"] = "true"
MODEL_NAME_OR_PATH = './swin-large-patch4-window12-384-in22k/'


transform = transforms.Compose([
    transforms.ColorJitter(brightness=0.2, contrast=0.2,
                           saturation=0.2, hue=0.1),
    transforms.RandomResizedCrop(384),
])

def load_dataset_from_path():
    result = []
    image_dirs = ['./images', ]
    for image_dir in image_dirs:
        for image_name in tqdm.tqdm(os.listdir(image_dir), total=len(os.listdir(image_dir)), postfix=image_dir):
            label = re.search(r"l\-(是|否)\.", image_name).group(1)
            image_path = os.path.join(image_dir, image_name)
            try:
                transform(Image.open(image_path).convert("RGB"))
            except Exception as e:
                print(f"invalid image: {image_path}, err:{e}")
                continue

            result.append({"image_path": image_path, "label": label})

    # #################### split
    random.shuffle(result)

    counter = defaultdict(int)
    for item in result:
        counter[item['label']] += 1
    pprint.pprint(counter)

    trains, tests = [], []
    train_counter = defaultdict(int)
    for item in result:
        if train_counter[item['label']] / counter[item['label']] > 0.85:
            tests.append(item)
        else:
            train_counter[item['label']] += 1
            trains.append(item)
    return HFDataset.from_list(trains), HFDataset.from_list(tests)
    # all_dataset = HFDataset.from_list(result)
    #
    # split = all_dataset.train_test_split(0.15, seed=1)
    # return split['train'], split['test']


train_dataset, eval_dataset = load_dataset_from_path()
print(
    f'total train size:{train_dataset.shape[0]},eval size:{eval_dataset.shape[0]}')


class SwinForClassify(nn.Module):
    def __init__(self, model_name="microsoft/swin-large-patch4-window12-384-in22k", num_labels=1):
        super().__init__()
        self.backbone = SwinModel.from_pretrained(model_name)  # 只加载特征提取部分
        hidden_size = self.backbone.config.hidden_size  # Swin的最后一层特征维度
        self.classify = nn.Linear(hidden_size, num_labels)  # 线性回归层

    def forward(self, pixel_values, labels=None):
        outputs = self.backbone(pixel_values)  # 提取特征
        pooled_output = outputs.pooler_output  # (batch_size, hidden_size)
        logits = torch.sigmoid(self.classify(pooled_output))  # 线性映射到目标维度
        loss = None
        if labels is not None:
            cri = BCELoss()
            loss = cri(logits, labels)

        return SwinImageClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
            reshaped_hidden_states=outputs.reshaped_hidden_states,

        )


# 示例
model = SwinForClassify(model_name=MODEL_NAME_OR_PATH, num_labels=1)

processor = AutoFeatureExtractor.from_pretrained(MODEL_NAME_OR_PATH)


class ClassifyDataset(Dataset):
    def __init__(self, dataset, processor, transform=None):
        self.dataset = dataset
        self.processor = processor
        self.transform = transform

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image = self.dataset[idx]['image_path']
        image = Image.open(image).convert('RGB')
        image = self.transform(image)

        label = self.dataset[idx]['label']
        if label == '是':
            label = 1
        elif label == '否':
            label = 0
        else:
            raise ValueError(f'{label} not valid.')
        return {"image": image, "label": label}


# 创建数据集对象
train_dataset = ClassifyDataset(train_dataset, processor)
eval_dataset = ClassifyDataset(eval_dataset, processor)

# 定义训练参数
training_args = TrainingArguments(
    output_dir="./results",  # 输出结果目录
    overwrite_output_dir=True,
    num_train_epochs=10,  # 训练轮数
    per_device_train_batch_size=4,  # 每设备训练的batch size
    per_device_eval_batch_size=8,  # 每设备评估的batch size
    evaluation_strategy="epoch",  # 每个epoch评估一次
    save_strategy="epoch",  # 每个epoch保存一次
    logging_dir='./logs',  # 日志目录
    logging_steps=10,
    learning_rate=2e-5,  # 学习率
    weight_decay=0.01,  # 权重衰减
    load_best_model_at_end=True,  # 训练结束时加载最好的模型
    remove_unused_columns=False,
    metric_for_best_model='f1',
)


def data_collator(batch):
    images = [item["image"] for item in batch]
    labels = [item["label"] for item in batch]
    inputs = processor(images, return_tensors="pt")
    inputs["labels"] = torch.tensor(labels, dtype=torch.float).reshape(-1,1)
    return inputs


def compute_metrics(eval_preds):
    preds, labels = [], []
    acc, total = 0, 0

    for (pred, label) in zip(eval_preds.predictions, eval_preds.label_ids):
        if pred > 0.5:
            pred = 1
        else:
            pred = 0
        preds.append(pred)
        labels.append(label)

        total += 1
        if pred == label:
            acc += 1
    print(f'ACC: {acc/(total+1e-5)}')
    print(classification_report(y_true=labels, y_pred=preds, ))
    return {"f1": f1_score(y_true=labels, y_pred=preds, average='macro')}


# 定义 Trainer
trainer = Trainer(
    model=model,  # 预训练模型
    args=training_args,  # 训练参数
    train_dataset=train_dataset,  # 训练数据集
    eval_dataset=eval_dataset,  # 验证数据集
    tokenizer=processor,  # 使用Swin图像处理器
    compute_metrics=compute_metrics,
    data_collator=data_collator
)

# 训练模型
#trainer.train(resume_from_checkpoint=True)
trainer.train()

# 保存模型
trainer.save_model("./final_model")

推理

import torch
import torch.nn as nn
import tqdm
from PIL import Image
from datasets import Dataset as HFDataset
from sklearn.metrics import classification_report, f1_score
from torch.nn import BCELoss
from torch.functional import F
from torch.utils.data import Dataset
# 加载 SwinTransformer 模型和处理器
from transformers import (AutoFeatureExtractor, SwinModel, Trainer,
                                  TrainingArguments, set_seed)
from transformers.models.swin.modeling_swin import SwinImageClassifierOutput
from torchvision import transforms
from safetensors.torch import load_file

set_seed(1)

transform = transforms.Compose([
    transforms.RandomResizedCrop(384),
])


class SwinForClassify(nn.Module):
    def __init__(self, model_name="microsoft/swin-large-patch4-window12-384-in22k", num_labels=1):
        super().__init__()
        self.backbone = SwinModel.from_pretrained(model_name)  # 只加载特征提取部分
        hidden_size = self.backbone.config.hidden_size  # Swin的最后一层特征维度
        self.classify = nn.Linear(hidden_size, num_labels)  # 线性回归层

    @torch.no_grad()
    def forward(self, pixel_values, labels=None):
        outputs = self.backbone(pixel_values)  # 提取特征
        pooled_output = outputs.pooler_output  # (batch_size, hidden_size)
        logits = torch.sigmoid(self.classify(pooled_output))  # 线性映射到目标维度
        loss = None
        if labels is not None:
            cri = BCELoss()
            loss = cri(logits, labels)

        return SwinImageClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
            reshaped_hidden_states=outputs.reshaped_hidden_states,

        )
      
if __name__ == "__main__":
    MODEL_NAME_OR_PATH = './swin-large-patch4-window12-384-in22k/'
    # 示例
    model = SwinForClassify(model_name=MODEL_NAME_OR_PATH, num_labels=1)
    state_dict = load_file("./final_model/model.safetensors")
    model.load_state_dict(state_dict)
    model.eval()

    processor = AutoFeatureExtractor.from_pretrained(MODEL_NAME_OR_PATH)

    test_image_dir = "../image-rotate/test_images"
    for image_name in os.listdir(test_image_dir):
        image_path = os.path.join(test_image_dir, image_name)
        image = Image.open(image_path).convert("RGB")
        image = transform(image)
        input = processor([image], return_tensors='pt')
        output = model(**input).logits[0]
        print(image_name,output>0.5, output)

展开全文 >>

让YOLOv11秒懂口罩/手套/防护服：CPPE5数据集实战指南

2025-08-06

目标检测的事儿，咱们不绕弯子了。这篇文章就是教你一件事：

3分钟教你用YOLOv11检测口罩、手套、防护服，工业、医疗、安防一网打尽。

这篇文章不会深究模型原理，而是手把手带你跑通全流程训练与推理，使用的数据是HuggingFace上经典的CPPE-5数据集，涵盖多种个人防护装备（PPE）目标检测任务。

✳️什么是 CPPE-5？

这里采用rishitdagli/cppe-5公开数据集，含以下5类标签：

类别英文	中文含义
Coverall	防护服 / 连体衣
Face_Shield	面罩 / 防护面屏
Gloves	手套
Goggles	护目镜
Mask	口罩

共1000张图片，真实复杂场景，适合目标检测实战测试。

🚀 快速开始（带你跑通）

1. 下载数据集

1
2
3

wget https://huggingface.co/datasets/rishitdagli/cppe-5/resolve/main/data/test-00000-of-00001.parquet

wget https://huggingface.co/datasets/rishitdagli/cppe-5/resolve/main/data/train-00000-of-00001.parquet

2. 转换成yolo格式

2.1 数据预处理脚本


import pathlib

import torch
from datasets import load_dataset
from PIL import Image, ImageDraw
from sympy import im

# 这里路径根据实际路径来
cppe5 = load_dataset(
    'parquet',
    data_files={
        "train": pathlib.Path(__file__).parent.parent.joinpath("train-00000-of-00001.parquet").__str__(),
        "test": pathlib.Path(__file__).parent.parent.joinpath("test-00000-of-00001.parquet").__str__()
    }
)
categories = cppe5["train"].features["objects"].feature["category"].names
id2label = {index: x for index, x in enumerate(categories, start=0)}
label2id = {v: k for k, v in id2label.items()}
# ['Coverall', 'Face_Shield', 'Gloves', 'Goggles', 'Mask']
print(categories)


def cppe5_to_yolo_format(examples, data_type='train'):
    # 注意这里，images/train放的是训练图片，images/test放的是测试图片
    # 同理，labels/train放的是对应图片标签
    pathlib.Path(__file__).parent.joinpath(
        f"images/{data_type}").mkdir(exist_ok=True, parents=True)
    pathlib.Path(__file__).parent.joinpath(
        f"labels/{data_type}").mkdir(exist_ok=True, parents=True)
    for index, sample in enumerate(examples):
        image = sample["image"].convert("RGB")
        yolo_labels = []

        # draw = ImageDraw.ImageDraw(image)
        # for bbox, category in zip(sample["objects"]["bbox"], sample["objects"]["category"]):
        #     x, y, w, h = tuple(bbox)
        #     if max(bbox) > 1.0:
        #         x1, y1 = int(x), int(y)
        #         x2, y2 = int(x + w), int(y + h)
        #     else:
        #         raise NotImplementedError
        #     draw.rectangle((x1, y1, x2, y2), outline="red")
        #     draw.text((x1, y1), str(id2label[category]), fill="red")
        # image.show()

        for bbox, category in zip(sample["objects"]["bbox"], sample["objects"]["category"]):

            x, y, w, h = tuple(bbox)
            if max(bbox) > 1.0:
                x1, y1 = int(x), int(y)
                x2, y2 = int(x + w), int(y + h)
            else:
                raise NotImplementedError
            img_w, img_h = image.size
            x_center = (x1 + x2) / 2.0 / img_w
            y_center = (y1 + y2) / 2.0 / img_h
            width = (x2 - x1) / img_w
            height = (y2 - y1) / img_h
            yolo_labels.append(
                f'{category} {x_center} {y_center} {width} {height}'
            )
        image.save(pathlib.Path(__file__).parent.joinpath(
            f"images/{data_type}/{index}.jpg"))
        with open(pathlib.Path(__file__).parent.joinpath(f"labels/{data_type}/{index}.txt"), "w") as f:
            f.write("\n".join(yolo_labels))


cppe5_train = cppe5_to_yolo_format(cppe5["train"], data_type='train')
cppe5_test = cppe5_to_yolo_format(cppe5["test"], data_type='test')

2.2 配置文件`data.yaml`

train: /object_detection/yolo_impl/images/train
val: /object_detection/yolo_impl/images/test
nc: 5
names: ["Coverall", "Face_Shield", "Gloves", "Goggles", "Mask"]

3. 启动YOLOv11模型训练


import os.path

from ultralytics import YOLO

os.environ['WANDB_DISABLED'] = 'true'
os.environ['YOLO_OFFLINE'] = 'true'


model_path = '/object_detection/yolo_impl/yolo11n.pt'
assert os.path.exists(model_path)

model = YOLO(model_path)

model.train(
    data=os.path.join(os.path.dirname(__file__), 'data.yaml'),
    epochs=100,             # 训练轮数
    imgsz=640,              # 输入图像尺寸
    batch=64,               # 批量大小，根据显存可调
    workers=4,              # 数据加载线程数

    # 优化器与学习率
    lr0=0.001,              # 初始学习率
    lrf=0.01,               # 最低学习率（cosine decay 最终值）
    optimizer='SGD',        # 优化器
    weight_decay=0.0005,    # 权重衰减
    warmup_epochs=3,        # 热身轮数
    patience=20,            # 提前停止的容忍度

    # 增强与正则
    augment=True,           # 启用数据增强
    mosaic=1.0,
    mixup=0.1,
    hsv_h=0.015,
    hsv_s=0.7,
    hsv_v=0.4,
    scale=0.5,
    translate=0.1,
    degrees=0.0,            # 不启用旋转

    # 损失函数超参
    box=7.5,                # 边界框损失权重
    cls=0.5,                # 类别损失权重

    iou=0.2,                # IoU 阈值（用于损失计算）

    device=0                # 使用GPU:0

)

📊 训练效果如下（示例）：

4. 推理预测


import pathlib

from ultralytics import YOLO

model = YOLO(pathlib.Path(__file__).parent.joinpath(
    "runs/detect/train/weights/best.pt"
    )
)

results = model.predict(

    source='/object_detection/yolo_impl/images/test',
    save=True

)
# 处理结果
for result in results:
    print(f"检测到的对象数量: {len(result.boxes)}")
    for box, conf, cls in zip(result.boxes.xyxy, result.boxes.conf, result.boxes.cls):
        print(f"类别: {result.names[int(cls)]}, 置信度: {conf:.2f}, 坐标: {box}")
        print(f"类别: {result.names[int(cls)]}, 置信度: {conf:.2f}, 坐标: {box}")

🔍 示例结果：


预测结果1	预测结果2

✅ 总结

恭喜你，现在你已经掌握了如何用YOLOv11快速完成PPE检测模型训练与推理，下一步可以尝试：

加入你的实际工业/医疗图像
微调超参数
部署模型到边缘设备

📬 有问题欢迎评论交流，觉得有用记得一键三连 🙌！

展开全文 >>

从LLM自主探索到LangGraph流程驱动：深度研究的两种范式

2025-07-09

1. 什么是Deep Research？为什么它值得关注

Deep Research是一种面向“深度研究”任务的新型AI Agent能力。它不再只是被动地回答提问，而是具备主动探索、综合整合信息、产出可信且可追溯答案的“研究员式”智能。

简而言之：它让AI真正参与“调研”而非“聊天”。

下图展示了ChatGPT与Grok所提供的深度研究功能界面：


ChatGPT运行深度思考	Grok DeepSearch

2. Gemini+LangGraph：一个全流程深度研究的范例

gemini-fullstack-langgraph-quickstart是Google Gemini官方推出的开源示例，展示了如何基于 LangGraph+Gemini API构建一个具备“多轮推理”能力的AI Agent，并通过Web页面进行交互。

虽然该项目提供了详尽的运行指南，但本文更关注其背后的核心机制：LangGraph如何实现LLM推理流程的编排？

3. 核心实现流程解析


官方示例中的AI Agent推理流程图	LangGraph 构建的有向图编排结构

核心节点如下：

generate_query：根据用户输入生成初始查询（最多3个搜索关键词）。
web_search:对每个查询项调用Gemini模型与Google Search API进行搜索，获取网页摘要。
reflection:分析搜索结果，判断信息是否充足，是否存在知识空白。这一反思过程同样使用Gemini模型完成。
evaluate_research:迭代优化,如果发现存在知识空白或信息不足，它将生成后续查询，并重复进行网页研究和反思步骤（最多可执行预设次数的循环）。
finalize_answer:生成最终答案,当研究被认为已经足够时，代理会使用Gemini模型将收集到的信息整合成连贯的答案，并附上网页来源的引用。

这个流程类似于“自主调研-不断提问-自我反思-形成结论”的思维链式行为，LangGraph则负责把这些步骤编排成可复用的流程图模型。

4. 是否值得学习LangGraph？我认为值得

LangGraph提供了一个更结构化、流程驱动的方式来构建AI Agent。它不仅仅是一个框架，更是一种对AI应用开发的全新“思维方式”：

明确节点定义（如搜索、反思、决策）
明确控制流（判断、循环、终止）
易于扩展与调试（可以替换 LLM、外部工具）

而不是用大量if-else去拼凑“多轮对话”逻辑。

5. 我的简化实现（Gemini→DeepSeek，Google Search→SearXNG）

我基于官方代码进行了简化与改造：

LLM模型替换为DeepSeek
搜索服务替换为开源搜索聚合工具SearXNG
修复了原示例中“重复搜索”的bug

你可以查看并运行这个最小实现版本：<见附录-LangGraph DeepResearch最小实现>

附录

SearXNG

LangGraph Studio

LangGraph DeepResearch最小实现

👉LangGraph DeepResearch最小实现:https://gist.github.com/geasyheart/053c65c00edb1c90a1882228944015e9

展开全文 >>

使用langgraph打造AI服务编排

2025-07-03

LangGraph是什么？

LangGraph是LangChain团队推出的流程编排工具，它基于状态机的思想，结合LangChain的Agent与Tool架构，允许我们以图的形式组织多个AI组件、服务调用、条件判断与上下文流转。

为什么只是给大模型Tools还不够？我们真正需要的是“编排”

在大模型Agent系统中，一些入门教程或框架（包括LangChain早期版本）会鼓励用户把各种工具注册给模型，然后说：

“你可以调用这些tools，自己决定该怎么完成任务。”

这听起来像是Agent的“智能体现”，但实际上它把复杂性全推给了语言模型本身，代价非常高。

低参数量模型在自主决策上效果不好。
流程逻辑是隐式的，决策都放到了prompt和模型权重中，而我们需要的是可靠、可复现、可维护的系统。从完全黑盒变成工程上可控的状态。

LangGraph——让模型专注智能，流程交给编排系统

LangGraph并不是用来替代语言模型的“思考能力”，而是让你把流程逻辑从模型里“解耦”出来，这样模型专注处理智能任务，复杂决策与状态控制则交由LangGraph管理——就像微服务和调度系统一样，各司其职。

组件职责	谁来做
工具调用决策	大模型
控制流程走向	LangGraph
状态管理	LangGraph
智能处理（理解/总结/生成）	大模型
并发、回退、异常处理	LangGraph

实现一个会讲笑话、查询天气、闲聊的机器人

整个流程如上图所示，下面我们来感受下langgraph的实现。

import os

from langchain_core.messages import AIMessage, AnyMessage
from langchain_deepseek import ChatDeepSeek
from langgraph.constants import END, START
from langgraph.graph import StateGraph
from pydantic import BaseModel, Field
from typing_extensions import TypedDict

INTENTION_PROMPT = """根据下面用户对话信息判断用户意图，意图有：查询天气、讲笑话、闲聊。结果请以json格式输出。
例如：
对话信息：
    User: 今天天气怎么样？
输出：
    {"intention": "查询天气"}

真实对话信息：
{{messages}}
"""


class IntentionOutput(BaseModel):
    intention: str = Field(description='用户意图')


llm = ChatDeepSeek(
    model="deepseek-chat",  # 指定 DeepSeek 的模型名称
    api_key=os.getenv("DEEPSEEK_API_KEY"),  # 替换为您自己的 DeepSeek API 密钥
)


class State(TypedDict):
    messages: list[AnyMessage]
    intention: str


# 初始化状态图
graph_builder = StateGraph(State)


def chatbot(state: State):
    """闲聊"""
    return {"messages": AIMessage(content='哦吼，我也不知道聊啥子')}


def get_weather(state: State):
    """查询天气"""
    return {"messages": AIMessage(content='今天天气晴朗，23度，适合外出打羽毛球。')}


def joke(state: State):
    """讲笑话"""
    history_msgs = ""
    for msg in state['messages']:
        history_msgs += f"{msg['role']}:{msg['content']}\n"
    prompt = '请根据用户最新的对话信息讲个笑话：\n' + history_msgs
    output = llm.invoke(prompt)
    return {"messages": output}


def user_intention(state: State):
    """意图识别"""
    history_msgs = ""
    for msg in state['messages']:
        history_msgs += f"{msg['role']}:{msg['content']}\n"
    prompt = INTENTION_PROMPT.replace("{{messages}}", history_msgs)
    print('意图识别'.center(60, '-'))
    print(prompt)
    structure_llm = llm.with_structured_output(IntentionOutput)
    output = structure_llm.invoke(prompt)
    final_intention = output.intention
    if final_intention not in ("查询天气", "讲笑话", "闲聊"):
        final_intention = '闲聊'
    return {"intention": final_intention}


def intention_conditional_edge(state: State):
    if state['intention'] == '查询天气':
        return 'get_weather'
    elif state['intention'] == '讲笑话':
        return 'joke'
    else:
        return 'chatbot'


# 添加节点到状态图

graph_builder.add_node('user_intention', user_intention)
graph_builder.add_node("chatbot", chatbot)
graph_builder.add_node('get_weather', get_weather)
graph_builder.add_node('joke', joke)

graph_builder.add_edge(START, 'user_intention')
graph_builder.add_conditional_edges('user_intention', intention_conditional_edge, [
    'chatbot', 'get_weather', 'joke'])
graph_builder.add_edge('chatbot', END)
graph_builder.add_edge('get_weather', END)
graph_builder.add_edge('joke', END)

# 编译状态图
graph = graph_builder.compile()

# with open("/tmp/b.png", 'wb') as f:
#     f.write(graph.get_graph().draw_mermaid_png())

if __name__ == "__main__":
    round1 = graph.invoke(
        {"messages": [{"role": "user", "content": "我很开心，讲个好笑的事情吧"}]})
    print(round1)

更多示例请参考官方文档。

编排可视化

可通过LangGraph Studio来进行可视化,操作步骤：

1. 安装必要包

1
2
3

# Python >= 3.11 is required.

pip install --upgrade "langgraph-cli[inmem]"

2. 创建langgraph.json

项目结构如下：

1
2
3

lg_demo/
  - chat_demo.py(内容如上)
  - langgraph.json

langgraph.json内容：

{
  "dependencies": [
    "."
  ],
  "graphs": {
    "agent": "./chat_demo.py:graph"
  }
}

3. 运行

1	DEEPSEEK_API_KEY=<your key> langgraph dev

会得到如下信息：


INFO:langgraph_api.cli:

        Welcome to

╦  ┌─┐┌┐┌┌─┐╔═╗┬─┐┌─┐┌─┐┬ ┬
║  ├─┤││││ ┬║ ╦├┬┘├─┤├─┘├─┤
╩═╝┴ ┴┘└┘└─┘╚═╝┴└─┴ ┴┴  ┴ ┴

- 🚀 API: http://127.0.0.1:2024
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
- 📚 API Docs: http://127.0.0.1:2024/docs

This in-memory server is designed for development and testing.
For production use, please use LangGraph Cloud.

如果浏览器无法正常打开https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024，可尝试使用Firefox浏览器。

4. 示例

下图为最终可视化效果，包含每个节点及其流向（来自LangGraph Studio）：

5. 客户端调用

你也可以通过python sdk进行调用，示例代码：


from langgraph_sdk import get_sync_client

client = get_sync_client(url="http://localhost:2024")

for chunk in client.runs.stream(
    None,  # Threadless run
    "agent",  # Name of assistant. Defined in langgraph.json.
    input={
        "messages": [
            {
                "role": "user",
                "content": "今天南京天气好吗？适合外出吗？",
            },
            {
                "role": "assistant",
                "content": "今天天气晴朗，23度，适合外出打羽毛球。"
            },
            {
                "role": "user",
                "content": "那你给我讲个天气相关的笑话吧"
            }
        ],
    },
    stream_mode="messages",
):
    print(f"Receiving new event of type: {chunk.event}...")
    print(chunk.data)
    print("\n\n")

个人感受

如果为了演示，可以尝试上面可视化部分。
如果集成至已有服务，可单独使用LangGraph服务编排。

展开全文 >>

钢材质检

2025-02-28

展开全文 >>

aha-moment根源之高质量推理数据

2025-02-08

前言

过年期间，deepseek-R1火出了圈，各家媒体都在狂轰乱炸宣传deepseek，以及技术圈各种文章来介绍其实现原理。那这里仅从“高质量数据”角度作为入口来阐述对其的理解。

为什么需要高质量数据集？

这不废话吗，没有高质量数据集怎么训练高质量模型。对的，这个回答完全没有问题，从目前大模型能力角度来讲，其回答已经持平或者某些方面已经高于绝大多数人的认知，这是其一。其二是在垂直领域或者具体业务，更多、更贴和的真实数据会给模型带来更好的效果以及降低一个量级的参数量，这也是下游能够应用大模型的主要原因。

那其三呢，我们无法构建一个真实环境，来给大模型进行交互，让其不断试错和学习，所以我们需要将人类对于各种问题的理解以及自然规律等等，用其简单明了直接的方式告诉大模型，这个就是答案,所以出现了SFT。其他文章在介绍到SFT时，喜欢用与人类能够交互的方式来解释这里，我觉得这里的原因是同等重要的，如果模型能够进化到具身智能，那么人类知识作为其主要部分，仍是绕不开与人类交互的。

什么是高质量数据集？

那自然伸展到这里了，或者说高质量数据集应该包含哪些可能，以及什么样的高质量数据集可以更大可能的激发出大模型的能力（aha moment或者是能力涌现）。aha moment是指经过先思考再回答，think step by step，对于推理问题，例如数学、代码方面表现出了优秀的结果。能力涌现是在pretrained model到sft model,指少量标注的QA可以激发出Pretrained model按照人类交互的方式回答用户问题。至此总结了两种高质量数据集的可能表达形式。这也是OpenAI O1、deepseek目前所带来的新的能力。那多模态大模型和agent交互所带来的，我认为也会是接下来的发展方向。

下面展示一个对话demo,经过这种reason dataset来让经过sft model训练后所带来的思考能力。

展开全文 >>

Llava简述

2024-09-02

介绍

Llava是一个多模态大模型，本文以如下代码大致介绍下。

import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'  # noqa

import requests
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration, LlavaConfig


config = LlavaConfig.from_pretrained("llava-hf/llava-1.5-7b-hf")


model = LlavaForConditionalGeneration(config=config)

processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
prompt = "USER: <image>\nWhat's the content of the image? ASSISTANT:"
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=prompt, images=image, return_tensors="pt")

generate_ids = model.generate(**inputs, max_new_tokens=15)
processor.batch_decode(generate_ids, skip_special_tokens=True,
                       clean_up_tokenization_spaces=False)[0]

print()

模型结构


LlavaForConditionalGeneration(
  (vision_tower): CLIPVisionModel(
    (vision_model): CLIPVisionTransformer(
      (embeddings): CLIPVisionEmbeddings(
        (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
        (position_embedding): Embedding(577, 1024)
      )
      (pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      (encoder): CLIPEncoder(
        (layers): ModuleList(
          (0-23): 24 x CLIPEncoderLayer(
            (self_attn): CLIPAttention(
              (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (mlp): CLIPMLP(
              (activation_fn): QuickGELUActivation()
              (fc1): Linear(in_features=1024, out_features=4096, bias=True)
              (fc2): Linear(in_features=4096, out_features=1024, bias=True)
            )
            (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          )
        )
      )
      (post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    )
  )
  (multi_modal_projector): LlavaMultiModalProjector(
    (linear_1): Linear(in_features=1024, out_features=4096, bias=True)
    (act): GELUActivation()
    (linear_2): Linear(in_features=4096, out_features=4096, bias=True)
  )
  (language_model): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32064, 4096)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): LlamaSdpaAttention(
            (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): LlamaMLP(
            (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
            (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
            (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
            (act_fn): SiLU()
          )
          (input_layernorm): LlamaRMSNorm()
          (post_attention_layernorm): LlamaRMSNorm()
        )
      )
      (norm): LlamaRMSNorm()
    )
    (lm_head): Linear(in_features=4096, out_features=32064, bias=False)
  )
)

一共三部分：

CLIPVision负责处理图像部分
Llama负责文本部分
multi_modal_projector负责将图像hidden_size投影到Llama一样维度。

数据处理

图像部分走的是CLIP处理流程，resize到336*336,所以pixel_values shape为(3, 336, 336)，其他没啥特殊。文字部分走的是Llama,这个就很熟悉了。

visual和text对齐

image走ViT,kernel_size为14，所以计算过程和结果如下：

1
2
3

(336-14)/14+1 = 24
24 * 24 = 596
# 如果考虑CLIP CLS的话就是597。

由于CLIP输出是1024,经过multi_modal_projector后维度为(1,576,4096)，这个也是下面image_features的维度。
至此同一个维度4096。

image插入位置

原prompt如下：

1	prompt = "USER: <image>\nWhat's the content of the image? ASSISTANT:"

<image>是插在指定位置的，那么这里图文对齐和之前的对齐就增添了另外一个含义：不仅要能图文对齐，还要考虑图像插入位置。

参考上图中_merge_input_ids_with_image_features函数以及结合上图中的信息，那么不难得出如下结论：

1 2	(final_embedding[:, 5:576+5, :] == image_features[:, :, :]).all() # > tensor(True)

由于后续target task为VQA、Image Caption之类的，先到此为止。

展开全文 >>

多模态-繁体不同排版

2024-08-19

前言

最近有个需求，能够对不同排版格式的繁体信息进行抽取，所以从传统的版面分析+文字检测、识别+阅读顺序+NLP到现在发展的多模态大模型综合调研。

此处以文字区域检测+识别做个demo,来直观感受多模态大模型的结果。

总结

LVLM(large vision language model)相比LLM至少落后一代。
LVLM相比LLM更具有挑战性。1）多模态信息融合。2）从结果上看，训练时长与loss下降速度。3）高清图片，针对不同尺寸的图片，原来例如CLIP使用固长224*224像素来patch,Qwen-VL-Chat使用448,MiniCPM-V-2_6采用动态计算切分方式，来更好贴近原始SigLIP的输入尺寸,减少缩放后图片质量的损失。以及引入query embed来减少高清图片输入长度过长问题，看Perceiver Sampler和LlaVA-UHD。
存在更为明显重复生成问题。之前在试字节豆包时，在一个问答中涉及批量工作场景时遇到过重复生成，即模型“宕机”了，不断重复一段文本。其他LLM也有这类问题，只是相比LVLM更少些。在未见过的或者说更为复杂排版场景时，出错的概率更高。
推理速度。不过这个也是跟随业界大环境走吧，第一代的LLM其实速度也不是很快，后面VLLM等加速推理框架、其他技术出现等。

附件

注：实际比下面还复杂些，此处只做简单demo展示。

简体横排（没有这样训练语料仍能泛化，模型本身具备OCR能力）：

手写体（基本仍符合横排）：

繁体竖排：

对于更复杂场景的排版，泛化效果不理想，官方解释是原始训练语料包含少。

展开全文 >>

介绍

知识图谱生成

查询

用户query解析

Local Search

Global Search

根据entities和relations获取有关chunks

LLM回答

总结

目录

介绍

Chunk流程

1. 数据准备

2. chunking.yaml

3. split_documents

4. first_chunk_summary

5. resplit_chunk_and_generate_summary

6. 第一个完整的chunk

chunk流程总结

Atomic Question Tagging

流程

总结

QA

入口和整体流程

1. propose question decomposition（对用户输入的问题进行分解）

2. Retrieve relevant atom information（召回相关chunk）

2.1 基于用户question生成的sub-questions进行召回

2.2 Backup retrieval1

2.3 Backup retrieval2

3. 让大模型决定搜索结果

总结

前言

数据示例

实战演示

训练

推理

✳️什么是 CPPE-5？

1. 下载数据集

2. 转换成yolo格式

2.1 数据预处理脚本

2.2 配置文件data.yaml

3. 启动YOLOv11模型训练

4. 推理预测

✅ 总结

1. 什么是Deep Research？为什么它值得关注

2. Gemini+LangGraph：一个全流程深度研究的范例

3. 核心实现流程解析

4. 是否值得学习LangGraph？我认为值得

5. 我的简化实现（Gemini→DeepSeek，Google Search→SearXNG）

附录

SearXNG

LangGraph Studio

LangGraph DeepResearch最小实现

LangGraph是什么？

为什么只是给大模型Tools还不够？我们真正需要的是“编排”

LangGraph——让模型专注智能，流程交给编排系统

实现一个会讲笑话、查询天气、闲聊的机器人

编排可视化

1. 安装必要包

2. 创建langgraph.json

3. 运行

4. 示例

5. 客户端调用

个人感受

前言

为什么需要高质量数据集？

什么是高质量数据集？

介绍

模型结构

数据处理

visual和text对齐

image插入位置

前言

总结

附件

2.2 配置文件`data.yaml`