Flink Agents实战:产品评论智能分析

134 阅读7分钟

一、场景说明

1 处理产品评论流数据,通过单个智能体从每条评论中提取评分(1-5 分)和不满意原因。

2 发现用户评论较低的记录及时通知发行经理。

3 通过时间窗口聚合每条评论的分析结果,生成产品级汇总报告(评分分布及常见投诉点)。

4 根据汇总报告,生成产品改进建议。

5 用户评论分析和产品改进建议均以实时流的形式输出。

二、环境准备

1、已安装好apache-flink==1.20.3;未安装Flink的可参考官方文档安装。

2、已安装好Python==3.11;未安装Python的可参考官方文档安装。

3、已安装好Ollama,本地部署qwen3:8b模型。

4、安装flink-agents==0.1.0;

pip install flink-agents==0.1.0

5、将 PYTHONPATH 环境变量设置为 flink-agents 的安装目录。

6、将 Flink Agents 发行版 JAR 文件复制到您的 Flink 安装目录下的 lib 目录中。

# Copy the JAR from the Python package to Flink's lib directory
cp $PYTHONPATH/flink_agents/lib/flink-agents-dist-0.1.0.jar $FLINK_HOME/lib/

7、启动Flink集群

# Start your Flink cluster
$FLINK_HOME/bin/start-cluster.sh

三、代码实现

1、评论分析智能体创建

import json
import logging

from flink_agents.api.agent import Agent
from flink_agents.api.prompts.prompt import Prompt
from flink_agents.api.decorators import (action, chat_model_setup, prompt, tool,)
from flink_agents.api.resource import ResourceDescriptor
from flink_agents.api.events.chat_event import ChatRequestEvent, ChatResponseEvent
from flink_agents.api.events.event import InputEvent, OutputEvent
from flink_agents.api.runner_context import RunnerContext
from flink_agents.api.chat_message import ChatMessage, MessageRole
# flink_agents.integrations
from flink_agents.integrations.chat_models.ollama_chat_model import ( OllamaChatModelSetup,)
#自定义类型和资源
from custom_types_and_resources import (
    ProductReview,
    review_analysis_prompt,
    notify_shipping_manager,
    ProductReviewAnalysisRes,
    ollama_server_descriptor,
)

# 评论分析智能体
class ReviewAnalysisAgent(Agent):
    """An agent that uses a large language model (LLM) to analyze product reviews
    and generate a satisfaction score and potential reasons for dissatisfaction.
    This agent receives a product review and produces a satisfaction score and a list of reasons for dissatisfaction. It handles prompt construction, LLM interaction, and output parsing.
    """

    @prompt
    @staticmethod
    def review_analysis_prompt() -> Prompt:
        """Prompt for review analysis."""
        return review_analysis_prompt

    @tool
    @staticmethod
    def notify_shipping_manager(id: str, review: str) -> None:
        """Notify the shipping manager when product received a negative review due to shipping damage.
        Parameters
        ----------
        id : str
            The id of the product that received a negative review due to shipping damage
        review: str
            The negative review content
        """
        # reuse the declared function, but for parsing the tool metadata, we write doc
        # string here again.
        notify_shipping_manager(id=id, review=review)

    @chat_model_setup
    @staticmethod
    def review_analysis_model() -> ResourceDescriptor:
        """ChatModel which focus on review analysis."""
        return ResourceDescriptor(
            clazz=OllamaChatModelSetup,
            connection="ollama_server",
            model="qwen3:4b",
            prompt="review_analysis_prompt",
            tools=["notify_shipping_manager"],
            extract_reasoning=True,
        )

    @action(InputEvent)
    @staticmethod
    def process_input(event: InputEvent, ctx: RunnerContext) -> None:
        """Process input event and send chat request for review analysis."""
        input: ProductReview = event.input
        ctx.short_term_memory.set("id", input.id)

        content = f"""
            "id": {input.id},
            "review": {input.review}
        """
        msg = ChatMessage(role=MessageRole.USER, extra_args={"input": content})
        ctx.send_event(ChatRequestEvent(model="review_analysis_model", messages=[msg]))

    @action(ChatResponseEvent)
    @staticmethod
    def process_chat_response(event: ChatResponseEvent, ctx: RunnerContext) -> None:
        """Process chat response event and send output event."""
        try:
            json_content = json.loads(event.response.content)
            ctx.send_event(
                OutputEvent(
                    output=ProductReviewAnalysisRes(
                        id=ctx.short_term_memory.get("id"),
                        score=json_content["score"],
                        reasons=json_content["reasons"],
                    )
                )
            )
        except Exception:
            logging.exception(
                f"Error processing chat response {event.response.content}"
            )

2、打分分布和不喜欢原因分析

from pyflink.datastream import ProcessWindowFunction
from typing import Iterable
from custom_types_and_resources import ProductReviewSummary, ProductReviewAnalysisRes


class AggregateScoreDistributionAndDislikeReasons(ProcessWindowFunction):
    """Aggregate score distribution and dislike reasons."""

    def process(
        self,
        key: str,
        context: "ProcessWindowFunction.Context",
        elements: Iterable[ProductReviewAnalysisRes],
    ) -> Iterable[ProductReviewSummary]:
        """Aggregate score distribution and dislike reasons."""
        rating_counts = [0 for _ in range(5)]
        reason_list = []
        for element in elements:
            rating = element.score
            if 1 <= rating <= 5:
                rating_counts[rating - 1] += 1
            reason_list = reason_list + element.reasons
        total = sum(rating_counts)
        percentages = [round((x / total) * 100, 1) for x in rating_counts]
        formatted_percentages = [f"{p}%" for p in percentages]
        return [
            ProductReviewSummary(
                id=key,
                score_hist=formatted_percentages,
                unsatisfied_reasons=reason_list,
            )
        ]

3、产品改进建议智能体创建

import json
import logging

from flink_agents.api.agent import Agent
from flink_agents.api.chat_message import ChatMessage, MessageRole
from flink_agents.api.decorators import (
    action,
    chat_model_setup,
    prompt,
)
from flink_agents.api.events.chat_event import ChatRequestEvent, ChatResponseEvent
from flink_agents.api.events.event import InputEvent, OutputEvent
from flink_agents.api.prompts.prompt import Prompt
from flink_agents.api.resource import ResourceDescriptor
from flink_agents.api.runner_context import RunnerContext
from flink_agents.integrations.chat_models.ollama_chat_model import (
    OllamaChatModelSetup,
)

from custom_types_and_resources import (
    ProductSuggestion,
    ProductReviewSummary,
    product_suggestion_prompt,
)



class ProductSuggestionAgent(Agent):
    """An agent that uses a large language model (LLM) to generate actionable product
    improvement suggestions from aggregated product review data.

    This agent receives a summary of product reviews, including a rating distribution
    and a list of user dissatisfaction reasons, and produces concrete suggestions for
    product enhancement. It handles prompt construction, LLM interaction, and output
    parsing.
    """

    @prompt
    @staticmethod
    def generate_suggestion_prompt() -> Prompt:
        """Generate product suggestions based on the rating distribution and user
        dissatisfaction reasons.
        """
        return product_suggestion_prompt

    @chat_model_setup
    @staticmethod
    def generate_suggestion_model() -> ResourceDescriptor:
        """ChatModel which focus on generating product suggestions."""
        return ResourceDescriptor(
            clazz=OllamaChatModelSetup,
            connection="ollama_server",
            model="qwen3:8b",
            prompt="generate_suggestion_prompt",
            extract_reasoning=True,
        )

    @action(InputEvent)
    @staticmethod
    def process_input(event: InputEvent, ctx: RunnerContext) -> None:
        """Process input event."""
        input: ProductReviewSummary = event.input
        ctx.short_term_memory.set("id", input.id)
        ctx.short_term_memory.set("score_hist", input.score_hist)

        content = f"""
            "id": {input.id},
            "score_histogram": {input.score_hist},
            "unsatisfied_reasons": {input.unsatisfied_reasons}
        """
        ctx.send_event(
            ChatRequestEvent(
                model="generate_suggestion_model",
                messages=[
                    ChatMessage(role=MessageRole.USER, extra_args={"input": content})
                ],
            )
        )

    @action(ChatResponseEvent)
    @staticmethod
    def process_chat_response(event: ChatResponseEvent, ctx: RunnerContext) -> None:
        """Process chat response event."""
        try:
            json_content = json.loads(event.response.content)
            ctx.send_event(
                OutputEvent(
                    output=ProductSuggestion(
                        id=ctx.short_term_memory.get("id"),
                        score_hist=ctx.short_term_memory.get("score_hist"),
                        suggestions=json_content["suggestions"],
                    )
                )
            )
        except Exception:
            logging.exception(
                f"Error processing chat response {event.response.content}"
            )

            # To fail the agent, you can raise an exception here.

4、自定义数据类型与资源

from typing import List

from pydantic import BaseModel

from flink_agents.api.chat_message import ChatMessage, MessageRole
from flink_agents.api.prompts.prompt import Prompt
from flink_agents.api.resource import ResourceDescriptor
from flink_agents.integrations.chat_models.ollama_chat_model import (
    OllamaChatModelConnection,
)

# Prompt for review analysis agent.
review_analysis_system_prompt_str = """
    Analyze the user review and product information to determine a
    satisfaction score (1-5) and potential reasons for dissatisfaction.

    Example input format:
    {{
        "id": "12345",
        "review": "The headphones broke after one week of use. Very poor quality."
    }}

    Ensure your response can be parsed by Python JSON, using this format as an example:
    {{
     "id": "12345",
     "score": 1,
     "reasons": [
       "poor quality"
       ]
    }}

    Please note that if a product review includes dissatisfaction with the shipping process,
    you should first notify the shipping manager using the appropriate tools. After executing
    the tools, strictly follow the example above to provide your score and reason — there is
    no need to disclose whether the tool was used.
    """

review_analysis_prompt = Prompt.from_messages(
    messages=[
        ChatMessage(
            role=MessageRole.SYSTEM,
            content=review_analysis_system_prompt_str,
        ),
        # Here we just fill the prompt with input, user should deserialize
        # input element to input text self in action.
        ChatMessage(
            role=MessageRole.USER,
            content="""
            "input":
            {input}
            """,
        ),
    ],
)


# Prompt for product suggestion agent.
product_suggestion_prompt_str = """
        Based on the rating distribution and user dissatisfaction reasons, generate three actionable suggestions for product improvement.

        Input format:
        {{
            "id": "1",
            "score_histogram": ["10%", "20%", "10%", "15%", "45%"],
            "unsatisfied_reasons": ["reason1", "reason2", "reason3"]
        }}

        Ensure that your response can be parsed by Python json,use the following format as an example:
        {{
            "suggestion_list": [
                "suggestion1",
                "suggestion2",
                "suggestion3"
            ]
        }}

        input:
        {input}
        """

product_suggestion_prompt = Prompt.from_text(product_suggestion_prompt_str)


# Tool for notifying the shipping manager. For simplicity, just print the message.
def notify_shipping_manager(id: str, review: str) -> None:
    """Notify the shipping manager when product received a negative review due to
    shipping damage.

    Parameters
    ----------
    id : str
        The id of the product that received a negative review due to shipping damage
    review: str
        The negative review content
    """
    content = (
        f"Transportation issue for product [{id}], the customer feedback: {review}"
    )
    print(content)


# Custom types used for product suggestion agent
class ProductReviewSummary(BaseModel):
    """Aggregates multiple reviews and insights using LLM for a product.

    Attributes:
        id (str): The unique identifier of the product.
        score_hist (List[str]): A collection of rating scores from various reviews.
        unsatisfied_reasons (List[str]): A list of reasons or insights generated by LLM
            to explain the rating.
    """

    id: str
    score_hist: List[str]
    unsatisfied_reasons: List[str]

class ProductSuggestion(BaseModel):
    """Provides a summary of review data including suggestions for improvement.

    Attributes:
        id (str): The unique identifier of the product.
        score_histogram (List[int]): A collection of rating scores from various reviews.
        suggestions (List[str]): Suggestions or recommendations generated as a result of
            review analysis.
    """

    id: str
    score_hist: List[str]
    suggestions: List[str]


# custom types for review analysis agent.
class ProductReview(BaseModel):
    """Data model representing a product review.

    Attributes:
    ----------
    id : str
        The unique identifier for the product being reviewed.
    review : str
        The review of the product.
    """

    id: str
    review: str

class ProductReviewAnalysisRes(BaseModel):
    """Data model representing analysis result of a product review.

    Attributes:
    ----------
    id : str
        The unique identifier for the product being reviewed.
    score : int
        The satisfaction score given by the reviewer.
    reasons : List[str]
        A list of reasons provided by the reviewer for dissatisfaction, if any.
    """

    id: str
    score: int
    reasons: list[str]


# ollama chat model connection descriptor
ollama_server_descriptor = ResourceDescriptor(
    clazz=OllamaChatModelConnection, request_timeout=120
)

5、智能体与流计算集成

from pathlib import Path

from pyflink.datastream import StreamExecutionEnvironment
from pyflink.common import Duration, WatermarkStrategy
from pyflink.datastream.connectors.file_system import FileSource, StreamFormat
from pyflink.datastream.window import TumblingProcessingTimeWindows

from flink_agents.api.execution_environment import AgentsExecutionEnvironment

from custom_types_and_resources import (
    ollama_server_descriptor,
    ProductReview, 
)
from review_analysis_agent import ReviewAnalysisAgent
from scores_reasons_aggregator import AggregateScoreDistributionAndDislikeReasons
from product_suggestion_agent import ProductSuggestionAgent


base_dir = Path(__file__).parent.parent

def main():
    # Set up the Flink streaming environment and the Agents execution environment.
    env = StreamExecutionEnvironment.get_execution_environment()
    agents_env = AgentsExecutionEnvironment.get_execution_environment(env)

    # Add Ollama chat model connection to be used by the ReviewAnalysisAgent and ProductSuggestionAgent.
    agents_env.add_resource(
        "ollama_server",
        ollama_server_descriptor,
    )
    
    # Read product reviews from a text file as a streaming source.
    # Each line in the file should be a JSON string representing a ProductReview.
    product_review_stream = env.from_source(
        source=FileSource.for_record_stream_format(
            StreamFormat.text_line_format(), f"file:///{base_dir}/resources"
        )
        .monitor_continuously(Duration.of_minutes(1))
        .build(),
        watermark_strategy=WatermarkStrategy.no_watermarks(),
        source_name="streaming_agent_example",
    ).map(
        lambda x: ProductReview.model_validate_json(x)  # Deserialize JSON to ProductReview.
    )
    
    # Use the ReviewAnalysisAgent to analyze each product review.
    review_analysis_res_stream = (
        agents_env.from_datastream(
            input=product_review_stream, key_selector=lambda x: x.id
        )
        .apply(ReviewAnalysisAgent())
        .to_datastream()
    )

    # Aggregate the analysis results in 1-minute tumbling windows.
    # This produces a score distribution and collects all unsatisfied reasons for each
    # product.
    aggregated_analysis_res_stream = (
        review_analysis_res_stream.key_by(lambda x: x.id)
        .window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
        .process(AggregateScoreDistributionAndDislikeReasons())
    )

    # Use the ProductSuggestionAgent (LLM) to generate product improvement suggestions
    # based on the aggregated analysis results.
    product_suggestion_res_stream = (
        agents_env.from_datastream(
            input=aggregated_analysis_res_stream,
            key_selector=lambda x: x.id,
        )
        .apply(ProductSuggestionAgent())
        .to_datastream()
    )

    # Print the final product improvement suggestions to stdout.
    product_suggestion_res_stream.print()

    # Execute the pipeline.
    agents_env.execute()

if __name__ == "__main__":
    main()

四、总结

本文基于Flink Agents开发了评论分析智能体和产品改进建议智能体,集成Flink引擎,实现了产品评论的实时分析和产品改进建议的实时输出。

通过这个案例,我们熟悉了Flink Agents基本组件的使用方法。为基于Flink Agents开发更为复杂场景的事件驱动型智能体系统打下坚实基础。