初步探索 superset新增导出excel功能

1,730 阅读5分钟

一、环境

  • Python 3.8.10
  • Node v16.16.0
  • Windows 10 专业版 64 位操作系统
  • Superset 1.5

二、前端操作步骤

我们可以参照csv的导出方式,去扩展excel。以以下页面为例

2.1 确定导出csv的地方

2.1.1查找样式

发现有个chart-slice的类名,以这个关键词进行全局搜索

发现Chart下有包含

2.1.2 找到渲染导出csv的button组件

进入chart组件可以看到关联组件SliceHeader

按照查找思路SliceHeader --> SliceHeaderControls,该文件下全局搜索CSV,可以确定csv的button组件导出位置

2.2 写入导出EXCEL组件

\superset-1.5\superset-frontend\src\dashboard\components\SliceHeaderControls\index.tsx

2.2.1 添加导出菜单

目前可以看菜单按钮有两块

{this.props.slice.viz_type !== 'filter_box' &&
    this.props.supersetCanCSV && (
    <Menu.Item key={MENU_KEYS.EXPORT_CSV}>{t('Export CSV')}</Menu.Item>
)}
          

{this.props.slice.viz_type !== 'filter_box' &&
  isFeatureEnabled(FeatureFlag.ALLOW_FULL_CSV_EXPORT) &&
  this.props.supersetCanCSV &&
  isTable && (
    <Menu.Item key={MENU_KEYS.EXPORT_FULL_CSV}>
      {t('Export full CSV')}
    </Menu.Item>
  )}

依葫芦画瓢,在后面新增两个excel导出按钮

{this.props.slice.viz_type !== 'filter_box' &&
  this.props.supersetCanExcel && (
    <Menu.Item key={MENU_KEYS.EXPORT_EXCEL}>{t('Export EXCEL')}</Menu.Item>
  )}
{this.props.slice.viz_type !== 'filter_box' &&
    isFeatureEnabled(FeatureFlag.ALLOW_FULL_EXCEL_EXPORT) &&
    this.props.supersetCanExcel &&
    isTable && (
      <Menu.Item key={MENU_KEYS.EXPORT_FULL_EXCEL}>
        {t('Export full EXCEL')}
      </Menu.Item>
  )}

2.2.2 添加相应字段

此时应该有很多未定义报错,所以需要添加以下代码

找到MENU_KEYS,在末尾添加

EXPORT_FULL_CSV: 'export_full_csv',
EXPORT_FULL_EXCEL: 'export_full_excel',

找到\superset-1.5\superset-frontend\packages\superset-ui-core\src\utils\featureFlags.ts中的FeatureFlag,在末尾添加

 ALLOW_FULL_EXCEL_EXPORT = 'ALLOW_FULL_EXCEL_EXPORT',

2.2.3 消除props不通过的校验

现在还有些问题是因为有些属性没有声明,props校验不通过,所以需要添加属性。目前有三个属性未添加supersetCanExcel、exportFullExcel、exportExcel。

  • \superset-1.5\superset-frontend\src\dashboard\components\SliceHeaderControls\index.tsx

找到export interface SliceHeaderControlsProps,在末尾添加

exportExcel?: (sliceId: number) => void;
exportFullExcel?: (sliceId: number) => void;
supersetCanExcel?: boolean;
  • \superset-1.5\superset-frontend\src\dashboard\components\SliceHeader\index.tsx

找到const SliceHeader: FC,在末尾添加

exportExcel = () => ({}),
exportFullExcel,
supersetCanExcel = false,
  • superset-1.5\superset-frontend\src\dashboard\containers\Chart.jsx

在supersetCanCSV: !!dashboardInfo.superset_can_csv, 后面新增

supersetCanExcel: !!dashboardInfo.superset_can_excel,
  • \superset-1.5\superset-frontend\src\dashboard\actions\hydrate.js

由于数据源来自于dashboardInfo,所以我们找到来源为hydrate.js,此时发现这个步骤牵扯到权限问题,目前还未实现,所以我们暂时写死,

// superset_can_excel: findPermission('can_excel', 'Superset', roles),
  superset_can_excel: true,

2.3 添加Excel导出方法

2.3.1 \superset-1.5\superset-frontend\src\dashboard\components\gridComponents\Chart.jsx

现在需要新增导出excel方法,同样参照exportCSV添加以下代码

exportExcel(isFullExcel = false) {
    this.props.logEvent(LOG_ACTIONS_EXPORT_EXCEL_DASHBOARD_CHART, {
      slice_id: this.props.slice.slice_id,
      is_cached: this.props.isCached,
    });
    exportChart({
      formData: isFullExcel
      ? { ...this.props.formData, row_limit: this.props.maxRows }
      : this.props.formData,
      // resultType: 'results',
      resultType: 'full',
      resultFormat: 'excel',
    });
}

  exportFullExcel() {
    this.exportExcel(true);
  }

2.3.2 \superset-1.5\superset-frontend\src\logger\LogUtils.ts

需要添加

  export const LOG_ACTIONS_EXPORT_EXCEL_DASHBOARD_CHART =
  'export_excel_dashboard_chart'; 

2.4 修改exportChart相关导出方法

exportChart是主要的导出csv的方法

export const exportChart = ({
  formData,
  resultFormat = 'json',
  resultType = 'full',
  force = false,
  ownState = {},
}) => {
  let url;
  let payload;
  if (shouldUseLegacyApi(formData)) {
    const endpointType = getLegacyEndpointType({ resultFormat, resultType });
    url = getExploreUrl({
      formData,
      endpointType,
      allowDomainSharding: false,
    });
    payload = formData;
  } else {
    url = '/api/v1/chart/data';
    payload = buildV1ChartDataPayload({
      formData,
      force,
      resultFormat,
      resultType,
      ownState,
    });
  }
  console.log("url:", url, payload);
  postForm(url, payload);
};

shouldUseLegacyApi这个函数主要是根据useLegacyApi来判断调用哪个接口,useLegacyApi是在图表注册时就已经确定好了。里面有两个函数,getLegacyEndpointType(生成最终类型的函数), getExploreUrl(生成导出url的函数)。

2.4.1 修改getLegacyEndpointType

export const getLegacyEndpointType = ({ resultType, resultFormat }) =>
  (resultFormat === 'csv' || resultFormat === 'excel' ) ? resultFormat : resultType;

2.4.2 修改 getExploreUrl

找到

if (endpointType === 'csv') {
    search.csv = 'true';
  }

在后面新增

if(endpointType === 'excel') {
    search.excel = 'true';
}

前端已经修改完毕,最终修改结果

三、后端操作步骤

3.1 下载openpyxl

安装excel的python驱动包openpyxl包,执行

pip install openpyxl

\

3.2 修改config.py

\superset-1.5\superset\config.py

CSV_EXPORT = {"encoding": "utf-8"} 后面新增


CSV_EXPORT = {"encoding": "utf-8"}
EXCEL_EXPORT = {"encoding": "utf-8"}

3.3 添加类型

我们需要在ChartDataResultFormat里面新增一个excel的类型

\superset-1.5\superset\common\chart_data.py

class ChartDataResultFormat(str, Enum):
    """
    Chart data response format
    """

    CSV = "csv"
    JSON = "json"
    EXCEL = "excel" // 新增

3.4 实现ExcelResponse

\superset-1.5\superset\views\base.py

class ExcelResponse(Response):  # pylint: disable=too-many-ancestors
    """
       Override Response to take into account xlsx encoding from config.py
    """

    charset = conf.get("EXCEL_EXPORT").get("encoding", "utf-8")

3.5 扩展generate_json

\superset-1.5\superset\views\core.py

找到最后return的地方,可以看到需要实现generate_json,在这个方法新增如下代码

1import上面的ExcelResponse

from superset.views.base import (
    api,
    BaseSupersetView,
    check_ownership,
    common_bootstrap_payload,
    create_table_permissions,
    CsvResponse,
    ExcelResponse,
    data_payload_response,
    generate_download_headers,
    get_error_msg,
    handle_api_exception,
    json_error_response,
    json_errors_response,
    json_success,
    validate_sqlatable,
)


2、增加response_type
    def generate_json(
        self, viz_obj: BaseViz, response_type: Optional[str] = None
    ) -> FlaskResponse:
        if response_type == ChartDataResultFormat.CSV:
            return CsvResponse(
                viz_obj.get_csv(), headers=generate_download_headers("csv")
            )

        #20220905
        if response_type == ChartDataResultFormat.EXCEL:
            return ExcelResponse(
                viz_obj.get_excel(), headers=generate_download_headers("xlsx")
            )

3.6 实现get_excel

\superset-1.5\superset\viz.py

在头部导入

import io

在get_csv后面新增如下代码

    def get_excel(self) -> bytes:
        data = io.BytesIO()
        df = self.get_df_payload()["df"]
        include_index = not isinstance(df.index, pd.RangeIndex)
        df.to_excel(data, index=include_index, **config.get("EXCEL_EXPORT"))
        data.seek(0)
        return data.read()

3.7 修改接口charts\data\api.py

下载接口一共有两个,一个是explore_json,但不支持表格之类的下载,此时需要修改另一个接口

\superset-1.5\superset\charts\data\api.py中的_send_chart_response方法,在

return Response(
    create_zip(files),
    headers=generate_download_headers("zip"),
    mimetype="application/zip",
)

后面新增

 
# import pandas
import pandas as pd
……
# import ExcelResponse
from superset.views.base import CsvResponse, generate_download_headers, ExcelResponse
 
 ……
 
    )

        # add by yangyuan on 20220811
        if result_format == ChartDataResultFormat.EXCEL:
            # pass
            # Verify user has permission to export excel file
            # if not security_manager.can_access("can_csv", "Superset"):
            #     return self.response_403()
            import io
            config = current_app.config
            if not result["queries"]:
                return self.response_400(_("Empty query result"))
        
            if len(result["queries"]) == 1:
                # print('excel1' * 80)
                # return single query results excel format
                try:
                    str_data = result["queries"][0]["data"]
                    # print(type(str_data))
                    # print(type(str_data[0]))
                    # print(str_data[0:2])
                    df = pd.DataFrame(str_data)
                    # print(df[0:3])
                    data = io.BytesIO()
                    include_index = not isinstance(df.index, pd.RangeIndex)
                    df.to_excel(data, index=include_index, **config.get("EXCEL_EXPORT"))
                    data.seek(0)
                    return ExcelResponse(data.read(), headers=generate_download_headers("xlsx"))
                except Exception as e:
                    print('error', str(e))
                    return self.response_400(_("deal_queries.error:[{}]".format(str(e))))
            # return multi-query excel results bundled as a zip file
            # encoding = current_app.config["EXCEL_EXPORT"].get("encoding", "utf-8")
            files = {}
            for idx, result in enumerate(result["queries"]):
                str_data = result["data"]
                df = pd.DataFrame(str_data)
                data = io.BytesIO()
                include_index = not isinstance(df.index, pd.RangeIndex)
                df.to_excel(data, index=include_index, **config.get("EXCEL_EXPORT"))
                data.seek(0)
                files[f"query_{idx + 1}.xlsx"] = data.read()
            return Response(
                create_zip(files),
                headers=generate_download_headers("zip"),
                mimetype="application/zip",
            )
            return self.response_400(_("Deal with excel error."))

最后一个流程就大致完成了,我们可以在这个页面测试下

下载成功

四、总结

这是superset二次开发支持excel的初步探索,目前还有一些问题未解决。如下:

  • Sqllab中点击.xlsx

点击.xlsx,需要实现对应的/superset/excel/${this.props.query.id}方法。可以参考

\superset-1.5\superset\views\core.py中的这段实现csv的方法

@has_access
@event_logger.log_this
@expose("/csv/<client_id>")
def csv(  # pylint: disable=no-self-use,too-many-locals
    self, client_id: str
) -> FlaskResponse:
    """Download the query results as csv."""
    logger.info("Exporting CSV file [%s]", client_id)
    query = db.session.query(Query).filter_by(client_id=client_id).one()

    try:
        query.raise_for_access()
    except SupersetSecurityException as ex:
        flash(ex.error.message)
        return redirect("/")

    blob = None
    if results_backend and query.results_key:
        logger.info("Fetching CSV from results backend [%s]", query.results_key)
        blob = results_backend.get(query.results_key)
    if blob:
        logger.info("Decompressing")
        payload = utils.zlib_decompress(
            blob, decode=not results_backend_use_msgpack
        )
        obj = _deserialize_results_payload(
            payload, query, cast(bool, results_backend_use_msgpack)
        )
        columns = [c["name"] for c in obj["columns"]]
        df = pd.DataFrame.from_records(obj["data"], columns=columns)
        logger.info("Using pandas to convert to CSV")
    else:
        logger.info("Running a query to turn into CSV")
        if query.select_sql:
            sql = query.select_sql
            limit = None
        else:
            sql = query.executed_sql
            limit = ParsedQuery(sql).limit
        if limit is not None and query.limiting_factor in {
            LimitingFactor.QUERY,
            LimitingFactor.DROPDOWN,
            LimitingFactor.QUERY_AND_DROPDOWN,
        }:
            # remove extra row from `increased_limit`
            limit -= 1
        df = query.database.get_df(sql, query.schema)[:limit]

    csv_data = csv.df_to_escaped_csv(df, index=False, **config["CSV_EXPORT"])
    quoted_csv_name = parse.quote(query.name)
    response = CsvResponse(
        csv_data, headers=generate_download_headers("csv", quoted_csv_name)
    )
    event_info = {
        "event_type": "data_export",
        "client_id": client_id,
        "row_count": len(df.index),
        "database": query.database.name,
        "schema": query.schema,
        "sql": query.sql,
        "exported_format": "csv",
    }
    event_rep = repr(event_info)
    logger.debug(
        "CSV exported: %s", event_rep, extra={"superset_event": event_info}
    )
    return response
  • 导出excel权限问题

其余的导出excel功能也可以参考这种思路去实现

参考: