Elasticsearch:Jira 连接器教程第二部分 - 6 个优化技巧

156 阅读18分钟

作者:来自 Elastic Gustavo Llermaly

将 Jira 连接到 Elasticsearch 后,我们现在将回顾最佳实践以升级此部署。

本系列的第一部分中,我们配置了 Jira 连接器并将对象索引到 Elasticsearch 中。在第二部分中,我们将回顾一些最佳实践和高级配置以升级连接器。这些实践是对当前文档的补充,将在索引阶段使用。

运行连接器只是第一步。当你想要索引大量数据时,每个细节都很重要,当你从 Jira 索引文档时,你可以使用许多优化点。

优化点

  1. 通过应用高级同步过滤器仅索引你需要的文档

  2. 仅索引你将使用的字段

  3. 根据你的需求优化映射

  4. 自动化文档级别安全性

  5. 卸载附件提取

  6. 监控连接器的日志

  7. 通过应用高级同步过滤器仅索引你需要的文档


默认情况下,Jira 会发送所有项目、问题和附件。如果你只对其中一些感兴趣,或者例如只对 “In Progress - 正在进行” 的问题感兴趣,我们建议不要索引所有内容。

在将文档放入 Elasticsearch 之前,有三个实例可以过滤文档:

  1. 远程:我们可以使用原生 Jira 过滤器来获取我们需要的内容。这是最好的选择,你应该尽可能尝试使用此选项,因为这样,文档在进入 Elasticsearch 之前甚至不会从源中出来。我们将为此使用高级同步规则。
  2. 集成:如果源​​没有原生过滤器来提供我们需要的内容,我们仍然可以使用基本同步规则在集成级别进行过滤,然后再将其导入 Elasticsearch。
  3. 摄入管道:在索引数据之前处理数据的最后一个选项是使用 Elasticsearch 摄入管道(ingest pipeline)。通过使用 Painless 脚本,我们可以非常灵活地过滤或操作文档。这样做的缺点是数据已经离开源并通过连接器,因此可能会给系统带来沉重的负担并产生安全问题。

让我们快速回顾一下 Jira 问题:



1.  GET bank/_search
2.  {
3.    "_source": ["Issue.status.name", "Issue.summary"],
4.    "query": {
5.      "exists": {
6.        "field": "Issue.status.name"
7.      }
8.    }
9.  }


注意:我们使用 “exists” 查询仅返回具有我们过滤的字段的文档。

你可以看到 “To Do” 中有很多我们不需要的问题:



1.  {
2.    "took": 3,
3.    "timed_out": false,
4.    "_shards": {
5.      "total": 2,
6.      "successful": 2,
7.      "skipped": 0,
8.      "failed": 0
9.    },
10.    "hits": {
11.      "total": {
12.        "value": 6,
13.        "relation": "eq"
14.      },
15.      "max_score": 1,
16.      "hits": [
17.        {
18.          "_index": "bank",
19.          "_id": "Marketing Mars-MM-1",
20.          "_score": 1,
21.          "_source": {
22.            "Issue": {
23.              "summary": "Conquer Mars",
24.              "status": {
25.                "name": "To Do"
26.              }
27.            }
28.          }
29.        },
30.        {
31.          "_index": "bank",
32.          "_id": "Marketing Mars-MM-3",
33.          "_score": 1,
34.          "_source": {
35.            "Issue": {
36.              "summary": "Conquering Earth",
37.              "status": {
38.                "name": "In Progress"
39.              }
40.            }
41.          }
42.        },
43.        {
44.          "_index": "bank",
45.          "_id": "Marketing Mars-MM-2",
46.          "_score": 1,
47.          "_source": {
48.            "Issue": {
49.              "summary": "Conquer the moon",
50.              "status": {
51.                "name": "To Do"
52.              }
53.            }
54.          }
55.        },
56.        {
57.          "_index": "bank",
58.          "_id": "Galactic Banking Project-GBP-3",
59.          "_score": 1,
60.          "_source": {
61.            "Issue": {
62.              "summary": "Intergalactic Security and Compliance",
63.              "status": {
64.                "name": "In Progress"
65.              }
66.            }
67.          }
68.        },
69.        {
70.          "_index": "bank",
71.          "_id": "Galactic Banking Project-GBP-2",
72.          "_score": 1,
73.          "_source": {
74.            "Issue": {
75.              "summary": "Bank Application Frontend",
76.              "status": {
77.                "name": "To Do"
78.              }
79.            }
80.          }
81.        },
82.        {
83.          "_index": "bank",
84.          "_id": "Galactic Banking Project-GBP-1",
85.          "_score": 1,
86.          "_source": {
87.            "Issue": {
88.              "summary": "Development of API for International Transfers",
89.              "status": {
90.                "name": "To Do"
91.              }
92.            }
93.          }
94.        }
95.      ]
96.    }
97.  }


为了仅获取 “In Progress” 的问题,我们将使用 JQL 查询(Jira 查询语言)创建高级同步规则:

转到连接器并单击 sync rules 选项卡,然后单击 Draft Rules。进入后,转到 Advanced Sync Rules 并添加以下内容:

 1.    [
2.      {
3.        "query": "status IN ('In Progress')"
4.      }
5.    ]

应用规则后,运行 Full Content Sync

此规则将排除所有非 “In Progress” 的问题。你可以通过再次运行查询来检查:



1.  GET bank/_search
2.  {
3.    "_source": ["Issue.status.name", "Issue.summary"],
4.    "query": {
5.      "exists": {
6.        "field": "Issue.status.name"
7.      }
8.    }
9.  }


以下是新的回应:



1.  {
2.    "took": 2,
3.    "timed_out": false,
4.    "_shards": {
5.      "total": 2,
6.      "successful": 2,
7.      "skipped": 0,
8.      "failed": 0
9.    },
10.    "hits": {
11.      "total": {
12.        "value": 2,
13.        "relation": "eq"
14.      },
15.      "max_score": 1,
16.      "hits": [
17.        {
18.          "_index": "bank",
19.          "_id": "Marketing Mars-MM-3",
20.          "_score": 1,
21.          "_source": {
22.            "Issue": {
23.              "summary": "Conquering Earth",
24.              "status": {
25.                "name": "In Progress"
26.              }
27.            }
28.          }
29.        },
30.        {
31.          "_index": "bank",
32.          "_id": "Galactic Banking Project-GBP-3",
33.          "_score": 1,
34.          "_source": {
35.            "Issue": {
36.              "summary": "Intergalactic Security and Compliance",
37.              "status": {
38.                "name": "In Progress"
39.              }
40.            }
41.          }
42.        }
43.      ]
44.    }
45.  }


  1. 仅索引你将使用的字段

现在我们只有我们想要的文档,你可以看到我们仍然会得到很多我们不需要的字段。我们可以在运行查询时使用 _source 隐藏它们,但最好的选择是不索引它们。

为此,我们将使用摄取管道(ingest pipeline)。我们可以创建一个删除所有我们不会使用的字段的管道。假设我们只想要来自问题的以下信息:

  • Assignee
  • Title
  • Status

我们可以创建一个新的摄取管道,仅使用摄取管道的 Content UI 获取这些字段:

单击复 Copy and customize,然后修改名为 index-name@custom 的管道,该管道应该刚刚创建且为空。我们可以使用 Kibana DevTools 控制台执行此操作,运行以下命令:



1.  PUT _ingest/pipeline/bank@custom
2.  {
3.    "description": "Only keep needed fields for jira issues and move them to root",
4.    "processors": [
5.      {
6.        "remove": {
7.          "keep": [
8.            "Issue.assignee.displayName",
9.            "Issue.summary",
10.            "Issue.status.name"
11.          ],
12.          "ignore_missing": true
13.        }
14.      },
15.      {
16.        "rename": {
17.          "field": "Issue.assignee.displayName",
18.          "target_field": "assignee",
19.          "ignore_missing": true
20.        }
21.      },
22.      {
23.        "rename": {
24.          "field": "Issue.summary",
25.          "target_field": "summary",
26.          "ignore_missing": true
27.        }
28.      },
29.      {
30.        "rename": {
31.          "field": "Issue.status.name",
32.          "target_field": "status",
33.          "ignore_missing": true
34.        }
35.      },
36.      {
37.        "remove": {
38.          "field": "Issue"
39.        }
40.      }
41.    ]
42.  }


让我们删除不需要的字段,并将需要的字段移至文档的根目录。

带有 keep 参数的 remove 处理器将从文档中删除除 keep 数组中的字段之外的所有字段。

我们可以通过运行模拟来检查这是否有效。从索引中添加其中一个文档的内容:



1.  POST /_ingest/pipeline/bank@custom/_simulate
2.  {
3.    "docs": [
4.      {
5.        "_index": "bank",
6.        "_id": "Galactic Banking Project-GBP-3",
7.        "_score": 1,
8.        "_source": {
9.          "Type": "Epic",
10.          "Custom_Fields": {
11.            "Satisfaction": null,
12.            "Approvals": null,
13.            "Change reason": null,
14.            "Epic Link": null,
15.            "Actual end": null,
16.            "Design": null,
17.            "Campaign assets": null,
18.            "Story point estimate": null,
19.            "Approver groups": null,
20.            "[CHART] Date of First Response": null,
21.            "Request Type": null,
22.            "Campaign goals": null,
23.            "Project overview key": null,
24.            "Related projects": null,
25.            "Campaign type": null,
26.            "Impact": null,
27.            "Request participants": [],
28.            "Locked forms": null,
29.            "Time to first response": null,
30.            "Work category": null,
31.            "Audience": null,
32.            "Open forms": null,
33.            "Details": null,
34.            "Sprint": null,
35.            "Stakeholders": null,
36.            "Marketing asset type": null,
37.            "Submitted forms": null,
38.            "Start date": null,
39.            "Actual start": null,
40.            "Category": null,
41.            "Change risk": null,
42.            "Target start": null,
43.            "Issue color": "purple",
44.            "Parent Link": {
45.              "hasEpicLinkFieldDependency": false,
46.              "showField": false,
47.              "nonEditableReason": {
48.                "reason": "EPIC_LINK_SHOULD_BE_USED",
49.                "message": "To set an epic as the parent, use the epic link instead"
50.              }
51.            },
52.            "Format": null,
53.            "Target end": null,
54.            "Approvers": null,
55.            "Team": null,
56.            "Change type": null,
57.            "Satisfaction date": null,
58.            "Request language": null,
59.            "Amount": null,
60.            "Rank": "0|i0001b:",
61.            "Affected services": null,
62.            "Type": null,
63.            "Time to resolution": null,
64.            "Total forms": null,
65.            "[CHART] Time in Status": null,
66.            "Organizations": [],
67.            "Flagged": null,
68.            "Project overview status": null
69.          },
70.          "Issue": {
71.            "statuscategorychangedate": "2024-11-07T16:59:54.786-0300",
72.            "issuetype": {
73.              "avatarId": 10307,
74.              "hierarchyLevel": 1,
75.              "name": "Epic",
76.              "self": "https://tomasmurua.atlassian.net/rest/api/2/issuetype/10008",
77.              "description": "Epics track collections of related bugs, stories, and tasks.",
78.              "entityId": "f5637521-ec75-48b8-a1b8-de18520807ca",
79.              "id": "10008",
80.              "iconUrl": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10307?size=medium",
81.              "subtask": false
82.            },
83.            "components": [],
84.            "timespent": null,
85.            "timeoriginalestimate": null,
86.            "project": {
87.              "simplified": true,
88.              "avatarUrls": {
89.                "48x48": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415",
90.                "24x24": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=small",
91.                "16x16": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=xsmall",
92.                "32x32": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=medium"
93.              },
94.              "name": "Galactic Banking Project",
95.              "self": "https://tomasmurua.atlassian.net/rest/api/2/project/10001",
96.              "id": "10001",
97.              "projectTypeKey": "software",
98.              "key": "GBP"
99.            },
100.            "description": null,
101.            "fixVersions": [],
102.            "aggregatetimespent": null,
103.            "resolution": null,
104.            "timetracking": {},
105.            "security": null,
106.            "aggregatetimeestimate": null,
107.            "attachment": [],
108.            "resolutiondate": null,
109.            "workratio": -1,
110.            "summary": "Intergalactic Security and Compliance",
111.            "watches": {
112.              "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/watchers",
113.              "isWatching": true,
114.              "watchCount": 1
115.            },
116.            "issuerestriction": {
117.              "issuerestrictions": {},
118.              "shouldDisplay": true
119.            },
120.            "lastViewed": "2024-11-08T02:04:25.247-0300",
121.            "creator": {
122.              "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
123.              "emailAddress": "contornan_cliche.0y@icloud.com",
124.              "avatarUrls": {
125.                "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
126.                "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
127.                "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
128.                "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
129.              },
130.              "displayName": "Tomas Murua",
131.              "accountType": "atlassian",
132.              "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
133.              "active": true,
134.              "timeZone": "Chile/Continental"
135.            },
136.            "subtasks": [],
137.            "created": "2024-10-29T15:52:40.306-0300",
138.            "reporter": {
139.              "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
140.              "emailAddress": "contornan_cliche.0y@icloud.com",
141.              "avatarUrls": {
142.                "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
143.                "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
144.                "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
145.                "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
146.              },
147.              "displayName": "Tomas Murua",
148.              "accountType": "atlassian",
149.              "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
150.              "active": true,
151.              "timeZone": "Chile/Continental"
152.            },
153.            "aggregateprogress": {
154.              "total": 0,
155.              "progress": 0
156.            },
157.            "priority": {
158.              "name": "Medium",
159.              "self": "https://tomasmurua.atlassian.net/rest/api/2/priority/3",
160.              "iconUrl": "https://tomasmurua.atlassian.net/images/icons/priorities/medium.svg",
161.              "id": "3"
162.            },
163.            "labels": [],
164.            "environment": null,
165.            "timeestimate": null,
166.            "aggregatetimeoriginalestimate": null,
167.            "versions": [],
168.            "duedate": null,
169.            "progress": {
170.              "total": 0,
171.              "progress": 0
172.            },
173.            "issuelinks": [],
174.            "votes": {
175.              "hasVoted": false,
176.              "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/votes",
177.              "votes": 0
178.            },
179.            "comment": {
180.              "total": 0,
181.              "comments": [],
182.              "maxResults": 0,
183.              "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/10008/comment",
184.              "startAt": 0
185.            },
186.            "assignee": {
187.              "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
188.              "emailAddress": "contornan_cliche.0y@icloud.com",
189.              "avatarUrls": {
190.                "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
191.                "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
192.                "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
193.                "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
194.              },
195.              "displayName": "Tomas Murua",
196.              "accountType": "atlassian",
197.              "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
198.              "active": true,
199.              "timeZone": "Chile/Continental"
200.            },
201.            "worklog": {
202.              "total": 0,
203.              "maxResults": 20,
204.              "startAt": 0,
205.              "worklogs": []
206.            },
207.            "updated": "2024-11-07T16:59:54.786-0300",
208.            "status": {
209.              "name": "In Progress",
210.              "self": "https://tomasmurua.atlassian.net/rest/api/2/status/10004",
211.              "description": "",
212.              "iconUrl": "https://tomasmurua.atlassian.net/",
213.              "id": "10004",
214.              "statusCategory": {
215.                "colorName": "yellow",
216.                "name": "In Progress",
217.                "self": "https://tomasmurua.atlassian.net/rest/api/2/statuscategory/4",
218.                "id": 4,
219.                "key": "indeterminate"
220.              }
221.            }
222.          },
223.          "id": "Galactic Banking Project-GBP-3",
224.          "_timestamp": "2024-11-07T16:59:54.786-0300",
225.          "Key": "GBP-3",
226.          "_allow_access_control": [
227.            "account_id:63c04b092341bff4fff6e0cb",
228.            "account_id:712020:88983800-6c97-469a-9451-79c2dd3732b5",
229.            "name:Gustavo",
230.            "name:Tomas-Murua"
231.            ]
232.        }
233.      }
234.      ]
235.  }


响应将是:



1.  {
2.    "docs": [
3.      {
4.        "doc": {
5.          "_index": "bank",
6.          "_version": "-3",
7.          "_id": "Galactic Banking Project-GBP-3",
8.          "_source": {
9.            "summary": "Intergalactic Security and Compliance",
10.            "assignee": "Tomas Murua",
11.            "status": "In Progress"
12.          },
13.          "_ingest": {
14.            "timestamp": "2024-11-10T06:58:25.494057572Z"
15.          }
16.        }
17.      }
18.    ]
19.  }


这看起来好多了!现在,让我们运行 full content sync 来应用更改。

  1. 根据你的需求优化映射

文档很干净。但是,我们可以进一步优化。我们可以进入  “it depends”  的领域。有些映射可以适用于你的用例,而其他映射则不行。找出答案的最佳方法是进行实验。

假设我们测试并得到了这个映射设计:

  • assignee:全文搜索和过滤器
  • summary:全文搜索
  • status:过滤器和排序

默认情况下,连接器将使用 dynamic_templates 创建映射,这些映射将配置所有文本字段以进行全文搜索、过滤和排序,这是一个坚实的基础,但如果我们知道我们想要用我们的字段做什么,它可以进行优化。

这是规则:



1.  {
2.    "all_text_fields": {
3.      "match_mapping_type": "string",
4.      "mapping": {
5.        "analyzer": "iq_text_base",
6.        "fields": {
7.          "delimiter": {
8.            "analyzer": "iq_text_delimiter",
9.            "type": "text",
10.            "index_options": "freqs"
11.          },
12.          "joined": {
13.            "search_analyzer": "q_text_bigram",
14.            "analyzer": "i_text_bigram",
15.            "type": "text",
16.            "index_options": "freqs"
17.          },
18.          "prefix": {
19.            "search_analyzer": "q_prefix",
20.            "analyzer": "i_prefix",
21.            "type": "text",
22.            "index_options": "docs"
23.          },
24.          "enum": {
25.            "ignore_above": 2048,
26.            "type": "keyword"
27.          },
28.          "stem": {
29.            "analyzer": "iq_text_stem",
30.            "type": "text"
31.          }
32.        }
33.      }
34.    }
35.  }


让我们为所有文本字段创建用于不同目的的不同子字段。你可以在文档中找到有关分析器的其他信息。

要使用这些映射,你必须:

  1. 在创建连接器之前创建索引
  2. 创建连接器时,选择该索引而不是创建新索引
  3. 创建摄取管道以获取所需的字段
  4. 运行 Full Content Sync*

*Full Content Sync 会将所有文档发送到 Elasticsearch。Incremental Sync 只会将上次增量或完整内容同步后更改的文档发送到 Elasticsearch。这两种方法都将从数据源获取所有数据。

我们的优化映射如下:



1.  PUT bank-optimal
2.  {
3.    "mappings": {
4.      "properties": {
5.        "assignee": {
6.          "type": "text",
7.          "fields": {
8.            "delimiter": {
9.              "type": "text",
10.              "index_options": "freqs",
11.              "analyzer": "iq_text_delimiter"
12.            },
13.            "enum": {
14.              "type": "keyword",
15.              "ignore_above": 2048
16.            },
17.            "joined": {
18.              "type": "text",
19.              "index_options": "freqs",
20.              "analyzer": "i_text_bigram",
21.              "search_analyzer": "q_text_bigram"
22.            },
23.            "prefix": {
24.              "type": "text",
25.              "index_options": "docs",
26.              "analyzer": "i_prefix",
27.              "search_analyzer": "q_prefix"
28.            },
29.            "stem": {
30.              "type": "text",
31.              "analyzer": "iq_text_stem"
32.            }
33.          },
34.          "analyzer": "iq_text_base"
35.        },
36.        "summary": {
37.          "type": "text",
38.          "fields": {
39.            "delimiter": {
40.              "type": "text",
41.              "index_options": "freqs",
42.              "analyzer": "iq_text_delimiter"
43.            },
44.            "joined": {
45.              "type": "text",
46.              "index_options": "freqs",
47.              "analyzer": "i_text_bigram",
48.              "search_analyzer": "q_text_bigram"
49.            },
50.            "prefix": {
51.              "type": "text",
52.              "index_options": "docs",
53.              "analyzer": "i_prefix",
54.              "search_analyzer": "q_prefix"
55.            },
56.            "stem": {
57.              "type": "text",
58.              "analyzer": "iq_text_stem"
59.            }
60.          },
61.          "analyzer": "iq_text_base"
62.        },
63.        "status": {
64.          "type": "keyword"
65.        }
66.      }
67.    }
68.  }


对于 assignee,我们保留了原有的映射,因为我们希望此字段针对搜索和过滤器进行优化。对于 summary,我们删除了 “enum” 关键字字段,因为我们不打算过滤摘要。我们将 status 映射为关键字,因为我们只打算过滤该字段。

注意:如果你不确定如何使用字段,基线分析器应该没问题。

  1. 自动化文档级安全性

在第一部分中,我们学习了使用文档级安全性 (Document Level Security - DLS) 为用户手动创建 API 密钥并根据该密钥限制访问权限。但是,如果你想在每次用户访问我们的网站时自动创建具有权限的 API 密钥,则需要创建一个脚本来接收请求,使用用户 ID 生成 API 密钥,然后使用它在 Elasticsearch 中搜索。

这是 Python 中的参考文件:



1.  import os
2.  import requests
3.  class ElasticsearchKeyGenerator:
4.     def __init__(self):
5.         self.es_url = "https://xxxxxxx.es.us-central1.gcp.cloud.es.io" # Your Elasticsearch URL
6.         self.es_user = "" # Your Elasticsearch User
7.         self.es_password = "" # Your Elasticsearch password

9.         # Basic configuration for requests
10.         self.auth = (self.es_user, self.es_password)
11.         self.headers = {'Content-Type': 'application/json'}

13.     def create_api_key(self, user_id, index, expiration='1d', metadata=None):
14.         """
15.         Create an Elasticsearch API key for a single index with user-specific filters.

17.         Args:
18.             user_id (str): User identifier on the source system
19.             index (str): Index name
20.             expiration (str): Key expiration time (default: '1d')
21.             metadata (dict): Additional metadata for the API key

23.         Returns:
24.             str: Encoded API key if successful, None if failed
25.         """
26.         try:
27.             # Get user-specific ACL filters
28.             acl_index = f'.search-acl-filter-{index}'
29.             response = requests.get(
30.                 f'{self.es_url}/{acl_index}/_doc/{user_id}',
31.                 auth=self.auth,
32.                 headers=self.headers
33.             )
34.             response.raise_for_status()

36.             # Build the query
37.             query = {
38.                 'bool': {
39.                     'must': [
40.                         {'term': {'_index': index}},
41.                         response.json()['_source']['query']
42.                     ]
43.                 }
44.             }

46.             # Set default metadata if none provided
47.             if not metadata:
48.                 metadata = {'created_by': 'create-api-key'}

50.             # Prepare API key request body
51.             api_key_body = {
52.                 'name': user_id,
53.                 'expiration': expiration,
54.                 'role_descriptors': {
55.                     f'jira-role': {
56.                         'index': [{
57.                             'names': [index],
58.                             'privileges': ['read'],
59.                             'query': query
60.                         }]
61.                     }
62.                 },
63.                 'metadata': metadata
64.             }

66.             print(api_key_body)

68.             # Create API key
69.             api_key_response = requests.post(
70.                 f'{self.es_url}/_security/api_key',
71.                 json=api_key_body,
72.                 auth=self.auth,
73.                 headers=self.headers
74.             )
75.             api_key_response.raise_for_status()

77.             return api_key_response.json()['encoded']

79.         except requests.exceptions.RequestException as e:
80.             print(f"Error creating API key: {str(e)}")
81.             return None

83.  # Example usage
84.  if __name__ == "__main__":
85.     key_generator = ElasticsearchKeyGenerator()

87.     encoded_key = key_generator.create_api_key(
88.         user_id="63c04b092341bff4fff6e0cb", # User id on Jira
89.         index="bank",
90.         expiration="1d",
91.         metadata={
92.             "application": "my-search-app",
93.             "namespace": "dev",
94.             "foo": "bar"
95.         }
96.     )

98.     if encoded_key:
99.         print(f"Generated API key: {encoded_key}")
100.     else:
101.         print("Failed to generate API key")


你可以在每个 API 请求上调用此 create_api_key 函数来生成 API 密钥,用户可以在后续请求中使用该密钥查询 Elasticsearch。你可以设置到期时间,还可以设置任意元数据,以防你想要注册有关用户或生成密钥的 API 的一些信息。

  1. 卸载附件提取

对于内容提取,例如从 PDF 和 Powerpoint 文件中提取文本,Elastic 提供了一种开箱即用的服务,该服务运行良好,但有大小限制。

默认情况下,本机连接器的提取服务支持每个附件最大 10MB。如果你有更大的附件,例如里面有大图像的 PDF,或者你想要托管提取服务,Elastic 提供了一个工具,可让你部署自己的提取服务。

此选项仅与连接器客户端兼容,因此如果你使用的是本机连接器,则需要将其转换为连接器客户端并将其托管在你自己的基础架构中。

请按照以下步骤操作:

a. 配置自定义提取服务并使用 Docker 运行



1.  docker run \
2.    -p 8090:8090 \
3.    -it \
4.    --name extraction-service \
5.    docker.elastic.co/enterprise-search/data-extraction-service:$EXTRACTION_SERVICE_VERSION


EXTRACTION_SERVICE_VERSION 你应该使用 Elasticsearch 8.15 的 0.3.x。

b. 配置 yaml con 提取服务自定义并运行

转到连接器客户端并将以下内容添加到 config.yml 文件以使用提取服务:



1.  extraction_service:
2.    host: http://localhost:8090


c. 按照步骤运行连接器客户端

配置完成后,你可以使用要使用的连接器运行连接器客户端。



1.  docker run \
2.  -v "</absolute/path/to>/connectors-config:/config" \ # NOTE: change absolute path to match where config.yml is located on your machine
3.  --tty \
4.  --rm \
5.  docker.elastic.co/enterprise-search/elastic-connectors:{version}.0 \
6.  /app/bin/elastic-ingest \
7.  -c /config/config.yml # Path to your configuration file in the container


你可以参考文档中的完整流程。

6. 监控连接器的日志

在出现问题时,查看连接器的日志非常重要,Elastic 提供了开箱即用的功能。

第一步是在集群中激活日志记录。建议将日志发送到其他集群(监控部署),但在开发环境中,你也可以将日志发送到索引文档的同一集群。

默认情况下,连接器会将日志发送到 elastic-cloud-logs-8 索引。如果你使用的是 Cloud,则可以在新的 Logs Explorer 中检查日志:

结论

在本文中,我们了解了在生产环境中使用连接器时需要考虑的不同策略。优化资源、自动化安全性和集群监控是正确运行大型系统的关键机制。

想要获得 Elastic 认证?了解下一期 Elasticsearch 工程师培训的时间!

Elasticsearch 包含许多新功能,可帮助你为你的用例构建最佳搜索解决方案。深入了解我们的示例笔记本以了解更多信息,开始免费云试用,或立即在你的本地机器上试用 Elastic。

原文:Jira connector tutorial part II: 6 optimization tips - Elasticsearch Labs