作者:来自 Elastic Gustavo Llermaly
将 Jira 连接到 Elasticsearch 后,我们现在将回顾最佳实践以升级此部署。
在本系列的第一部分中,我们配置了 Jira 连接器并将对象索引到 Elasticsearch 中。在第二部分中,我们将回顾一些最佳实践和高级配置以升级连接器。这些实践是对当前文档的补充,将在索引阶段使用。
运行连接器只是第一步。当你想要索引大量数据时,每个细节都很重要,当你从 Jira 索引文档时,你可以使用许多优化点。
优化点
-
通过应用高级同步过滤器仅索引你需要的文档
-
仅索引你将使用的字段
-
根据你的需求优化映射
-
自动化文档级别安全性
-
卸载附件提取
-
监控连接器的日志
-
通过应用高级同步过滤器仅索引你需要的文档
默认情况下,Jira 会发送所有项目、问题和附件。如果你只对其中一些感兴趣,或者例如只对 “In Progress - 正在进行” 的问题感兴趣,我们建议不要索引所有内容。
在将文档放入 Elasticsearch 之前,有三个实例可以过滤文档:
- 远程:我们可以使用原生 Jira 过滤器来获取我们需要的内容。这是最好的选择,你应该尽可能尝试使用此选项,因为这样,文档在进入 Elasticsearch 之前甚至不会从源中出来。我们将为此使用高级同步规则。
- 集成:如果源没有原生过滤器来提供我们需要的内容,我们仍然可以使用基本同步规则在集成级别进行过滤,然后再将其导入 Elasticsearch。
- 摄入管道:在索引数据之前处理数据的最后一个选项是使用 Elasticsearch 摄入管道(ingest pipeline)。通过使用 Painless 脚本,我们可以非常灵活地过滤或操作文档。这样做的缺点是数据已经离开源并通过连接器,因此可能会给系统带来沉重的负担并产生安全问题。
让我们快速回顾一下 Jira 问题:
1. GET bank/_search
2. {
3. "_source": ["Issue.status.name", "Issue.summary"],
4. "query": {
5. "exists": {
6. "field": "Issue.status.name"
7. }
8. }
9. }
注意:我们使用 “exists” 查询仅返回具有我们过滤的字段的文档。
你可以看到 “To Do” 中有很多我们不需要的问题:
1. {
2. "took": 3,
3. "timed_out": false,
4. "_shards": {
5. "total": 2,
6. "successful": 2,
7. "skipped": 0,
8. "failed": 0
9. },
10. "hits": {
11. "total": {
12. "value": 6,
13. "relation": "eq"
14. },
15. "max_score": 1,
16. "hits": [
17. {
18. "_index": "bank",
19. "_id": "Marketing Mars-MM-1",
20. "_score": 1,
21. "_source": {
22. "Issue": {
23. "summary": "Conquer Mars",
24. "status": {
25. "name": "To Do"
26. }
27. }
28. }
29. },
30. {
31. "_index": "bank",
32. "_id": "Marketing Mars-MM-3",
33. "_score": 1,
34. "_source": {
35. "Issue": {
36. "summary": "Conquering Earth",
37. "status": {
38. "name": "In Progress"
39. }
40. }
41. }
42. },
43. {
44. "_index": "bank",
45. "_id": "Marketing Mars-MM-2",
46. "_score": 1,
47. "_source": {
48. "Issue": {
49. "summary": "Conquer the moon",
50. "status": {
51. "name": "To Do"
52. }
53. }
54. }
55. },
56. {
57. "_index": "bank",
58. "_id": "Galactic Banking Project-GBP-3",
59. "_score": 1,
60. "_source": {
61. "Issue": {
62. "summary": "Intergalactic Security and Compliance",
63. "status": {
64. "name": "In Progress"
65. }
66. }
67. }
68. },
69. {
70. "_index": "bank",
71. "_id": "Galactic Banking Project-GBP-2",
72. "_score": 1,
73. "_source": {
74. "Issue": {
75. "summary": "Bank Application Frontend",
76. "status": {
77. "name": "To Do"
78. }
79. }
80. }
81. },
82. {
83. "_index": "bank",
84. "_id": "Galactic Banking Project-GBP-1",
85. "_score": 1,
86. "_source": {
87. "Issue": {
88. "summary": "Development of API for International Transfers",
89. "status": {
90. "name": "To Do"
91. }
92. }
93. }
94. }
95. ]
96. }
97. }
为了仅获取 “In Progress” 的问题,我们将使用 JQL 查询(Jira 查询语言)创建高级同步规则:
转到连接器并单击 sync rules 选项卡,然后单击 Draft Rules。进入后,转到 Advanced Sync Rules 并添加以下内容:
1. [
2. {
3. "query": "status IN ('In Progress')"
4. }
5. ]
应用规则后,运行 Full Content Sync。
此规则将排除所有非 “In Progress” 的问题。你可以通过再次运行查询来检查:
1. GET bank/_search
2. {
3. "_source": ["Issue.status.name", "Issue.summary"],
4. "query": {
5. "exists": {
6. "field": "Issue.status.name"
7. }
8. }
9. }
以下是新的回应:
1. {
2. "took": 2,
3. "timed_out": false,
4. "_shards": {
5. "total": 2,
6. "successful": 2,
7. "skipped": 0,
8. "failed": 0
9. },
10. "hits": {
11. "total": {
12. "value": 2,
13. "relation": "eq"
14. },
15. "max_score": 1,
16. "hits": [
17. {
18. "_index": "bank",
19. "_id": "Marketing Mars-MM-3",
20. "_score": 1,
21. "_source": {
22. "Issue": {
23. "summary": "Conquering Earth",
24. "status": {
25. "name": "In Progress"
26. }
27. }
28. }
29. },
30. {
31. "_index": "bank",
32. "_id": "Galactic Banking Project-GBP-3",
33. "_score": 1,
34. "_source": {
35. "Issue": {
36. "summary": "Intergalactic Security and Compliance",
37. "status": {
38. "name": "In Progress"
39. }
40. }
41. }
42. }
43. ]
44. }
45. }
- 仅索引你将使用的字段
现在我们只有我们想要的文档,你可以看到我们仍然会得到很多我们不需要的字段。我们可以在运行查询时使用 _source 隐藏它们,但最好的选择是不索引它们。
为此,我们将使用摄取管道(ingest pipeline)。我们可以创建一个删除所有我们不会使用的字段的管道。假设我们只想要来自问题的以下信息:
- Assignee
- Title
- Status
我们可以创建一个新的摄取管道,仅使用摄取管道的 Content UI 获取这些字段:
单击复 Copy and customize,然后修改名为 index-name@custom 的管道,该管道应该刚刚创建且为空。我们可以使用 Kibana DevTools 控制台执行此操作,运行以下命令:
1. PUT _ingest/pipeline/bank@custom
2. {
3. "description": "Only keep needed fields for jira issues and move them to root",
4. "processors": [
5. {
6. "remove": {
7. "keep": [
8. "Issue.assignee.displayName",
9. "Issue.summary",
10. "Issue.status.name"
11. ],
12. "ignore_missing": true
13. }
14. },
15. {
16. "rename": {
17. "field": "Issue.assignee.displayName",
18. "target_field": "assignee",
19. "ignore_missing": true
20. }
21. },
22. {
23. "rename": {
24. "field": "Issue.summary",
25. "target_field": "summary",
26. "ignore_missing": true
27. }
28. },
29. {
30. "rename": {
31. "field": "Issue.status.name",
32. "target_field": "status",
33. "ignore_missing": true
34. }
35. },
36. {
37. "remove": {
38. "field": "Issue"
39. }
40. }
41. ]
42. }
让我们删除不需要的字段,并将需要的字段移至文档的根目录。
带有 keep 参数的 remove 处理器将从文档中删除除 keep 数组中的字段之外的所有字段。
我们可以通过运行模拟来检查这是否有效。从索引中添加其中一个文档的内容:
1. POST /_ingest/pipeline/bank@custom/_simulate
2. {
3. "docs": [
4. {
5. "_index": "bank",
6. "_id": "Galactic Banking Project-GBP-3",
7. "_score": 1,
8. "_source": {
9. "Type": "Epic",
10. "Custom_Fields": {
11. "Satisfaction": null,
12. "Approvals": null,
13. "Change reason": null,
14. "Epic Link": null,
15. "Actual end": null,
16. "Design": null,
17. "Campaign assets": null,
18. "Story point estimate": null,
19. "Approver groups": null,
20. "[CHART] Date of First Response": null,
21. "Request Type": null,
22. "Campaign goals": null,
23. "Project overview key": null,
24. "Related projects": null,
25. "Campaign type": null,
26. "Impact": null,
27. "Request participants": [],
28. "Locked forms": null,
29. "Time to first response": null,
30. "Work category": null,
31. "Audience": null,
32. "Open forms": null,
33. "Details": null,
34. "Sprint": null,
35. "Stakeholders": null,
36. "Marketing asset type": null,
37. "Submitted forms": null,
38. "Start date": null,
39. "Actual start": null,
40. "Category": null,
41. "Change risk": null,
42. "Target start": null,
43. "Issue color": "purple",
44. "Parent Link": {
45. "hasEpicLinkFieldDependency": false,
46. "showField": false,
47. "nonEditableReason": {
48. "reason": "EPIC_LINK_SHOULD_BE_USED",
49. "message": "To set an epic as the parent, use the epic link instead"
50. }
51. },
52. "Format": null,
53. "Target end": null,
54. "Approvers": null,
55. "Team": null,
56. "Change type": null,
57. "Satisfaction date": null,
58. "Request language": null,
59. "Amount": null,
60. "Rank": "0|i0001b:",
61. "Affected services": null,
62. "Type": null,
63. "Time to resolution": null,
64. "Total forms": null,
65. "[CHART] Time in Status": null,
66. "Organizations": [],
67. "Flagged": null,
68. "Project overview status": null
69. },
70. "Issue": {
71. "statuscategorychangedate": "2024-11-07T16:59:54.786-0300",
72. "issuetype": {
73. "avatarId": 10307,
74. "hierarchyLevel": 1,
75. "name": "Epic",
76. "self": "https://tomasmurua.atlassian.net/rest/api/2/issuetype/10008",
77. "description": "Epics track collections of related bugs, stories, and tasks.",
78. "entityId": "f5637521-ec75-48b8-a1b8-de18520807ca",
79. "id": "10008",
80. "iconUrl": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10307?size=medium",
81. "subtask": false
82. },
83. "components": [],
84. "timespent": null,
85. "timeoriginalestimate": null,
86. "project": {
87. "simplified": true,
88. "avatarUrls": {
89. "48x48": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415",
90. "24x24": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=small",
91. "16x16": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=xsmall",
92. "32x32": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=medium"
93. },
94. "name": "Galactic Banking Project",
95. "self": "https://tomasmurua.atlassian.net/rest/api/2/project/10001",
96. "id": "10001",
97. "projectTypeKey": "software",
98. "key": "GBP"
99. },
100. "description": null,
101. "fixVersions": [],
102. "aggregatetimespent": null,
103. "resolution": null,
104. "timetracking": {},
105. "security": null,
106. "aggregatetimeestimate": null,
107. "attachment": [],
108. "resolutiondate": null,
109. "workratio": -1,
110. "summary": "Intergalactic Security and Compliance",
111. "watches": {
112. "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/watchers",
113. "isWatching": true,
114. "watchCount": 1
115. },
116. "issuerestriction": {
117. "issuerestrictions": {},
118. "shouldDisplay": true
119. },
120. "lastViewed": "2024-11-08T02:04:25.247-0300",
121. "creator": {
122. "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
123. "emailAddress": "contornan_cliche.0y@icloud.com",
124. "avatarUrls": {
125. "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
126. "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
127. "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
128. "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
129. },
130. "displayName": "Tomas Murua",
131. "accountType": "atlassian",
132. "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
133. "active": true,
134. "timeZone": "Chile/Continental"
135. },
136. "subtasks": [],
137. "created": "2024-10-29T15:52:40.306-0300",
138. "reporter": {
139. "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
140. "emailAddress": "contornan_cliche.0y@icloud.com",
141. "avatarUrls": {
142. "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
143. "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
144. "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
145. "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
146. },
147. "displayName": "Tomas Murua",
148. "accountType": "atlassian",
149. "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
150. "active": true,
151. "timeZone": "Chile/Continental"
152. },
153. "aggregateprogress": {
154. "total": 0,
155. "progress": 0
156. },
157. "priority": {
158. "name": "Medium",
159. "self": "https://tomasmurua.atlassian.net/rest/api/2/priority/3",
160. "iconUrl": "https://tomasmurua.atlassian.net/images/icons/priorities/medium.svg",
161. "id": "3"
162. },
163. "labels": [],
164. "environment": null,
165. "timeestimate": null,
166. "aggregatetimeoriginalestimate": null,
167. "versions": [],
168. "duedate": null,
169. "progress": {
170. "total": 0,
171. "progress": 0
172. },
173. "issuelinks": [],
174. "votes": {
175. "hasVoted": false,
176. "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/votes",
177. "votes": 0
178. },
179. "comment": {
180. "total": 0,
181. "comments": [],
182. "maxResults": 0,
183. "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/10008/comment",
184. "startAt": 0
185. },
186. "assignee": {
187. "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
188. "emailAddress": "contornan_cliche.0y@icloud.com",
189. "avatarUrls": {
190. "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
191. "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
192. "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
193. "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
194. },
195. "displayName": "Tomas Murua",
196. "accountType": "atlassian",
197. "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
198. "active": true,
199. "timeZone": "Chile/Continental"
200. },
201. "worklog": {
202. "total": 0,
203. "maxResults": 20,
204. "startAt": 0,
205. "worklogs": []
206. },
207. "updated": "2024-11-07T16:59:54.786-0300",
208. "status": {
209. "name": "In Progress",
210. "self": "https://tomasmurua.atlassian.net/rest/api/2/status/10004",
211. "description": "",
212. "iconUrl": "https://tomasmurua.atlassian.net/",
213. "id": "10004",
214. "statusCategory": {
215. "colorName": "yellow",
216. "name": "In Progress",
217. "self": "https://tomasmurua.atlassian.net/rest/api/2/statuscategory/4",
218. "id": 4,
219. "key": "indeterminate"
220. }
221. }
222. },
223. "id": "Galactic Banking Project-GBP-3",
224. "_timestamp": "2024-11-07T16:59:54.786-0300",
225. "Key": "GBP-3",
226. "_allow_access_control": [
227. "account_id:63c04b092341bff4fff6e0cb",
228. "account_id:712020:88983800-6c97-469a-9451-79c2dd3732b5",
229. "name:Gustavo",
230. "name:Tomas-Murua"
231. ]
232. }
233. }
234. ]
235. }
响应将是:
1. {
2. "docs": [
3. {
4. "doc": {
5. "_index": "bank",
6. "_version": "-3",
7. "_id": "Galactic Banking Project-GBP-3",
8. "_source": {
9. "summary": "Intergalactic Security and Compliance",
10. "assignee": "Tomas Murua",
11. "status": "In Progress"
12. },
13. "_ingest": {
14. "timestamp": "2024-11-10T06:58:25.494057572Z"
15. }
16. }
17. }
18. ]
19. }
这看起来好多了!现在,让我们运行 full content sync 来应用更改。
- 根据你的需求优化映射
文档很干净。但是,我们可以进一步优化。我们可以进入 “it depends” 的领域。有些映射可以适用于你的用例,而其他映射则不行。找出答案的最佳方法是进行实验。
假设我们测试并得到了这个映射设计:
- assignee:全文搜索和过滤器
- summary:全文搜索
- status:过滤器和排序
默认情况下,连接器将使用 dynamic_templates 创建映射,这些映射将配置所有文本字段以进行全文搜索、过滤和排序,这是一个坚实的基础,但如果我们知道我们想要用我们的字段做什么,它可以进行优化。
这是规则:
1. {
2. "all_text_fields": {
3. "match_mapping_type": "string",
4. "mapping": {
5. "analyzer": "iq_text_base",
6. "fields": {
7. "delimiter": {
8. "analyzer": "iq_text_delimiter",
9. "type": "text",
10. "index_options": "freqs"
11. },
12. "joined": {
13. "search_analyzer": "q_text_bigram",
14. "analyzer": "i_text_bigram",
15. "type": "text",
16. "index_options": "freqs"
17. },
18. "prefix": {
19. "search_analyzer": "q_prefix",
20. "analyzer": "i_prefix",
21. "type": "text",
22. "index_options": "docs"
23. },
24. "enum": {
25. "ignore_above": 2048,
26. "type": "keyword"
27. },
28. "stem": {
29. "analyzer": "iq_text_stem",
30. "type": "text"
31. }
32. }
33. }
34. }
35. }
让我们为所有文本字段创建用于不同目的的不同子字段。你可以在文档中找到有关分析器的其他信息。
要使用这些映射,你必须:
- 在创建连接器之前创建索引
- 创建连接器时,选择该索引而不是创建新索引
- 创建摄取管道以获取所需的字段
- 运行 Full Content Sync*
*Full Content Sync 会将所有文档发送到 Elasticsearch。Incremental Sync 只会将上次增量或完整内容同步后更改的文档发送到 Elasticsearch。这两种方法都将从数据源获取所有数据。
我们的优化映射如下:
1. PUT bank-optimal
2. {
3. "mappings": {
4. "properties": {
5. "assignee": {
6. "type": "text",
7. "fields": {
8. "delimiter": {
9. "type": "text",
10. "index_options": "freqs",
11. "analyzer": "iq_text_delimiter"
12. },
13. "enum": {
14. "type": "keyword",
15. "ignore_above": 2048
16. },
17. "joined": {
18. "type": "text",
19. "index_options": "freqs",
20. "analyzer": "i_text_bigram",
21. "search_analyzer": "q_text_bigram"
22. },
23. "prefix": {
24. "type": "text",
25. "index_options": "docs",
26. "analyzer": "i_prefix",
27. "search_analyzer": "q_prefix"
28. },
29. "stem": {
30. "type": "text",
31. "analyzer": "iq_text_stem"
32. }
33. },
34. "analyzer": "iq_text_base"
35. },
36. "summary": {
37. "type": "text",
38. "fields": {
39. "delimiter": {
40. "type": "text",
41. "index_options": "freqs",
42. "analyzer": "iq_text_delimiter"
43. },
44. "joined": {
45. "type": "text",
46. "index_options": "freqs",
47. "analyzer": "i_text_bigram",
48. "search_analyzer": "q_text_bigram"
49. },
50. "prefix": {
51. "type": "text",
52. "index_options": "docs",
53. "analyzer": "i_prefix",
54. "search_analyzer": "q_prefix"
55. },
56. "stem": {
57. "type": "text",
58. "analyzer": "iq_text_stem"
59. }
60. },
61. "analyzer": "iq_text_base"
62. },
63. "status": {
64. "type": "keyword"
65. }
66. }
67. }
68. }
对于 assignee,我们保留了原有的映射,因为我们希望此字段针对搜索和过滤器进行优化。对于 summary,我们删除了 “enum” 关键字字段,因为我们不打算过滤摘要。我们将 status 映射为关键字,因为我们只打算过滤该字段。
注意:如果你不确定如何使用字段,基线分析器应该没问题。
- 自动化文档级安全性
在第一部分中,我们学习了使用文档级安全性 (Document Level Security - DLS) 为用户手动创建 API 密钥并根据该密钥限制访问权限。但是,如果你想在每次用户访问我们的网站时自动创建具有权限的 API 密钥,则需要创建一个脚本来接收请求,使用用户 ID 生成 API 密钥,然后使用它在 Elasticsearch 中搜索。
这是 Python 中的参考文件:
1. import os
2. import requests
3. class ElasticsearchKeyGenerator:
4. def __init__(self):
5. self.es_url = "https://xxxxxxx.es.us-central1.gcp.cloud.es.io" # Your Elasticsearch URL
6. self.es_user = "" # Your Elasticsearch User
7. self.es_password = "" # Your Elasticsearch password
9. # Basic configuration for requests
10. self.auth = (self.es_user, self.es_password)
11. self.headers = {'Content-Type': 'application/json'}
13. def create_api_key(self, user_id, index, expiration='1d', metadata=None):
14. """
15. Create an Elasticsearch API key for a single index with user-specific filters.
17. Args:
18. user_id (str): User identifier on the source system
19. index (str): Index name
20. expiration (str): Key expiration time (default: '1d')
21. metadata (dict): Additional metadata for the API key
23. Returns:
24. str: Encoded API key if successful, None if failed
25. """
26. try:
27. # Get user-specific ACL filters
28. acl_index = f'.search-acl-filter-{index}'
29. response = requests.get(
30. f'{self.es_url}/{acl_index}/_doc/{user_id}',
31. auth=self.auth,
32. headers=self.headers
33. )
34. response.raise_for_status()
36. # Build the query
37. query = {
38. 'bool': {
39. 'must': [
40. {'term': {'_index': index}},
41. response.json()['_source']['query']
42. ]
43. }
44. }
46. # Set default metadata if none provided
47. if not metadata:
48. metadata = {'created_by': 'create-api-key'}
50. # Prepare API key request body
51. api_key_body = {
52. 'name': user_id,
53. 'expiration': expiration,
54. 'role_descriptors': {
55. f'jira-role': {
56. 'index': [{
57. 'names': [index],
58. 'privileges': ['read'],
59. 'query': query
60. }]
61. }
62. },
63. 'metadata': metadata
64. }
66. print(api_key_body)
68. # Create API key
69. api_key_response = requests.post(
70. f'{self.es_url}/_security/api_key',
71. json=api_key_body,
72. auth=self.auth,
73. headers=self.headers
74. )
75. api_key_response.raise_for_status()
77. return api_key_response.json()['encoded']
79. except requests.exceptions.RequestException as e:
80. print(f"Error creating API key: {str(e)}")
81. return None
83. # Example usage
84. if __name__ == "__main__":
85. key_generator = ElasticsearchKeyGenerator()
87. encoded_key = key_generator.create_api_key(
88. user_id="63c04b092341bff4fff6e0cb", # User id on Jira
89. index="bank",
90. expiration="1d",
91. metadata={
92. "application": "my-search-app",
93. "namespace": "dev",
94. "foo": "bar"
95. }
96. )
98. if encoded_key:
99. print(f"Generated API key: {encoded_key}")
100. else:
101. print("Failed to generate API key")
你可以在每个 API 请求上调用此 create_api_key 函数来生成 API 密钥,用户可以在后续请求中使用该密钥查询 Elasticsearch。你可以设置到期时间,还可以设置任意元数据,以防你想要注册有关用户或生成密钥的 API 的一些信息。
- 卸载附件提取
对于内容提取,例如从 PDF 和 Powerpoint 文件中提取文本,Elastic 提供了一种开箱即用的服务,该服务运行良好,但有大小限制。
默认情况下,本机连接器的提取服务支持每个附件最大 10MB。如果你有更大的附件,例如里面有大图像的 PDF,或者你想要托管提取服务,Elastic 提供了一个工具,可让你部署自己的提取服务。
此选项仅与连接器客户端兼容,因此如果你使用的是本机连接器,则需要将其转换为连接器客户端并将其托管在你自己的基础架构中。
请按照以下步骤操作:
a. 配置自定义提取服务并使用 Docker 运行
1. docker run \
2. -p 8090:8090 \
3. -it \
4. --name extraction-service \
5. docker.elastic.co/enterprise-search/data-extraction-service:$EXTRACTION_SERVICE_VERSION
EXTRACTION_SERVICE_VERSION 你应该使用 Elasticsearch 8.15 的 0.3.x。
b. 配置 yaml con 提取服务自定义并运行
转到连接器客户端并将以下内容添加到 config.yml 文件以使用提取服务:
1. extraction_service:
2. host: http://localhost:8090
c. 按照步骤运行连接器客户端
配置完成后,你可以使用要使用的连接器运行连接器客户端。
1. docker run \
2. -v "</absolute/path/to>/connectors-config:/config" \ # NOTE: change absolute path to match where config.yml is located on your machine
3. --tty \
4. --rm \
5. docker.elastic.co/enterprise-search/elastic-connectors:{version}.0 \
6. /app/bin/elastic-ingest \
7. -c /config/config.yml # Path to your configuration file in the container
你可以参考文档中的完整流程。
6. 监控连接器的日志
在出现问题时,查看连接器的日志非常重要,Elastic 提供了开箱即用的功能。
第一步是在集群中激活日志记录。建议将日志发送到其他集群(监控部署),但在开发环境中,你也可以将日志发送到索引文档的同一集群。
默认情况下,连接器会将日志发送到 elastic-cloud-logs-8 索引。如果你使用的是 Cloud,则可以在新的 Logs Explorer 中检查日志:
结论
在本文中,我们了解了在生产环境中使用连接器时需要考虑的不同策略。优化资源、自动化安全性和集群监控是正确运行大型系统的关键机制。
想要获得 Elastic 认证?了解下一期 Elasticsearch 工程师培训的时间!
Elasticsearch 包含许多新功能,可帮助你为你的用例构建最佳搜索解决方案。深入了解我们的示例笔记本以了解更多信息,开始免费云试用,或立即在你的本地机器上试用 Elastic。
原文:Jira connector tutorial part II: 6 optimization tips - Elasticsearch Labs