Elasticsearch 搜索

本文最后更新于:2024年3月18日 凌晨

Elasticsearch 搜索

  • Elasticsearch 允许我们搜索存在于所有索引或一些特定索引中的文档。
1
GET /users/_search?q=username:Tim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "A person"
}
}
]
}
}
  • score:代表该条记录的权重,结果按照权重大小排序。
  • 如下这些参数可以使用统一资源标识符在搜索操作中传递。
参数 说明
q 此参数用于指定查询字符串,
fields 此参数用于在响应中选择返回字段,
sort 可以通过使用这个参数获得排序结果,这个参数的可能值是 fieldName, fieldName:ascfieldname:desc
timeout 使用此参数限定搜索时间,响应只包含指定时间内的匹配,默认情况下,无超时,
terminate_after 可以将响应限制为每个分片的指定数量的文档,当到达这个数量以后,查询将提前终止,默认情况下不设置 terminate_after,
size 它表示要返回的命中数,默认值为 10,

DSL 查询

  • 在 Elasticsearch 中,通过使用基于 JSON 的查询进行搜索,查询由两个子句组成。
    • 叶查询子句:这些子句是匹配,项或范围的,它们在特定字段中查找特定值。
    • 复合查询子句:这些查询是叶查询子句和其他复合查询的组合,用于提取所需的信息。
  • Elasticsearch 支持大量查询,查询从查询关键字开始,然后以 JSON 对象的形式在其中包含条件和过滤器,以下描述了不同类型的查询。

match

  • 模糊查询,会使用分词器解析,先分析再通过分析的文档进行查询。
  • match 查询 keyword 字段: match 会被分词,而 keyword 不会被分词, match 的需要跟 keyword 的完全匹配。
  • match 查询 text 字段: match 分词, text 也分词,只要 match 的分词结果和 text 的分词结果有相同的就匹配。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
POST /users/_search

{
"query": {
"match": {
"desc": "Tim Mike"
}
},
"_source": [
"username",
"desc"
],
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 0,
"size": 2
}
  • 如果有多个查询字段用空格隔开。
  • _source:指定查询的属性。
  • sort:进行排序。
  • fromsize:分页的起始位置和页面大小。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"username": "Mike",
"desc": "I am Mike"
},
"sort": [
30
]
},
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"username": "Tim",
"desc": "I am Tim"
},
"sort": [
20
]
}
]
}
}

match_all

  • 返回所有内容,并每个对象 score 为 1.0
1
2
3
4
5
6
7
POST /users/_search

{
"query": {
"match_all": {}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"username": "Mike",
"password": "333333",
"age": 30,
"date": "1999/1/1",
"desc": "I am Mike"
}
},
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "A person"
}
}
]
}
}

multi_match

  • 此查询将文本或短语与多个字段匹配。
1
2
3
4
5
6
7
8
9
10
11
12
13
POST /users/_search

{
"query": {
"multi_match": {
"query": "Tim",
"fields": [
"username",
"desc"
]
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "I am Tim"
}
}
]
}
}

match_phrase

  • match_phrase 匹配 keyword 字段: match_phrase 会被分词,而 keyword 不会被分词, match_phrase 的需要跟 keyword 的完全匹配。
  • match_phrase 匹配 text 字段: match_phrase 是分词的, text 也是分词的, match_phrase 的分词结果必须在 text 字段分词中都包含,而且顺序必须相同,而且必须都是连续的。
1
2
3
4
5
6
7
8
9
POST /users/_search

{
"query": {
"match_phrase": {
"desc": "I am"
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"username": "Mike",
"password": "333333",
"age": 30,
"date": "1999/1/1",
"desc": "I am Mike"
}
},
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "I am Tim"
}
}
]
}
}

query_string

  • query_string 用查询解析器解析输入并在运算符周围分割文本。
  • query_string 无法查询查询 key 类型的字段。
  • query_string 查询 text 类型的字段:和 match_phrase 区别的是,不需要连续,顺序还可以调换。
1
2
3
4
5
6
7
8
9
10
POST /users/_search

{
"query":{
"query_string":{
"query": "Ti* OR *ike",
"fields" : ["username"],
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"username": "Mike",
"password": "333333",
"age": 30,
"date": "1999/1/1",
"desc": "I am Mike"
}
},
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "I am Tim"
}
}
]
}
}

term

  • 使用倒排索引,代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇。
  • term 查询 keyword 字段: term 不会分词,而 keyword 字段也不分词,需要完全匹配才可。
  • term 查询 text 字段:因为 text 字段会分词,而 term 不分词,所以 term 查询的条件必须是 text 字段分词后的某一个。
1
2
3
4
5
6
7
8
9
POST /users/_search

{
"query": {
"term": {
"age": 20
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "I am Tim"
}
}
]
}
}

range

  • 此查询用于查找值的范围之间的值的对象。
    • gte:大于和等于。
    • gt:大于。
    • lte:小于和等于。
    • lt:小于。
1
2
3
4
5
6
7
8
9
10
11
POST /users/_search

{
"query": {
"range": {
"age": {
"gte": 25
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"username": "Mike",
"password": "333333",
"age": 30,
"date": "1999/1/1",
"desc": "I am Mike"
}
}
]
}
}

bool

  • 布尔查询,将子查询为真的放入查询结果中。
  • must:所有的语句都必须(must)匹配,与 AND 等价。
  • must_not:所有的语句都不能(must not)匹配,与 NOT 等价。
  • should:至少有一个语句要匹配,与 OR 等价。
  • filter:返回的文档必须满足 filter 子句的条件,但是跟 must 不一样的是,不会计算分值,并且可以使用缓存。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
POST /users/_search

{
"query": {
"bool": {
"must": [
{
"match": {
"username": "Tim"
}
},
{
"match": {
"password": 123456
}
}
],
"filter": {
"range": {
"age": {
"gte": 20
}
}
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5753642,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"username": "Tim",
"password": "123456",
"age": 20,
"date": "2000/1/1",
"desc": "I am Tim"
}
}
]
}
}

highlight

  • 高亮查询,对匹配的关键字做高亮处理。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
POST /users/_search

{
"query": {
"match": {
"username": "Tim"
}
},
"highlight": {
"pre_tags": "<i style='color:red;'>",
"post_tags": "</i>",
"fields": {
"name":{}
}
}
}