elasticsearch的集合存储与搜索

发表于2020-05-08，长度4097， 1034个单词， 11分钟读完

从集合数据中检索元素是日常开发常见的需求。Java 当然提供了简单的遍历方法去判断，比如java.util.Collection#contains；类似地，MySQL也提供了遍历方法 find_in_set ，尽管性能不好。那么ES有没有和find_in_set类似的功能呢？这篇文章我们简单探索一下。

这里针对的是elasticsearch 6.8版本

ES 提供了全文检索和精确值检索，在精确值检索中又包括了两种多值检索，一种叫多精确值检索（Terms Query），一种叫多精确值集检索（Terms Set Query）。它们是不是就是find_in_set呢？

Terms Query

Terms Query的查询过程是传入一个集合，判断文档的对应字段是否在这个集合里面。我们试验一把：

PUT localhost:9200/my_index
{
    "mappings": {
        "_doc": {
            "properties": {
                "name": {
                    "type": "keyword"
                },
                "age": {
                    "type": "integer"
                }
            }
        }
    }
}

PUT localhost:9200/my_index/_doc/1
{
    "name": "tom",
    "age": 10
}
PUT localhost:9200/my_index/_doc/2
{
    "name": "jerry",
    "age": 10
}

然后对其进行检索：

GET localhost:9200/my_index/_search
{
    "query" :{
    	"terms":{
    		"name":["tom", "jerry","green"]
    	}
    }
}
结果：
{
    "took": 6,
    "timed_out": false,
    "hits": {
        "total": 2,
        "max_score": 1.0,
        "hits": [
            {
                "_index": "my_index",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "name": "jerry",
                    "age": 10
                }
            },
            {
                "_index": "my_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "tom",
                    "age": 10
                }
            }
        ]
    }
}

也就是说存储的不是集合，查询的是集合，有点像范围查询。和find_in_set正好相反。

不过我们可以存储一下集合进行检索：

PUT localhost:9200/my_index/_doc/3
{
    "name": ["green","jim"],
    "age": 10
}

GET localhost:9200/my_index/_search
{
    "query" :{
    	"terms":{
    		"name":["jim"]
    	}
    }
}
结果：
{
    "took": 7,
    "timed_out": false,
    "hits": {
        "total": 1,
        "max_score": 1.0,
        "hits": [
            {
                "_index": "my_index",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "name": [
                        "green",
                        "jim"
                    ],
                    "age": 10
                }
            }
        ]
    }
}

可是我们期望的查询不是使用name数组而是单个名称，想要期望包含该名称的文档被命中（上面就是包含Jim的被命中了）。这时候去掉中括号发现保存，要求terms必须给数组。那不用terms而用term可以吗？

GET localhost:9200/my_index/_search
{
    "query" :{
    	"term":{
    		"name":"jim"
    	}
    }
}
结果：
{
    "took": 8,
    "timed_out": false,
    "hits": {
        "total": 1,
        "max_score": 1.0,
        "hits": [
            {
                "_index": "my_index",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "name": [
                        "green",
                        "jim"
                    ],
                    "age": 10
                }
            }
        ]
    }
}

这样就实现了包含匹配的查询。重点在哪里呢？在mapping里面：虽然我们定义的name类型是keyword（或者其他类型），但是可以写入keyword数组。这时候检索就是进行包含匹配的查询。

Terms Set Query

Terms Set Query比Terms Query多一个参数，必须指定文档字段中的数组元组最少要匹配几个才算命中（不指定最少匹配数会报错：“No minimum should match has been specified”）。指定的是文档里面一个数字类型的字段，也就是文档要自己定义匹配数量，然后es去匹配这个字段：

PUT localhost:9200/my_index/_doc/3
{
    "name": ["green","jim"],
    "mm": 1
}

GET localhost:9200/my_index/_search
{
    "query" :{
    	"terms_set":{
    		"name": {
    			"terms":["green"],
                        "minimum_should_match_field": "mm"
    		}
    	}
    }
}

通过字段mm定义最少匹配数，结果是：

{
    "took": 10,
    "timed_out": false,
    "hits": {
        "total": 1,
        "max_score": 0.6931472,
        "hits": [
            {
                "_index": "my_index",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.6931472,
                "_source": {
                    "name": [
                        "green",
                        "jim"
                    ],
                    "mm": 1
                }
            }
        ]
    }
}