elasticSearch(3) - restfulApi-摩杜云开发者社区

此章内容比较简单，是官方的一个主要API，精选了一些日常开发中需要使用的，罗列一下。本文档介绍的API分为两类：1、resutful；2、sql。建议大家使用restful方式，因为通用性比较好。

操作原则

索引操作：索引建议只增不改，因为修改过程相当于删除+重索引；
更新数据：这是一个先查询确认文档存在、修改内容、删除旧文档、在原有位置重新索引的过程。ES默认是采用版本号的方式来控制的
删除数据：

删除文档：ES会先标记，再异步物理删除；
删除索引：直接删除整个文件；
关闭索引：类似删除，但会在磁盘上保留文件；

RestfulAPI格式

通用格式：curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>' -d '<BODY>'

VERB HTTP方法: GET , POST , PUT , HEAD , DELETE
PROTOCOL http或者https协议(只有在Elasticsearch前面有https代理的时候可用)
HOST Elasticsearch集群中的任何一个节点的主机名，如果是在本地的节点，那么就叫localhost
PORT Elasticsearch HTTP服务所在的端口，默认为9200
QUERY_STRING 一些可选的查询请求参数，例如 ?pretty 参数将使请求返回更加美观易读的JSON数据 BODY 一个JSON格式的请求主体(如果请求需要的话)

curl -XGET 'http://localhost:9200/_count?pretty' -d ' {
"query": { "match_all": {}
} }

Restful-API可用的查询参数

URL参数名称	用法	body查询时是否支持
q	查询字符串
df	在查询中未定义字段前缀时使用的默认字段
analyzer	分析查询字符串时使用的分析器名称
analyze_wildcard	是否应分析通配符和前缀查询，默认为flase
batched_reduce_size	设置一次查询可查询的最大分片的数量	是
default_operator	要使用的默认运算符，默认为or，可以为and
lenient	格式控制，默认为false，不建议改为true
explain	性能分析，主要对每个结果分析得分情况
_source	是否输出source中内容，默认为true
stored_fields	要返回的文档的存储字段，用逗号分隔，不指定则不返回值
sort	排序，格式为fieldName:desc\|asc，可以有多个
track_sorces	是否计算相关性分数，建议用true
track_total_hits	值可以为false或正整数，当为false时返回值永远为1。当不为false时表示精确命中的文档数，默认值为10000，所以小于等于10000时为精确计数，否则要把此值设置很大。
timeout	默认为无超时，不建议设置	是
terminate_after	查询时每个分片可收集的最大文档数，默认无限制	是
from	默认0，表示结果的偏移量	是
size	默认10，表示结果的数量	是
search_type	可选值有：dfs_query_then_fetch、query_then_fetch（默认值），第一个性能会好一点，但相对的资源占用会比较大	是
allow_partial_search_results	默认为true，表示异常时可以返回部分数据	是
result_cache	是否允许size=0的请求	是

查询结果示例

{
  "_scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFEk1bWdRSUFCQWh1VUxxTFZGUDB5AAAAAAAAADMWSi1Jc3M2NnNTbWVkN3dkLVZqVC1sZw==",
  "took": 3, #总耗时
  "timed_out": false, 
  "_shards": { #分片信息
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {  #结果信息
    "total": {
      "value": 1000,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [  #详细数据
      { 
        "_index": "customer",
        "_type": "_doc",
        "_id": "0",
        "_score": null,
        "_source": {
          "account_number": 0,
          "firstname": "Bradshaw"
        },
        "sort": [
          0
        ]
      }
    ]
  }
}

查询参数-格式控制

?v ：带列的详细输出
?help：api帮助
?pretty ：返回结果是格式化后的JSON格式
?format=yaml ：返回结果是格式化后的yaml格式
flat_settings=true：返回结果平铺展开
如果用postman则body设置成raw, json/(application/json)
human=false：关闭可读性，适用于API程序处理
?h=xx,xx：只显示特定的列

元数据过滤

filter_path=took, hits.hits._id：用来过滤不必要的元数据（_source过滤原始数据），表示这这些节点不值不返回，但只用于范围查询语句中，也支持通配符:

*：排除某个字段，比如metadata.indices.*.stat*
-：排除某个字段
**：匹配某个字段

_source=false：不返回source内容，默认开启
字段过滤：可以用_source_includes、_source_excludes，可以使用通配符

CRUD涉及的API都是单索引的，其它的bulk、 multi 、reindex可以跨多个索引执行，多个索引用逗号分隔。也可以使用通配符来匹配索引，比如test*等

get /customer, king/_search?q=tag:wow

ignore_unavailable：是否忽略不可用索引
allow_no_nidices：如果为true时，当找不到索引时请求会失败，防止搜索到索引别名

索引名称快捷

索引名称支持日期解析，但只支持两天内的数据，目的是优化全索引扫描

调试参数

error_trace=true：查看具体错误的语句的地方

日期快捷

now是一个关键字，可用以下来快捷设置时间。

now+1h
now-1h
//其它单位还有y, M, w, d, h, H, m, s，例子
1、now-10M/M  当前时间减去10个月并四舍五入到月初，注意这里/M的意思
2、now/d   四舍五入到一天的开始

二、集群概要信息查看

查看子目录

get http://localhost:9200/_cat  //相当于一个帮助，列出可用的url

集群健康度

get localhost:9200/_cat/health?v  //集群健康度，如果status

green：主副分片全部正常工作；
yellow：副本分片丢失，可以查看unassigned_shards值来确定其严重性；
red：主分片丢失或存在未分配的分片，此时有些数据不可用，需尽快修复；

节点信息

GET localhost:9200/_cat/nodes?v //所有结点
GET localhost:9200/_cat/master?v  //主节点
GET localhost:9200/_cat/recovery?v  //恢复情况
GET localhost:9200/_cat/thread_pool?v //节点线程池

查询别名

http://localhost:9200/_cat/aliases?v

查询分片

http://localhost:9200/_cat/allocation?v
http://localhost:9200/_cat/shards?v

查询分段

http://localhost:9200/_cat/segments?v

三、操作节点

GET localhost:9200/_nodes
GET localhost:9200/_nodes/_all
GET localhost:9200/_nodes/_local
GET localhost:9200/_nodes/_master
GET localhost:9200/_nodes/nodeName //实际的节点名称，可用通配符
GET localhost:9200/_nodes/10.0.0.3,10.0.4
GET localhost:9200/_nodes/10.0.0.*
GET localhost:9200/_nodes/stats
GET localhost:9200/_nodes/nodeid1, nodeid2/stats

四、操作集群

GET localhost:9200/_cluster/health
GET localhost:9200/_cluster/state
GET localhost:9200/_cluster/state?human&pretty //查看统计信息
GET localhost:9200/_cluster/settings  //查看和更新，如果带body就是更新操作

五、操作索引

索引建议只增不改，因为修改过程相当于删除+重索引；

列表

get localhost:9200/_cat/indices?v

创建

有es中创建索引后，不需要再手动创建映射，当填写新字段时在action.auto_create_index=true时则会自动添加新字段到映射定义中，这个设置默认为true，但不建议这么来做。索引名称原则上采用全小写字母，不能包含符号和数字

put localhost:9200/customer?pretty   //创建一个名为customer的索引
或 curl -XPUT "localhost:9200/customer?pretty"
//返回值如下：
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "customer"
}

查询索引是否存在

HEAD customer

打开或关闭索引

POST /customer/_close
POST /customer/_open

删除

delete localhost:9200/customer?pretty

查询设置

get localhost:9200/bank/_settings?flat_settings=true

查询集群和索引文档总数

http://localhost:9200/_cat/count?v  //集群文档
http://localhost:9200/_cat/count/customer?v  //单个索引

查询字段内存占用情况

http://localhost:9200/_cat/fieldate?v
http://localhost:9200/_cat/fieldate?v&fields=a

设置索引

put customer 
{"settings":{
    "index":{
        "number_of_shards":3,
        "number_of_replicas":2
    }
 }
}

收缩索引

将现有索引收缩下分片数量，不是很常用。

查看映射

get /king/_mapping/
{
  "king": {
    "mappings": {
      "properties": {
        "address": {
          "type": "text"
        },
        "age": {
          "type": "long"
        },
        "name": {
          "type": "keyword"
        }
      }
    }
  }
}

六、操作文档

创建

//创建时会指定一个id值，可用于查询，如果没有指定则会随机生成一个,但一定要用post来提交
put localhost:9200/customer/_doc/1?pretty   body = {"name":"John Doe"}
post localhost:9200/customer/_doc/?pretty
//显性的设置分片路由，比如:
post localhost:9200/customer/_doc/?routing=kimchy  body={"user":"kimchy", "age":"man"}

{
    "_index": "customer",
    "_type": "_doc",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,  #索引应该在多少个分片上执行
        "successful": 1, #实际操作成功执行的分片数
        "failed": 0
    },
    "_seq_no": 1000,
    "_primary_term": 2
}

更新

文档在Elasticsearch中是不可变的——我们不能修改他们，它其实是先创建再删除的过程，过程如下：1. 从旧文档中检索JSON 2. 修改它 3. 删除旧文档 4. 索引新文档

post localhost:9200/customer/_doc/1?pretty  body = {"doc":{"name":"jack ld"}}

添加字段

post localhost:9200/customer/_doc/1?pretty  body = {"doc":{"name":"jack ld", "age":20}}

查询byId

//这里的id是上面的例子中生成的
get localhost:9200/customer/_doc/1?pretty
//显性的设置分片路由，性能会比较快，比如:
get localhost:9200/customer/_doc/1?routing=kimchy

删除

不会立刻删除，只是把需要删除的文档的ID记录到了一个列表中，当段合并时才会真正把源文件删除。

delete localhost:9200/customer/_doc/2?pretty

列表查询

get localhost:9200/bank/_search?q=*&sort=account_number:asc/1?pretty
bank：索引名称
q=*：指匹配所有文档
sort=account_number:asc：排序

{
    "took": 16, #总执行时间ms
    "timed_out": false, #搜索过程是否超时
    "_shards": {
        "total": 1, #总共搜索了多少分片，以及成功和失败了共多少
        "successful": 1, 
        "skipped": 0,
        "failed": 0
    },
    "hits": {  #实际的搜索结果
        "total": { 
            "value": 1000,   #总共返回1000条，但这个值是个参数值，只有relation=eq时此值才是准确的
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [ #结果数据
            {
                "_index": "bank",
                "_type": "_doc",
                "_id": "1",
                "_score": 1, #查询相似度
                "_source": {
                    "account_number": 1,
                    "balance": 39225,
                    "firstname": "Amber",
                    "lastname": "Duke",
                    "age": 32,
                    "gender": "M",
                    "address": "880 Holmes Lane",
                    "employer": "Pyrami",
                    "email": "amberduke@pyrami.com",
                    "city": "Brogan",
                    "state": "IL"
                }
            }
        ]
    }
}

查询数据分片归属

GET /_search_shards
GET /_search_shards?routing=liudong //查询某个索引在哪个分片上
{
  "nodes": {
    "J-Iss66sSmed7wd-VjT-lg": {
      "name": "5fc8e72b7ef3",
      "ephemeral_id": "wtLSy11WSbWAPtsjYUoELw",
      "transport_address": "172.17.0.2:9300",
      "attributes": {
        "ml.machine_memory": "4125892608",
        "xpack.installed": "true",
        "transform.node": "true",
        "ml.max_open_jobs": "20"
      }
    }
  },
  "indices": {
    "customer": {}
  },
  "shards": [
    [
      {
        "state": "STARTED",
        "primary": true,
        "node": "J-Iss66sSmed7wd-VjT-lg",
        "relocating_node": null,
        "shard": 0,
        "index": "customer",
        "allocation_id": {
          "id": "xotuBCQvR1CGHU-bhjl_Tg"
        }
      }
    ]
  ]
}

七、批量操作

执行过程中整个批量请求会加载到协调结点里，所以一次批量的文档最好在1000~5000个文档之间，大小最好在5M~15M之间。

批量查询（读）

合并多个请求可以避免每个请求单独的网络开销。如果你需要从Elasticsearch中检索多个文档，相对于一个一个的检索，更快的方式是在一个请求中使用multi-get或者 mget API。

GET /_mget 
{
  "docs" : [ 
    {
      "_index" : "website",
      "_type" : "blog",
      "_id" : 2
    }, 
    {
      "_index" : "website",
      "_type" : "pageviews",
      "_id" : 1,
      "_source" : "views"
    } 
  ]
}

GET /website/blog/_mget 
{
   "ids" : [ "2", "1" ] 
}

批量操作(写)

批量操作时，如果不会存在事务性的那种全成功才算成功的操作，它会把成功和失败的结果全部返回，供使用者查询

post localhost:9200/customer/_bulk?pretty
{"index":{"_id":"1"} }
{"name":"john doe"}
{"index":{"_id":"2"} }
{"name":"john doe"}
//数据格式有要求的，必须要换行，而且最后一行也要换行
post localhost:9200/customer/_bulk?pretty
{"update":{"_id":"1"} }
{"doc":{"name":"john doe becoemes jane doe"} }
{"delete":{"_id":"2"} }
//下面的create也可换成index、update、delete，不同的是如果文档已存在则create会报错
post localhost:9200/customer/_bulk?pretty
{ "create" : { "_index" : "customer", "_id" : "3" } }
{ "name" : "Alice" }
{ "create" : { "_index" : "customer", "_id" : "4" } }
{ "name" : "Smith" }

数据结构如下，在索引中会有1000条数据，建立在bank索引上：

"account_number": 1,
"balance": 39225,
"firstname": "Amber",
"lastname": "Duke",
"age": 32,
"gender": "M",
"address": "880 Holmes Lane",
"employer": "Pyrami",
"email": "amberduke@pyrami.com",
"city": "Brogan",
"state": "IL"

八、Query SQL

统计-count

GET /_count
{
  "count": 1000,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  }
}

分组-group by

{
     "size": 0,  #设置后不返回具体数据
     "aggs": {
          "group_by_state": {
               "terms": {
                    "field": "state.keyword"  # state是一个字段，keyword是一个关键字
               }
          }
     }
}

terms：相当于count，只有这个在field的字段后要加keyword，其它的聚合函数而不需要

分段分组

"aggs": {
                    "group_by_age": {
                         "range": {
                              "filed": "age",
                              "ranges": [
                                   {
                                        "from": 20,
                                        "to": 30
                                   }
                              ]
                         }
                    }
               }

集合查询-limit

{
     "query": {
          "match_all": {}
     },
     "size": 10 #默认值为10
}

分页查询-limit from

最大不能超为index.max_result_window的值，默认为10000

{
     "query": {
          "match_all": {}
     },
     "from": 2,
     "size": 10
}

字段排序-order by

{
     "query": {
          "match_all": {}
     },
     "sort": [
          {
               "account_number": "asc"  #这个排序值最好是唯一值
          }
     ]
}
//如果在排序的字段有多个值，还可以指定mode属性，可选的有：min, max, sum, avg, median。比如price中有多个价格，如果按当中最大值排序的话，查询语句要这样配置
     "sort": [
          {"price": {"order":"asc", "mode":"max"}
     ]
//如果某些文档没有排序字段，可以指定其位置
     {"price": {"order":"asc", "mode":"max", "missing":"_last||_first"}

查询特定字段

"_source": [
          "account_number","age"
     ]
 或
 "_source": {
    "includes": [ "account_number",  "firstname"],
    "excludes": ["*.price"]
    },

按特定字段匹配查询-where

"query": {
          "match": {"account_number"：1}
     },

按特定字段匹配查询-where多条件

GET customer/_search
{"query":{
    "match":{
        "text":{
            "query":"ld",
            "operator":"and"
        }
    }
}
}

按特定字段匹配查询多条件bool查询-or || and || not

bool下可以有多个子句，也可以在子句中嵌套bool查询。

//address的值必须同时包含Holmes和Lane
{
     "query": {
          "bool": {
               "must": [
                    { "match": {  "address": "Holmes" }  },
                    { "match": {  "address": "Lane" }  },
               ]
          }
     }
}

must：相对于and
should：相当于or
must not：and的相反

过滤-filter

它不计算score分值，其性能会更好一些。

"filter": {
                    "range": {
                         "balance": {
                              "glt": 10000,
                              "lte": 30000
                         }
                    }
               }