elasticsearch入门

elasticssearch / 2023-02-09
0 1,023

一、elasticsearch简介

Elasticsearch 是一个分布式、RESTful 风格的搜索和数据分析引擎,能够解决不断涌现出的各种用例。 作为 Elastic Stack 的核心,它集中存储您的数据,帮助您发现意料之中以及意料之外的情况。
——摘自官网

二、Docker安装es

建议和kibana搭配使用。

1、安装es
  • 启动命令

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xmx128m -Xms64m" -v /home/yanghuanxi/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /home/yanghuanxi/elasticsearch/data:/usr/share/elasticsearch/data -v /home/yanghuanxi/elasticsearch/plugins:/usr/share/elasticsearch/plugins -d elasticsearch:7.7.0

2、安装kibana
  • 启动命令
    docker run -d -p 5601:5601 --link elasticsearch -e "ELASTICSEARCH_URL=http://虚拟机地址:9200" kibana:7.7.0

可能遇到的错误:
解决 浏览器提示Kibana server is not ready yet,查看日志,是显示无法链接ES

kibana的启动需要指定es的地址,有两种方式,上面是第一种,没试过。
第二种,进入容器,修改配置文件,将里面的es地址修改了之后重启容器。

==⚠ 注意:需要给es和kibana挂载目录可读可写权限 chmod -R 777 elasticsearch/️==

三、基本命令

1、_cat

GET /_cat/nodes: 查看所有节点
GET /_cat/health: 查看es健康状态
GET /_cat/master: 查看es的主节点
GET /_cat/indices: 查看所有索引。相当于show databases;

2、新增索引

贴一个官方的示例数据

PUT /customer/_doc/1
{
  "name": "John Doe"
}
POST /customer/_doc/1
{
  "name": "John Doe"
}

修改也用这个接口。PUT和POST的区别在于使用PUT新增索引必须带id,而POST可以不带id,会自动生成。

3、查询
GET /customer/_doc/1
result >>> 
{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 26,
  "_primary_term" : 4,
  "found" : true,
  "_source" : {
    "name": "John Doe"
  }
}
4、批量插入

语法:

Request
POST /_bulk
POST /<index>/_bulk
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
5、更新
>>> 数据没发生变化的话version就不会增加
POST fzkj/_update/1
{
  "doc":{
      "name":"test2",
      "address":"北京2"
  }
}
>>> 数据没发生变化version也会增加
POST fzkj/_doc/1
{
      "name":"test3",
      "address":"北京3"
}
6、删除索引
DELETE fzkj/_doc/1
DELETE fzkj

四、查询进阶

1、match_all
GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

result >>> 
{
  "took" : 63,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
        "value": 1000,
        "relation": "eq"
    },
    "max_score" : null,
    "hits" : [ {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "0",
      "sort": [0],
      "_score" : null,
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

默认会返回命中的前10条记录。可以通过fromsize调整。

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" },
    { "xxx": "desc" }
  ],
  "from": 10,
  "size": 10,
  "_source": ["name", "balance"] # 指定返回的字段
}
2、match全文检索
>>> 查询address值为milk lane的数据(会分词)
GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}
3、match_phrase短语匹配
>>> 查询address值为milk lane的数据(不分词)
GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

要匹配短语除了使用match_phease之外,还可以使用key.keyword,,比如:

>>> 查询address值为milk lane的数据(不分词)
GET /bank/_search
{
  "query": { "match": { "address.keyword": "mill lane" } }
}

两种方式的区别:
match_phrase表示做短语匹配,会将待匹配的值当作一个短语,只要查找的单词中包含这个短语就算;
而.keyword会做一个精确匹配,只有结果中完整包含.keyword的值才算。

4、multi_match多字段匹配
GET /bank/_search
{
  "query": {
    "multi_match": {
    "query": "multi",
    "fields": ["state", "address"]
    }
  }
}

查询state字段或者address字段包含multi的数据

5、must
>>> 查询age = 40 & state != ID的数据
GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ],
      "should": [{
          "match": {"lastname": "zhangsan"}
      }]
    }
  }
}
6、filter结果过滤【不会共享相关性得分】
>>> 查询balance大于20000并且小于30000的数据
GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
7、term检索

term和match功能相同,都是用来做检索的。==区别在于term主要用来做精确字段的检索,match主要用来做全文检索==。比如:使用term查询年龄、性别等;使用match查询名字、描述等文本。

GET /bank/_search
{
  "query": { "term": { "age": "90" } }
}
8、aggregations聚合分析
  • 检索address中包含mill的所有人的年龄分布及平均年龄
GET bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": {
      "avg": {
        "field": "age"
      }
    }
  }
}

result >>> 
"aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : {
      "value" : 34.0
    }
  }
  • 按照年龄聚合,并且请求各年龄段的平均薪资
GET bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs": {
        "balanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}
result >>> 
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,
          "doc_count" : 2,
          "balanceAvg" : {
            "value" : 27806.5
          }
        },
        {
          "key" : 28,
          "doc_count" : 1,
          "balanceAvg" : {
            "value" : 19648.0
          }
        },
        {
          "key" : 32,
          "doc_count" : 1,
          "balanceAvg" : {
            "value" : 25571.0
          }
        }
      ]
    }
  }
  • 查出所有的年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资
GET bank/_search
{
  "query": {
    "match_all": {
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 100
      },
      "aggs": {
        "genderAvg": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "balanceAvg":{
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

五、mapping映射

mapping映射是用来定义一个索引下包含的属性(field)是如何存储索引的。比如,使用mapping来定义:

  • 哪些字符串属性应该被看做全文本属性
  • 哪些属性包含数字、日期和地理坐标
  • 文档中属性是否被索引
  • 日期的格式
  • 自定义映射规则来执行动态添加属性

==⚠️:es在6.0.0的版本中移了类型的概念,所有的数据都存储在索引下。==

官网

1、创建映射
PUT /my-index
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}
2、添加新的字段映射
PUT /my-index/_mapping
{
    "properties": {
      "employee-id":    { 
        "type": "integer" 
        "index": false # 不被索引
      }  
    }
}

==⚠️:不能更新映射,可以迁移数据==

3、迁移数据【reindex】

先创建出正确的映射,然后使用如下方式进行数据迁移:

POST _reindex
{
    "source": {
        "index": "old"
    },
    "dest": {
        "index": "new"
    }
}

六、安装ik分词器

新版本es好像内置了这个插件了。
说下思路:
下载ik分词器到es下的plugins文件夹中,修改ik的IKAnalyzer.cfg.xml文件,将里面的词库地址修改下。

七、springboot整合es

官网

es