5、Apache Phoenix(5.0.0-5.1.2) 介绍及部署、使用(基本使用、综合使用、二级索引示例)、数据分区示例
  nNPyvzOmRTFq 2023年11月02日 47 0

Apache Hbase 系列文章

1、hbase-2.1.0介绍及分布式集群部署、HA集群部署、验证、硬件配置推荐 2、hbase-2.1.0 shell基本操作详解 3、HBase的java API基本操作(创建、删除表以及对数据的添加、删除、查询以及多条件查询) 4、HBase使用(namespace、数据分区、rowkey设计、原生api访问hbase) 5、Apache Phoenix(5.0.0-5.1.2) 介绍及部署、使用(基本使用、综合使用、二级索引示例)、数据分区示例 6、Base批量装载——Bulk load(示例一:基本使用示例) 7、Base批量装载-Bulk load(示例二:写千万级数据-mysql数据以ORCFile写入hdfs,然后导入hbase) 8、HBase批量装载-Bulk load(示例三:写千万级数据-mysql数据直接写成Hbase需要的数据,然后导入hbase)


(文章目录)


本文介绍了Apache Phoenix功能、部署以及基本使用示例以及综合使用示例,其中涉及shell应用和Java api应用。 本文前提是hbase可用。 本文分为5个部分,即Apache Phoenix介绍、部署与验证、基本使用示例、综合使用示例和数据分区的示例。

一、Apache Phoenix介绍

1、Phoenix简介

Phoenix官方网址:http://phoenix.apache.org/ Phoenix官网:「We put the SQL back in NoSQL」 在这里插入图片描述

  • Apache Phoenix让Hadoop中支持低延迟OLTP和业务操作分析。
  • 提供标准的SQL以及完备的ACID事务支持
  • 通过利用HBase作为存储,让NoSQL数据库具备通过有模式的方式读取数据,我们可以使用SQL语句来操作HBase,例如:创建表、以及插入数据、修改数据、删除数据等。
  • Phoenix通过协处理器在服务器端执行操作,最小化客户机/服务器数据传输
  • Apache Phoenix可以很好地与其他的Hadoop组件整合在一起,例如:Spark、Hive、Flume以及MapReduce。

Phoenix其本质是用Java写的基于JDBC API操作HBase的开源SQL引擎。 它有如下几个功能特性:

  • 通过JDBC API实现了大部分的java.sql接口,包括元数据API
  • DDL支持:通过CREATE TABLE、DROP TABLE及ALTER TABLE来添加/删除
  • DML支持:用于逐行插入的UPSERT VALUES,用于相同或不同表之间大量数据传输的UPSERT SELECT,用于删除行的DELETE
  • 事务支持:通过客户端的批处理实现的有限的事务支持
  • 二级索引支持
  • 遵循ANSI SQL标准

Phoenix可能存在各种不稳定,如下面描述的几点问题:

  • 最新版本对HBase、Hadoop等有严格版本控制,对于已经用上HBase的业务来说要升级HBase版本适配Phoenix代价太大
  • 与HBase强相关,作为HBase中的一个组件启动,HBase元数据容易遭到破坏
  • 官方提供的创建索引方法,容易导致插入失败,查询失败,程序崩溃等问题
  • 二级索引这个特性应该是大部分用户引入Phoenix主要考虑的因素之一。HBase因其历史原因只支持rowkey索引,当使用rowkey来查询数据时可以很快定位到数据位置。现实中,业务查询需求条件往往比较复杂,带有多个查询字段组合,如果用HBase查的话,只能全表扫描进行过滤,效率很低。而Phoenix支持除rowkey外的其它字段的索引创建,即二级索引,查询效率可大幅提升。

2、索引类别

1)、Covered Indexes

覆盖索引,即索引表中就包含你想要的全部字段数据,这样就只需要通过访问索引表而无需访问主表就能得到数据。 语法:create index my_index on test (v1) include(v2);

当执行select v2 from test where v1='...'时,就只会查找索引表数据,不会去主表扫描。

2)、Global Indexes 全局索引

  • 全局索引适用于读多写少的场景。

  • 全局索引在写数据时会消耗大量资源,所有对数据的增删改操作都会更新索引表,而索引表是分布在各个结点上的,性能会受到影响。

  • 有点:在读多的场景下如果查询的字段用到索引,效率会很快,因为可以很快定位到数据所在具体结点region上,对于写性能就很慢了,因为每写一次,需要更新所有结点上的索引表数据。

  • 语法:create index my_index on test (v1);

  • 如果执行`select v2 from test where v1='...', 实际是用不上索引的,因为v2不在索引字段中,对于全局索引来说,如果查询的字段不包含在索引表中,则还是会去全表扫描主表。

3)、Local Indexes 本地索引

  • 本地索引适用于写多读少场景。
  • 和全局索引类似,Phoenix会在查询时自动选择是否使用索引。如果定义为局部索引,索引表数据和主表数据会放在同一regionserver上,避免写操作时跨节点写索引表带来的额外开销(如Global Indexes)。当使用局部索引查询时,即使查询字段不是索引字段,索引表也会正常使用,这和Global Indexes是有区别的。
  • 在4.8版本之前,所有局部索引数据存放在一个单独的共享表中,4.8之后是存储在主表的一个独立的列族中。因为是局部索引,所以在client端查询使用索引时,需要扫描每个结点上的索引表以得到数据所在具体region位置,当region多时,查询时耗会很高,所以查询性能比较低,适合读少写多场景。
  • 语法:create local index my_index on test (v1);

4)、Mutable Indexing 和Immutable Indexing

  • IMMutable Indexing 不可变索引主要创建在不可变表上,适用于数据只写一次不会有Update等操作,在什么场景下会用到不可变索引呢,很经典的时序数据:write once read many times。在这种场景下,所有索引数据(primary和index)要么全部写成功,要么一个失败全都失败返回错误给客户端。 不可变索引用到场景比较少, 语法:create table test (pk VARCHAR primary key,v1 VARCHAR, v2 VARCHAR) IMMUTABLE_ROWS=true; 即在创建表时指定IMMUTABLE_ROWS参数为true,默认这个参数为false。如果想把不可变索引改为可变索引,可用alter修改: alter table test set IMMUTABLE_ROWS=false;
  • Mutable Indexing 可变索引意思是在修改数据如Insert、Update或Delete数据时会同时更新索引。这里的索引更新涉及WAL,即主表数据更新时,会把索引数据也同步更新到WAL,只有当WAL同步到磁盘时才会去更新实际的primary/index数据,以保证当中间任何一个环节异常时可通过WAL来恢复主表和索引表数据。

3、使用Phoenix是否会影响HBase性能

  • Phoenix不会影响HBase性能,反而会提升HBase性能
  • Phoenix将SQL查询编译为本机HBase扫描
  • 确定scan的key的最佳startKey和endKey
  • 编排scan的并行执行
  • 将WHERE子句中的谓词推送到服务器端
  • 通过协处理器执行聚合查询
  • 用于提高非行键列查询性能的二级索引
  • 统计数据收集,以改进并行化,并指导优化之间的选择
  • 跳过扫描筛选器以优化IN、LIKE和OR查询
  • 行键加盐保证分配均匀,负载均衡

4、官方性能测试

  • Phoenix VS Hive(基于HDFS和HBase) 在这里插入图片描述
  • Phoenix VS Impala 在这里插入图片描述

二、部署Phoenix

1、下载

大家可以从官网上下载与HBase版本对应的Phoenix版本。对应到HBase 2.1,应该使用版本「5.0.0-HBase-2.0」。 下载链接:http://phoenix.apache.org/download.html

2、安装

以下操作是在server1上完成的,如果需要多个phoenix,则需要复制多份即可。

以下英文部分是安装版本:phoenix-hbase-2.1-5.1.2-bin.tar.gz,其他是安装apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz。本示例全部是以apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz进行说明的。 在这里插入图片描述

启动时可能会出现警告: java.lang.IllegalArgumentException: Bad history file syntax! The history file /home/alanchan/.sqlline/history may be an older history: please remove it or use a different history file. 删除.sqlline/history(rm -R .sqlline/),再启动即可。

1)、上传安装包到Linux系统,并解压

cd /usr/local/bigdata
#文件上传至/usr/local/bigdata,解压至/usr/local/bigdata
tar -xvzf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz -C /usr/local/bigdata

2)、将phoenix的所有jar包添加到所有HBase RegionServer和Master的lib目录

#  拷贝jar包到hbase lib目录 
cp /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/phoenix-*.jar /usr/local/bigdata/hbase-2.1.0/lib/

#  进入到hbase lib  目录查看相关jar文件是否拷贝完成
cd /usr/local/bigdata/hbase-2.1.0/lib/

# 分发jar包到每个HBase 节点
scp phoenix-*.jar server2:$PWD
scp phoenix-*.jar server3:$PWD
scp phoenix-*.jar server4:$PWD

3)、修改配置文件

该步骤注意: 如果是hbase和phoenix环境是同时建立的,则可以;否则参见后面启动phoenix时的异常说明。

cd /usr/local/bigdata/hbase-2.1.0/conf/

vim hbase-site.xml

# 1. 将以下配置添加到 hbase-site.xml 后边
<!-- 支持HBase命名空间映射 -->
<property>
    <name>phoenix.schema.isNamespaceMappingEnabled</name>
    <value>true</value>
</property>
<!-- 支持索引预写日志编码 -->
<property>
  <name>hbase.regionserver.wal.codec</name>
  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>


# 2. 将hbase-site.xml分发到每个节点
scp hbase-site.xml server2:$PWD
scp hbase-site.xml server3:$PWD
scp hbase-site.xml server4:$PWD

4)、将配置后的hbase-site.xml拷贝到phoenix的bin目录

#此处仅仅时在server1上进行部署phoenix,如果需要多台机器上进行部署,则需要拷贝至多个机器上,但未验证
cp /usr/local/bigdata/hbase-2.1.0/conf/hbase-site.xml /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/bin/

5)、重新启动HBase

stop-hbase.sh
start-hbase.sh

6)、启动Phoenix客户端,连接Phoenix Server

注意:第一次启动Phoenix连接HBase会稍微慢一点。
cd /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/

bin/sqlline.py server2:2118(zookeeper服务,任一即可)
# 输入!table查看Phoenix中的表

!table

#按照上述配置后出现的异常
[alanchan@server1 apache-phoenix-5.0.0-HBase-2.0-bin]$ bin/sqlline.py server1:2118
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:server1:2118 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:server1:2118
22/09/29 17:05:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error: ERROR 726 (43M10):  Inconsistent namespace mapping properties. Ensure that config phoenix.schema.isNamespaceMappingEnabled is consistent on client and server. (state=43M10,code=726)
java.sql.SQLException: ERROR 726 (43M10):  Inconsistent namespace mapping properties. Ensure that config phoenix.schema.isNamespaceMappingEnabled is consistent on client and server.
        at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
        at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1318)

# 解决办法
# if set once, should not be rollback. Old client will not work after this property is enabled.
# 如果设置一次,就不应该回滚。启用此属性后,旧客户端将无法工作。由于当前的环境是使用了一段时间后增加的phoenix,上述应该是针对新的环境配置(但未验证)。
# 解决方法如下:
# 1、删除在hbase和phoenix的hbase-site.xml下设置的关于命名空间启用的设置。
# 2、重启hbase,进入hbase shell
# 3、按照下面步骤进行操作
# 修改hbase中表SYSTEM:CATALOG名为SYSTEM.CATALOG
[alanchan@server1 conf]$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 2.1.0, re1673bb0bbfea21d6e5dba73e013b09b8b49b89b, Tue Jul 10 17:26:48 CST 2018
Took 0.0027 seconds                                                                                                                                                                                                                                                      
hbase(main):001:0> disable 'SYSTEM:CATALOG'
Took 1.7845 seconds                                                                                                                                                                                                                                                      
hbase(main):002:0> snapshot 'SYSTEM:CATALOG', 'cata_tableSnapshot'
Took 1.0076 seconds                                                                                                                                                                                                                                                      
hbase(main):003:0> clone_snapshot 'cata_tableSnapshot', 'SYSTEM.CATALOG'
Took 4.3310 seconds                                                                                                                                                                                                                                                      
hbase(main):004:0> drop 'SYSTEM:CATALOG'
Took 1.2898 seconds

# 按照上述处理后,重新启动phoenix后正常
[alanchan@server1 bin]$ ./sqlline.py server2:2118
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:server2:2118 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:server2:2118
22/09/29 17:16:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:server2:2118> 
0: jdbc:phoenix:server2:2118> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_TENANT  | VIEW_STATEMENT  | VIEW_TYPE  | INDEX_TYPE  | TRANSACTIONAL  | IS_NA |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true            | 32            | false         |                 |            |             | false          | false |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
0: jdbc:phoenix:server2:2118> 

在这里插入图片描述

7)、查看HBase的Web UI

可以看到Phoenix在default命名空间下创建了一些表,而且该系统表加载了大量的协处理器。 在这里插入图片描述

三、入门示例

具体的官网操作示例:https://phoenix.apache.org/language/index.html 在这里插入图片描述

1、表操作

在Phoenix中,我们可以使用类似于SQL DDL的方式快速创建表。

1)、创建表

CREATE TABLE IF NOT EXISTS 表名 (
   ROWKEY 名称 数据类型 PRIMARY KEY
          列蔟名.列名1 数据类型 NOT NULL,
          列蔟名.列名2 数据类型 NOT NULL,
          列蔟名.列名3 数据类型
          );
--- 订单明细建表语句
create table if not exists order_info(
    ID varchar primary key,
    C1.STATUS varchar,
    C1.MONEY float,
    C1.PAY_WAY integer,
    C1.USER_ID varchar,
    C1.OPERATION_TIME varchar,
    C1.CATEGORY varchar
);

0: jdbc:phoenix:server2:2118> create table if not exists order_info(
. . . . . . . . . . . . . . >     ID varchar primary key,
. . . . . . . . . . . . . . >     C1.STATUS varchar,
. . . . . . . . . . . . . . >     C1.MONEY float,
. . . . . . . . . . . . . . >     C1.PAY_WAY integer,
. . . . . . . . . . . . . . >     C1.USER_ID varchar,
. . . . . . . . . . . . . . >     C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . >     C1.CATEGORY varchar
. . . . . . . . . . . . . . > );
No rows affected (2.449 seconds)
0: jdbc:phoenix:server2:2118> select * from order_info;
+-----+---------+--------+----------+----------+-----------------+-----------+
| ID  | STATUS  | MONEY  | PAY_WAY  | USER_ID  | OPERATION_TIME  | CATEGORY  |
+-----+---------+--------+----------+----------+-----------------+-----------+
+-----+---------+--------+----------+----------+-----------------+-----------+
No rows selected (0.073 seconds)

通过HBase的Web UI,我们可以看到Phoenix帮助我们自动在HBase中创建了一张名为 order_info 的表格。 在这里插入图片描述

注意: 原 HBase 中的表不会自动映射到 Phoenix 中去,并不会在 Phoenix 中显示出来,但在 Phoenix 中创建相同名字的表,原表数据会导入到表中相应的列(列名一模一样),可以看到原先的数据。其中必须有一项设置为 PRIMARY KEY,表示该项作为 ROWKEY,且在 Phoenix 中,所有名字都会被自动转化成大写,如果需要保留小写,则需加上双引号

2)、查看表的信息

# !desc order_info
# 注意:一定要加上 !

0: jdbc:phoenix:server2:2118> !desc order_info
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+----------------+-------------------+--------------------+--------------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |   COLUMN_NAME   | DATA_TYPE  | TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX  | NULLABLE  | REMARKS  | COLUMN_DEF  | SQL_DATA_TYPE  | SQL_DATETIME_SUB  | CHAR_OCTET_LENGTH  | ORDINAL_POSI |
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+----------------+-------------------+--------------------+--------------+
|            |              | ORDER_INFO  | ID              | 12         | VARCHAR    | null         | null           | null            | null            | 0         |          |             | null           | null              | null               | 1            |
|            |              | ORDER_INFO  | STATUS          | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             | null           | null              | null               | 2            |
|            |              | ORDER_INFO  | MONEY           | 6          | FLOAT      | null         | null           | null            | null            | 1         |          |             | null           | null              | null               | 3            |
|            |              | ORDER_INFO  | PAY_WAY         | 4          | INTEGER    | null         | null           | null            | null            | 1         |          |             | null           | null              | null               | 4            |
|            |              | ORDER_INFO  | USER_ID         | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             | null           | null              | null               | 5            |
|            |              | ORDER_INFO  | OPERATION_TIME  | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             | null           | null              | null               | 6            |
|            |              | ORDER_INFO  | CATEGORY        | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             | null           | null              | null               | 7            |
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+----------------+-------------------+--------------------+--------------+

3)、删除表语法

drop table if exists order_info;

4)、大小写问题

在HBase中,如果在列簇、列名没有添加双引号。Phoenix会自动转换为大写。 如果要将列的名字改为小写,需要使用双引号。 例如:

create table if not exists order_info(
    ID varchar primary key,
    C1.STATUS varchar,
    C1.MONEY float,
    C1.PAY_WAY integer,
    C1.USER_ID varchar,
    C1.OPERATION_TIME varchar,
    C1.CATEGORY varchar
);

---注意:一旦加了小写,后面都得任何应用该列的地方都得使用双引号,否则将报以下错误:
---Error: ERROR 504 (42703): Undefined column. columnName=order_info.ID

在这里插入图片描述

2、插入数据

在Phoenix中,插入并不是使用insert来实现的。而是 「upsert 」命令。 它的功能为insert + update,与HBase中的put相对应。 如果不存在则插入,否则更新。列表是可选的,如果不存在,值将按模式中声明的顺序映射到列。这些值必须计算为常量。

-- 语法
upsert into 表名(列蔟列名, xxxx, ) VALUES(XXX, XXX, XXX)

--示例
UPSERT INTO order_info VALUES('000001', '已提交', 4070, 1, '4944191', '2020-04-25 12:09:16', '手机;');

插入一条数据 在这里插入图片描述

3、查询数据

与标准SQL一样,Phoenix也是使用select语句来实现数据的查询。 注意:创建表的时候,如果是小写的话,phoenix会自动给转成大写(包含字段和表明),使用的时候,字段必须使用转化后的字段名称,表名称可以使用小写。

-- 查询所有数据
SELECT * FROM order_info;

-- 根据ID查询数据
SELECT * FROM order_info WHERE "ID" = '000001';

4、更新数据

在Phoenix中,更新数据也是使用UPSERT。

-- 语法格式如下:
UPSERT INTO 表名(列名, …) VALUES(对应的值, …);

-- 示例:将ID为'000001'的订单状态修改为已付款。ID是rowkey。如果数据存在则更新;否则是插入。
UPSERT INTO order_info ("ID", C1."STATUS") VALUES ('000001', '已付款');

5、根据ID删除数据

DELETE FROM order_info WHERE "ID" = '000001';
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('000001', '已提交', 4070, 1, '4944191', '2020-04-25 12:09:16', '手机;');
1 row affected (0.084 seconds)


0: jdbc:phoenix:server2:2118> SELECT * FROM order_info;
+---------+---------+---------+----------+----------+----------------------+-----------+
|   ID    | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    | CATEGORY  |
+---------+---------+---------+----------+----------+----------------------+-----------+
| 000001  | 已提交     | 4070.0  | 1        | 4944191  | 2020-04-25 12:09:16  | 手机;       |
+---------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.045 seconds)


0: jdbc:phoenix:server2:2118> UPSERT INTO order_info ("ID", C1."STATUS") VALUES ('000001', '已付款');
1 row affected (0.009 seconds)
0: jdbc:phoenix:server2:2118> select * from order_info;
+---------+---------+---------+----------+----------+----------------------+-----------+
|   ID    | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    | CATEGORY  |
+---------+---------+---------+----------+----------+----------------------+-----------+
| 000001  | 已付款     | 4070.0  | 1        | 4944191  | 2020-04-25 12:09:16  | 手机;       |
+---------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.032 seconds)


0: jdbc:phoenix:server2:2118> SELECT * FROM order_info WHERE "ID" = '000001';
+---------+---------+---------+----------+----------+----------------------+-----------+
|   ID    | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    | CATEGORY  |
+---------+---------+---------+----------+----------+----------------------+-----------+
| 000001  | 已付款     | 4070.0  | 1        | 4944191  | 2020-04-25 12:09:16  | 手机;       |
+---------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.035 seconds)


0: jdbc:phoenix:server2:2118> DELETE FROM order_info WHERE "ID" = '000001';
1 row affected (0.013 seconds)
0: jdbc:phoenix:server2:2118> select * from order_info;
+-----+---------+--------+----------+----------+-----------------+-----------+
| ID  | STATUS  | MONEY  | PAY_WAY  | USER_ID  | OPERATION_TIME  | CATEGORY  |
+-----+---------+--------+----------+----------+-----------------+-----------+
+-----+---------+--------+----------+----------+-----------------+-----------+
No rows selected (0.029 seconds)


0: jdbc:phoenix:server2:2118> select ID,STATUS from order_info;
+-----+---------+
| ID  | STATUS  |
+-----+---------+
+-----+---------+
No rows selected (0.018 seconds)

6、导入测试数据

为了方便我们做更多的查询,将以下SQL语句复制到Phoenix客户端中执行。


UPSERT INTO order_info VALUES('000002','已提交',4070,1,'4944191','2020-04-25 12:09:16','手机;');
UPSERT INTO order_info VALUES('000003','已完成',4350,1,'1625615','2020-04-25 12:09:37','家用电器;;电脑;');
UPSERT INTO order_info VALUES('000004','已提交',6370,3,'3919700','2020-04-25 12:09:39','男装;男鞋;');
UPSERT INTO order_info VALUES('000005','已付款',6370,3,'3919700','2020-04-25 12:09:44','男装;男鞋;');
UPSERT INTO order_info VALUES('000006','已提交',9380,1,'2993700','2020-04-25 12:09:41','维修;手机;');
UPSERT INTO order_info VALUES('000007','已付款',9380,1,'2993700','2020-04-25 12:09:46','维修;手机;');
UPSERT INTO order_info VALUES('000008','已完成',6400,2,'5037058','2020-04-25 12:10:13','数码;女装;');
UPSERT INTO order_info VALUES('000009','已付款',280,1,'3018827','2020-04-25 12:09:53','男鞋;汽车;');
UPSERT INTO order_info VALUES('000010','已完成',5600,1,'6489579','2020-04-25 12:08:55','食品;家用电器;');
UPSERT INTO order_info VALUES('000011','已付款',5600,1,'6489579','2020-04-25 12:09:00','食品;家用电器;');
UPSERT INTO order_info VALUES('000012','已提交',8340,2,'2948003','2020-04-25 12:09:26','男装;男鞋;');
UPSERT INTO order_info VALUES('000013','已付款',8340,2,'2948003','2020-04-25 12:09:30','男装;男鞋;');
UPSERT INTO order_info VALUES('000014','已提交',7060,2,'2092774','2020-04-25 12:09:38','酒店;旅游;');
UPSERT INTO order_info VALUES('000015','已提交',640,3,'7152356','2020-04-25 12:09:49','维修;手机;');
UPSERT INTO order_info VALUES('000016','已付款',9410,3,'7152356','2020-04-25 12:10:01','维修;手机;');
UPSERT INTO order_info VALUES('000017','已提交',9390,3,'8237476','2020-04-25 12:10:08','男鞋;汽车;');
UPSERT INTO order_info VALUES('000018','已提交',7490,2,'7813118','2020-04-25 12:09:05','机票;文娱;');
UPSERT INTO order_info VALUES('000019','已付款',7490,2,'7813118','2020-04-25 12:09:06','机票;文娱;');
UPSERT INTO order_info VALUES('000020','已付款',5360,2,'5301038','2020-04-25 12:08:50','维修;手机;');
UPSERT INTO order_info VALUES('000021','已提交',5360,2,'5301038','2020-04-25 12:08:53','维修;手机;');
UPSERT INTO order_info VALUES('000022','已取消',5360,2,'5301038','2020-04-25 12:08:58','维修;手机;');
UPSERT INTO order_info VALUES('000023','已付款',6490,0,'3141181','2020-04-25 12:09:22','食品;家用电器;');
UPSERT INTO order_info VALUES('000024','已付款',3820,1,'9054826','2020-04-25 12:10:04','家用电器;;电脑;');
UPSERT INTO order_info VALUES('000025','已提交',4650,2,'5837271','2020-04-25 12:08:52','机票;文娱;');
UPSERT INTO order_info VALUES('000026','已付款',4650,2,'5837271','2020-04-25 12:08:57','机票;文娱;');

在这里插入图片描述

7、分页查询

使用limit和offset可以快速进行分页。 limit表示每页多少条记录,offset表示从第几条记录开始查起。

-- 第一页
select * from order_info limit 10 offset 0;
-- 第二页
-- offset从10开始
select * from order_info limit 10 offset 10;
-- 第三页
select * from order_info limit 10 offset 20;

在这里插入图片描述

8、分区表

默认创建表的方式,则HBase顺序写入可能会受到RegionServer热点的影响。对行键进行加盐可以解决热点问题。在HBase中,可以使用两种方式:

  1. ROWKEY预分区
  2. 加盐指定数量分区

1)、ROWKEY预分区

按照用户ID来分区,一共4个分区。并指定数据的压缩格式为GZ

drop table if exists order_info;
create table if not exists order_info(
    ID varchar primary key,
    C1.STATUS varchar,
    C1.MONEY float,
    C1.PAY_WAY integer,
    C1.USER_ID varchar,
    C1.OPERATION_TIME varchar,
    C1.CATEGORY varchar
)
CONPRESSION='GZ'
SPLIT ON ('3','5','7');

#SPLIT ON ('3','5','7'),3、5、7表示是rowkey以3、5、7作为判断数据落入的区间,具体参见本文末尾关于该部分的示例。

0: jdbc:phoenix:server2:2118> drop table if exists order_info;
No rows affected (2.303 seconds)
0: jdbc:phoenix:server3:2118> create table if not exists order_info(
. . . . . . . . . . . . . . >     ID varchar primary key,
. . . . . . . . . . . . . . >     C1.STATUS varchar,
. . . . . . . . . . . . . . >     C1.MONEY float,
. . . . . . . . . . . . . . >     C1.PAY_WAY integer,
. . . . . . . . . . . . . . >     C1.USER_ID varchar,
. . . . . . . . . . . . . . >     C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . >     C1.CATEGORY varchar
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > CONPRESSION='GZ'
. . . . . . . . . . . . . . > SPLIT ON ('3','5','7');
No rows affected (6.944 seconds)
0: jdbc:phoenix:server3:2118> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_TENANT  | VIEW_STATEMENT  | VIEW_TYPE  | INDEX_TYPE  | TRANSACTIONAL  | IS_NA |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true            | 32            | false         |                 |            |             | false          | false |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
|            |              | ORDER_INFO  | TABLE         |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | false |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
0: jdbc:phoenix:server3:2118> 

通过图形界面可以看到是分区了 在这里插入图片描述 在这里插入图片描述 插入一些数据,然后去HBase中查看数据的分布情况。

0: jdbc:phoenix:server3:2118> UPSERT INTO order_info VALUES('f642b16b-eade-4169-9eeb-4d5f294ec594','已提交',4010,1,'6463215','2020-04-25 12:09:29','男鞋;汽车;');
1 row affected (0.003 seconds)
0: jdbc:phoenix:server3:2118> UPSERT INTO order_info VALUES('f642b16b-eade-4169-9eeb-4d5f294ec594','已付款',4010,1,'6463215','2020-04-25 12:09:33','男鞋;汽车;');
1 row affected (0.004 seconds)
0: jdbc:phoenix:server3:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183845','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
1 row affected (0.003 seconds)
0: jdbc:phoenix:server3:2118> select * from order_info;
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
|                  ID                   | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    |  CATEGORY  |
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
| 02602f66-adc7-40d4-8485-76b5632b5b53  | 已提交     | 4070.0  | 1        | 4944191  | 2020-04-25 12:09:16  | 手机;        |
| 31477850-8b15-4f1b-9ec3-939f7dc47241  | 已付款     | 4650.0  | 2        | 5837271  | 2020-04-25 12:08:57  | 机票;文娱;     |
| 526e33d2-a095-4e19-b759-0017b13666ca  | 已完成     | 3280.0  | 0        | 5553283  | 2020-04-25 12:09:01  | 食品;家用电器;   |
| 70fa0ae0-6c02-4cfa-91a9-6ad929fe6b1b  | 已付款     | 4100.0  | 1        | 8598963  | 2020-04-25 12:09:08  | 维修;手机;     |
| 0968a418-f2bc-49b4-b9a9-2157cf214cfd  | 已完成     | 4350.0  | 1        | 1625615  | 2020-04-25 12:09:37  | 家用电器;;电脑;  |
| 39319322-2d80-41e7-a862-8b8858e63316  | 已完成     | 5000.0  | 1        | 5686435  | 2020-04-25 12:08:56  | 家用电器;;电脑;  |
| 5a6932f4-b4a4-4a1a-b082-2475d13f9240  | 已提交     | 50.0    | 2        | 1764961  | 2020-04-25 12:10:07  | 家用电器;;电脑;  |
| 7170ce71-1fc0-4b6e-a339-67f525536dcd  | 已提交     | 9740.0  | 1        | 4816392  | 2020-04-25 12:10:03  | 数码;女装;     |
| 0e01edba-5e55-425e-837a-7efb91c56630  | 已付款     | 6370.0  | 3        | 3919700  | 2020-04-25 12:09:44  | 男装;男鞋;     |
| 3d2254bd-c25a-404f-8e42-2faa4929a629  | 已完成     | 5000.0  | 1        | 1274270  | 2020-04-25 12:08:43  | 男装;男鞋;     |
| 5fc0093c-59a3-417b-a9ff-104b9789b530  | 已提交     | 6310.0  | 2        | 1292805  | 2020-04-25 12:09:36  | 男装;男鞋;     |
| 71961b06-290b-457d-bbe0-86acb013b0e3  | 已完成     | 6550.0  | 3        | 2393699  | 2020-04-25 12:08:49  | 男鞋;汽车;     |
| 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751  | 已付款     | 9380.0  | 1        | 2993700  | 2020-04-25 12:09:46  | 维修;手机;     |
| 42f7fe21-55a3-416f-9535-baa222cc0098  | 已完成     | 3600.0  | 2        | 2661641  | 2020-04-25 12:09:58  | 维修;手机;     |
| 605c6dd8-123b-4088-a047-e9f377fcd866  | 已完成     | 8980.0  | 2        | 6202324  | 2020-04-25 12:09:54  | 机票;文娱;     |

发现数据分布在每一个Region中 在这里插入图片描述

2)、加盐指定数量分区

注意:CONPRESSION和SALT_BUCKETS之间需要使用逗号分隔,否则会出现语法错误
drop table if exists order_info;
create table if not exists order_info(
    ID varchar primary key,
    C1.STATUS varchar,
    C1.MONEY float,
    C1.PAY_WAY integer,
    C1.USER_ID varchar,
    C1.OPERATION_TIME varchar,
    C1.CATEGORY varchar
)
CONPRESSION='GZ', SALT_BUCKETS=10;

0: jdbc:phoenix:server3:2118> drop table if exists order_info;
No rows affected (2.649 seconds)
0: jdbc:phoenix:server3:2118> create table if not exists order_info(
. . . . . . . . . . . . . . >     ID varchar primary key,
. . . . . . . . . . . . . . >     C1.STATUS varchar,
. . . . . . . . . . . . . . >     C1.MONEY float,
. . . . . . . . . . . . . . >     C1.PAY_WAY integer,
. . . . . . . . . . . . . . >     C1.USER_ID varchar,
. . . . . . . . . . . . . . >     C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . >     C1.CATEGORY varchar
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > CONPRESSION='GZ', SALT_BUCKETS=10;
No rows affected (2.394 seconds)

在HBase的Web UI中可以查看到生成了10个Region 在这里插入图片描述 插入数据后,发现数据分部在每一个Region中 在这里插入图片描述 查看HBase中的表,发现Phoenix在每个ID前,都添加了一个Hash值,用来将分布分布到不同的Region中

#phoenix 客户端查询
0: jdbc:phoenix:server3:2118> select * from order_info where ID = 'f8f3ca6f-2f5c-44fd-9755-1792de183845';
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
|                  ID                   | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    | CATEGORY  |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| f8f3ca6f-2f5c-44fd-9755-1792de183845  | 已付款     | 5950.0  | 3        | 4060214  | 2020-04-25 12:09:12  | 机票;文娱;    |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.029 seconds)

#HBase 客户端查询
hbase(main):025:0> scan "ORDER_INFO", {LIMIT => 1}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x00\x00\x00\x00, timestamp=1664447540561, value=x                                                                                                                                         
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x80\x0B, timestamp=1664447540561, value=\xE5\xB7\xB2\xE4\xBB\x98\xE6\xAC\xBE                                                                                                              
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x80\x0C, timestamp=1664447540561, value=\xC6\x12\x90\x01                                                                                                                                  
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x80\x0D, timestamp=1664447540561, value=\x80\x00\x00\x01                                                                                                                                  
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x80\x0E, timestamp=1664447540561, value=2993700                                                                                                                                           
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x80\x0F, timestamp=1664447540561, value=2020-04-25 12:09:46                                                                                                                               
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                           column=C1:\x80\x10, timestamp=1664447540561, value=\xE7\xBB\xB4\xE4\xBF\xAE;\xE6\x89\x8B\xE6\x9C\xBA;                                                                                                
1 row(s)
Took 0.0931 seconds 

四、综合示例

1、建立视图

1)、应用场景

因为我们之前已经创建了 order_info 表,而且数据添加的方式都是以PUT方式原生API来添加的。故此时,我们不再需要再使用Phoenix创建新的表,而是使用Phoenix中的视图,通过视图来建立与HBase表之间的映射,从而实现数据快速查询。

2)、视图介绍

我们可以在现有的HBase或Phoenix表上创建一个视图。表、列簇和列名必须与现有元数据完全匹配,否则会出现异常。当创建视图后,就可以使用SQL查询视图,和操作Table一样。 语法示例

# Delete phoenix view
DROP VIEW IF EXISTS VIEW_NAME CASCADE;
 
# Create phoenix view
CREATE VIEW VIEW_NAME
(
   ROWKEY   VARCHAR PRIMARY KEY,
   "F1".TEST_ID UNSIGNED_LONG -- F1 is column family
) AS SELECT * FROM VIEW_NAME;

  • 第一种方式创建视图 假如HBase中存在表”student”,需要以视图的方式操作它,那么开始创建视图 会自动对应表名,自动关联字段。
--需要删除phoenix中存在的“student”表:
CREATE view "student"(
"ROW" varchar primary key,
"info"."age" varchar,
"info"."name" varchar,
"info"."sex" varchar
);

CREATE view "user_info"(
"ROW" varchar primary key,
"ext_info"."age" varchar,
"ext_info"."name" varchar,
"ext_info"."sex" varchar
);

CREATE view "user_info"(
"ROW" varchar primary key,
"ext_info"."age" varchar,
"ext_info"."name" varchar,
"ext_info"."hobby" varchar
);

---这里双引号内的 “student” 和HBase中的表名是一样的,所以会自动关联。
  • 第二种方式创建视图 可以在Phoenix table的基础上创建

CREATE VIEW my_VIEW (
new_col VARCHAR,
new_col2 VARCHAR
)
AS
SELECT
*
FROM phoenix_Table
WHERE ……
  • 第三种方式创建视图 是在视图之上建立视图
CREATE VIEW my_VIEW (
new_col VARCHAR,
new_col2 VARCHAR
)
AS
SELECT
*
FROM phoenix_view
WHERE ……
-- 在phoenix中创建的表,可以通过javaapi的原生方法写入,但在phoenix中select * 是查不出来数据的,但是用select 字段名  则可以查出数据
-- 此处的解决办法就是创建视图解决

-- 映射HBase中的表,wb该表通过hbase shell已经创建,现在在phoenix中创建视图
CREATE VIEW wb
(
ID varchar primary key,
C1.NAME varchar,
C1.ADDRESS varchar,
C1.SEX varchar,
C1.PAY_DATE varchar,
C1.NUM_CURRENT varchar,
C1.NUM_USAGE varchar,
C1.TOTAL_MONEY varchar,
C1.RECORD_DATE varchar,
C1.LATEST_DATE varchar
);

-- 映射Phoenix中的表
CREATE VIEW my_view ( new_col SMALLINT )
    AS SELECT * FROM my_table WHERE k = 100;

-- 映射到一个SQL查询
CREATE VIEW my_view_on_view
    AS SELECT * FROM my_view WHERE new_col > 70;

3)、建立ORDER_INFO的视图


-- 参考创建语句:
create view if not exists ORDER_INFO_V AS SELECT * FROM ORDER_INFO ;

0: jdbc:phoenix:server3:2118> create view if not exists ORDER_INFO_V AS SELECT * FROM ORDER_INFO ;
No rows affected (0.02 seconds)
0: jdbc:phoenix:server3:2118> select * from ORDER_INFO_V;
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
|                  ID                   | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    |  CATEGORY  |
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
| 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751  | 已付款     | 9380.0  | 1        | 2993700  | 2020-04-25 12:09:46  | 维修;手机;     |
| 2a01dfe5-f5dc-4140-b31b-a6ee27a6e51e  | 已付款     | 7490.0  | 2        | 7813118  | 2020-04-25 12:09:06  | 机票;文娱;     |
| 31477850-8b15-4f1b-9ec3-939f7dc47241  | 已付款     | 4650.0  | 2        | 5837271  | 2020-04-25 12:08:57  | 机票;文娱;     |
...

0: jdbc:phoenix:server3:2118> select * from ORDER_INFO_V LIMIT 1;
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
|                  ID                   | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    | CATEGORY  |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751  | 已付款     | 9380.0  | 1        | 2993700  | 2020-04-25 12:09:46  | 维修;手机;    |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.029 seconds)

2、开发基于SQL查询数据接口

默认情况下, 直接在 HBase 中创建的表通过 Phoenix 是查不到的 如果要在 Phoenix 中操作直接在 HBase 中创建的表,则需要在 Phoenix 中进行表的映射 映射方式有两种:

  • 视图映射。相比于直接创建映射表,视图的查询效率会低。 原因是创建映射表的时候,Phoenix 会在表中创建一些空的键值对,这些空键值对的存在可以用来提高查询效率。
  • 表映射。使用create table创建的关联表,如果对表进行了修改,源数据也会改变,同时如果关联表被删除,源表也会被删除。但是视图就不会,如果删除视图,源数据不会发生改变。

1)、建立MOMO_CHAT:MESSAGE視圖

由於我們在5、HBase使用(namespace、数据分区、rowkey设计、原生api访问hbase)上一節中已經建立好了表,本處僅僅是使用phoenix與hbase表進行映射。

-- 创建MOMO_CHAT:MESSAGE視圖
create view if not exists "MOMO_CHAT"."MESSAGE" (
    "pk" varchar primary key, -- 指定ROWKEY映射到主键
    "C1"."msg_time" varchar,
    "C1"."sender_nickyname" varchar,
    "C1"."sender_account" varchar,
    "C1"."sender_sex" varchar,
    "C1"."sender_ip" varchar,
    "C1"."sender_os" varchar,
    "C1"."sender_phone_type" varchar,
    "C1"."sender_network" varchar,
    "C1"."sender_gps" varchar,
    "C1"."receiver_nickyname" varchar,
    "C1"."receiver_ip" varchar,
    "C1"."receiver_account" varchar,
    "C1"."receiver_os" varchar,
    "C1"."receiver_phone_type" varchar,
    "C1"."receiver_network" varchar,
    "C1"."receiver_gps" varchar,
    "C1"."receiver_sex" varchar,
    "C1"."msg_type" varchar,
    "C1"."distance" varchar,
    "C1"."message" varchar
);

# 執行時出現phoenix .ReadOnlyTableException: ERROR 505 (42000): Table is read only異常
# 1、在hbase-site.xml文件中增加如下配置,並重新啓動hbase
	<!-- phoenix .ReadOnlyTableException: ERROR 505 (42000): Table is read only  -->
	<!-- Ensure that config phoenix.schema.isNamespaceMappingEnabled is consistent on client and server  -->
	<property>
      <name>phoenix.schema.isNamespaceMappingEnabled</name>
      <value>true</value>
    </property>
    <property>
      <name>phoenix.schema.mapSystemTablesToNamespace</name>
      <value>true</value>
    </property>
 # 2、同時將hbase-site.xml文件拷貝至phoenix的bin目錄下   
cp /usr/local/bigdata/hbase-2.1.0/conf/hbase-site.xml /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/bin/

# 3、再次執行創建視圖語句,成功。

0: jdbc:phoenix:server1:2118> create view if not exists "MOMO_CHAT"."MESSAGE" (
. . . . . . . . . . . . . . >     "pk" varchar primary key, -- 指定ROWKEY映射到主键
. . . . . . . . . . . . . . >     "C1"."msg_time" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_nickyname" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_account" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_sex" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_ip" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_os" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_phone_type" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_network" varchar,
. . . . . . . . . . . . . . >     "C1"."sender_gps" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_nickyname" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_ip" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_account" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_os" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_phone_type" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_network" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_gps" varchar,
. . . . . . . . . . . . . . >     "C1"."receiver_sex" varchar,
. . . . . . . . . . . . . . >     "C1"."msg_type" varchar,
. . . . . . . . . . . . . . >     "C1"."distance" varchar,
. . . . . . . . . . . . . . >     "C1"."message" varchar
. . . . . . . . . . . . . . > );
No rows affected (7.836 seconds)
0: jdbc:phoenix:server1:2118> !table
+------------+--------------+---------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----+
| TABLE_CAT  | TABLE_SCHEM  |  TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_TENANT  | VIEW_STATEMENT  | VIEW_TYPE  | INDEX_TYPE  | TRANSACTIONAL  | IS_ |
+------------+--------------+---------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----+
|            | SYSTEM       | CATALOG       | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | tru |
|            | SYSTEM       | FUNCTION      | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | tru |
|            | SYSTEM       | LOG           | SYSTEM TABLE  |          |            |                            |                 |              | true            | 32            | false         |                 |            |             | false          | tru |
|            | SYSTEM       | SEQUENCE      | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | tru |
|            | SYSTEM       | STATS         | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | tru |
|            |              | ORDER_INFO    | TABLE         |          |            |                            |                 |              | false           | 10            | false         |                 |            |             | false          | fal |
|            |              | ORDER_INFO_V  | VIEW          |          |            |                            |                 |              | false           | 10            | false         |                 | UPDATABLE  |             | false          | fal |
|            | MOMO_CHAT    | MESSAGE       | VIEW          |          |            |                            |                 |              | false           | null          | false         |                 | MAPPED     |             | false          | tru |
+------------+--------------+---------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----+
0: jdbc:phoenix:server1:2118> 

2)、使用SQL语句查询数据

根据日期、发送人账号、接收人账号查询历史消息

-- 查询对应日期的数据
List<Message> message = hbaseNativeChatMessageService.getMessage("2022-09-30", "13978325309", "15338014470");
log.info("message count={}",message.size());

INFO - message count=87

SELECT * FROM "MOMO_CHAT"."MESSAGE" T 
WHERE substr("msg_time", 0, 10) = '2022-09-30'
    AND T."sender_account" = '13978325309'
    AND T."receiver_account" = '15338014470' LIMIT 100;
    

0: jdbc:phoenix:server1:2118> SELECT * FROM "MOMO_CHAT"."MESSAGE" T 
. . . . . . . . . . . . . . > WHERE substr("msg_time", 0, 10) = '2022-09-30'
. . . . . . . . . . . . . . >     AND T."sender_account" = '13978325309'
. . . . . . . . . . . . . . >     AND T."receiver_account" = '15338014470' LIMIT 100;
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+-------------------+
|                       pk                        |       msg_time       | sender_nickyname  | sender_account  | sender_sex  |    sender_ip     |  sender_os   | sender_phone_type  | sender_network  |      sender_gps       | receiver_nickyname  |   receiver_ip     |
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+-------------------+
| 00005903_13978325309_15338014470_1664505396000  | 2022-09-30 10:36:36  | 许兴文               | 13978325309     | 男           | 215.205.47.66    | Android 6.0  | 一加 OnePlus         | 4G              | 114.058533,36.247553  | 叶忆南                 | 178.220.200.51    |
| 2b977e41_13978325309_15338014470_1664504375000  | 2022-09-30 10:19:35  | 伟飞鸿               | 13978325309     | 男           | 151.37.182.209   | IOS 12.0     | Apple iPhone XR    | 4G              | 120.607971,31.650084  | 瑞高飞                 | 104.73.66.176     |
| 591b649a_13978325309_15338014470_1664505936000  | 2022-09-30 10:45:36  | 池涉                | 13978325309     | 男           | 35.252.199.53    | Android 8.0  | 华为 荣耀9X            | 5G              | 108.392166,23.665665  | 鞠良骏                 | 196.75.11
....
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+-------------------+
87 rows selected (6.301 seconds)

3、编写Java代码

本部分也是在前面一篇文章的基础上实现的,即实现原来的接口。

1)、实现步骤

  • 编写PhoenixChatMessageService实现ChatMessageService接口
  • 在构造器中创建JDBC连接 a)、JDBC驱动为:PhoenixDriver.class.getName() b)、JDBC连接URL为:jdbc:phoenix:server1:2118
  • 基于JDBC实现getMessage查询
  • 在close方法中

2)、代码


package org.hbase.momo.service.impl;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;

import org.apache.phoenix.jdbc.PhoenixDriver;
import org.hbase.momo.bean.Message;
import org.hbase.momo.service.ChatMessageService;

public class PhoenixChatMessageService implements ChatMessageService {

	private Connection connection;

	public PhoenixChatMessageService() throws Exception {
		// 1. 加载驱动
		Class.forName(PhoenixDriver.class.getName());
		// 2. 获取JDBC连接
		connection = DriverManager.getConnection("jdbc:phoenix:server1:2118");
	}

	@Override
	public List<Message> getMessage(String date, String sender, String receiver) throws Exception {
		// 1. SQL语句
		String sql = "select * from \"MOMO_CHAT\".\"MESSAGE\" where substr(\"msg_time\", 0, 10) = ? and \"sender_account\" = ? and \"receiver_account\" = ?";

		// 2. 构建一个prepareStatement
		PreparedStatement preparedStatement = connection.prepareStatement(sql);

		// 3. 设置Prestatement对应的参数
		preparedStatement.setString(1, date);
		preparedStatement.setString(2, sender);
		preparedStatement.setString(3, receiver);

		// 4. 执行SQL语句,获取到一个ResultSet
		ResultSet resultSet = preparedStatement.executeQuery();

		ArrayList<Message> msgList = new ArrayList<>();

		// 5. 迭代ResultSet将数据封装在Msg里面
		while (resultSet.next()) {
			Message msg = new Message();
			msg.setMsg_time(resultSet.getString("msg_time"));
			msg.setSender_nickyname(resultSet.getString("sender_nickyname"));
			msg.setSender_account(resultSet.getString("sender_account"));
			msg.setSender_sex(resultSet.getString("sender_sex"));
			msg.setSender_ip(resultSet.getString("sender_ip"));
			msg.setSender_os(resultSet.getString("sender_os"));
			msg.setSender_phone_type(resultSet.getString("sender_phone_type"));
			msg.setSender_network(resultSet.getString("sender_network"));
			msg.setSender_gps(resultSet.getString("sender_gps"));
			msg.setReceiver_nickyname(resultSet.getString("receiver_nickyname"));
			msg.setReceiver_ip(resultSet.getString("receiver_ip"));
			msg.setReceiver_account(resultSet.getString("receiver_account"));
			msg.setReceiver_os(resultSet.getString("receiver_os"));
			msg.setReceiver_phone_type(resultSet.getString("receiver_phone_type"));
			msg.setReceiver_network(resultSet.getString("receiver_network"));
			msg.setReceiver_gps(resultSet.getString("receiver_gps"));
			msg.setReceiver_sex(resultSet.getString("receiver_sex"));
			msg.setMsg_type(resultSet.getString("msg_type"));
			msg.setDistance(resultSet.getString("distance"));
			msg.setMessage(resultSet.getString("message"));

			msgList.add(msg);
		}

		// 关闭资源
		resultSet.close();
		preparedStatement.close();

		return msgList;
	}

	@Override
	public void close() throws Exception {
		connection.close();
	}
}

3)、测试类

package org.hbase.momo;

import java.util.List;

import org.hbase.momo.bean.Message;
import org.hbase.momo.service.ChatMessageService;
import org.hbase.momo.service.impl.PhoenixChatMessageService;
import org.springframework.util.StopWatch;

import lombok.extern.slf4j.Slf4j;

/**
 * @author chenw
 *
 */
@Slf4j
public class TestApp {

	public static void main(String[] args) throws Exception {
		StopWatch clock = new StopWatch();
		clock.start(TestApp.class.getSimpleName());

//		ChatMessageService chatMessageService = new HBaseNativeChatMessageService();
		ChatMessageService chatMessageService = new PhoenixChatMessageService();
		// String msg_time, String sender, String receiver
//		MD5Hash_发件人账号_收件人账号_消息时间戳
		//13978325309_15338014470
		List<Message> message = chatMessageService.getMessage("2022-09-30", "13978325309", "15338014470");

		for (Message msg : message) {
			log.info("message:{}", msg);
		}
		log.info("message count={}",message.size());
//		log.info("message:{}", message);
		chatMessageService.close();

		clock.stop();
		log.info(clock.prettyPrint());
	}

}

#phoenix客戶端查詢耗時
 INFO - message:Message(msg_time=2022-09-30 10:31:56, sender_nickyname=凤茂彦, sender_account=13978325309, sender_sex=女, sender_ip=251.228.196.190, sender_os=IOS 9.0, sender_phone_type=华为 荣耀Play4T, sender_network=4G, sender_gps=99.487874,33.21035, receiver_nickyname=井采, receiver_ip=183.88.173.174, receiver_account=15338014470, receiver_os=IOS 9.0, receiver_phone_type=一加 OnePlus, receiver_network=4G, receiver_gps=96.397128,28.774328 , receiver_sex=男, msg_type=TEXT, distance=60.27KM, message=爱你,却要无欲无求,好难!爱你,却要偷偷摸摸,好累!爱你,却让自己心碎,好惨!但竟然心甘情愿,好傻!!!)
 INFO - message count=87
 INFO - StopWatch '': running time (millis) = 4149
-----------------------------------------
ms     %     Task name
-----------------------------------------
04149  100%  TestApp

#hbase原生客戶端查詢耗時
INFO - StopWatch '': running time (millis) = 9737
-----------------------------------------
ms     %     Task name
-----------------------------------------
09737  100%  TestApp

通過運行結果看,時間上大概是hbase原生查詢一半不到,hbase原生查詢時間是9737毫秒

如果程序運行出現如下異常,則需要將上述的hbase-site.xml配置拷貝至resources文件夾下,或直接在conf中配置屬性

	<property>
      <name>phoenix.schema.isNamespaceMappingEnabled</name>
      <value>true</value>
    </property>
    <property>
      <name>phoenix.schema.mapSystemTablesToNamespace</name>
      <value>true</value>
  </property>

Exception in thread "main" java.sql.SQLException: ERROR 726 (43M10): Inconsistent namespace mapping properties. Cannot initiate connection as SYSTEM:CATALOG is found but client does not have phoenix.schema.isNamespaceMappingEnabled enabled

4、二级索引

索引具体介绍参见上文。

1)、索引创建语法

  • 全局索引
CREATE INDEX 索引名称 ON 表名 (列名1, 列名2, 列名3...)
  • 本地索引
CREATE local INDEX 索引名称 ON 表名 (列名1, 列名2, 列名3...)
  • 覆盖索引
CREATE INDEX my_index ON my_table (v1,v2) INCLUDE(v3)
  • 函数索引
-- 创建索引
CREATE INDEX UPPER_NAME_IDX ON EMP (UPPER(FIRST_NAME||' '||LAST_NAME))
-- 以下查询会走索引
SELECT EMP_ID FROM EMP WHERE UPPER(FIRST_NAME||' '||LAST_NAME)='JOHN DOE'

2)、索引示例一:创建全局索引 + 覆盖索引

根据用户ID(USER_ID)来查询订单的ID(RowKey)以及对应的支付金额(MOENEY)。 例如:查询已付款的订单ID和支付金额 此时,就可以在USER_ID列上创建索引,来加快查询

  • 创建索引
-- 创建索引
create index GBL_IDX_ORDER_INFO on order_info (C1."USER_ID") INCLUDE("ID", C1."MONEY");


#創建時可能會出現如下異常 Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=GBL_IDX_ORDER_INFO (state=42Y88,code=1029)

#解決辦法,在hbase-site.xml文件中增加如下配置,並將該文件複製到其他的hbase集群中,并且也拷貝到phoenix的bin目錄下,然後重啓hbase

<!-- 創建二級索引需要的配置 -->
<property> 
	  <name>hbase.regionserver.wal.codec</name> 
  	  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> 
</property>

重新创建索引,成功

0: jdbc:phoenix:server2:2118> create index GBL_IDX_ORDER_INFO on order_info (C1."USER_ID") INCLUDE("ID", C1."MONEY");
66 rows affected (7.594 seconds)

scan "GBL_IDX_ORDER_INFO", { LIMIT  => 1}

可以在HBase shell中看到,Phoenix自动帮助我们创建了一张GBL_IDX_ORDER_INFO的表。这种表就是一张索引表。 它的数据如下:

hbase(main):004:0> scan "GBL_IDX_ORDER_INFO", { LIMIT  => 1}
ROW                                                                       COLUMN+CELL                                                                                                                                                                                                            
 \x001274270\x003d2254bd-c25a-404f-8e42-2faa4929a629                      column=C1:\x00\x00\x00\x00, timestamp=1664524735818, value=x                                                                                                                                                           
 \x001274270\x003d2254bd-c25a-404f-8e42-2faa4929a629                      column=C1:\x80\x0B, timestamp=1664524735818, value=\xC5\x9C@\x01                                                                                                                                                       
1 row(s)
Took 0.0135 seconds

# 这张表的ROWKEY为: \x00 +用户ID + \x00 + 原始表ROWKEY,列簇对应的就是include中指定的两个字段。

0: jdbc:phoenix:server2:2118> select * from order_info where USER_ID='1274270';
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
|                  ID                   | STATUS  |  MONEY  | PAY_WAY  | USER_ID  |    OPERATION_TIME    | CATEGORY  |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| 3d2254bd-c25a-404f-8e42-2faa4929a629  | 已完成     | 5000.0  | 1        | 1274270  | 2020-04-25 12:08:43  | 男装;男鞋;    |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.087 seconds)


# 查询数据
select user_id, id, money from order_info where user_id = '1274270';

0: jdbc:phoenix:server2:2118> select user_id, id, money from order_info where user_id = '1274270';
+----------+---------------------------------------+---------+
| USER_ID  |                  ID                   |  MONEY  |
+----------+---------------------------------------+---------+
| 1274270  | 3d2254bd-c25a-404f-8e42-2faa4929a629  | 5000.0  |
+----------+---------------------------------------+---------+
1 row selected (0.041 seconds)

# 查看执行计划
explain select user_id, id, money from order_info where user_id = '1274270';

0: jdbc:phoenix:server2:2118> explain select user_id, id, money from order_info where user_id = '1274270';
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                     PLAN                                                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN RANGE SCAN OVER GBL_IDX_ORDER_INFO [0,'1274270'] - [9,'1274270']  | null            | null           | null         |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.027 seconds)

# PLAN中能看到SCAN的是GBL_IDX_ORDER_INFO,说明Phoenix是直接通过查询索引表获取到数据。
  • 删除索引 使用drop index 索引名 ON 表名
drop index GBL_IDX_ORDER_INFO on ORDER_INFO;
  • 查看索引
!table

0: jdbc:phoenix:server2:2118> !table
+------------+--------------+---------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----------------------+
| TABLE_CAT  | TABLE_SCHEM  |     TABLE_NAME      |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_TENANT  | VIEW_STATEMENT  | VIEW_TYPE  | INDEX_TYPE  | TRANSACTIONAL  | IS_NAMESPACE_MAPPED   |
+------------+--------------+---------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----------------------+
|            |              | GBL_IDX_ORDER_INFO  | INDEX         |          |            |                            |                 | ACTIVE       | false           | 10            | false         |                 |            | GLOBAL      | false          | false                 |
|            | SYSTEM       | CATALOG             | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | true                  |
|            | SYSTEM       | FUNCTION            | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | true                  |
|            | SYSTEM       | LOG                 | SYSTEM TABLE  |          |            |                            |                 |              | true            | 32            | false         |                 |            |             | false          | true                  |
|            | SYSTEM       | SEQUENCE            | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | true                  |
|            | SYSTEM       | STATS               | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |             | false          | true                  |
|            |              | ORDER_INFO          | TABLE         |          |            |                            |                 |              | false           | 10            | false         |                 |            |             | false          | false                 |
|            |              | ORDER_INFO_V        | VIEW          |          |            |                            |                 |              | false           | 10            | false         |                 | UPDATABLE  |             | false          | false                 |
|            | MOMO_CHAT    | MESSAGE             | VIEW          |          |            |                            |                 |              | false           | null          | false         |                 | MAPPED     |             | false          | true                  |
+------------+--------------+---------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----------------------+
0: jdbc:phoenix:server2:2118> 
  • 测试查询所有列是否会使用索引
explain select * from order_info where user_id = '1274270';

0: jdbc:phoenix:server2:2118> explain select * from order_info where user_id = '1274270';
+------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                  PLAN                                  | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO  | null            | null           | null         |
|     SERVER FILTER BY C1.USER_ID = '1274270'                            | null            | null           | null         |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.027 seconds)
 
# 通过查询结果发现,PLAN中是执行的FULL SCAN,说明索引并没有生效,进行的全表扫描。
  • 使用Hint强制使用索引
explain select /*+ INDEX(order_info GBL_IDX_ORDER_INFO) */ * from order_info where user_id = '1274270';

0: jdbc:phoenix:server2:2118> explain select /*+ INDEX(order_info GBL_IDX_ORDER_INFO) */ * from order_info where user_id = '1274270';
+-----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                         PLAN                                                          | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO                                                 | null            | null           | null         |
|     SKIP-SCAN-JOIN TABLE 0                                                                                            | null            | null           | null         |
|         CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN RANGE SCAN OVER GBL_IDX_ORDER_INFO [0,'1274270'] - [9,'1274270']  | null            | null           | null         |
|             SERVER FILTER BY FIRST KEY ONLY                                                                           | null            | null           | null         |
|     DYNAMIC SERVER FILTER BY "ORDER_INFO.ID" IN ($11.$13)                                                             | null            | null           | null         |
+-----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.042 seconds)

# 通过执行计划,我们可以观察到查看全局索引,找到ROWKEY,然后执行全表的JOIN,其实就是把对应ROWKEY去查询ORDER_INFO表。

3)、索引示例二:创建本地索引

根据订单ID、订单状态、支付金额、支付方式、用户ID来查询订单 针对这种场景,我们可以使用本地索引来提高查询效率 注意:创建表的时候指定了SALT_BUCKETS,是不支持本地索引的 以下操作是基於下面創建的表結構,用於驗證本地索引是否支持salt_buckets的創建表方式

create table if not exists order_info(
    ID varchar primary key,
    C1.STATUS varchar,
    C1.MONEY float,
    C1.PAY_WAY integer,
    C1.USER_ID varchar,
    C1.OPERATION_TIME varchar,
    C1.CATEGORY varchar
)
CONPRESSION='GZ', SALT_BUCKETS=10;
  • 创建本地索引
create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id) ;

#創建本地索引時會出現以下異常 Error: ERROR 102 (08001): Malformed connection url. ERROR 102 (08001): ERROR 102 (08001): Malformed connection url. :server1:2118,server2:2118,server3:2118:2181:/hbase; LOCAL_IDX_ORDER_INFO (state=08001,code=102) java.sql.SQLException: ERROR 102 (08001): Malformed connection url. ERROR 102 (08001): ERROR 102 (08001): Malformed connection url. :server1:2118,server2:2118,server3:2118:2181:/hbase; LOCAL_IDX_ORDER_INFO

#解決辦法 #將hbase-site.xml文件中的關於zookeeper的配置進行修改 由

 <!-- ZooKeeper的地址 -->
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>server1:2118,server2:2118,server3:2118</value>
	</property>

修改為

<!-- ZooKeeper的地址 -->
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>server1,server2,server3</value>
	</property>
	<property>
		<name>hbase.zookeeper.property.clientPort</name>
		<value>2118</value>
	</property>
	
	#分發hbase-site.xml至hbase集群,並複製到phoenix的bin目錄下,重啓hbase和phoenix

  • 再次執行創建本地索引
0: jdbc:phoenix:server2:2118> create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id) ;
66 rows affected (5.938 seconds)

通过查看WebUI,我们并没有发现创建名为:LOCAL_IDX_ORDER_INFO 的表。 那索引数据是存储在哪儿呢?我们可以通过HBase shell

hbase(main):002:0> scan 'ORDER_INFO' ,{LIMIT => 1}
ROW                                                                       COLUMN+CELL                                                                                                                                                                                                            
 \x00\x00\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751\x00\xE5\xB7\xB2\xE4\xBB column=L#0:\x00\x00\x00\x00, timestamp=1664449069580, value=\x00\x00\x00\x00                                                                                                                                           
 \x98\xE6\xAC\xBE\x00\xC2^Q\x00\xC1\x02\x002993700                                                                                                                                                                                                                                               
1 row(s)
Took 0.4211 seconds 

可以看到Phoenix对数据进行处理,原有的数据发生了变化。

建立了二级索引,具有一定地侵入性,原有的数据Phoenix会编码后存储。但当我们执行get/put/delete请求时,按正常理解是不能操作数据的。Phoenix建立视图时,会在表上创建若干的协处理器。协处理会负责接收客户端发来的请求,get/put/delete/scan,然后进行解码操作,最后我们发现仍然可以操作数据。

  • 查看数据
explain select * from ORDER_INFO WHERE status = '已提交';
explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;

0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交';
+------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                  PLAN                                  | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO  | null            | null           | null         |
|     SERVER FILTER BY C1.STATUS = '已提交'                                 | null            | null           | null         |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.057 seconds)
0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;
+------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                  PLAN                                  | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO  | null            | null           | null         |
|     SERVER FILTER BY (C1.STATUS = '已提交' AND C1.PAY_WAY = 1)            | null            | null           | null         |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.033 seconds)

通过观察上面的两个执行计划发现,两个查询都是通过FULL SCAN來實現的,沒有走索引,説明本地索引未生效。

  • 重新創建表
drop table if exists order_info;
create table order_info(
    ID varchar primary key,
    C1.STATUS varchar,
    C1.MONEY float,
    C1.PAY_WAY integer,
    C1.USER_ID varchar,
    C1.OPERATION_TIME varchar,
    C1.CATEGORY varchar
)
CONPRESSION='GZ'
SPLIT ON ('3','5','7');

創建表之前可能需要將該表的相關索引、視圖等要先刪除

0: jdbc:phoenix:server2:2118> create table if not exists order_info(
. . . . . . . . . . . . . . > ID varchar primary key,
. . . . . . . . . . . . . . > C1.STATUS varchar,
. . . . . . . . . . . . . . > C1.MONEY float,
. . . . . . . . . . . . . . > C1.PAY_WAY integer,
. . . . . . . . . . . . . . > C1.USER_ID varchar,
. . . . . . . . . . . . . . > C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . > C1.CATEGORY varchar
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > CONPRESSION='GZ'
. . . . . . . . . . . . . . > SPLIT ON ('3','5','7');
  • 重新創建本地索引

注意:創建表后,需要先添加數據后,再建索引,否則會出現如下異常: org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed to build index for unexpected reason! 據説是hbase與phoenix版本不兼容造成的,本示例使用的hbase2.1.0與phoenix5.0.0.但其他的版本没有验证。 phoenix官方版本说明:Current release 5.1.2 can run on Apache HBase 2.1, 2.2, 2.3 and 2.4。

create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id);
.。。。
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f642b16b-eade-4169-9eeb-4d5f294ec594','已付款',4010,1,'6463215','2020-04-25 12:09:33','男鞋;汽车;');
1 row affected (0.003 seconds)
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183845','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
1 row affected (0.004 seconds)
0: jdbc:phoenix:server2:2118> create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id);
66 rows affected (5.057 seconds)

以下示例創建索引前後插入數據所展示的情況(創建前插入數據沒有異常,創建後再插入數據出現異常) hbase2.1.0对应的phoenix的版本是5.1.2(phoenix-hbase-2.1-5.1.2-bin.tar.gz,该版本部署之前有关于该部分的说明),验证通过

0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183845','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
1 row affected (0.004 seconds)
0: jdbc:phoenix:server2:2118> create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id);
66 rows affected (5.057 seconds)
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183846','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
22/10/08 09:18:52 WARN client.AsyncRequestFutureImpl: id=1, table=ORDER_INFO, attempt=1/16, failed=1ops, last exception=org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed to build index for unexpected reason!
  • 重新查看數據
explain select * from ORDER_INFO WHERE status = '已提交';
explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;

0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交';
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                   PLAN                                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER ORDER_INFO [2]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND "STATUS" = '已提交'                  | null            | null           | null         |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.024 seconds)
0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                           PLAN                                            | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER ORDER_INFO [2]                  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND ("STATUS" = '已提交' AND TO_INTEGER("PAY_WAY") = 1)  | null            | null           | null         |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.022 seconds)

通过观察上面的两个执行计划发现,两个查询都是通过RANGE SCAN来实现的。说明本地索引生效。

  • 删除本地索引
drop index LOCAL_IDX_ORDER_INFO  on ORDER_INFO ;

0: jdbc:phoenix:server2:2118> drop index LOCAL_IDX_ORDER_INFO  on ORDER_INFO ;
66 rows affected (0.923 seconds)
  • 重新执行一次扫描,你会发现数据恢复出来了。
hbase(main):004:0> scan 'ORDER_INFO' , {LIMIT => 1}
ROW                                                                       COLUMN+CELL                                                                                                                                                                                                            
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x00\x00\x00\x00, timestamp=1664449069580, value=x                                                                                                                                                           
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x80\x0B, timestamp=1664449069580, value=\xE5\xB7\xB2\xE4\xBB\x98\xE6\xAC\xBE                                                                                                                                
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x80\x0C, timestamp=1664449069580, value=\xC6\x12\x90\x01                                                                                                                                                    
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x80\x0D, timestamp=1664449069580, value=\x80\x00\x00\x01                                                                                                                                                    
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x80\x0E, timestamp=1664449069580, value=2993700                                                                                                                                                             
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x80\x0F, timestamp=1664449069580, value=2020-04-25 12:09:46                                                                                                                                                 
 \x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751                                 column=C1:\x80\x10, timestamp=1664449069580, value=\xE7\xBB\xB4\xE4\xBF\xAE;\xE6\x89\x8B\xE6\x9C\xBA;                                                                                                                  
1 row(s)
Took 0.0300 seconds

4)、索引示例三:创建本地函数索引

  • 创建本地函数索引
CREATE LOCAL INDEX LOCAL_IDX_MOMO_MESSAGE ON MOMO_CHAT.MESSAGE(substr("msg_time", 0, 10), "sender_account", "receiver_account");

0: jdbc:phoenix:server2:2118> CREATE LOCAL INDEX LOCAL_IDX_MOMO_MESSAGE ON MOMO_CHAT.MESSAGE(substr("msg_time", 0, 10), "sender_account", "receiver_account");
976,135 rows affected (24.163 seconds)
0: jdbc:phoenix:server2:2118> select count(*) from "MOMO_CHAT"."MESSAGE";
+-----------+
| COUNT(1)  |
+-----------+
| 976135    |
+-----------+
1 row selected (0.844 seconds)
  • 执行数据查询

没有使用函数索引,程序查询的耗时

List<Message> message = chatMessageService.getMessage("2022-09-30", "13978325309", "15338014470");
INFO - message count=87
INFO - StopWatch '': running time (millis) = 4149

基于函数索引查询

SELECT * FROM "MOMO_CHAT"."MESSAGE" T
WHERE substr("msg_time", 0, 10) = '2022-09-30'
    AND T."sender_account" = '13978325309'
    AND T."receiver_account" = '15338014470' LIMIT 100;
    
0: jdbc:phoenix:server2:2118> SELECT * FROM "MOMO_CHAT"."MESSAGE" T
. . . . . . . . . . . . . . > WHERE substr("msg_time", 0, 10) = '2022-09-30'
. . . . . . . . . . . . . . >     AND T."sender_account" = '13978325309'
. . . . . . . . . . . . . . >     AND T."receiver_account" = '15338014470' LIMIT 100;
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+------------------+-------------------+----+
|                       pk                        |       msg_time       | sender_nickyname  | sender_account  | sender_sex  |    sender_ip     |  sender_os   | sender_phone_type  | sender_network  |      sender_gps       | receiver_nickyname  |   receiver_ip    | receiver_account  | re |
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+------------------+-------------------+----+
| 00005903_13978325309_15338014470_1664505396000  | 2022-09-30 10:36:36  | 许兴文               | 13978325309     | 男           | 215.205.47.66    | Android 6.0  | 一加 OnePlus         | 4G              | 114.058533,36.247553  | 叶忆南                 | 178.220.200.51   | 15338014470    
...
87 rows selected (0.33 seconds)

0: jdbc:phoenix:server2:2118> explain SELECT * FROM "MOMO_CHAT"."MESSAGE" T
. . . . . . . . . . . . . . > WHERE substr("msg_time", 0, 10) = '2022-09-30'
. . . . . . . . . . . . . . >     AND T."sender_account" = '13978325309'
. . . . . . . . . . . . . . >     AND T."receiver_account" = '15338014470' LIMIT 100;
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                                    PLAN                                                                     | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 6-CHUNK 100 ROWS 6700 BYTES SERIAL 6-WAY ROUND ROBIN RANGE SCAN OVER MOMO_CHAT:MESSAGE [1,'2022-09-30','13978325309','15338014470']  | 6700            | 100            | 0            |
|     SERVER FILTER BY FIRST KEY ONLY                                                                                                         | 6700            | 100            | 0            |
|     SERVER 100 ROW LIMIT                                                                                                                    | 6700            | 100            | 0            |
| CLIENT 100 ROW LIMIT                                                                                                                        | 6700            | 100            | 0            |
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.06 seconds)

可以看到,查询速度非常快,0.33秒就查询出来了数据

五、数据分区示例

1、rowkey 是整数

create table test1(
rk integer primary key,
      C1.id varchar,
      C1.name varchar
)
CONPRESSION='GZ'
SPLIT ON (10,20,30,40);

Region 1 : row key 的前两位是 min~10
Region 2 : row key 的前两位是 10~20
Region 3 : row key 的前两位是 20~30
Region 4 : row key 的前两位是 30~40
Region 5 : row key 的前两位是 40~max
--第一个分区
UPSERT INTO test1 VALUES(9,'a1','alan2452');
--第二个分区
UPSERT INTO test1 VALUES(10,'a1','alan2452');
UPSERT INTO test1 VALUES(11,'a1','alan2452');
UPSERT INTO test1 VALUES(19,'a1','alan2452');
--第三个分区
UPSERT INTO test1 VALUES(20,'a1','alan2452');
UPSERT INTO test1 VALUES(21,'a1','alan2452');
UPSERT INTO test1 VALUES(29,'a1','alan2452');
--第四个分区
UPSERT INTO test1 VALUES(30,'a1','alan2452');
UPSERT INTO test1 VALUES(31,'a1','alan2452');
UPSERT INTO test1 VALUES(39,'a1','alan2452');
--第五个分区
UPSERT INTO test1 VALUES(40,'a1','alan2452');
UPSERT INTO test1 VALUES(41,'a1','alan2452');

在这里插入图片描述

2、rowkey是字符串

create table test(
rk varchar primary key,
      C1.id varchar,
      C1.name varchar
)
CONPRESSION='GZ'
SPLIT ON ('aa','bb','cc');

--第0个分区
UPSERT INTO test VALUES('a1','a1','alan2452');
--第1个分区
UPSERT INTO test VALUES('aa1','aa1','alan2452');
UPSERT INTO test VALUES('ab1','ab1','alan2452');
UPSERT INTO test VALUES('ba1','ba1','alan2452');
UPSERT INTO test VALUES('b1','b1','alan2452');
--第2个分区
UPSERT INTO test VALUES('bb1','bb1','alan2452');
UPSERT INTO test VALUES('bc1','bc1','alan2452');
UPSERT INTO test VALUES('c1','c1','alan2452');
UPSERT INTO test VALUES('cb','cb','alan2452');
--第3个分区
UPSERT INTO test VALUES('cc','cc','alan2452');
UPSERT INTO test VALUES('cc1','cc1','alan2452');

在这里插入图片描述

【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月08日 0

暂无评论

推荐阅读
  KRe60ogUm4le   2024年05月31日   101   0   0 flink大数据
  KRe60ogUm4le   2024年05月31日   34   0   0 flink大数据
nNPyvzOmRTFq
最新推荐 更多

2024-05-31