Rows估算错误导致SQL性能偏差_使用扩展统计信息纠正执行计划-摩杜云开发者社区

问题概述

SELECT ROUND( SUM (COALESCE(station_month.maintain_labor_cost,0) + ............ COALESCE(station_month.employee_compensation,0)), 2) AS totalCost, ROUND( SUM ( info.ast_original_value ), 2 ) AS totalAstValue FROM powerX.cost_XXXXX_pl_info info INNER JOIN powerX.cost_XXXXX_pl_month station_month ON info.psr_id = station_month.psr_id AND info.ast_id = station_month.ast_id where info.professional_type = '01' AND info.equip_type = 'xl' AND station_month.sun_year_month in ( '202301', '202302','202303','202304' )

执行计划

Aggregate (cost=4938.02..4938.03 rows=1 width=64) (actual time=85.167..85.169 rows=1 loops=1) Buffers: shared hit=44893 -> Nested Loop (cost=13.38..4937.97 rows=1 width=126) (actual time=0.720..82.459 rows=1343 loops=1) Buffers: shared hit=44893 -> Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1) Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Heap Blocks: exact=764 Buffers: shared hit=786 -> Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional (cost=0.00..12.85 rows=443 width=0) (actual time=0.614..0.614 rows=7013 loops=1) Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Buffers: shared hit=22 -> Index Scan using idx_cost_XXXXX_pl_month_idx_prsid_astid on cost_XXXXX_pl_month station_month (cost=0.42..7.93 rows=1 width=176) (actual time=0.011..0.011 rows=0 loops=7013) Index Cond: (((psr_id)::text = (info.psr_id)::text) AND ((ast_id)::text = (info.ast_id)::text)) Filter: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[])) Rows Removed by Filter: 3 Buffers: shared hit=44107 Planning Time: 0.667 ms Execution Time: 85.243 ms

有同学疑问为什么85ms的SQL还需要优化，因为整体逻辑读4.5W。首次查询近90%需要采用物理读，此时SQL需要近5S左右，还是比较慢的。我们需要优化。

Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1) Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Heap Blocks: exact=764 Buffers: shared hit=786

细心的同学会发现执行计划中估算的rows 443，而实际情况 rows = 7013。而这SQL的性能“杀手“。

疑问为什么估算的rows会差异如此之多？

是不是本次统计信息过期导致rows评估偏差过多？经过排查后发现统计信息并没有过期。

如：

根据SQL条件测试

分别单独测试 (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) 两个条件

执行计划：

Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=129.15..7439.87 rows=6804 width=723) (actual time=219.819..221.361 rows=7013 loops=1) Recheck Cond: ((equip_type)::text = 'xl'::text) Heap Blocks: exact=764 Buffers: shared hit=764 read=22 I/O Timings: read=219.260 -> Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional (cost=0.00..127.45 rows=6804 width=0) (actual time=219.735..219.736 rows=7013 loops=1) Index Cond: ((equip_type)::text = 'xl'::text) Buffers: shared read=22 I/O Timings: read=219.260 Planning Time: 0.126 ms Execution Time: 221.680 ms

单独测试第二个条件

Seq Scan on cost_XXXXX_pl_info info (cost=0.00..8328.08 rows=6804 width=723) (actual time=0.007..22.252 rows=7014 loops=1) Filter: ((professional_type)::text = '01'::text) Rows Removed by Filter: 97392 Buffers: shared hit=7023 Planning Time: 0.117 ms Execution Time: 22.569 ms

发现估算的rows 精确度都比较高，为啥两个一起组合rows偏差如此之大呢？

问题原因

查看rows计算逻辑

with taba as ( select schemaname, tablename, attname, null_frac, array_to_string( most_common_vals ,',') most_common_vals , array_to_string( most_common_freqs ,',') most_common_freqs from pg_stats where tablename = 'cost_XXXXX_pl_info' and attname in( 'professional_type','equip_type') ) select schemaname, tablename, t.reltuples, attname, null_frac, regexp_split_to_table(most_common_vals, ',') vals, regexp_split_to_table(most_common_freqs, ',') cardinality, round( regexp_split_to_table(most_common_freqs, ',')::float * (t.reltuples - null_frac)) rowss from taba inner join pg_class t on taba.tablename = t.relname and t.relnamespace::regnamespace::text = taba.schemaname;

Schemaname \| tablename \|reltuples \|attname \| null_frac \| vals \| most_common_freqs \|rowss powerX cost_XXXXX_pl_info 104406 equip_type 0 dytq 0.7334 76571 powerX cost_XXXXX_pl_info 104406 equip_type 0 dkx 0.1727 18031 powerX cost_XXXXX_pl_info 104406 equip_type 0 xl 0.06516667 6804 powerX cost_XXXXX_pl_info 104406 equip_type 0 zf01 0.019 1984 powerX cost_XXXXX_pl_info 104406 equip_type 0 zf06 0.0059666666 623 powerX cost_XXXXX_pl_info 104406 equip_type 0 zf08 0.0025 261 powerX cost_XXXXX_pl_info 104406 equip_type 0 zf04 0.0010333334 108 powerX cost_XXXXX_pl_info 104406 professional_type 0 03 0.91566664 95601 powerX cost_XXXXX_pl_info 104406 professional_type 0 01 0.06516667 6804 powerX cost_XXXXX_pl_info 104406 professional_type 0 02 0.0191 1994
而条件 6804 正是单一条件的执行计划中估算rows。计算方式： (reltuples - null_frac ) * most_common_freqs = 104406 * 0.06516667 = 6804 同理两个条件的rows估算逻辑：104406 * 0.06516667 0.06516667 正是两个过滤条件的rows

验证两个过滤条件的rows

Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1) Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Heap Blocks: exact=764 Buffers: shared hit=786

解决方案

如何纠正rows评估差异问题？根据官方文档通过创建扩展统计信息

创建扩展统计信息

create statistics stat_p_e ( dependencies ) on professional_type, equip_type from powerX.cost_XXXXX_pl_info; --收集统计信息 Analyse powerX.cost_XXXXX_pl_info;

查看两个过滤条件的执行计划

---- rows 估算正确 Bitmap Heap Scan on cost_stationline_pl_info info (cost=146.09..7472.62 rows=6797 width=723) (actual time=0.730..2.339 rows=7013 loops=1) Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Heap Blocks: exact=764 Buffers: shared hit=786 -> Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional (cost=0.00..144.39 rows=6797 width=0) (actual time=0.648..0.649 rows=7013 loops=1) Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Buffers: shared hit=22 Planning Time: 0.144 ms Execution Time: 2.657 ms

优化效果

Aggregate (cost=14662.17..14662.18 rows=1 width=64) (actual time=16.336..16.339 rows=1 loops=1) Buffers: shared hit=1236 (cost=7574.99..14662.12 rows=1 width=126) (actual time=11.282..13.865 rows=1343 loops=1) Hash Cond: (((station_month.psr_id)::text = (info.psr_id)::text) AND ((station_month.ast_id)::text = (info.ast_id)::text)) Buffers: shared hit=1236 -> Index Scan using cost_stationline_pl_month_idx_sun_year_month on cost_stationline_pl_month station_month (cost=0.42..7070.73 rows=3202 width=176) (actual time=0.063..1.184 rows=3235 loops=1) Index Cond: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[])) Buffers: shared hit=450 I/O Timings: read=290.886 -> Hash (cost=7472.62..7472.62 rows=6797 width=78) (actual time=11.190..11.191 rows=7013 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 620kB Buffers: shared hit=786 -> Bitmap Heap Scan on cost_stationline_pl_info info (cost=146.09..7472.62 rows=6797 width=78) (actual time=2.371..8.550 rows=7013 loops=1) Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Heap Blocks: exact=764 Buffers: shared hit=786 -> Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional (cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1) Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) Buffers: shared hit=22 Planning Time: 1.853 ms Execution Time: 16.453 ms

对比：

	Rows评估	执行计划	执行成本
优化前	(cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)	Nested Loop	shared hit=44893 85ms
优化后	(cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)	Hash join	shared hit=1236 16ms