Rows估算错误导致SQL性能偏差_使用扩展统计信息纠正执行计划
  IE5LYMWlmdvL 2023年11月13日 27 0

问题概述

SELECT  ROUND( SUM (COALESCE(station_month.maintain_labor_cost,0) + ............

COALESCE(station_month.employee_compensation,0)), 2) AS totalCost,  ROUND( SUM ( info.ast_original_value ), 2 ) AS totalAstValue

FROM

powerX.cost_XXXXX_pl_info info

INNER JOIN powerX.cost_XXXXX_pl_month station_month ON info.psr_id = station_month.psr_id

AND info.ast_id = station_month.ast_id

  where  info.professional_type = '01'

  AND info.equip_type = 'xl'

AND station_month.sun_year_month  in ( '202301', '202302','202303','202304'  )

执行计划

Aggregate  (cost=4938.02..4938.03 rows=1 width=64) (actual time=85.167..85.169 rows=1 loops=1)

  Buffers: shared hit=44893

  ->  Nested Loop  (cost=13.38..4937.97 rows=1 width=126) (actual time=0.720..82.459 rows=1343 loops=1)

        Buffers: shared hit=44893

        ->  Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

              Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

              Heap Blocks: exact=764

              Buffers: shared hit=786

              ->  Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional  (cost=0.00..12.85 rows=443 width=0) (actual time=0.614..0.614 rows=7013 loops=1)

                    Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

                    Buffers: shared hit=22

        ->  Index Scan using idx_cost_XXXXX_pl_month_idx_prsid_astid on cost_XXXXX_pl_month station_month  (cost=0.42..7.93 rows=1 width=176) (actual time=0.011..0.011 rows=0 loops=7013)

              Index Cond: (((psr_id)::text = (info.psr_id)::text) AND ((ast_id)::text = (info.ast_id)::text))

              Filter: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[]))

              Rows Removed by Filter: 3

              Buffers: shared hit=44107

Planning Time: 0.667 ms

Execution Time: 85.243 ms

有同学疑问 为什么85ms的SQL还需要优化, 因为整体逻辑读4.5W。 首次查询近90%需要采用物理读,此时SQL需要近5S左右, 还是比较慢的。 我们需要优化。

Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

 Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

              Heap Blocks: exact=764

              Buffers: shared hit=786

细心的同学会发现执行计划中 估算的rows 443, 而实际情况 rows = 7013。 而这SQL的性能“杀手“。

疑问为什么估算的rows会差异如此之多?

是不是本次统计信息过期导致rows评估偏差过多?经过排查后发现统计信息并没有过期。

如:

根据SQL条件测试

分别单独测试 (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text)) 两个条件

执行计划:

Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=129.15..7439.87 rows=6804 width=723) (actual time=219.819..221.361 rows=7013 loops=1)

  Recheck Cond: ((equip_type)::text = 'xl'::text)

  Heap Blocks: exact=764

  Buffers: shared hit=764 read=22

  I/O Timings: read=219.260

  ->  Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional  (cost=0.00..127.45 rows=6804 width=0) (actual time=219.735..219.736 rows=7013 loops=1)

        Index Cond: ((equip_type)::text = 'xl'::text)

        Buffers: shared read=22

        I/O Timings: read=219.260

Planning Time: 0.126 ms

Execution Time: 221.680 ms

单独测试第二个条件

Seq Scan on cost_XXXXX_pl_info info  (cost=0.00..8328.08 rows=6804 width=723) (actual time=0.007..22.252 rows=7014 loops=1)

  Filter: ((professional_type)::text = '01'::text)

  Rows Removed by Filter: 97392

  Buffers: shared hit=7023

Planning Time: 0.117 ms

Execution Time: 22.569 ms

发现估算的rows 精确度都比较高,为啥两个一起组合rows偏差如此之大呢?

问题原因

查看rows计算逻辑

with taba as (

 select   schemaname, tablename, attname, null_frac,     array_to_string(  most_common_vals  ,',')   most_common_vals ,

 array_to_string(  most_common_freqs  ,',')   most_common_freqs

 from   pg_stats where tablename = 'cost_XXXXX_pl_info' and attname in( 'professional_type','equip_type')

 ) select  schemaname, tablename, t.reltuples,  attname, null_frac,    

    regexp_split_to_table(most_common_vals, ',') vals,  regexp_split_to_table(most_common_freqs, ',')  cardinality,

  round( regexp_split_to_table(most_common_freqs, ',')::float *  (t.reltuples  - null_frac))   rowss

from  taba  

inner join  pg_class t on taba.tablename = t.relname  and  t.relnamespace::regnamespace::text = taba.schemaname;

Schemaname  | tablename  |reltuples    |attname   |     null_frac |   vals |  most_common_freqs  |rowss

powerX cost_XXXXX_pl_info 104406 equip_type 0 dytq 0.7334 76571

powerX cost_XXXXX_pl_info 104406 equip_type 0 dkx 0.1727 18031

powerX cost_XXXXX_pl_info 104406 equip_type 0 xl 0.06516667 6804

powerX cost_XXXXX_pl_info 104406 equip_type 0 zf01 0.019      1984

powerX cost_XXXXX_pl_info 104406 equip_type 0 zf06 0.0059666666 623

powerX cost_XXXXX_pl_info 104406 equip_type 0 zf08 0.0025 261

powerX cost_XXXXX_pl_info 104406 equip_type 0 zf04 0.0010333334 108

powerX cost_XXXXX_pl_info 104406 professional_type 0 03 0.91566664 95601

powerX cost_XXXXX_pl_info 104406 professional_type 0 01 0.06516667 6804

powerX cost_XXXXX_pl_info 104406 professional_type 0 02 0.0191 1994

而条件 6804 正是单一条件的执行计划中 估算rows。

计算方式: (reltuples  -   null_frac )  * most_common_freqs  =  104406  * 0.06516667  =  6804  

同理两个条件的rows估算逻辑:104406 * 0.06516667 0.06516667 正是两个过滤条件的rows

验证两个过滤条件的rows 

Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

 Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

              Heap Blocks: exact=764

              Buffers: shared hit=786

解决方案

如何纠正rows评估差异问题? 根据官方文档通过创建扩展统计信息

创建扩展统计信息

create  statistics  stat_p_e  ( dependencies  )  on   professional_type, equip_type   from  powerX.cost_XXXXX_pl_info;

--收集统计信息

Analyse   powerX.cost_XXXXX_pl_info;

查看两个过滤条件的执行计划

---- rows 估算正确

Bitmap Heap Scan on cost_stationline_pl_info info  (cost=146.09..7472.62 rows=6797 width=723) (actual time=0.730..2.339 rows=7013 loops=1)

  Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

  Heap Blocks: exact=764

  Buffers: shared hit=786

  ->  Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional  (cost=0.00..144.39 rows=6797 width=0) (actual time=0.648..0.649 rows=7013 loops=1)

        Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

        Buffers: shared hit=22

Planning Time: 0.144 ms

Execution Time: 2.657 ms

优化效果

 Aggregate  (cost=14662.17..14662.18 rows=1 width=64) (actual time=16.336..16.339 rows=1 loops=1)

  Buffers: shared hit=1236

   (cost=7574.99..14662.12 rows=1 width=126) (actual time=11.282..13.865 rows=1343 loops=1)

        Hash Cond: (((station_month.psr_id)::text = (info.psr_id)::text) AND ((station_month.ast_id)::text = (info.ast_id)::text))

        Buffers: shared hit=1236

        ->  Index Scan using cost_stationline_pl_month_idx_sun_year_month on cost_stationline_pl_month station_month  (cost=0.42..7070.73 rows=3202 width=176) (actual time=0.063..1.184 rows=3235 loops=1)

              Index Cond: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[]))

              Buffers: shared hit=450

              I/O Timings: read=290.886

        ->  Hash  (cost=7472.62..7472.62 rows=6797 width=78) (actual time=11.190..11.191 rows=7013 loops=1)

              Buckets: 8192  Batches: 1  Memory Usage: 620kB

              Buffers: shared hit=786

              ->  Bitmap Heap Scan on cost_stationline_pl_info info  (cost=146.09..7472.62 rows=6797 width=78) (actual time=2.371..8.550 rows=7013 loops=1)

                    Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

                    Heap Blocks: exact=764

                    Buffers: shared hit=786

                    ->  Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional  (cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)

                          Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

                          Buffers: shared hit=22

Planning Time: 1.853 ms

Execution Time: 16.453 ms

 

对比:

 

Rows评估

执行计划

执行成本

优化前

(cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

Nested Loop

shared hit=44893  85ms

优化后

(cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)

Hash join

shared hit=1236  16ms

【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月13日 0

暂无评论

IE5LYMWlmdvL