每日HiveSQL_求解运动员最大连胜的次数_15

2024-01-03 13:30:13

1.现需要从运动员比赛结果表中统计每个运动员最大连胜的次数

需求结果:

2.所用到的表和数据

--表创建
CREATE TABLE athlete_results
(
    athlete_id INT,
    match_time TIMESTAMP,
    result     VARCHAR(10) -- 'win', 'lose', 'draw'
);

--数据装载
INSERT INTO athlete_results
VALUES (1, '2023-11-01', 'win'),
       (1, '2023-11-02', 'win'),
       (1, '2023-11-03', 'lose'),
       (1, '2023-11-04', 'win'),
       (1, '2023-11-05', 'draw'),
       (1, '2023-11-06', 'win'),
       (1, '2023-11-07', 'win'),
       (1, '2023-11-08', 'win'),
       (2, '2023-11-01', 'win'),
       (2, '2023-11-02', 'lose'),
       (2, '2023-11-03', 'draw'),
       (2, '2023-11-04', 'win'),
       (2, '2023-11-05', 'win'),
       (2, '2023-11-06', 'win');

3.答案

解析:由结果可知,此结果是对每个运动员进行分组得到的每个运动员的最大连胜数,此题关键在于如何根据特定的分组条件进行分组,得到每个可以代表运动员和比赛结果的分组字段。
思路一:

首先使用row_number()开窗根据运动员id进行分区,日期进行排序得到分组字段rn1,row_number()开窗根据运动员id和比赛结果进行分区,日期进行排序得到分组字段rn2,如下图rn1 - rn2可以唯一表述每个运动员在一次连续胜利、失败、平局的分组字段。

3.1?使用row_number()开两个窗,一个根据运动员id分区,日期升序排序,得到group1,另一个是根据运动员id和比赛结果进行分区,日期升序排序,得到group2
select athlete_id,
       to_date(match_time)                                                              match_time,
       result,
       row_number() over (partition by athlete_id order by to_date(match_time))         group1,
       row_number() over (partition by athlete_id, result order by to_date(match_time)) group2
from athlete_results;

运行结果:?

3.2筛选出比赛结果为”win“,然后再根据运动员id和win_group(group1 - group2)分组,count(*)得到运动员每个分组阶段的连胜次数
select athlete_id,
       group1 - group2 win_group,
       count(*)        win_count
from (
         select athlete_id,
                to_date(match_time)                                                              match_time,
                result,
                row_number() over (partition by athlete_id order by to_date(match_time))         group1,
                row_number() over (partition by athlete_id, result order by to_date(match_time)) group2
         from athlete_results
     ) t
where result = 'win'
group by athlete_id, group1 - group2;

运行结果:

3.3最后根据以上运行结果根据运动员id进行分组得到最终结果
select athlete_id,
       max(win_count) max_win_count
from (
         select athlete_id,
                group1 - group2 win_group,
                count(*)        win_count
         from (
                  select athlete_id,
                         to_date(match_time)                                                              match_time,
                         result,
                         row_number() over (partition by athlete_id order by to_date(match_time))         group1,
                         row_number() over (partition by athlete_id, result order by to_date(match_time)) group2
                  from athlete_results
              ) t
         where result = 'win'
         group by athlete_id, group1 - group2
     ) t
group by athlete_id;

运行结果:?

思路二:?

思考打破连续胜利的条件,仔细思考可知,一个运动员每次失败或者平局都会开启一个新的分组,怎么描述这个分组呢?很容易可以想到sum(if())结构,如下图?

3.1使用sum(if())开窗得到分组字段win_group
select athlete_id,
       to_date(match_time) match_day,
       result,
       sum(if(result = 'lose' or result = 'draw', 1, 0))
           over (partition by athlete_id order by to_date(match_time)) win_group
from athlete_results;

运行结果:

3.2根据以上结果筛选出result = ‘win’的结果,之后进行根据运动员id和win_group分组,count(*)得到连胜的次数。
select athlete_id,
       count(*) win_count
from (
         select athlete_id,
                result,
                sum(if(result = 'lose' or result = 'draw', 1, 0))
                    over (partition by athlete_id order by match_time) win_group
         from athlete_results) t
where result = 'win'
group by athlete_id, win_group;

运行结果:

3.3根据运动员id进行分组,得到最终结果
select athlete_id,
       max(win_count) max_win_count
from (
         select athlete_id,
                count(*) win_count
         from (
                  select athlete_id,
                         result,
                         sum(if(result = 'lose' or result = 'draw', 1, 0))
                             over (partition by athlete_id order by match_time) win_group
                  from athlete_results) t
         where result = 'win'
         group by athlete_id, win_group
     ) t
group by athlete_id;

运行结果:

文章来源:https://blog.csdn.net/m0_64893323/article/details/135359822
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。