explode与lateral view使用详解(spark及hive环境对比)
HIVE环境
1.explode 炸裂函数
定义:explode函数能够将array及map类型的数据炸开,实现一行变多行
格式:select explode(array/map) from table
示例
原始数据tmp表
name | id | goods |
a | 1 | book_a,food_a |
b | 2 | book_b,food_b |
c | 3 | null? |
with tmp as (
select 'a' name,'1' as id,'book_a,food_a' as goods
union all
select 'b' name,'2' as id,'book_b,food_b' as goods
union all
select 'c' name,'3' as id,null as goods
)
select explode(split(goods,',')) as goods_col
from tmp
输出结果
goods_col |
book_a |
food_a |
book_b |
food_b |
注意 1.如果炸开字段为null值,则会被直接过滤掉 2.hive环境下使用explode函数只能select explode炸开的列,且仅能包含单个explode,其他列无法取到。如 select explode(split(goods,',')) as goods_col ,name ,id from tmp 则会报错
2.lateral view侧视图
定义:Lateral 也是用来对数组进行列转行的,Lateral View主要解决在select使用UDTF(如explode)做查询的过程中查询只能包含单个UDTF,不能包含其它字段以及多个UDTF的情况(及不能添加额外的select列的问题)
格式:tabelA lateral view UDTF(xxx) 视图别名(虚拟表名) as a,b,c
示例:
with tmp as (
select 'a' name,'1' as id,'book_a,food_a' as goods
union all
select 'b' name,'2' as id,'book_b,food_b' as goods
union all
select 'c' name,'3' as id,null as goods
)
select name,id,goods_col
from tmp
lateral view explode(split(goods,',')) tmp_table as goods_col
输出结果
name | id | goods_col |
a | 1 | book_a |
a | 1 | food_a |
b | 2 | book_a |
b | 2 | food_b |
注意 1.如果炸开字段为null值,则会被直接过滤掉
特别的,如果想要保留炸开字段为空值的行,则可以使用lateral view 来实现,对上述查询逻辑进行改动
with tmp as (
select 'a' name,'1' as id,'book_a,food_a' as goods
union all
select 'b' name,'2' as id,'book_b,food_b' as goods
union all
select 'c' name,'3' as id,null as goods
)
select name,id,goods_col
from tmp
lateral view outer explode(split(goods,',')) tmp_table as goods_col
输出结果
name | id | goods_col |
a | 1 | book_a |
a | 1 | food_a |
b | 2 | book_a |
b | 2 | food_b |
c | 3 | null |
spark
1.explode 炸裂函数
????????spark中的炸裂函数explode具备hive中的explode的功能,不同的是,spark中使用explode函数在select时能够筛选额外的列
with tmp as (
select 'a' name,'1' as id,'book_a,food_a' as goods
union all
select 'b' name,'2' as id,'book_b,food_b' as goods
union all
select 'c' name,'3' as id,null as goods
)
select name,id,explode(split(goods,',')) as goods_col
from tmp
输出结果
name | id | goods_col |
a | 1 | book_a |
a | 1 | food_a |
b | 2 | book_a |
b | 2 | food_b |
????????因此在spark中explode 近似可理解为等同于 lateral view explode,同时,spark中也支持explode_outer 等同于 lateral view outer explode
2.lateral view侧视图
????????spark中使用lateral view侧视图与hive中完全一致
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!