Hive使用遇到的一些问题

Anonymous
2016/04/28
Comments
hive, Hive, ORC, issue

Category: Categories: BigData

记录使用`ORC`所遇到的一些问题

查询报错

由于hiveserver2中默认设置了“矢量化执行器” enable为true，这导致使用orc存储的表使用上了这个参数，走了“矢量化执行器”导致错误，其中遇到的错误有：

数组越界

select a.serv_dt, a.broker_id, a.log_time, b.log_time
       ,unix_timestamp(b.log_time)-unix_timestamp(a.log_time) as time_diff
from   (select *
        from   dw_db_temp.GL_20160422_noqiang_page_broker_action_rank
        where  page_id='1-150000'
       ) a
join   dw_db_temp.GL_20160422_noqiang_page_broker_action_rank b
  on   a.con=b.con and b.rk=a.rk+1;

空指向

drop table if exists dw_db_temp.gh_esf_tmp1;
create table dw_db_temp.gh_esf_tmp1
as
select cal_dt
       ,city_id
       ,region_name
       ,subregion_id, subregion_name
       ,count(distinct comm_id) as pm_comm
       ,count(distinct case when is_comm_repeat=1 then comm_id end) as pm_comm_repeat
from   dw_db_temp.gh_esf_pm_broker_last
where  subregion_name in ('长阳','良乡','窦店','梨园','通州北苑','武夷花园','中央别墅区','通州北苑','华南','泉水','泡崖','机场新区','西山水库','马坡')
group by cal_dt,city_id,region_name,subregion_id,subregion_name
;

解决办法，将矢量化执行器关闭即可：

set hive.vectorized.execution.enabled = false;
set hive.vectorized.execution.reduce.enabled = false;

查询无结果

hive表存成ORC格式，自关联出现无结果现象并且无任何错误，异常SQL如下：

SELECT a.*
       ,b.call_7
FROM   (SELECT comm_id
               ,call_30
               ,call_success_30
               ,call_nonunique_30
        FROM   dw_db_temp.carrie_mr_call_jt_160428_3
        WHERE  type=3
       ) a
LEFT OUTER JOIN
       (SELECT comm_id
               ,call_30
               ,call_success_30
               ,call_nonunique_30
        FROM   dw_db_temp.carrie_mr_call_jt_160428_3
        WHERE  type=2
       ) b
   ON  a.comm_id=b.comm_id
LIMIT 100
;

其导致原因是因为再hiveserver2上默认开启了参数hive.optimize.index.filter为true，解决办法：

将表格式存成textfile
关闭hive.optimize.index.filter

set hive.optimize.index.filter=false;

写表报错

当默认使用orc写表时，where子句中出现 where col is null 将报 java.lang.IllegalArgumentException: Bad primitive category VOID 异常

drop table if exists dw_db_temp.carrie_call_comm_160428_01;
create table dw_db_temp.carrie_call_comm_160428_01
as
select a.*,b.pcall_id
from   (select * from dw_db.dw_phone_call_logs
        where created_dt>=date_sub(${dealDate},7)
       ) a
left outer join
       (select date_index,phonenum,loupan_id,pcall_id
        from   dw_db.dw_nh_adm_transfer_log
        where  date_index>=date_sub(${dealDate},7)
          and  flag=2
          and  type>1
       )b
   on  a.created_dt=b.date_index and a.id=b.pcall_id
where  b.pcall_id is null
;

解决办法：将hive表建成textfile即可

create table table_name stored as textfile
as
select ......

MapJoin 字段类型不一致

主要是 string 和 int 的列做 join

日志里面看 job 长期 hang 在以下位置

查看你 map 的时候，map 的状态为 FAILED，查看 map 的日志，最后有以下类似的信息：

2015-12-11 04:26:06,119 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load back 1 hashtable file from tmp file uri:file:/hadoop3/yarn/nm/usercache/hadoop/appcache/application_1449142557781_73597/container_e11_1449142557781_73597_01_000002/Stage-9.tar.gz/MapJoin-mapfile11--.hashtable

解决方法 保证用于 join 的列一致，cast 或者直接修改表的结构

如果还不能解决，先手动禁用掉 MapJoin

set hive.auto.convert.join=false;

hive 使用的新的序列化方法后偶尔出现的问题

这种问题一般是代码里面设置了 set hive.exec.parallel=true; 运行的时候，偶尔会报以下错误，重跑有可能成功：

 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
     at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
     at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
     at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
     at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
     at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)

如果有这种情况，解决的方法是先更换序列化的方法，即加入：

set hive.plan.serialization.format=javaXML;

WHERE 条件int限制值超过Integer.MAX_VALUE值导致出错

select *
from   dual
where  id=2183402559

id为int类型，where里面id=’2183402559’ 限制值超过 Integer.MAX_VALUE=2147483647 导致下面报错

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10041]: No partition predicate found for Alias “a” Table > “dw_ganji_house_source_officetrade”

Loh's Blog