[ Hive基础 ] (138)
    Hive是什么
    是有facebook开源,基于hado ...
    Hive基本原理
    是一个hadoop客户端工具,用于将H ...
    HiveServer2服务
    提供 JDBC/ODBC接口,为用户提 ...
        用户访问权限
        hive.server2.enable ...
    Metastore服务
    为Hive CLI或者Hiveserv ...
        嵌入式模式
        独立服务模式
        默认模式。 ...
        SQL Parser 解析器
        将sql字符串转换成抽象语法树AST, ...
        Semantic Analyzer 语义分析器
        将AST划分为QeuryBlock ...
        Logical Plan Gen 逻辑计划生成器
        将语法树转换成逻辑计划 ...
        Logical Optimizer 逻辑优化器
        将逻辑计划进行优化 ...
        Physical Plan Gen 物理计划生成器
        根据优化的逻辑计划生成物理计划,即Ma ...
        Physical Optimizer 物理优化器
        将物理计划进行优化 ...
        Execution 执行器
        执行物理计划,得到结果返回 ...
    HDFS
    Yarn
    命令行
        hive
        beeline
    基础类型
        tinyint
        smallint
        int
        bigint
        float
        double
        boolean
        string
        timestamp
        binary
    复杂类型
        array
        map
        struct
    类型转化
        隐式转换
        类型可以向范围更大的类型隐式转换。 ...
        显式转换
        cast(feild as
    类型取值范围
    按照取值范围进行从小到大排序,转换对照 ...
        void
        boolean
        tinyint
        smallint
        int
        bigint
        float
        double
        decimal
        string
        varchar
        timestamp
        date
        binary
        数据库 Database
            创建数据库
            create database dbn ...
            查询数据库
            show databases ; ...
            修改数据库
            alter database; 数据库 ...
                dbproperties
                location
                owner user
            删除数据库
            drop database ; ...
        表 Table
            创建表
                create table ...
                    管理表
                    外部表 external
                    分区表 partitioned by
                        创建分区
                        create table part_t ...
                        查看分区
                        show partitions par ...
                        增加分区
                        alter table part_ta ...
                        删除分区
                        alter table part_ta ...
                        修复分区
                            add partition
                            drop partition
                            msck
                                msck repair table part_table drop partitions;
                                增加存在的路径但是元数据没有的分区。 ...
                                msck repair table part_table add partitions;
                                删除不存在的路径但是元数据还有的分区。 ...
                                msck repair table part_table sync partitions;
                                相当于执行以上两个命令。 ...
                                msck repair table part_table;
                                等价于 msck repair ta ...
                        二级分区表
                        create table table_ ...
                        动态分区表
                            相关参数
                                set hive.exec.dynamic.partition=true;
                                set hive.exec.dynamic.partition.mode=nonstrict;
                                set hive.exec.max.dynamic.partitions=1000;
                                set hive.exec.max.dynamic.partitions.pernode=100;
                                set hive.exec.max.created.files=100000;
                                set hive.error.on.empty.partition=false;
                    临时表 temporary
                    分桶表
                    create table table_ ...
                        分桶排序表
                        create table table_ ...
                    序列化 row format
                    serde 序列号和反序列化器, 默认 ...
                        delimited
                        每个字段按照特定分隔符进行分割。 ...
                        fields teminated by
                        列分隔符 ...
                        collection items teminated by
                        map,struct,array中元素 ...
                        map keys terminate by
                        map中key与value的分隔符 ...
                        lines terminated by
                        行分隔符 ...
                        null defined as
                        空值符号 ...
                    文件格式 stored as
                    存储位置 location
                    配置参数 tblproperties
                create table as select ... (CTAS)
                这种方式建表不允许构建外部表 ...
                create table like ...
                复制表结构 ...
            查看表
            show tables; desc ...
            修改表
                表重命名
                alter table ... ren ...
                修改列信息
                修改的都是元数据 ...
                    增加列
                    alter table ... add ...
                    更新列
                    alter table ... cha ...
                    替换列
                    重新定义表中所有列, alter t ...
            删除表
            drop table table_na ...
            清空表
            truncate table tabl ...
        load
        load data inpath '' ...
        insert
            将查询结果写入表
            isnert into table t ...
            将给定数据写入表
            insert into table t ...
            将查询结果写入路径
            insert overwrite di ...
        export/import
        export table table_ ...
        基本查询
        select * from table ...
            查全表
            查指定列
            列别名
            limit语句
            where语句
            group by语句
    HiveQL 视图
    HiveQL 索引
    HiveQL 函数
        查看函数命令
        show functions; de ...
        操作符
            操作符优先级
            关系运算符
            主要用于where和having语句中 ...
            数学运算符
            逻辑运算符
            字符串运算符
            符合类型运算符
        函数
            数学函数
                round
                ceil
            集合函数
                size
                map
                map_keys
                map_values
                array
                array_countains
                sort_array
                struct
                named_struct
            类型转换函数
            日期函数
                unix_timestamp
                from_unixtime
                current_date
                current_timestamp
                month
            条件函数
            字符串函数
            数据屏蔽函数
            流程控制函数
                case when
                if
            聚合函数
                count
                count(*),count(1):对 ...
                max
                min
                sum
                avg
            表格生成函数
            窗口分析函数 Window functions
            为每行数据划分一个窗口,然后对窗口内的 ...
                语法
                函数(...) over(窗口范围) ...
                    函数
                    窗口
                        基于行
                        order by [field] ro ...
                        基于值
                        order by [field] ra ...
                        分区
                        partition by [field ...
                        缺省
                        over() 中partition b ...
                            partition by 不写
                            order by 不写
                            rows|range between ... and ... 不写
                                over中有order by
                                默认值: range between ...
                                over中没有order by
                                默认值:rows between un ...
                分类
                    聚合函数
                        max
                        min
                        sum
                        avg
                        count
                    跨行取值函数
                    lead/lag : 不支持自定义 ...
                        lead
                        lead(字段,偏移量,默认值) 后一 ...
                        lag
                        lag(字段,偏移量,默认值) 前一 ...
                        first_value
                        first_value(字段,是否跳过 ...
                        last_value
                        last_value(字段,是否跳过n ...
                    排名函数
                    rank/dense_rank/row ...
                        rank
                        rank() over(partiti ...
                        dense_rank
                        dense_rank() over(p ...
                        row_number
                        row_number() over(p ...
            多维分析函数
            高级聚合函数
                collect_list
                collect_set
            炸裂函数
                explode
                    explode(array)
                    explode(map)
                posexplode
                    posexplode(array)
                inline
                    inline(array>)
                lateral view
                将udtf函数应用到源表的每行数据,将 ...
            自定义函数
                创建临时函数
                create temporary fu ...
                创建永久函数
                create function my_ ...
                自定义函数分类
                    自定义函数 UDF user-defined function
                    一进一出,实现类 GenericUDF ...
                    自定义聚合函数 UDAF user-defined aggregation function
                    多进一出 ...
                    自定义表函数 UDTF user-defined table-generating functions
                    一进多出 ...
        DEFLATE
        算法:DEFLATE,扩展名:.def ...
        Gzip
        算法:DEFLATE,扩展名:.gz, ...
        bzip2
        算法:bzip2,扩展名:.bz2,是 ...
        LZO
        算法:LZO,扩展名:.lzo,是否切 ...
        Snappy
        算法:Snappy,扩展名:.snap ...
    存储格式
        行存储
        一次需要读取一整行数据。 ...
        列存储
        一次只需要读取一列或者几列数据。 ...
        Text file
        行存储 ...
        Orc
        列存储,读取orc文件,需要从Post ...
            文件格式
                Header:ORC
                Strip [num]
                    Index Data
                    各个field的最大值,最小值,行位置 ...
                    column [field]
                    Stripe Footer
                    各个column的编码等信息 ...
                File Footer
                Header长度; 各个stripe信 ...
                Postscript
                file footer长度;文件压缩参 ...
                Postscript Length
            建表语句
            create table tabl_n ...
                tblproperties参数
                    orc.compress
                    默认值:ZLIB,压缩格式:NONE, ...
                    orc.compress.size
                    默认值:256kb,每个压缩块的大小 ...
                    orc.stripe.size
                    默认值:64M,每个stripe的大小 ...
                    orc.row.index.stride
                    默认值:10000,索引步长 ...
        Parquet
        列存储,读取parquet文件,需要从 ...
            文件格式
                PAR1
                Row Group [num]
                    column chunk [field_name]
                        page [num]
                Footer (File Meta Data)
                    Row Group [num] Meta Data
                        column chunk [field_name] meta data
                            数据类型
                            编码方式
                            data page位置
                            统计信息 (最大值、最小值、null值个数)
                Footer Length
                PAR1
            建表语句
            create table tabl_n ...
                tblproperties参数
                    parquet.compression
                    默认值:uncompressed,压缩 ...
                    parquet.block.size
                    默认值:128m,行组大小,通常与hd ...
                    parquet.page.size
                    默认值:1m,页大小 ...
        Sequence file
    压缩
        文件压缩
            Textfile
            压缩设置 set hive.exec. ...
            Orc
            压缩设置 create ... sto ...
            Parquet
            压缩设置 create ... sto ...
        计算中压缩
            单个MR中间结果压缩
            压缩设置 set mapreduce. ...
            单条SQL中间结果压缩
            压缩设置 set hive.exec. ...
    资源配置
        Yarn资源配置
            yarn.nodemanager.resource.memory-mb
            单个nodemanager分配给con ...
            yarn.nodemanager.resource.cpu-vcores
            单个nodemanager分配给con ...
            yarn.scheduler.maximum-allocation-mb
            单个container能够使用的最大内 ...
            yarn.scheduler.minimum-allocation-mb
            单个container能够使用的最小内 ...
        MapReduce资源配置
            mapreduce.map.memory.mb
            单个map task申请的contai ...
            mapreduce.map.cpu.vcores
            单个map task申请的contai ...
            mapreduce.reduce.memory.mb
            单个reduce task申请的con ...
            mapreduce.reduce.cpu.vcores
            单个reduce task申请的con ...
        组成部分
            Stage
            Explain有一系列Stage组成, ...
            Operator Tree
            一个Stage对应一个MapReduc ...
                Map Operator Tree
                Reduce Operator Tree
            Operator
            常见的Operator有以下几种。 ...
                TableScan
                Select Opterator
                Group by Operator
                Reduce Output Operator
                Join Operator
                File Output Operator
                Fetch Operator
        内容结构
            Stage Depenencies
                Stage [num] ...
            Stage Plans
                Stage [num]
                    Map Operator Tree
                        TableScan
                        Select Opterator
                        Group by Operator
                        Reduce Output Operator
                    Reduce Operator Tree
                        Group by Operator
                        File Output Operator
                Stage [num...]
                    Fetch Operator
        基本语法
                Formatted
                Exetended
                Depenency
    语法调优