[ Flink基础 ] (150)
        事务处理
        分析处理
    流处理
        Lambda架构
        有状态流处理
    事件驱动
    流式数据
    Flink Program
    用户开发的应用程序。 负责将程序代码解 ...
    JobManager
    负责接收来着Client(Flink ...
        ResourceManager
        负责资源的管理,在整个 Flink 集 ...
        Dispatcher
        负责接收用户提供的作业,并且负责为这个 ...
        JobMaster
        JobMaster负责管理单个JobG ...
    TaskManager
    负责按照Executor Graph的 ...
    重要概念
        数据流图 DataFlow
        根据用户使用API编写的代码生成的初始 ...
        并行度 Parallism
        一个特定算子的子任务数量。默认并行度: ...
            并行度优先级
            算子设置 > 全局设置 > 集群默认设置 ...
        数据传输方式
            一对一模式 One-To-One
            重分配模式 Redistributing
        算子链
        子任务合并条件是并行度相同,并且是on ...
            禁用算子链
            开始新的算子链
        执行图
            StreamGraph
            根据用户使用Stream API编写的 ...
            JobGraph
            StreamGraph经过优化后生成的 ...
            ExecutionGraph
            JobManager根据JobGrap ...
            物理执行图
            在TaskManager上部署Task ...
        任务 Task
        一个TaskManager是一个JVM ...
        任务槽 Task Slots
        TaskManager通过Task S ...
            Slot共享组
            设置代码 .slotSharingGr ...
        任务共享 Slot
        Flink允许子任务共享Slot,一个 ...
        Total Flink Memory
            JVM Heap Memory
                Framework Heap
                Task Heap
            Off-Heap Memory
                Managed Memory
                Directory Memory
                    Framework Off-Heap
                    Task Off-Heap
                    Network
        JVM Metaspace
        JVM Overhead
        它是为JVM进程预留的其它本地内存,用 ...
    SQL
    高级API,High-level La ...
    Table
    高级API,Declarative D ...
    DataStream/DataSet API
    核心API,Core APIs ...
    ProcessFunction
    低级API,Stateful Stre ...
部署模式
区别: 单作业模式是通过客户端来提交的 ...
    Standalone
    Yarn
    Mesos
        创建执行环境
            getExecuteEnvironment
            createLocalEnvironment
            createRemoteEnvironment
        执行模式 Execution Mode
            批处理环境
            代码:ExecuteEnvironme ...
            流处理环境
            代码:StreamExecuteEnv ...
                流执行模式 Streaming
                批执行模式 Batch
                自动模式 Automatic
    数据类型
        TypeInfomation
        统一表示数据类型。 ...
        基本类型
        数组类型
        复合类型
        泛型类型
        TypeHint
        类型暗示(type hint)机制解决 ...
    算子
        源算子 Source
            从集合读取
            从文件读取
            从Socket读取
            从Kafka读取
            自定义Source
        转换算子 Transformation
            基本转换算子
                map
                flatMap
                fliter
            聚合算子 Aggregation
                keyBy
                reduce
            自定义函数 UDF
                函数类 Function Classes
                匿名函数
                富函数 Rich Function Classes
            分区
                逻辑分区 软分区
                keyBy算子 ...
                物理分区 Physical Partitioning
                    随机分区 shuffle
                    轮询分区 rebalance
                    重缩放分区 rescale
                    广播 broadcast
                    全局分区 global
                    自定义分区 partitionCustom
        输出算子 Sink
            连接到外部系统
            文件
            Kafka
            Redis
            Elasticsearch
            MySQL
            自定义输出
    时间语义
        事件时间 Event Time
        摄入时间 Ingestion Time
    水位线 Watermark
    基于事件时间提出的概念,自定义的逻辑时 ...
        有序数据流
        无序数据流
        水位线特性
        水位线生成
            生成策略
                TimestampAssigner
                从数据流中的事件数据的某个字段中提取时 ...
                WatermarkGenerator
                按照既定的方式,基于时间戳生成水位线。 ...
                    onEvent
                    为每个到来的事件数据调用该方法。 ...
                    onPeriodicEmit
                    周期性调用的方法,默认200ms,可以 ...
            自定义水位线
                周期性水位线生成器(Periodic Generator)
    窗口 Window
        概念
            数据块
            将无限数据切割成有限的“数据块”进行处 ...
            存储桶
            将数据流切割成有限大小的多个“存储桶” ...
        分类
            按照驱动类型分类
                时间窗口 Time Window
                计数窗口 Count Window
            按照分配规则分类
                滚动窗口 Tumbling Windows
                滑动窗口 Sliding Windows
                会话窗口 Session Windows
                全局窗口 Global Windows
        窗口API
            按键分区窗口 Keyed Windows
            stream.keyBy(...).w ...
            非按键分区窗口 Non-Keyed Windows
            stream.windowAll(.. ...
            时间窗口
                滚动时间窗口
                TumblingProcessingT ...
                滑动时间窗口
                SlidingProcessingTi ...
                会话时间窗口
                ProcessingTimeSessi ...
            事件窗口
                滚动事件窗口
                TumblingEventTimeWi ...
                滑动事件窗口
                SlidingEventTimeWin ...
                会话事件窗口
                EventTimeSessionWin ...
            计数窗口
                滚动计数窗口
                stream.keyBy(...).c ...
                滑动计数窗口
                stream.keyBy(...).c ...
            流转换
                DataStream
                DS (keyBy)-> KS , ...
                AllWindowStream
                AWS (reduce,aggrega ...
                KeyedStream
                KS (sum,max,min,red ...
                WindowedStream
                WS (reduce,aggrete, ...
            分类
                    归约函数 ReduceFunction
                    聚合函数 AggregateFunction
                        输入类型 IN
                        累加器类型 ACC
                        输出类型 OUT
                全窗口函数 Full Window Functions
                    窗口函数 WindowFunction
                    stream.keyBy(
                    处理窗口函数 ProcessWindowFunction
                    stream.keyBy(
            其它API
                触发器 Trigger
                    onElement()
                    onEventTime()
                    onProcessingTime()
                    clear()
                移除器 Evictor
                    evictBefore()
                    evictAfter()
                允许延迟 Allowed Lateness
                stream.keyBy(...).w ...
                测输出流 Side Output
                OutputTag ou ...
            处理函数
                基本处理函数 ProcessFunction
                    ProcessFunction
                    KeyedProcessFunction
                    ProcessWindowFunction
                    ProcessAllWindowFunction
                    CoProcessFunction
                    ProcessJoinFunction
                    BroadcastProcessFunction
                    KeyedBroadcastProcessFunction
                按键分区处理函数 KeyedProcessFunction
                    定时器 Timer
                    定时服务 TimerService
    分流
        测输出流 Side Output
        通常使用处理函数 Process Fu ...
    合流
        联合 Union
        连接 Connect
            连接流 ConnectedStreams
        双流联结 Join
            窗口联结 Window Join
            间隔联结 Interval Join
            窗口同组联结 Window CoGroup
    状态
        状态算子
            无状态算子
            map,flatMap,filter ...
            有状态算子
        状态管理
        状态分类
            原始状态 Raw State
            托管状态 Managed State
                按键分区状态 Keyed State
                    状态结构类型
                        值状态 ValueState
                        列表状态 ListState
                        映射状态 MapState
                        规约状态 ReducingState
                        聚合状态 AggregatingState
                    状态生存时间 TTL
                    time-to-live ...
                        StateTtlConfig
                        需要创建StateTtlConfig配 ...
                算子状态 Operator State
                涉及到重分区后状态的重组分配。 ...
                    状态结构类型
                        列表状态 ListState
                        子任务以轮询方式round-robin ...
                        联合列表状态 UnionListState
                        子任务以联合重组方式选择要使用的状态项。 ...
                        广播状态 BroadcastState
                        以映射结构map的键值对key-val ...
                广播状态 Broadcast State
                适用于动态配置/动态规则。 ...
        状态持久化
            检查点 Checkpoint
            状态后端 State Backends
            状态的存储访问维护,有一个可插拔的组件 ...
                分类
                    哈希表状态后端 HashMapStateBackend
                    底层是HashMap结构。 ...
                    内嵌RocksDB状态后端 EmbeddedRocksDBStateBackend
                    底层是key-value存储介质。执行 ...
                配置
                    全局配置
                    flink-conf.yaml 修改 ...
                    单独配置
                    为每个作业单独配置 env.setSt ...
    检查点 Checkpoint
    将某个时间点的所有状态保存下来就是检查 ...
        检查点保存
            周期性保存
            保存流程
            检查点保存要等所有任务将同一条数据处理 ...
        检查点恢复
            重启应用
            读取检查点并重置状态
            重放数据
            继续处理数据
        检查点算法
            Chandy-Lamport 算法
            Flink采用了基于此算法的分布式快照。 ...
            检查点分界线 Checkpoint Barrier
            由source端在数据流中插入的一个标 ...
            检查点协调器 Checkpoint Coordinator
            专门用来协调处理检查点的相关工作。 ...
            分布式快照算法
                分界线传递规则
                    单个上游向多个下游发送barrier
                    需要广播出去。 ...
                    多个上游向单个下游发送barrier
                    执行 分界线对齐 (barrier a ...
                JobManager发送检查点保存指令
                分界线对齐后状态保存到持久存储
                处理换存数据并继续处理其它数据
        检查点配置
            检查点启用
            env.enableCheckpoin ...
            检查点存储 Checkpoint Storage
            env.getCheckpointCo ...
            其它配置
            checkpointConfig.se ...
        保存点用途
            版本管理/归档存储
            更新Flink版本
            更新应用程序
            调整并行度
            暂停应用程序
        保存点使用
            创建保存点
            bin/flink savepoint ...
            重启应用
            bin/flink run -s :s ...
    状态一致性
        一致性概念
        一致性就是结果的正确性。 ...
        一致性级别
            最多一次 AT-MOST-ONCE
            至少一次 AT-LEAST-ONCE
            精确一次 EXACTLY-ONCE
                一致性检查点
                Flink使用一种轻量级快照机制(检查 ...
        端到端(end-to-end)状态一致性
        即结果的正确性贯穿整个流处理应用的始终 ...
            端到端 Exactly-Once
                应用内部
                使用checkpoint进行保证。 ...
                Source 端
                可以重置数据的读取位置。 ...
                Sink 端
                从故障恢复时候,数据不会重复写入外部系统 ...
                    幂等写入 Idempotent Writes
                    即一个操作被重复执行多次,但是修改的结 ...
                    事务写入 Transactional Writes
                    应用中一系列的操作,要么全部成功,要么 ...
                        预写日志 Write-Ahead-Log WAL
                        先把结果当成状态保存起来,在收到che ...
                        两阶段提交 Two-Phase-Commit 2PC
                        实现代码TwoPhaseCommitS ...
                            预提交
                            对于每个checkpoint,sink ...
                            正式提交
                            收到checkpoint完成时候,才正 ...
    Table API
        创建表环境
            表环境 TableEnvironment
                功能
                    注册Catalog和表
                    直线SQL查询
                    注册自定义函数UDF
                    DataStream和表之间的转换
            Catalog目录
            和标准SQL的概念一致,主要用来管理数 ...
        创建表
            连接器表 Connector Tables
            通过连接器connector连接到一个 ...
            虚拟表 Virtual Tables
            创建虚拟试图 tableEnv.cre ...
        表查询
            SQL查询
            tableEnv.sqlQuery(" ...
            Table API查询
            Table API是嵌入式编程语言中的 ...
        输出表
        tableEnv.executeSql ...
        表/流转换
            表转换为流 Table to DataStream
            tableEnv.toDataStre ...
            流转换为表 DataStream to Table
                调用方法
                    fromDataStream()方法
                    createTemporaryView()方法
                    fromChangelogStream()方法
                数据类型
                    原子类型
                    Tuple类型
                    POJO类型
                    Row类型
                        RowKind
        流处理里面的表
            表转换为流
                仅插入流 Insert-Only Streams
                更新日志流 changelog streams
            动态表 Dynamic Tables
                更新日志流 changelog stream
            持续查询 Continuous Query
                执行步骤
                    流转换为动态表
                    在此动态表上进行持续查询,生成新的动态表
                    将动态表转换为流
            流转换为动态表
            SQL持续查询
                更新Update查询
                定义的结果表的更新日志流中,保护了in ...
                追加Append查询
                定义的结果表的更新日志流中,只有ins ...
                查询限制
                有些持续查询的计算代价太高,需要受到限 ...
                    状态大小
                    更新计算
            动态表转换为流
                仅追加Append-Only流
                仅通过插入insert来修改的动态表, ...
                撤回Retract流
                    添加add消息
                    撤回retact消息
                    编码规则
                        Insert插入操作
                        编码为add消息 ...
                        Delete删除操作
                        编码为retract消息 ...
                        Update更新操作
                        编码为被更改行的retact消息,新行 ...
                更新插入Upsert流
                    更新插入upsert消息
                    删除delete消息
        时间属性/窗口
            事件时间
                创建表中定义
                create table TmpTab ...
                流转换为表中定义
                自定义提取时间戳和生成水位线 stre ...
            处理时间
                创建表中定义
                create table TmpTab ...
                流转换为表中定义
                定义流 stream = ...; 将 ...
            窗口
                分组窗口 Group Window
                老版本中使用 ...
                窗口表值函数 Windowing TVFs
                    表函数
                        窗口起始点 window_start
                        窗口结束点 window_end
                        窗口时间 window_time
                        window_time = windo ...
                    分类
                        滚动窗口 Tumbling Windows
                        tumble(table TmpTab ...
                        滑动窗口 Hop Wiondows 跳跃窗口
                        hop(table TmpTable, ...
                        累积窗口 Cumulate Windows
                        cumulate(table TmpT ...
                            最大窗口长度 max window size
                            累积步长 step
                        会话窗口 Session Windows
        聚合查询 Aggregation
            分组聚合 Group Aggregation
            select user, count( ...
                状态TTL
                配置项:table.exec.stat ...
            窗口聚合 Window Aggregation
            select user, window ...
            开窗聚合 Over Aggregation (Over聚合)
            针对某一行的前后的行数开窗来统计计算。 ...
                Over子句
                    partition by
                    order by
                    开窗范围
                    between ... precedi ...
                    范围间隔
                    range between inter ...
                    行间隔
                    rows between 5 prec ...
        联结查询 Join
            常规联结查询 Regular Join
                等值内联结 Inner Equl-Join
                等值外联结 Outer Equl-Join
            间隔联结查询 Interval Join
                两表联结
                联结条件
                时间间隔限制
            时间联结查询 Temporal Join
                版本表 Versioned Table
        函数 functions
            系统函数
                标量函数 Scalar Functions
                    比较函数 Comparison Functions
                    逻辑函数 Logical Functions
                    算术函数 Arithmetic Functions
                    字符串函数 String Functions
                    时间函数 Temporal Functions
                聚合函数 Aggregate Functions
                    Count()
                    Sum()
                    Rank()
                    Row_number()
            自定义函数 UDF
                分类
                    标量函数 Scalar Functions
                    表函数 Table Functions
                    聚合函数 Aggregate Functions
                    表聚合函数 Table Aggregate Functions
                调用流程
                    实现UDF抽象类
                    注册函数
                    Table API 调用函数
                    SQL 调用函数
        SQL客户端
            命令行交互界面CLI
            启动客户端
            ./bin/sql-client.sh ...
            设置运行模式
            set 'execution.runt ...
        连接到外部系统
            Kafka
            upsert kafka ...
            文件系统
            JDBC
            Elasticsearch
            HBase
            Hive
                Hive目录
                HiveCatalog ...
                Hive元数据
                Metastore ...
                设置方言
                default和hive , set ...
        解析引擎
            JavaCC (Java Compiler Compiler)
            JavaCC 是一个开源的语法分析器生 ...
            Apache Calcite
            Calcite使用Java CC对SQ ...
                解析过程
                    Parse阶段
                    SQL 解析阶段(SQL–>SqlNo ...
                    Validate阶段
                    SqlNode 验证(SqlNode– ...
                    TranslateToRel阶段
                    生成逻辑计划阶段。语义分析(SqlNo ...
                    Optimize阶段
                    优化阶段(Logical RelNod ...
                        SubQuery Decorrellation 子查询拆解
                        Fliter/Project Pushdown 过滤映射下推
                        Join Recorder 连接记录
                    Execute阶段
                    生成ExecutionPlan,生成物 ...
                    (+)Trasnformations阶段
                    Physical Plan使用code ...
                        Code Optimizations
                        State-of-art Opertors
                        Resource Optimizations
                    (+)Job Graph
                    ransformations之后生成J ...
            Changelog机制 Changelog Mechansim
            也可以叫做Retraction机制。 ...
                确定node该产生消息类型
                    消息类型
                    flink流处理的动态实时表有三种操作 ...
                确定update消息内容
                    消息内容
                    一个update消息有两个内容:upd ...
            优化操作
                内部数据结构优化
                原有row对象结构:1.占用过多空间; ...
                预聚合策略
                table.optimizer.agg ...
                支持微批
                开启支持微批 table.exec.m ...
                top-n策略优化
                使用计算top n来替代实时计算中的全 ...
        SQL语法