基于 Hadoop HDFS、Apache Spark 和 Apache Hive 的研究型 A 股数据流水线。 项目从真实数据采集开始,完成 RAW、ODS、DWD、DWS、ADS ...
ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly.