๋ฐ˜์‘ํ˜•

์ฐธ๊ณ  ์‚ฌ์ดํŠธ

https://www.bucketplace.com/post/2021-04-13-%EB%B2%84%ED%82%B7%ED%94%8C%EB%A0%88%EC%9D%B4%EC%8A%A4-airflow-%EB%8F%84%EC%9E%85%EA%B8%B0/

 

๋ฒ„ํ‚ทํ”Œ๋ ˆ์ด์Šค Airflow ๋„์ž…๊ธฐ - ์˜ค๋Š˜์˜์ง‘ ๋ธ”๋กœ๊ทธ

ํƒ์›”ํ•œ ๋ฐ์ดํ„ฐํ”Œ๋žซํผ์„ ์œ„ํ•œ Airflow ๋„์ž…๊ธฐ

www.bucketplace.com

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•
  • Secondary Namenode: HDFS์˜ Secondary Namenode๋Š” ์ฃผ Namenode์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ •๋ณด๋ฅผ ์ฃผ๊ธฐ์ ์œผ๋กœ ์ˆ˜์ง‘ํ•˜์—ฌ ๋กœ๊ทธ ํŒŒ์ผ์„ ํ•ฉ์นฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋กœ๊ทธ ํŒŒ์ผ์€ ์ฃผ Namenode๊ฐ€ ๊ณ ์žฅ ๋‚ฌ์„ ๋•Œ, ์ด์ „์— ์ˆ˜์ง‘๋œ ๋กœ๊ทธ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ Namenode๋ฅผ ๋ณต๊ตฌํ•˜๊ณ  ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ •๋ณด๋ฅผ ๋ณต์›ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, Secondary Namenode๋Š” ์ฃผ Namenode์˜ ๋ถ€ํ•˜๋ฅผ ์ค„์ด๊ณ  HDFS ํด๋Ÿฌ์Šคํ„ฐ์˜ ์•ˆ์ •์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
  • Standby Namenode: HDFS์˜ Standby Namenode๋Š” ์ฃผ Namenode์™€ ํ•จ๊ป˜ HDFS์˜ ๊ณ ๊ฐ€์šฉ์„ฑ(High Availability, HA)๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Standby Namenode๋Š” ์ฃผ Namenode์˜ ๋ฐ์ดํ„ฐ์™€ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ •๋ณด๋ฅผ ๋ณต์ œํ•˜์—ฌ ํ•ญ์ƒ ์ตœ์‹  ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, ์ฃผ Namenode๊ฐ€ ๊ณ ์žฅ ๋‚ฌ์„ ๋•Œ, Standby Namenode๋Š” ์ฆ‰์‹œ ์ž‘๋™ํ•˜์—ฌ ์ฃผ Namenode ์—ญํ• ์„ ๋Œ€์‹ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ HA ๊ตฌ์„ฑ์€ ์ฃผ Namenode์˜ ๊ณ ์žฅ์ด๋‚˜ ์œ ์ง€ ๋ณด์ˆ˜ ๋•Œ๋ฌธ์— downtime์„ ์ตœ์†Œํ™”ํ•˜๊ณ  HDFS ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐ€์šฉ์„ฑ์„ ๋ณด์žฅํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ, Secondary Namenode์™€ Standby Namenode๋Š” ๋ชจ๋‘ ์ฃผ Namenode์˜ ์•ˆ์ •์„ฑ๊ณผ ๊ฐ€์šฉ์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, Secondary Namenode๋Š” HA ๊ตฌ์„ฑ์ด ์•„๋‹ˆ๋ฉฐ, ์ฃผ Namenode๊ฐ€ ๊ณ ์žฅ ๋‚ฌ์„ ๋•Œ ๋” ๋งŽ์€ ๋ณต๊ตฌ ์‹œ๊ฐ„์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— Standby Namenode๋Š” HA ๊ตฌ์„ฑ์ด๋ฏ€๋กœ ์ฃผ Namenode์˜ ๊ณ ์žฅ ์‹œ๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๊ณ  ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐ€์šฉ์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ๋”์šฑ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

Prometheus ๋‹ค์šด๋กœ๋“œ

  1. wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz 
  2. tar xvzf prometheus-2.35.0.linux-amd64.tar.gz

Prometheus ์‹คํ–‰

nohup ./prometheus --config.file=prometheus.yml >> ./prometheus_run.log 2>&1 &

์•ผ๋ฏˆํŒŒ์ผ์— ์ ํžŒ config ๊ฐ’๋“ค์„ ํ† ๋Œ€๋กœ prometheus ์‹คํ–‰๋จ

address=0.0.0.0:9090

Grafana ๋‹ค์šด๋กœ๋“œ

  1. wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.3-1.x86_64.rpm
  2. sudo yum install grafana-enterprise-8.5.3-1.x86_64.rpm

Grafana ์‹คํ–‰

  1. sudo systemctl daemon-reload
  2. sudo systemctl start grafana-server
  3. sudo systemctl status grafana-server
    1. grafana-server.service - Grafana instance
         Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; disabled; vendor preset: disabled)
         Active: active (running) since Wed 2022-05-25 17:15:47 KST; 5s ago
           Docs: http://docs.grafana.org
       Main PID: 15426 (grafana-server)
          Tasks: 9
         CGroup: /system.slice/grafana-server.service
                 โ””โ”€15426 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-
  4. sudo systemctl enable grafana-server

 

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์„œ๋ฒ„ ๊ตฌ์„ฑ๋„ ์˜ˆ์‹œ

  Zookeeper HDFS YARN MapReduce
Master quorumpeermain  journal node /Name Node
(active)
ResourceManager Job Histroyserver
Standby Master  quorumpeer journal node/
Name Node
(stand by)
ResourceManager  
Worker 1 quorumpeer journal node/
Data Node 1
NodeManager  
Worker 2   Data Node 2 NodeManager  
Worker 3   Data Node 3 NodeManager  

1.  zookeeper ์‹คํ–‰

- ๊ฐ ์„œ๋ฒ„ zkServer.sh start 

- quorumpeer, quorumpeermain ์‹คํ–‰ ๋จ

- leader, follower ์„ ์ •์€ ๋žœ๋ค์ž„

2. journal node ์‹คํ–‰ - zk์„œ๋ฒ„ ์‹คํ–‰ํ•œ ์„œ๋ฒ„์—์„œ ์‹คํ–‰ํ•จ (Namenode HA๋ฅผ ์œ„ํ•จ)

- ์‹คํ–‰ํ•ด์•ผ 8485 ํฌํŠธ ์—ด๋ฆผ. ์‹คํ–‰์•ˆํ•˜๋ฉด ๋„ค์ž„๋…ธ๋“œ ํฌ๋งท์‹œ ์—๋Ÿฌ๋œธ

3. active Namenode ๋ฅผ ์‹คํ–‰ํ•  ์„œ๋ฒ„์—์„œ ์ž‘์—…

- Namenode resset

- Namenode start

4. standby Namenode ์‹คํ–‰ํ•  ์„œ๋ฒ„์—์„œ ์ž‘์—…

- active Namenode์˜ metadata ๋ณต์‚ฌ

- hdfs namenode -bootstrapStandby

- Namenode start

5. namenode ์„œ๋ฒ„์— zookeeper failcontroller ์‹คํ–‰

6. Worker 1-3 ์„œ๋ฒ„์—์„œ Datanode ์‹คํ–‰

7. RM, NM ๊ฐ ์„œ๋ฒ„์—์„œ ์‹คํ–‰

- Resource Manager ๋„ HA ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•จ

8. Master ์„œ๋ฒ„์—์„œ MR job historyserver ๋ฐ๋ชฌ ์‹คํ–‰

 

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

HUE ๋‹ค์šด๋กœ๋“œ

์›ํ•˜๋Š” ํœด ์„ ํƒํ•˜์—ฌ ๋‹ค์šด๋กœ๋“œ
https://github.com/cloudera/hue/tags

 

GitHub - cloudera/hue: Open source SQL Query Assistant service for Databases/Warehouses

Open source SQL Query Assistant service for Databases/Warehouses - GitHub - cloudera/hue: Open source SQL Query Assistant service for Databases/Warehouses

github.com

 

Dependency ๋ฌดํ•œ๊ณ ํ†ต

mvn, database ์„ค์น˜ํ›„ ๊ธฐ๋ณธ์ ์ธ ์„ธํŒ…์€ ์™„๋ฃŒ (hue ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค์™€ ์œ ์ € ์ƒ์„ฑ ์ž‘์—…๊นŒ์ง€ !)

python pip upgrade

curl https://bootstrap.pypa.io/pip/2.7/get-pip.py -o get-pip.py
python get-pip.py
pip install --upgrade pip

python package ์„ค์น˜

pip install psycopg2
pip install psycopg2-binary

OS Package ์„ค์น˜

sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
libffi-devel python-devel openssl-devel -y

SQLite ๋ฒ„์ „ ์—…๊ทธ๋ ˆ์ด๋“œ (hue django ๋•Œ๋ฌธ์— ๋„ˆ๋ฌด ์•„๋ž˜๋ฒ„์ „์ผ์ˆ˜๋ก ์„ค์น˜ ์•ˆ๋จ)

https://kojipkgs.fedoraproject.org/packages/sqlite/

 

Index of /packages/sqlite

kojipkgs.fedoraproject.org

์—ฌ๊ธฐ์„œ ๋งž๋Š” sqlite rpm, sqlite-devel rpm ๋‹ค์šด๋กœ๋“œ

wget https://kojipkgs.fedoraproject.org/packages/sqlite/3.12.2/1.fc24/x86_64/sqlite-3.12.2-1.fc24.x86_64.rpm
wget https://kojipkgs.fedoraproject.org/packages/sqlite/3.12.2/1.fc24/x86_64/sqlite-devel-3.12.2-1.fc24.x86_64.rpm

rpm -Uvh sqlite-3.12.2-1.fc24.x86_64.rpm sqlite-devel-3.12.2-1.fc24.x86_64.rpm

HUE Build

desktop/devtools.mk ์ˆ˜์ •

DEVTOOLS += \
        ipython[7.10.0] \
        ipdb[0.13.9] \

ํœด ๋นŒ๋“œ

cd ${HUE_SRC}
make apps

 

HUE Start

[hue database ๋™๊ธฐํ™”] build/env/bin/hue migrate 
[hue server ์‹œ์ž‘] build/env/bin/hue runserver 0.0.0.0:8000
[hue login] user id/ password -admin/admin
[hdfs user ์ƒ์„ฑ] hdfs dfs -mkdir /user/admin
[hdfs user dir ๊ถŒํ•œ๋ณ€๊ฒฝ] hdfs dfs -chown -R admin:admin /user/admin

 

HUE Configs

vi ${HUE_SRC}/desktop/conf/pseudo-distributed.ini
  [[database]]
    engine=postgresql_psycopg2
    host=1.2.3.4
    name=hue
    port=5432
    user=hue
    password=hue


[hadoop]
  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs
    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://1.2.3.4:8020
      webhdfs_url=http://1.2.3.4:50070/webhdfs/v1
# ------------------------------------------------------------------------     
[beeswax]
  hive_server_host=1.2.3.4
  hive_server_port=10000
  hive_server_http_port=10001
  max_number_of_sessions=3
  thrift_version=11
  use_sasl=true
  # ------------------------------------------------------------------------
[hbase]
  hbase_clusters=(Cluster|1.2.3.4:9090)
  thrift_transport=buffered
  ssl_cert_ca_verify=false
728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

ํ•ด๋‹น ๊ธ€์€ ์œ ํˆฌ๋ธŒ๋ณด๊ณ  ๋ฒˆ์—ญ ๋ฐ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

  Hive Impala
์•„ํ‚คํ…์ณ ๋งต๋ฆฌ๋“€์Šค ๊ธฐ๋ฐ˜์œผ๋กœ ์‹คํ–‰ ๋˜์—ˆ์œผ๋‚˜,
๋‹ค์–‘ํ•œ ์—”์ง„ ์ง€์›
- ๋งต๋ฆฌ๋“€์Šค
- tez
- ์ŠคํŒŒํฌ
์ตœ๊ทผ์—” tez๊ฐ€ ๊ธฐ๋ณธ์—”์ง„์œผ๋กœ ๋‚˜์˜จ๋‹ค๊ณ  ๋“ค์—ˆ๋Š”๋ฐ ํ™•์ธํ•„์š”ํ•จ
๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์ปดํ“จํ„ฐ
RAM ๋งŽ์ด ์”€
์–ธ์–ด ์ž๋ฐ” C++
์‚ฌ์šฉ ์˜ˆ์‹œ ETL
๊ณผ๊ฑฐ ์ผ๊ด„ ์ฒ˜๋ฆฌ
Tez ๋ฐ LLAP๋ฅผ ํ†ตํ•ด interactive query์— ๊ฐ€๊นŒ์šด ์ฟผ๋ฆฌ๋ฅผ ์ œ๊ณต
์ง€์—ฐ์‹œ๊ฐ„ ์งง์Œ, interactive query
์žฅ์   Fault Torelance
ํฐ ํ…Œ์ด๋ธ”๋ผ๋ฆฌ ์กฐ์ธ ๊ฐ€๋Šฅ
interactive query ํ–ฅ์ƒ (Fault Tolerance ์•„๋‹˜)
์Šคํƒ€ ์Šคํ‚ค๋งˆ ํ˜•ํƒœ ์กฐ์ธ ์ตœ์ ํ™”

์ง€์› ํŒŒ์ผ ํ˜•์‹ ํ•˜๋‘ก ํŒŒ์ผ ํฌ๋งท
๋‹ค์–‘ํ•œ ๊ตฌ์กฐ์™€ ๋ฐ˜์ •ํ˜• ๋ฐ์ดํ„ฐ ํ˜•์‹
๋‹ค์–‘ํ•˜๊ฒŒ ์ง€์›ํ•˜์ง€๋งŒ Parquetํ˜•์‹์ด ์ œ์ผ

 

 

์‹ค์ œ๋กœ ์„ค์น˜ํ•ด๋ณด๋‹ˆ

Hive Metastore๊ฐ€ ์‹คํ–‰๋˜์–ด ์žˆ์ง€ ์•Š์œผ๋ฉด Impala๋Š” ์‹คํ–‰ํ•  ์ˆ˜ ์—†๋‹ค.

Metastore๋ฅผ ๊ณต์œ ํ•ด์„œ ์“ฐ๋Š”๊ฑฐ ๊ฐ™์€๋ฐ ๋” ์ฐพ์•„๋ด์•ผํ• ๋“ฏ..

 

 

 

 

 

 

 

 

 

 

 

 

 

์œ ํˆฌ๋ธŒ ์ฃผ์†Œ - https://www.youtube.com/watch?v=vmiWOlcnFW8 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

Kudu Cluster ๊ตฌ์„ฑ์ •๋ณด ์˜ˆ์‹œ

kudu1.com kudu master
kudu2.com kudu master
kudu3.com kudu tserver
kudu4.com kudu tserver
kudu5.com kudu tserver

Master Server

kudu1  ๋…ธ๋“œ

[user@kudu1 ~] ${KUDU_HOME}/sbin/kudu-master/sbin/kudu-master --rpc_bind_addresses=0.0.0.0:7051 \
						              --log-dir=${KUDU_HOME}/logs/master \
                                                              --fs_wal_dir=${KUDU_HOME}/logs/master \
                                                              --fs_data_dirs=${KUDU_HOME}/logs/master \
                                                              --webserver_port=8051 \
                                                              --master_addresses=kudu1.com:7051,kudu2.com:7051 &

kudu2  ๋…ธ๋“œ

 

[user@kudu2 ~] ${KUDU_HOME}/sbin/kudu-master/sbin/kudu-master --rpc_bind_addresses=0.0.0.0:7051 \
						              --log-dir=${KUDU_HOME}/logs/master \
                                                              --fs_wal_dir=${KUDU_HOME}/logs/master \
                                                              --fs_data_dirs=${KUDU_HOME}/logs/master \
                                                              --webserver_port=8051 \
                                                              --master_addresses=kudu1.com:7051,kudu2.com:7051 &

 

Tablet Server 

kudu3 ๋…ธ๋“œ

[user@kudu3 ~] ${KUDU_HOME}/sbin/kudu-tserver --rpc_bind_addresses=0.0.0.0:7050 
					      --log-dir={KUDU_HOME}/logs/tserver \
                                              --fs_wal_dir={KUDU_HOME}/logs/tserver \
                                              --fs_data_dirs={KUDU_HOME}/logs/tserver \ 
                                              --webserver_port=8050 \
                                              --tserver_master_addrs=kudu1.com:7051,kudu2.com:7051 &

kudu4 ๋…ธ๋“œ

 

[user@kudu4 ~] ${KUDU_HOME}/sbin/kudu-tserver --rpc_bind_addresses=0.0.0.0:7050 
					      --log-dir={KUDU_HOME}/logs/tserver \
                                              --fs_wal_dir={KUDU_HOME}/logs/tserver \
                                              --fs_data_dirs={KUDU_HOME}/logs/tserver \ 
                                              --webserver_port=8050 \
                                              --tserver_master_addrs=kudu1.com:7051,kudu2.com:7051 &

kudu5 ๋…ธ๋“œ

 

[user@kudu5 ~] ${KUDU_HOME}/sbin/kudu-tserver --rpc_bind_addresses=0.0.0.0:7050 
					      --log-dir={KUDU_HOME}/logs/tserver \
                                              --fs_wal_dir={KUDU_HOME}/logs/tserver \
                                              --fs_data_dirs={KUDU_HOME}/logs/tserver \ 
                                              --webserver_port=8050 \
                                              --tserver_master_addrs=kudu1.com:7051,kudu2.com:7051 &

 

์˜ต์…˜ ์„ค๋ช…

--rpc_bind_addresses  
--log-dir ๋กœ๊ทธ ํŒŒ์ผ ์ €์žฅ ์œ„์น˜
--fs_wal_dir WAL ๋กœ๊ทธ ์ €์žฅ ์œ„์น˜
--fs_data_dirs ๋ฐ์ดํ„ฐ ๋ธ”๋ก ์ €์žฅ ์œ„์น˜
--webserver_port ์›น ํฌํŠธ๋ฒˆํ˜ธ ์ง€์ •
--tserver_master_addrs  ์ฝค๋งˆ๋กœ ๊ตฌ๋ถ„
ํƒœ๋ธ”๋ฆฟ ์„œ๋ฒ„๊ฐ€ ์—ฐ๊ฒฐํ•ด์•ผ ํ•˜๋Š” ๋งˆ์Šคํ„ฐ ์ฃผ์†Œ
๋งˆ์Šคํ„ฐ๋Š” ์ด ๊นƒ๋ฐœ์„ ์ฝ์ง€ ์•Š์Œ
--master_addresses
์ฝค๋งˆ๋กœ ๊ตฌ๋ถ„, Master ์ปจ์„ผ์„œ์Šค ๊ตฌ์„ฑ์„ ์œ„ํ•œ ๋ชจ๋“  RPC ์ฃผ์†Œ
๊ฐ’ ์ถ”๊ฐ€ ์•ˆํ•  ์‹œ standalone์œผ๋กœ ์‹คํ–‰๋จ

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

Airflow ์„ค์น˜ ํ›„ ์‹คํ–‰

nohup airflow webserver--port 8080 > webserver.log 2>&1 &

Airflow Scheduler ์‹คํ–‰

nohup airflow scheduler > scheduler.log 2>&1 &

Aiflow DAG ๋ฆฌ์ŠคํŠธ ํ™•์ธ

airflow dags list

์‹คํ–‰ ํ›„ ๊ฒฐ๊ณผ

dag_id                        | filepath                      | owner   | paused
==============================+===============================+=========+=======
example_bash_operator         | /usr/local/lib/python3.7/site | airflow | False
                              | -packages/airflow/example_dag |         |
                              | s/example_bash_operator.py    |         |
example_branch_datetime_opera | /usr/local/lib/python3.7/site | airflow | True
tor_2                         | -packages/airflow/example_dag |         |
                              | s/example_branch_datetime_ope |         |
                              | rator.py                      |         |
example_branch_dop_operator_v | /usr/local/lib/python3.7/site | airflow | True
3                             | -packages/airflow/example_dag |         |
                              | s/example_branch_python_dop_o |         |
                              | perator_3.py                  |         |
example_branch_labels         | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_branch_labels.py    |         |
example_branch_operator       | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_branch_operator.py  |         |
example_complex               | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_complex.py          |         |
example_dag_decorator         | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_dag_decorator.py    |         |
example_external_task_marker_ | /usr/local/lib/python3.7/site | airflow | True
child                         | -packages/airflow/example_dag |         |
                              | s/example_external_task_marke |         |
                              | r_dag.py                      |         |
example_external_task_marker_ | /usr/local/lib/python3.7/site | airflow | True
parent                        | -packages/airflow/example_dag |         |
                              | s/example_external_task_marke |         |
                              | r_dag.py                      |         |
example_nested_branch_dag     | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_nested_branch_dag.p |         |
                              | y                             |         |
example_passing_params_via_te | /usr/local/lib/python3.7/site | airflow | True
st_command                    | -packages/airflow/example_dag |         |
                              | s/example_passing_params_via_ |         |
                              | test_command.py               |         |
example_python_operator       | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_python_operator.py  |         |
example_short_circuit_operato | /usr/local/lib/python3.7/site | airflow | True
r                             | -packages/airflow/example_dag |         |
                              | s/example_short_circuit_opera |         |
                              | tor.py                        |         |
example_skip_dag              | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_skip_dag.py         |         |
example_sla_dag               | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_sla_dag.py          |         |
example_subdag_operator       | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_subdag_operator.py  |         |
example_subdag_operator.secti | /usr/local/lib/python3.7/site | airflow | True
on-1                          | -packages/airflow/example_dag |         |
                              | s/example_subdag_operator.py  |         |
example_subdag_operator.secti | /usr/local/lib/python3.7/site | airflow | True
on-2                          | -packages/airflow/example_dag |         |
                              | s/example_subdag_operator.py  |         |
example_task_group            | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_task_group.py       |         |
example_task_group_decorator  | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_task_group_decorato |         |
                              | r.py                          |         |
example_time_delta_sensor_asy | /usr/local/lib/python3.7/site | airflow | True
nc                            | -packages/airflow/example_dag |         |
                              | s/example_time_delta_sensor_a |         |
                              | sync.py                       |         |
example_trigger_controller_da | /usr/local/lib/python3.7/site | airflow | True
g                             | -packages/airflow/example_dag |         |
                              | s/example_trigger_controller_ |         |
                              | dag.py                        |         |
example_trigger_target_dag    | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_trigger_target_dag. |         |
                              | py                            |         |
example_weekday_branch_operat | /usr/local/lib/python3.7/site | airflow | True
or                            | -packages/airflow/example_dag |         |
                              | s/example_branch_day_of_week_ |         |
                              | operator.py                   |         |
example_xcom                  | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_xcom.py             |         |
example_xcom_args             | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_xcomargs.py         |         |
example_xcom_args_with_operat | /usr/local/lib/python3.7/site | airflow | True
ors                           | -packages/airflow/example_dag |         |
                              | s/example_xcomargs.py         |         |
latest_only                   | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_latest_only.py      |         |
latest_only_with_trigger      | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/example_latest_only_with_tr |         |
                              | igger.py                      |         |
tutorial                      | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/tutorial.py                 |         |
tutorial_etl_dag              | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/tutorial_etl_dag.py         |         |
tutorial_taskflow_api_etl     | /usr/local/lib/python3.7/site | airflow | True
                              | -packages/airflow/example_dag |         |
                              | s/tutorial_taskflow_api_etl.p |         |
                              | y                             |         |
tutorial_taskflow_api_etl_vir | /usr/local/lib/python3.7/site | airflow | True
tualenv                       | -packages/airflow/example_dag |         |
                              | s/tutorial_taskflow_api_etl_v |         |
                              | irtualenv.py

์˜ˆ์‹œ๋กœ ๋“ค์–ด ์žˆ๋Š” airflow dags ์œ„์น˜ ์•Œ์ˆ˜ ์žˆ๋‹ค.

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

                   List of relations
 Schema |             Name              | Type  | Owner
--------+-------------------------------+-------+-------
 public | BUCKETING_COLS                | table | hive
 public | CDS                           | table | hive
 public | COLUMNS_V2                    | table | hive
 public | CTLGS                         | table | hive
 public | DATABASE_PARAMS               | table | hive
 public | DBS                           | table | hive
 public | DB_PRIVS                      | table | hive
 public | DELEGATION_TOKENS             | table | hive
 public | FUNCS                         | table | hive
 public | FUNC_RU                       | table | hive
 public | GLOBAL_PRIVS                  | table | hive
 public | IDXS                          | table | hive
 public | INDEX_PARAMS                  | table | hive
 public | I_SCHEMA                      | table | hive
 public | KEY_CONSTRAINTS               | table | hive
 public | MASTER_KEYS                   | table | hive
 public | METASTORE_DB_PROPERTIES       | table | hive
 public | MV_CREATION_METADATA          | table | hive
 public | MV_TABLES_USED                | table | hive
 public | NOTIFICATION_LOG              | table | hive
 public | NOTIFICATION_SEQUENCE         | table | hive
 public | NUCLEUS_TABLES                | table | hive
 public | PARTITIONS                    | table | hive
 public | PARTITION_EVENTS              | table | hive
 public | PARTITION_KEYS                | table | hive
 public | PARTITION_KEY_VALS            | table | hive
 public | PARTITION_PARAMS              | table | hive
 public | PART_COL_PRIVS                | table | hive
 public | PART_COL_STATS                | table | hive
 public | PART_PRIVS                    | table | hive
 public | ROLES                         | table | hive
 public | ROLE_MAP                      | table | hive
 public | SCHEMA_VERSION                | table | hive
 public | SDS                           | table | hive
 public | SD_PARAMS                     | table | hive
 public | SEQUENCE_TABLE                | table | hive
 public | SERDES                        | table | hive
 public | SERDE_PARAMS                  | table | hive
 public | SKEWED_COL_NAMES              | table | hive
 public | SKEWED_COL_VALUE_LOC_MAP      | table | hive
 public | SKEWED_STRING_LIST            | table | hive
 public | SKEWED_STRING_LIST_VALUES     | table | hive
 public | SKEWED_VALUES                 | table | hive
 public | SORT_COLS                     | table | hive
 public | TABLE_PARAMS                  | table | hive
 public | TAB_COL_STATS                 | table | hive
 public | TBLS                          | table | hive
 public | TBL_COL_PRIVS                 | table | hive
 public | TBL_PRIVS                     | table | hive
 public | TXN_WRITE_NOTIFICATION_LOG    | table | hive
 public | TYPES                         | table | hive
 public | TYPE_FIELDS                   | table | hive
 public | VERSION                       | table | hive
 public | WM_MAPPING                    | table | hive
 public | WM_POOL                       | table | hive
 public | WM_POOL_TO_TRIGGER            | table | hive
 public | WM_RESOURCEPLAN               | table | hive
 public | WM_TRIGGER                    | table | hive
 public | aux_table                     | table | hive
 public | compaction_queue              | table | hive
 public | completed_compactions         | table | hive
 public | completed_txn_components      | table | hive
 public | hive_locks                    | table | hive
 public | materialization_rebuild_locks | table | hive
 public | min_history_level             | table | hive
 public | next_compaction_queue_id      | table | hive
 public | next_lock_id                  | table | hive
 public | next_txn_id                   | table | hive
 public | next_write_id                 | table | hive
 public | repl_txn_map                  | table | hive
 public | runtime_stats                 | table | hive
 public | txn_components                | table | hive
 public | txn_to_write_id               | table | hive
 public | txns                          | table | hive
 public | write_set                     | table | hive

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

kudu๊ฐ€ ๋ญ”์ง€, ์‚ฌ์šฉํ•ด๋ณด์ง€๋„ ๋ชปํ•˜๊ณ  rpm ๋นŒ๋“œ๋ฅผ ํ•ด์•ผํ•˜๋Š” ์ƒํ™ฉ์—์„œ ๊ธ€์„ ์ผ๋‹ค๊ฐ€ ์ตœ๊ทผ์— ๋‹ค์‹œ ์ˆ˜์ •ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

ํ˜„์žฌ๋„ kudu๋Š” ์‚ฌ์šฉํ•ด๋ณด์ง€ ๋ชปํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•˜๋‘ก์—์ฝ”์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๋‹ค๋ณด๋ฉด ๋Œ€๋ถ€๋ถ„ ์•„ํ‚คํ…์ณ๊ฐ€ ์œ ์‚ฌํ•œ ๋Š๋‚Œ์ด๋ผ ์ดํ•ดํ•˜๋Š”๋ฐ ํฌ๊ฒŒ ์–ด๋ ต์ง€๋Š” ์•Š๋‹ค.

์ฒ˜์Œ์— kudu ์ฐพ์•„๋ดค์„ ๋•Œ

kudo์˜ ์•„ํ‚คํ…์ณ๋Š” hbase์™€ ์œ ์‚ฌํ•œ ์ ์ด ์žˆ๋‹ค.

๊ทธ๋ž˜์„œ hbase๋ฅผ ์‚ฌ์šฉํ•ด๋ดค๊ฑฐ๋‚˜ ํ–ˆ๋‹ค๋ฉด, kudu์ดํ•ดํ•˜๋Š”๋ฐ์—” ํฌ๊ฒŒ ์–ด๋ ค์›€์ด ์—†์„ ๊ฒƒ์ด๋‹ค.

kudu์—์„œ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ๋Š” key-value ์ด๋‹ค.

 

์ผ๋‹จ, kudu๋ฅผ ๊ฒ€์ƒ‰ํ•ด์„œ ์ด๊ฒƒ ์ €๊ฒƒ ๋ณด๋‹ค๋ณด๋ฉด ๊ทธ๋ž˜์„œ hbase๊ฐ™์€๊ฑฐ๋ผ๋Š” ๊ฑด๊ฐ€? ์ƒ๊ฐ์ด ๋“ ๋‹ค.

๋”ฐ๋ผ๋‹ค๋‹ˆ๋Š” ํ‚ค์›Œ๋“œ๊ฐ€ key-value๋กœ ์ธ๋ฐ, hbase ๋˜ํ•œ key-value๊ฐ€ ํ•ต์‹ฌํ‚ค์›Œ๋“œ์ด๋‹ค.

kudu ์ „์— hbase

kudu๋ฅผ ์•Œ์•„๋ณด๊ธฐ์ „์— hbase์— ๋Œ€ํ•ด ๊ฐ„๋žตํžˆ ์งš๊ณ  ๋„˜์–ด๊ฐ€๋ฉด HDFS๋Š” append-only , immutable์˜ ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. HDFS์— ์ €์žฅ๋œ ํŒŒ์ผ์€ ์ˆ˜์ •์ด ๋ถˆ๊ฐ€ํ•˜๋‹ค. (๋ฎ์–ด์“ฐ๊ธฐํ•ด์„œ ์ˆ˜์ •)

Hbase์˜ ๊ฒฝ์šฐ HDFS์™€ ๋‹ฌ๋ฆฌ ์ˆ˜์ • ๊ฐ€๋Šฅํ•˜๊ณ  ๋žœ๋ค์œผ๋กœ ์ ‘๊ทผํ•ด ์ฝ๊ณ  ์“ฐ๊ธฐ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ ‘๊ทผ์ด ๋น ๋ฅด๊ณ  low latency๋กœ input, output ๊ณผ์ •์—์„œ ์ง€์—ฐ์„ ์ตœ์†Œํ™” ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

hbase์™€ kudu์˜ ์ฐจ์ด์ ์€ ?

๊ทธ๋Ÿผ kudu์™€ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅผ๊นŒ??

hbase๋Š” ์ฒ˜๋ฆฌ๋Ÿ‰์ด ๋‚ฎ๊ณ , kudu๋Š” ์ฒ˜๋ฆฌ๋Ÿ‰์ด ๋†’๋‹ค๊ณ  ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.

hbase์€ HDFS์—์„œ JSONํŒŒ์ผ, kudu๋Š” HDFS์—์„œ parquet ํŒŒ์ผ์„ ์ฒ˜๋ฆฌํ•œ๋‹ค

kudu๋Š” parquet ์™€ hbase์˜ ์ž์‹๊ฒฉ์ด๋‹ค.

 

parquetํŒŒ์ผ์„ ๋‹ค๋ค„๋ดค๋‹ค๋ฉด, ์••์ถ•๋ฅ ์ด ๋†’์•„(์šฉ๋Ÿ‰์ด ์ž‘์Œ ๋“ฑ๋“ฑ) ์ฒ˜๋ฆฌ๋Ÿ‰์ด ๋†’๋‹ค๊ณ  ๋Š๋‚„ ์ˆ˜ ์žˆ๋‹ค.

 

parquet์ด ๋จผ๋””?

hbase์ฒ˜๋Ÿผ ์นผ๋Ÿผ๋ฒ ์ด์Šค ์Šคํ† ๋ฆฌ์ง€์ด๋ฉฐ, hbase๊ฐ™์€ parquet parquet๊ฐ™์€ hbase๋ผ๊ณ  ์ƒ๊ฐํ•ด์•ผํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

 

Kudu๋Š” Hbase์™€ parquet์˜ ํ˜ผ์ข… 

์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ๋„์ฐฉํ•˜์—ฌ ์ €์žฅํ•  ๋•, Hbase์™€ ๊ฐ™์€ ํŠน์ง•์„ ๊ฐ€์ง€๋ฉฐ,
์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ถ„์„ ์›Œํฌ๋กœ๋“œ ์‹คํ–‰ํ•  ๋•, parquet๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ๊ฐ€์ง„๋‹ค.

 

์ฟผ๋ฆฌ๋กœ ๊ธด ๊ธฐ๊ฐ„ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ์™€ ์‹ค์‹œ๊ฐ„ ๋‹จ๊ธฐ ๋ฐ์ดํ„ฐ ๊ฒฐํ•ฉ์„ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค

Hbase in-memory + Parquet columar layout = kudu

Kudu์™€ ์ƒํ˜ธ์ž‘์šฉ ํ•˜๋Š” ๊ฒƒ๋“ค?

  • Impala
    • ๋ณ„๋„์˜ ์งˆ์˜ ๊ธฐ๋Šฅ์„ kudu์—์„œ ์ง€์›ํ•˜์ง€์•Š๊ธฐ์— impala์™€ ๊ฐ™์€ ์งˆ์˜ ์—”์ง„ ์‚ฌ์šฉํ•ด์•ผํ•œ๋‹ค.
    • ํ™•์‹คํžˆ ํ•˜์ด๋ธŒ ์ฟผ๋ฆฌ์™€๋Š” ์†๋„๊ฐ€ ๋น ๋ฆ„์„ ๋Š๋‚„ ์ˆ˜ ์žˆ๋‹ค. ๋ฌด๊ฑฐ์šด ์ฟผ๋ฆฌ๋งŒ ๋Œ๋ฆฌ๊ณ  ์ƒ์ฃผํ•˜๋Š” ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ ๋งŽ๋‹ค๋ฉด TEZ์—”์ง„ ์‚ฌ์šฉ์ž์ฒด๋„ ๋ถ€๋‹ด์ด ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค. MR์—”์ง„ ์“ฐ๋Š” ํ•˜์ด๋ธŒ์™€๋Š” ๋งค์šฐ ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค
  • C++,java,python client
  • MapReduce
  • Spark

Kudu ์•„ํ‚คํ…์ณ

Kudu Master

  • Fault Tolerance ์žฅ์• ํ—ˆ์šฉ
  • Failover to backup masters ๋ฐฑ์—…์œผ๋กœ ์žฅ์• ์กฐ์น˜
  • Raft used for electing new leaders ์ƒˆ๋กœ์šด ๋ฆฌ๋” ๋ฝ‘์„๋•Œ Raft ์‚ฌ์šฉ
  • Only leader serves client request ๋ฆฌ๋”๋งŒ ํด๋ผ์ด์–ธํŠธ ์š”์ฒญ ์ˆ˜์šฉ

Tablets

  • Hbase์˜ region๊ณผ ๋น„์Šทํ•จ
  • 3- 5 ๊ฐœ ๋ณต์ œ๊ฐ€๋Šฅํ•จ
  • ๋ณต์ œ๋Š” ํ•ฉ์˜๋ฅผ ์œ„ํ•ด Raft์˜ ๋ฆฌ๋”/ํŒ”๋กœ์›Œ ํŒจํ„ด์„ ์‚ฌ์šฉ
  • HDFS์•„๋‹Œ ๋กœ์ปฌ์— ๋ฐ์ดํ„ฐ ์ €์žฅํ•จ

 

Kudu ์‹คํ–‰

https://n-a-y-a.tistory.com/108

 

[kudu] kudu cluster๋ชจ๋“œ๋กœ ์‹คํ–‰

Kudu Cluster ๊ตฌ์„ฑ์ •๋ณด ์˜ˆ์‹œ kudu1.com kudu master kudu2.com kudu master kudu3.com kudu tserver kudu4.com kudu tserver kudu5.com kudu tserver Master Server kudu1 ๋…ธ๋“œ [user@kudu1 ~] ${KUDU_HOME}/sbin/kudu-master/sbin/kudu-master --rpc_bind_addresses=

n-a-y-a.tistory.com

 

Kudu Impala, Hive metastore ์—ฐ๋™

https://n-a-y-a.tistory.com/110

 

hive metastore - kudu - impala ์—ฐ๋™

hive-site.xml hive.metastore.transactional.event.listeners org.apache.hive.hcatalog.listener.DbNotificationListener, org.apache.kudu.hive.metastore.KuduMetastorePlugin hive.metastore.disallow.incompatible.col.type.changes false hive.metastore.notifications

n-a-y-a.tistory.com

๊ผญ ํ•˜์ด๋ธŒ๋ฉ”ํƒ€์Šคํ† ์–ด์™€๋Š” ์—ฐ๋™์„ ํ•˜์ง€์•Š์•„๋„ ๋ ๊ฒƒ๊ฐ™์ง€๋งŒ, ๊ธฐ์กด์— ํ•˜์ด๋ธŒ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉด ์—ฐ๋™ํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ์ด ํŽธ๋ฆฌํ•  ๊ฒƒ์ด๋‹ค.

 

728x90
๋ฐ˜์‘ํ˜•

+ Recent posts