๋ฐ˜์‘ํ˜•

HUE ๋‹ค์šด๋กœ๋“œ

์›ํ•˜๋Š” ํœด ์„ ํƒํ•˜์—ฌ ๋‹ค์šด๋กœ๋“œ
https://github.com/cloudera/hue/tags

 

GitHub - cloudera/hue: Open source SQL Query Assistant service for Databases/Warehouses

Open source SQL Query Assistant service for Databases/Warehouses - GitHub - cloudera/hue: Open source SQL Query Assistant service for Databases/Warehouses

github.com

 

Dependency ๋ฌดํ•œ๊ณ ํ†ต

mvn, database ์„ค์น˜ํ›„ ๊ธฐ๋ณธ์ ์ธ ์„ธํŒ…์€ ์™„๋ฃŒ (hue ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค์™€ ์œ ์ € ์ƒ์„ฑ ์ž‘์—…๊นŒ์ง€ !)

python pip upgrade

curl https://bootstrap.pypa.io/pip/2.7/get-pip.py -o get-pip.py
python get-pip.py
pip install --upgrade pip

python package ์„ค์น˜

pip install psycopg2
pip install psycopg2-binary

OS Package ์„ค์น˜

sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
libffi-devel python-devel openssl-devel -y

SQLite ๋ฒ„์ „ ์—…๊ทธ๋ ˆ์ด๋“œ (hue django ๋•Œ๋ฌธ์— ๋„ˆ๋ฌด ์•„๋ž˜๋ฒ„์ „์ผ์ˆ˜๋ก ์„ค์น˜ ์•ˆ๋จ)

https://kojipkgs.fedoraproject.org/packages/sqlite/

 

Index of /packages/sqlite

kojipkgs.fedoraproject.org

์—ฌ๊ธฐ์„œ ๋งž๋Š” sqlite rpm, sqlite-devel rpm ๋‹ค์šด๋กœ๋“œ

wget https://kojipkgs.fedoraproject.org/packages/sqlite/3.12.2/1.fc24/x86_64/sqlite-3.12.2-1.fc24.x86_64.rpm
wget https://kojipkgs.fedoraproject.org/packages/sqlite/3.12.2/1.fc24/x86_64/sqlite-devel-3.12.2-1.fc24.x86_64.rpm

rpm -Uvh sqlite-3.12.2-1.fc24.x86_64.rpm sqlite-devel-3.12.2-1.fc24.x86_64.rpm

HUE Build

desktop/devtools.mk ์ˆ˜์ •

DEVTOOLS += \
        ipython[7.10.0] \
        ipdb[0.13.9] \

ํœด ๋นŒ๋“œ

cd ${HUE_SRC}
make apps

 

HUE Start

[hue database ๋™๊ธฐํ™”] build/env/bin/hue migrate 
[hue server ์‹œ์ž‘] build/env/bin/hue runserver 0.0.0.0:8000
[hue login] user id/ password -admin/admin
[hdfs user ์ƒ์„ฑ] hdfs dfs -mkdir /user/admin
[hdfs user dir ๊ถŒํ•œ๋ณ€๊ฒฝ] hdfs dfs -chown -R admin:admin /user/admin

 

HUE Configs

vi ${HUE_SRC}/desktop/conf/pseudo-distributed.ini
  [[database]]
    engine=postgresql_psycopg2
    host=1.2.3.4
    name=hue
    port=5432
    user=hue
    password=hue


[hadoop]
  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs
    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://1.2.3.4:8020
      webhdfs_url=http://1.2.3.4:50070/webhdfs/v1
# ------------------------------------------------------------------------     
[beeswax]
  hive_server_host=1.2.3.4
  hive_server_port=10000
  hive_server_http_port=10001
  max_number_of_sessions=3
  thrift_version=11
  use_sasl=true
  # ------------------------------------------------------------------------
[hbase]
  hbase_clusters=(Cluster|1.2.3.4:9090)
  thrift_transport=buffered
  ssl_cert_ca_verify=false
728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์‚ฌ์ „์ž‘์—… ํ•„์š”

  1. root ๊ณ„์ •์— JAVA_HOME ์ถ”๊ฐ€ ํ•„์š”ํ•จ
  2. solr ์„ค์น˜
  3. Maven 3.6.3 ์„ค์น˜
  4. PostgreSQL ์„ค์น˜ ๋ฐ DB - ranger, User - rangeradmin(pw:rangeradmin) ์ƒ์„ฑ


์ž‘์—…๋“ค ์‹คํ–‰ํ•  ๋•Œ root ๋˜๋Š” ๊ถŒํ•œ ๊ฐ€์ง„ ๊ณ„์ •์œผ๋กœ ํ•ด์•ผํ•จ
solr ์„ค์น˜ํ•„์š”!
https://n-a-y-a.tistory.com/m/68

[Apache Solr] Apache solr 8.5.0 ์„ค์น˜ํ•˜๊ธฐ

ranger, atlas๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์„ ์„ค์น˜ํ•ด์•ผํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค์ด๋‹ค. https://archive.apache.org/dist/lucene/solr/8.5.0/ Index of /dist/lucene/solr/8.5.0 archive.apache.org ํ•ด๋‹น ์‚ฌ์ดํŠธ์—์„œ 8.5.0๋ฒ„์ „์„ ๋‹ค์šด ๋ฐ›..

n-a-y-a.tistory.com


ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋“ค

1
2
3
4
$ sudo yum install git gcc python3 python3-devel
$ sudo yum install -y npm nodejs
$ npm install node-ranger
$ pip3 install requests
cs

Ranger ์„ค์น˜ ๋ฐ MVN ๋นŒ๋“œ

1
2
3
4
$ sudo wget https://downloads.apache.org/ranger/2.1.0/apache-ranger-2.1.0.tar.gz
$ sudo tar xvzf apache-ranger-2.1.0.tar.gz
$ cd apache-ranger-2.1.0/
$ mvn -Pall -DskipTests=true clean compile package install
cs

mvn ๋นŒ๋“œ๋Š” ํ•œ์‹œ๊ฐ„ ๋ฐ˜์ •๋„ ๊ฑธ๋ฆฌ๊ณ , mvn์—์„œ ๋นŒ๋“œ ์—๋Ÿฌ๋Š” ๋‚˜์ง€์•Š์•˜์Œ.
์—๋Ÿฌ ๋‚  ๊ฒฝ์šฐ, mvn ์„ค์ • ๊ฐ’ ํ™•์ธ ํ•„์š”ํ•จ.

Ranger - Admin ์„ค์น˜

1
2
3
4
5
$ cd ${RANGER_SRC}/target/
$ ls -al
-rw-r--r--.  1 root  root  248560962 Jul  1 17:50 ranger-2.1.0-admin.tar.gz
$ sudo tar xvzf ranger-2.1.0-admin.tar.gz 
$ cd ranger-2.1.0-admin
cs

Ranger - Admin Config ์„ค์ •

1
2
3
4
5
6
7
8
9
10
11
12
13
$ vi ranger-2.1.0-admin/install.properties
###
DB_FLAVOR=POSTGRES
SQL_CONNECTOR_JAR=/usr/share/java/postgresql.jar
db_root_user=postgres
db_root_password=postgres
db_host=localhost:5432/ranger
db_name=ranger
db_user=rangeradmin
db_password=rangeradmin
audit_solr_urls=http://localhost:6083/solr/ranger_audits
haddop_conf=/opt/hadoop-3.1.1/etc/hadoop
###
cs/ใ…•ใ„ดใ„ท


/usr/share/java/ dir์— postgresql JDBC ์„ค์น˜ ํ•„์š”!

Solr ์‹คํ–‰๋˜์–ด ์žˆ๋Š” ์ƒํƒœ์—์„œ solr-ranger set up

1
ranger-2.1.0-admin/contrib/solr_for_audit_setup/setup.sh
cs

Solr ์‹คํ–‰

1
# ./opt/solr/ranger_audit_server/scripts/start_solr.sh
cs

Ranger-admin setup.sh ์‹คํ–‰

1
# ranger-2.0.0-admin/set_globals.sh
# ranger-2.0.0-admin/setup.sh
cs

์„ฑ๊ณต๋กœ๊ทธ

1
2
3
4
5
2021-07-02 14:09:47,267  [I] --------- Verifying Ranger DB connection ---------
2021-07-02 14:09:47,267  [I] Checking connection..
2021-07-02 14:09:47,267  [JISQL] /usr/lib/jvm/java-1.8.0-openjdk/bin/java  -cp /usr/share/java/postgresql-42.2.8.jar:/opt/ranger-2.1.0-admin/jisql/lib/* org.apache.util.sql.Jisql -driver postgresql -cstring jdbc:postgresql://localhost:5432/ranger              # for db_flavor=mysql|postgres|sqla|mssql       #for example: db_host=localhost:3306/ranger -u rangeradmin -p '********' -noheader -trim -c \;  -query "select 1;"
2021-07-02 14:09:47,508  [I] Checking connection passed.
Installation of Ranger PolicyManager Web Application is completed.
cs

ranger-admin ์‹คํ–‰

1
$ sudo ranger-admin start
cs


6080 ํฌํŠธ ์ ‘์†ํ•˜๋ฉด ranger ํ™”๋ฉด ํ™•์ธ๊ฐ€๋Šฅํ•จ.
ID/PW : admin

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์•„ํŒŒ์น˜ ์—์ด๋ธŒ๋กœ๋ž€ ? 

https://dennyglee.com/2013/03/12/using-avro-with-hdinsight-on-azure-at-343-industries/

- ํŠน์ • ์–ธ์–ด์— ์ข…์†๋˜์ง€ ์•Š๋Š” ์–ธ์–ด ์ค‘๋ฆฝ์  ๋ฐ์ดํ„ฐ ์ง๋ ฌํ™” ์‹œ์Šคํ…œ

- ํ•˜๋‘ก Writable์˜ ์ฃผ์š” ๋‹จ์ ์ธ ์–ธ์–ด ์ด์‹์„ฑ ํ•ด๊ฒฐ ์œ„ํ•ด ์ƒ๊ฒจ๋‚จ

 

์•„ํŒŒ์น˜ ์“ฐ๋ฆฌํ”„ํŠธ, ๊ตฌ๊ธ€ ํ”„๋กœํ† ์ฝœ ๋ฒ„ํผ์™€ ๋‹ค๋ฅธ ์ฐจ๋ณ„ํ™”๋œ ํŠน์„ฑ๊ฐ€์ง€๊ณ  ์žˆ์Œ

 

๋ฐ์ดํ„ฐ๋Š” ๋‹ค๋ฅธ ์‹œ์Šคํ…œ๊ณผ ๋น„์Šทํ•˜๊ฒŒ ์–ธ์–ด ๋…๋ฆฝ ์Šคํ‚ค๋งˆ๋กœ ๊ธฐ์ˆ ๋จ

์—์ด๋ธŒ๋กœ์—์„œ ์ฝ”๋“œ ์ƒ์„ฑ์€ ์„ ํƒ์‚ฌํ•ญ์ž„

๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์“ฐ๋Š” ์‹œ์ ์— ์Šคํ‚ค๋งˆ๋Š” ํ•ญ์ƒ ์กด์žฌํ•œ๋‹ค ๊ฐ€์ •ํ•จ - ๋งค์šฐ ๊ฐ„๊ฒฐํ•œ ์ฝ”๋”ฉ์ด ๊ฐ€๋Šฅ

 

์Šคํ‚ค๋งˆ์˜ ์ž‘์„ฑ

JSON

๋ฐ์ดํ„ฐ๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ ํฌ๋งท์œผ๋กœ ์ธ์ฝ”๋”ฉ

 

์—์ด๋ธŒ๋กœ ๋ช…์„ธ - ๋ชจ๋“  ๊ตฌํ˜„์ฒด๊ฐ€ ์ง€์›ํ•ด์•ผ ํ•˜๋Š” ๋ฒ„์ด๋„ˆ๋ฆฌ ํฌ๋งท์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ

API - ์—์ด๋ธŒ๋กœ ๋ช…์„ธ์—์„œ ๋น ์ ธ์žˆ๋Š” ๋‚ด์šฉ์ž„. ๊ฐ ํŠน์ •์–ธ์–ด์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ์ž‘์„ฑ๋จ. ์–ธ์–ด์˜ ๋ฐ”์ธ๋”ฉ ํŽธ์˜์„ฑ ๋†’์ด๊ณ  ์ƒํ˜ธ์šด์˜์„ฑ ์ €ํ•˜ ๋ฌธ์ œ ํ•ด๊ฒฐ๋จ

 

์Šคํ‚ค๋งˆํ•ด์„ - ์‹ ์ค‘ํ•˜๊ฒŒ ์ •์˜๋œ ์–ด๋– ํ•œ ์ œ์•ฝ์กฐ๊ฑด์—์„œ๋„ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์Šคํ‚ค๋งˆ์™€ ๋ฐ์ดํ„ฐ๋ฅผ ์“ฐ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์Šคํ‚ค๋งˆ๊ฐ€ ๊ฐ™์ง€ ์•Š์•„๋„ ๋œ๋‹ค.(์Šคํ‚ค๋งˆ ๋ณ€ํ˜• ๋ฉ”์ปค๋‹ˆ์ฆ˜)

 

ex ) ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์„ ๋•Œ ์‚ฌ์šฉํ•œ ์Šคํ‚ค๋งˆ์— ์ƒˆ๋กœ์šด ํ•„๋“œ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž์™€ ๊ธฐ์กด ์‚ฌ์šฉ์ž๋Š” ๋ชจ๋‘ ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌธ์ œ์—†์ด ์ฝ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž๋Š” ์ƒˆ๋กœ์šด ํ•„๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์“ธ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ์กด ์‚ฌ์šฉ์ž๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ฒŒ๋˜๋Š”๋ฐ ์ƒˆ๋กœ์šด ํ•„๋“œ๋Š” ๋ฌด์‹œํ•˜๊ณ  ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์ž‘์—…์ฒ˜๋Ÿผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๊ฐ์ฒด ์ปจํ…Œ์ด๋„ˆ ํฌ๋งท ์ œ๊ณต(ํ•˜๋‘ก ์‹œํ€€์Šค ํŒŒ์ผ๊ณผ ์œ ์‚ฌํ•จ)

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์€ ์Šคํ‚ค๋งˆ๊ฐ€ ์ €์žฅ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์„น์…˜์„ ํฌํ•จํ•˜๊ณ  ์žˆ์–ด ์ž์‹ ์„ ์„ค๋ช…ํ•˜๋Š” ํŒŒ์ผ์ž„

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์€ ์••์ถ•๊ณผ ๋ถ„ํ•  ๊ธฐ๋Šฅ ์ œ๊ณต

 

์—์ด๋ธŒ๋กœ ์ž๋ฃŒํ˜•๊ณผ ์Šคํ‚ค๋งˆ

์—์ด๋ธŒ๋กœ์˜ ๊ธฐ๋ณธ ์ž๋ฃŒํ˜• ํ‘œ์‹œ

1
2
3
4
5
6
7
8
{"type":"null"}
{"type":"boolean"}
{"type":"int"}
{"type":"long"}
{"type":"float"}
{"type":"double"}
{"type":"bytes"}
{"type":"string"}
cs

์—์ด๋ธŒ๋กœ ๋ณตํ•ฉ ์ž๋ฃŒํ˜•

array ์ˆœ์„œ์žˆ๋Š” ๊ฐ์ฒด ์ง‘ํ•ฉ, ๋™์ผ ํƒ€์ž…

1
2
3
4
5
{
 "type""array",
 "items""long"
}
 
cs

map ์ˆœ์„œ ์—†๋Š” ํ‚ค-๊ฐ’, ๋™์ผ ํƒ€์ž…

1
2
3
4
{
 "type""map",
 "values""string"
}
cs

record ์ž„์˜์˜ ์ž๋ฃŒํ˜•

1
2
3
4
5
6
7
8
9
10
11
{
 "type""record",
 "name""WeatherRecord",
 "doc""A weather reading.",
 "fields": [
 {"name""year""type""int"},
 {"name""temperature""type""int"},
 {"name""stationId""type""string"}
 ]
}
 
cs

enum ๋ช…๋ช…๋œ ๊ฐ’์˜ ์ง‘ํ•ฉ

1
2
3
4
5
6
7
{
 "type""enum",
 "name""Cutlery",
 "doc""An eating utensil.",
 "symbols": ["KNIFE""FORK""SPOON"]
}
 
cs

fixed ๊ณ ์ •๊ธธ์ด 8๋น„ํŠธ ๋ถ€ํ˜ธ ์—†๋Š” ๋ฐ”์ดํŠธ

1
2
3
4
5
6
{
 "type""fixed",
 "name""Md5Hash",
 "size"16
}
 
cs

union ์Šคํ‚ค๋งˆ์˜ ์œ ๋‹ˆ์˜จ, ๋ฐฐ์—ด์˜ ๊ฐ์š”์†Œ๋Š” ์Šคํ‚ค๋งˆ์ž„

1
2
3
4
5
[
 "null",
 "string",
 {"type""map""values""string"}
]
cs

 

 

์—์ด๋ธŒ๋กœ ์ž๋ฃŒํ˜•๊ณผ ๋‹ค๋ฅธ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด ์ž๋ฃŒํ˜• ๋งคํ•‘ ํ•„์š”

 

์ž๋ฐ” - ์ œ๋„ค๋ฆญ ๋งคํ•‘ : ์Šคํ‚ค๋งˆ๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์—†์„ ๋•Œ

์ž๋ฐ”, C++ - ๊ตฌ์ฒด์  ๋งคํ•‘ : ์Šคํ‚ค๋งˆ ๋ฐ์ดํ„ฐ ํ‘œํ˜„ ์ฝ”๋“œ ์ƒ์„ฑ

์ž๋ฐ” - ๋ฆฌํ”Œ๋ ‰ํŠธ ๋งคํ•‘ : ์—์ด๋ธŒ๋กœ ์ž๋ฃŒํ˜•์„ ๊ธฐ์กด ์ž๋ฐ” ์ž๋ฃŒํ˜•์œผ๋กœ ๋งคํ•‘ 

 

 

์ธ๋ฉ”๋ชจ๋ฆฌ ์ง๋ ฌํ™”์™€ ์—ญ์ง๋ ฌํ™”

์—์ด๋ธŒ๋กœ ์Šคํ‚ค๋งˆ ์˜ˆ์‹œ - ํ•ด๋‹น ์—์ด๋ธŒ๋กœ ์Šคํ‚ค๋งˆ๋Š” StringPair.avsc ์— ์ €์žฅ๋จ 

1
2
3
4
5
6
7
8
9
{
 "type""record",
 "name""StringPair",
 "doc""A pair of strings.",
 "fields": [
 {"name""left""type""string"},
 {"name""right""type""string"}
 ]
}
cs

ํŒŒ์ผ์„ ํด๋ž˜์Šค ๊ฒฝ๋กœ์— ์ €์žฅํ•œ ํ›„ ๋กœ๋”ฉํ•จ

1
2
3
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(
getClass().getResourceAsStream("StringPair.avsc"));
cs

์ œ๋„ค๋ฆญ API ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—์ด๋ธŒ๋กœ ๋ ˆ์ฝ”๋“œ์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•จ

1
2
3
4
GenericRecord datum = new GenericData.Record(schema);
datum.put("left""L");
datum.put("right""R");
cs

์ถœ๋ ฅ ์ŠคํŠธ๋ฆผ์— ๋ ˆ์ฝ”๋“œ๋ฅผ ์ง๋ ฌํ™”ํ•จ

1
2
3
4
5
6
7
8
ByteArrayOutputStream out = new ByteArrayOutputStream();
DatumWriter<GenericRecord> writer =
new GenericDatumWriter<GenericRecord>(schema);
Encoder encoder = EncoderFactory.get().binaryEncoder(outnull);
writer.write(datum, encoder);
encoder.flush();
out.close();
cs

DatumReader : ๋ฐ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ ์ธ์ฝ”๋”๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋ฃŒํ˜•์œผ๋กœ ๋ฐ˜ํ™˜

GenericDatumReader : GenericRecord์˜ ํ•„๋“œ๋ฅผ ์ธ์ฝ”๋”๋กœ ์ „๋‹ฌ

์ด์ „์˜ ์ƒ์„ฑ๋œ ์ธ์ฝ”๋”๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ธ์ฝ”๋” ํŒฉํ† ๋ฆฌ์— null ์ „๋‹ฌ

writer() ์ŠคํŠธ๋ฆผ ๋‹ซ๊ธฐ ์ „์— ํ•„์š”ํ•˜๋ฉด ๋” ํ˜ธ์ถœ ๊ฐ€๋Šฅ

encoder.flush(); write๋ฉ”์„œ๋“œ ํ˜ธ์ถœ ํ›„ ์ธ์ฝ”๋” ํ”Œ๋Ÿฌ์‹œํ•˜๊ณ  ์ถœ๋ ฅ ์ŠคํŠธ๋ฆผ ๋‹ซ์Œ

 

ํ•ด๋‹น ๊ณผ์ • ๋ฐ˜๋Œ€๋กœ ํ•˜๋ฉด ๋ฐ”์ดํŠธ ๋ฒ„ํผ์—์„œ ๊ฐ์ฒด ์ฝ์„ ์ˆ˜ ์žˆ์Œ

1
2
3
4
5
6
7
 DatumReader<GenericRecord> reader =
 new GenericDatumReader<GenericRecord>(schema);
 Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(),
 null);
 GenericRecord result = reader.read(null, decoder);
 assertThat(result.get("left").toString(), is("L"));
 assertThat(result.get("right").toString(), is("R"));
cs

 

 

๊ตฌ์ฒด์ ์ธ API

๋ฉ”์ด๋ธ์— ์ถ”๊ฐ€ํ•˜์—ฌ ์ž๋ฐ”๋กœ๋œ ์Šคํ‚ค๋งˆ ์ฝ”๋“œ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<project>
 ...
 <build>
 <plugins>
 <plugin>
 <groupId>org.apache.avro</groupId>
 <artifactId>avro-maven-plugin</artifactId>
 <version>${avro.version}</version>
 <executions>
 <execution>
 <id>schemas</id>
 <phase>generate-sources</phase>
 <goals>
 <goal>schema</goal>
 </goals>
 <configuration>
 <includes>
 <include>StringPair.avsc</include>
 </includes>
 <stringType>String</stringType>
 <sourceDirectory>src/main/resources</sourceDirectory>
 <outputDirectory>${project.build.directory}/generated-sources/java
 </outputDirectory>
 </configuration>
 </execution>
 </executions>
 </plugin>
 </plugins>
 </build>
 ...
</project>
 
cs

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ

์ธ๋ฉ”๋ชจ๋ฆฌ ์ŠคํŠธ๋ฆผ์—์„œ ํŒŒ์ผ ์ฝ๊ธฐ

๋ฐ์ดํ„ฐ ํŒŒ์ผ = ์—์ด๋ธŒ๋กœ ์Šคํ‚ค๋งˆ +  ํ—ค๋” (๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (์‹ฑํฌ๋งˆ์ปค ํฌํ•จ)) +์ผ๋ จ์˜ ๋ธ”๋ก (์ง๋ ฌํ™”๋œ ์—์ด๋ธŒ๋กœ ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š”)

๋ฐ์ดํ„ฐ ํŒŒ์ผ์— ๊ธฐ๋ก๋œ ๊ฐ์ฒด๋Š” ๋ฐ˜๋“œ์‹œ ํŒŒ์ผ์˜ ์Šคํ‚ค๋งˆ์™€ ์ผ์น˜ํ•ด์•ผํ•œ๋‹ค.

์ผ์น˜ํ•˜์ง€์•Š๋Š” ๊ฒฝ์šฐ์— append์‹œ ์˜ˆ์™ธ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
File file = new File("data.avro");
DatumWriter<GenericRecord> writer =
 new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter =
new DataFileWriter<GenericRecord>(writer);
dataFileWriter.create(schema, file);
dataFileWriter.append(datum);
dataFileWriter.close();
 
cs

 

ํŒŒ์ผ์— ํฌํ•จ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ฐธ์กฐํ•˜์—ฌ ์ฝ๊ธฐ ๋•Œ๋ฌธ์— ์Šคํ‚ค๋งˆ๋ฅผ ๋”ฐ๋กœ ์ •์˜ํ•˜์ง€์•Š์•„๋„๋จ

getSchema()๋ฅผ ์ด์šฉํ•˜๋ฉด DataFileReader ์ธ์Šคํ„ด์Šค์˜ ์Šคํ‚ค๋งˆ ์ •๋ณด ์–ป์„ ์ˆ˜ ์žˆ๊ณ  ์›๋ณธ ๊ฐ์ฒด์— ์‚ฌ์šฉํ•œ ์Šคํ‚ค๋งˆ์™€ ๊ฐ™์€์ง€ ํ™•์ธ๊ฐ€๋Šฅ 

1
2
3
4
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> dataFileReader =
new DataFileReader<GenericRecord>(file, reader);
assertThat("Schema is the same", schema, is(dataFileReader.getSchema()));
cs

DataFileReader๋Š” ์ •๊ทœ ์ž๋ฐ” ๋ฐ˜๋ณต์ž๋กœ hasNext, next๋ฉ”์„œ๋“œ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ˜ธ์ถœํ•˜์—ฌ

๋ชจ๋“  ๋ฐ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ ์ˆœํ™˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•œ๊ฐœ๋งŒ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ๊ธฐ๋Œ€ํ•œ ํ•„๋“œ๊ฐ’ ์žˆ๋Š” ์ง€ ํ™•์ธ

1
2
3
4
5
assertThat(dataFileReader.hasNext(), is(true));
GenericRecord result = dataFileReader.next();
assertThat(result.get("left").toString(), is("L"));
assertThat(result.get("right").toString(), is("R"));
assertThat(dataFileReader.hasNext(), is(false));
cs

์ƒํ˜ธ ์šด์˜์„ฑ

ํŒŒ์ด์ฌ API

1
2
3
4
5
6
import os
import string
import sys
from avro import schema
from avro import io
from avro import datafile
cs

 

์—์ด๋ธŒ๋กœ ๋„๊ตฌ

1
% java -jar $AVRO_HOME/avro-tools-*.jar tojson pairs.avro
cs

 

์Šคํ‚ค๋งˆ ํ•ด์„

๊ธฐ๋กํ•  ๋•Œ ์‚ฌ์šฉํ•œ writer ์Šคํ‚ค๋งˆ์™€ ๋‹ค๋ฅธ reader์˜ ์Šคํ‚ค๋งˆ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์‹œ ์ฝ์„ ์ˆ˜ ์žˆ๋‹ค,.

์ถ”๊ฐ€๋œ ํ•„๋“œ - reader ์‹ ๊ทœ์ผ ๋•Œ, reader๋Š” ์‹ ๊ทœ ํ•„๋“œ์˜ ๊ธฐ๋ณธ๊ฐ’์ด์šฉ

์ถ”๊ฐ€๋œ ํ•„๋“œ - writer ์‹ ๊ทœ์ผ ๋•Œ, reader๋Š” ์‹ ๊ทœ ํ•„๋“œ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด์‹œํ•จ

์ œ๊ฑฐ๋œ ํ•„๋“œ - reader ์‹ ๊ทœ์ผ ๋•Œ, ์‚ญ์ œ๋œ ํ•„๋“œ ๋ฌด์‹œ

์ œ๊ฑฐ๋œ ํ•„๋“œ - writer ์‹ ๊ทœ์ผ ๋•Œ, ์ œ๊ฑฐ๋œ ํ•„๋“œ ๊ธฐ๋กํ•˜์ง€ ์•Š์Œ. reader ์˜ ์Šคํ‚ค๋งˆ๋ฅผ writer ์Šคํ‚ค๋งˆ์™€ ๊ฐ™๊ฒŒ ๋งž์ถ”๊ฑฐ๋‚˜ ์ด์ „์œผ๋กœ ๊ฐฑ์‹ ํ•จ

์ •๋ ฌ ์ˆœ์„œ

record๋ฅผ ์ œ์™ธ์•ˆ ๋ชจ๋“  ์ž๋ฃŒํ˜•์—๋Š” ์ˆœ์„œ๊ฐ€ ์ •ํ•ด์ ธ์žˆ์Œ

record๋Š” order ์†์„ฑ ๋ช…์‹œํ•˜์—ฌ ์ •๋ ฌ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค.

์˜ค๋ฆ„์ฐจ์ˆœ (๊ธฐ๋ณธ)

๋‚ด๋ฆผ์ฐจ์ˆœ : descending

๋ฌด์‹œ : ignore

 

์—์ด๋ธŒ๋กœ ๋งต๋ฆฌ๋“€์Šค

๋‚ ์”จ ๋ ˆ์ฝ”๋“œ 

1
2
3
4
5
6
7
8
9
10
{
 "type""record",
 "name""WeatherRecord",
 "doc""A weather reading.",
 "fields": [
 {"name""year""type""int"},
 {"name""temperature""type""int"},
 {"name""stationId""type""string"}
 ]
}
cs

์ตœ๊ณ  ๊ธฐ์˜จ์„ ์ฐพ๋Š” ๋งต๋ฆฌ๋“€์Šค ํ”„๋กœ๊ทธ๋žจ, ์—์ด๋ธŒ๋กœ ์ถœ๋ ฅ ๋งŒ๋“ฆ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
public class AvroGenericMaxTemperature extends Configured implements Tool {
 
 private static final Schema SCHEMA = new Schema.Parser().parse(
 "{" +
 " \"type\": \"record\"," +
 " \"name\": \"WeatherRecord\"," +
 " \"doc\": \"A weather reading.\"," +
 " \"fields\": [" +
 " {\"name\": \"year\", \"type\": \"int\"}," +
 " {\"name\": \"temperature\", \"type\": \"int\"}," +
 " {\"name\": \"stationId\", \"type\": \"string\"}" +
 " ]" +
 "}"
 );
 public static class MaxTemperatureMapper
 extends Mapper<LongWritable, Text, AvroKey<Integer>,
 AvroValue<GenericRecord>> {
 private NcdcRecordParser parser = new NcdcRecordParser();
 private GenericRecord record = new GenericData.Record(SCHEMA);
 @Override
 protected void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {
 parser.parse(value.toString());
 if (parser.isValidTemperature()) {
 record.put("year", parser.getYearInt());
 record.put("temperature", parser.getAirTemperature());
 record.put("stationId", parser.getStationId());
 context.write(new AvroKey<Integer>(parser.getYearInt()),
 new AvroValue<GenericRecord>(record));
 }
 }
 }
 
 public static class MaxTemperatureReducer
 extends Reducer<AvroKey<Integer>, AvroValue<GenericRecord>,
 AvroKey<GenericRecord>, NullWritable> {
 @Override
 protected void reduce(AvroKey<Integer> key, Iterable<AvroValue<GenericRecord>>
 values, Context context) throws IOException, InterruptedException {
 GenericRecord max = null;
 for (AvroValue<GenericRecord> value : values) {
 GenericRecord record = value.datum();
 if (max == null ||
360 | Chapter 12: Avro (Integer) record.get("temperature"> (Integer) max.get("temperature")) {
 max = newWeatherRecord(record);
 }
 }
 context.write(new AvroKey(max), NullWritable.get());
 }
 private GenericRecord newWeatherRecord(GenericRecord value) {
 GenericRecord record = new GenericData.Record(SCHEMA);
 record.put("year", value.get("year"));
 record.put("temperature", value.get("temperature"));
 record.put("stationId", value.get("stationId"));
 return record;
 }
 }
 @Override
 public int run(String[] args) throws Exception {
 if (args.length != 2) {
 System.err.printf("Usage: %s [generic options] <input> <output>\n",
 getClass().getSimpleName());
 ToolRunner.printGenericCommandUsage(System.err);
 return -1;
 }
 Job job = new Job(getConf(), "Max temperature");
 job.setJarByClass(getClass());
 job.getConfiguration().setBoolean(
 Job.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 AvroJob.setMapOutputKeySchema(job, Schema.create(Schema.Type.INT));
 AvroJob.setMapOutputValueSchema(job, SCHEMA);
 AvroJob.setOutputKeySchema(job, SCHEMA);
 job.setInputFormatClass(TextInputFormat.class);
 job.setOutputFormatClass(AvroKeyOutputFormat.class);
 job.setMapperClass(MaxTemperatureMapper.class);
 job.setReducerClass(MaxTemperatureReducer.class);
 return job.waitForCompletion(true) ? 0 : 1;
 }
 
 public static void main(String[] args) throws Exception {
 int exitCode = ToolRunner.run(new AvroGenericMaxTemperature(), args);
 System.exit(exitCode);
 }
}
cs

ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰

1
2
3
4
export HADOOP_CLASSPATH=avro-examples.jar
export HADOOP_USER_CLASSPATH_FIRST=true # override version of Avro in Hadoop
% hadoop jar avro-examples.jar AvroGenericMaxTemperature \
 input/ncdc/sample.txt output

๊ฒฐ๊ณผ๋ฌผ์ถœ๋ ฅ

1
2
3
% java -jar $AVRO_HOME/avro-tools-*.jar tojson output/part-r-00000.avro
{"year":1949,"temperature":111,"stationId":"012650-99999"}
{"year":1950,"temperature":22,"stationId":"011990-99999"}
cs

 

์—์ด๋ธŒ๋กœ ๋งต๋ฆฌ๋“€์Šค ์ด์šฉํ•ด ์ •๋ ฌ

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์ •๋ ฌํ•˜๋Š” ๋งต๋ฆฌ๋“€์Šค ํ”„๋กœ๊ทธ๋žจ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
public class AvroSort extends Configured implements Tool {
 static class SortMapper<K> extends Mapper<AvroKey<K>, NullWritable,
 AvroKey<K>, AvroValue<K>> {
 @Override
 protected void map(AvroKey<K> key, NullWritable value,
 Context context) throws IOException, InterruptedException {
 context.write(key, new AvroValue<K>(key.datum()));
 }
 }
 static class SortReducer<K> extends Reducer<AvroKey<K>, AvroValue<K>,
 AvroKey<K>, NullWritable> {
 @Override
 protected void reduce(AvroKey<K> key, Iterable<AvroValue<K>> values,
 Context context) throws IOException, InterruptedException {
 for (AvroValue<K> value : values) {
 context.write(new AvroKey(value.datum()), NullWritable.get());
 }
 }
 }
 @Override
 public int run(String[] args) throws Exception {
 
 if (args.length != 3) {
 System.err.printf(
 "Usage: %s [generic options] <input> <output> <schema-file>\n",
 getClass().getSimpleName());
 ToolRunner.printGenericCommandUsage(System.err);
 return -1;
 }
 
 String input = args[0];
 String output = args[1];
 String schemaFile = args[2];
 Job job = new Job(getConf(), "Avro sort");
 job.setJarByClass(getClass());
 job.getConfiguration().setBoolean(Job.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);
 FileInputFormat.addInputPath(job, new Path(input));
 FileOutputFormat.setOutputPath(job, new Path(output));
 AvroJob.setDataModelClass(job, GenericData.class);
 Schema schema = new Schema.Parser().parse(new File(schemaFile));
 AvroJob.setInputKeySchema(job, schema);
 AvroJob.setMapOutputKeySchema(job, schema);
 AvroJob.setMapOutputValueSchema(job, schema);
 AvroJob.setOutputKeySchema(job, schema);
 job.setInputFormatClass(AvroKeyInputFormat.class);
 job.setOutputFormatClass(AvroKeyOutputFormat.class);
 job.setOutputKeyClass(AvroKey.class);
 job.setOutputValueClass(NullWritable.class);
 job.setMapperClass(SortMapper.class);
 job.setReducerClass(SortReducer.class);
 return job.waitForCompletion(true) ? 0 : 1;
 }
 
 public static void main(String[] args) throws Exception {
 int exitCode = ToolRunner.run(new AvroSort(), args);
 System.exit(exitCode);
 }
}
cs

์ •๋ ฌ์€ ๋งต๋ฆฌ๋“€์Šค ์…”ํ”Œ ๊ณผ์ •์—์„œ ์ผ์–ด๋‚˜๋ฉฐ ์ •๋ ฌ๊ธฐ๋Šฅ์€ ์—์ด๋ธŒ๋กœ์˜ ์Šคํ‚ค๋งˆ์— ์˜ํ•ด ์ •ํ•ด์ง

์ž…๋ ฅ๋ฐ์ดํ„ฐ ์ ๊ฒ€

1
2
3
4
5
% java -jar $AVRO_HOME/avro-tools-*.jar tojson input/avro/pairs.avro
{"left":"a","right":"1"}
{"left":"c","right":"2"}
{"left":"b","right":"3"}
{"left":"b","right":"2"}
cs

ํ”„๋กœ๊ทธ๋žจ์‚ฌ์šฉํ•˜์—ฌ ์ •๋ ฌ

1
2
% hadoop jar avro-examples.jar AvroSort input/avro/pairs.avro output \
 ch12-avro/src/main/resources/SortedStringPair.avsc
cs

์ •๋ ฌ ํ›„ ์ €์žฅ๋œ ํŒŒ์ผ ์ถœ๋ ฅ

1
2
3
4
5
% java -jar $AVRO_HOME/avro-tools-*.jar tojson output/part-r-00000.avro
{"left":"b","right":"3"}
{"left":"b","right":"2"}
{"left":"c","right":"2"}
{"left":"a","right":"1"}
cs
728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

hue ์„ค์น˜ ํ•  ๋•Œ ์•ž์„œ ์žˆ๋˜ ํ•˜๋‘ก ์—์ฝ”์‹œ์Šคํ…œ๋“ค์ด ์–ด๋Š์ •๋„ ์„ค์น˜๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ์ง„ํ–‰ํ•˜๊ฒ ๋‹ค.

ํœด์˜ ๊ฒฝ์šฐ ์„ค์น˜ํ•˜๊ธฐ์ „์— ์‚ฌ์ „์ž‘์—…์„ ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

postgres๋Š” ๋‹ค๋ฅธ ํฌ์ŠคํŠธ์—์„œ ์„ค์ •์„ ๋‹ค๋ฃจ๊ธฐ๋กœ ํ•˜๊ณ ,

ํœด ์„ค์น˜ ๊ฐ€์ด๋“œ ์—์„œ๋Š” ํœด์—์„œ ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค ์ƒ์„ฑ์ •๋„๋งŒ ๋‹ค๋ฃฐ ์˜ˆ์ •์ด๋‹ค.

 

์‚ฌ์ „์ž‘์—…

ํœด๋Š” ํŒŒ์ด์ฌ์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ ํŒŒ์ด์ฌ ๋ฒ„์ „์„ ์žก์•„์ค˜์•ผํ•œ๋‹ค.

ํ™˜๊ฒฝ๋ณ€์ˆ˜๋Š” .bash_profile ์— ์ถ”๊ฐ€ํ•˜์˜€๋‹ค.

ํŒŒ์ด์ฌ ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์ถ”๊ฐ€

$ sudo vi ~/..bash_profile

export PYTHON_VER=python3.8

 

psycopg2 ์„ค์น˜ (์ „์— pip๋„ ์„ค์น˜๋˜์–ด ์žˆ์–ด์•ผํ•จ)

$ pip install psycopg2

$ python setup.py build

$ sudo python setup.py install

$ pip install psycopg2-binary


nodejs ์„ค์น˜ (centos7 ๊ธฐ์ค€ ์ด๋‹ค.)

 
$ sudo yum install epel-release

$ sudo yum install nodejs


hue ์—์„œ ์‚ฌ์šฉํ•˜๋Š” package ์„ค์น˜

$ sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
cs


maven ์„ค์น˜

$ wget https://downloads.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz -P /tmp

$ sudo tar xf /tmp/apache-maven-3.6.3-bin.tar.gz -C /opt

$ sudo ln -s /opt/apache-maven-3.6.0 /opt/maven 

$ sudo vi ~/.bash_profile

#MAVEN

export MAVEN_HOME=/opt/maven

export M2_HOME=$MAVEN_HOME

PATH=$PATH:$M2_HOME/bin:

$ source ~/.bash_profile

$ vi /opt/maven/conf/settings.xml


mirror ์‚ฌ์ดํŠธ ์ถ”๊ฐ€ํ•˜๊ธฐ

maven build๊ฐ€ ํ•„์š”ํ•œ ์•„ํŒŒ์น˜ ์˜คํ”ˆ์†Œ์Šค๋“ค์ด ์žˆ๋Š”๋ฐ, centos์˜ ๊ฒฝ์šฐ yum install maven์‹œ 3.0.5๊ฐ€ ์„ค์น˜๋œ๋‹ค.
3.0.5๋ฒ„์ „์œผ๋กœ ๋นŒ๋“œ ์‹œ fail์ด ๋นˆ๋ฒˆํ•˜๊ธฐ๋„ํ•˜๊ณ , ๊ณต์‹์‚ฌ์ดํŠธ์—์„œ๋„ 3.3์ด์ƒ ๋ฒ„์ „ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•˜๊ธฐ ๋•Œ๋ฌธ์—
์•„ํŒŒ์น˜ ๋ฏธ๋Ÿฌ ์‚ฌ์ดํŠธ์—์„œ ์ตœ์‹ ๋ฒ„์ „ maven์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•œ๋‹ค.

postgres์— hue db, user ์ถ”๊ฐ€ํ•˜๊ธฐ

psql -U postgres

CREATE USER hue WITH PASSWORD 'hue';

CREATE DATABASE hue OWNER hue;

\l

ํœด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ์˜ค๋„ˆ ํ™•์ธํ•˜๊ธฐ

---

Solr ์„ค์น˜

https://n-a-y-a.tistory.com/m/68

 

[Solr] Apache solr 8.5.0 ์„ค์น˜ํ•˜๊ธฐ

ranger, atlas๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์„ ์„ค์น˜ํ•ด์•ผํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค์ด๋‹ค. https://archive.apache.org/dist/lucene/solr/8.5.0/ Index of /dist/lucene/solr/8.5.0 archive.apache.org ํ•ด๋‹น ์‚ฌ์ดํŠธ์—์„œ 8.5.0๋ฒ„์ „์„ ๋‹ค์šด ๋ฐ›..

n-a-y-a.tistory.com

---

 

ํœด ์„ค์น˜

 
$ wget https://cdn.gethue.com/downloads/hue-4.0.1.tgz

$ tar -xvzf hue-4.0.1.tgz

$ ln -s hue-4.0.0 hue

$ cd hue

$ export PREFIX=/usr/local

$ make 7$ make install

 


ํœด ์‹คํ–‰

 
$ ./build/env/bin/supervisor &

$ netstat -nltp | grep 8888

์ž…๋ ฅ์‹œ ์„œ๋น„์Šค ์˜ฌ๋ผ์˜จ ๊ฒƒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.


***HDFS***
***HIVE***
***HBASE***
๋“ฑ ๊ฐ ์„œ๋น„์Šค๋“ค์€ ํ˜„์žฌ ์—ฐ๊ฒฐ๋œ ์ƒํƒœ๋Š” ์•„๋‹ˆ๋ฏ€๋กœ
๋งž๋Š” config๊ฐ’๋“ค์„ ์ฐพ์•„ ์ˆ˜์ •ํ•ด์ค˜์•ผํ•œ๋‹ค.


์ฐธ๊ณ ์‚ฌ์ดํŠธ
docs.gethue.com/administrator/installation/

 

Installation :: Hue SQL Assistant Documentation

docs.gethue.com

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์ฃผํ‚คํผ๋ž€ ๋ถ„์‚ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ๋ถ„์‚ฐ ์ฝ”๋””๋„ค์ด์…˜์ด๋‹ค.

 

znode(์ €๋„๋…ธ๋“œ)๊ฐ€ ๊ฐ๊ฐ์˜ ์„œ๋ฒ„์— ์œ„์น˜ํ•ด ์žˆ๋‹ค.

๊ฐ ํ•˜๋‘ก์˜ ์„œ๋น„์Šค๋“ค์ด ์ž˜ ๋™์ž‘ํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•œ๋‹ค.

์ฃผ๊ธฐ์ ์œผ๋กœ ํ•˜ํŠธ๋น„ํŠธ ์š”๊ตฌํ•˜์—ฌ ๋ฐ›๋Š” ๋ฐฉ์‹์œผ๋กœ,

 

๋”ฐ๋ผ์„œ ์ฃผ๊ธฐํผ๋Š” ํ™€์ˆ˜๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š”๋ฐ

์—ฌ๊ธฐ์„œ ๋“ค์–ด๊ฐ€๋Š” ๊ฐœ๋…์ด ์ฟผ๋Ÿผ์ด๋‹ค.

 

์ฟผ๋Ÿผ์ด๋ž€? 

๋‹ค์ˆ˜๊ฒฐ๋กœ ์˜ˆ๋ฅผ ๋“ค์–ด 5๊ฐœ์˜ ์„œ๋ฒ„๋กœ ๊ตฌ์„ฑ ๋˜์–ด์žˆ๊ณ ,

2๊ฐœ์˜ ์„œ๋ฒ„๊ฐ€ ์ฃฝ๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ–ˆ์„ ๋•Œ ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค๊ณ  ํŒ๋‹จํ•œ๋‹ค.

๊ทธ๋ฆฌ๊ณ  5๊ฐœ ์ค‘ 3๊ฐœ์˜ ์„œ๋ฒ„๊ฐ€ ์ฃฝ์—ˆ์„ ๊ฒฝ์šฐ, ๋‹ค์ˆ˜๊ฒฐ๋กœ ์ธํ•ด ๋น„์ •์ƒ์ด๋ผ๊ณ  ํŒ๋‹คํ•œ๋‹ค.

๊ทธ๋กœ ์ธํ•ด, ์ฃผํ‚คํผ๋Š” ํ™€์ˆ˜๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค.

 

zookeeper ํด๋Ÿฌ์Šคํ„ฐ๋Š”

ํ•˜๋‚˜์˜ ์„œ๋ฒ„๊ฐ€ ๋ฆฌ๋”์ด๊ณ , ๋‹ค๋ฅธ ์„œ๋ฒ„๋Š” ํŒ”๋กœ์›Œ์ด๋‹ค

๋ฆฌ๋” ์„œ๋ฒ„๋ฅผ ๊ธฐ์ค€์œผ๋กœ sync๋ฅผ ๋งž์ถ˜๋‹ค.

 

์ž์„ธํ•œ ๋‚ด์šฉ์€ ๊ณต์‹ ์‚ฌ์ดํŠธ ์ฐธ์กฐ๋ฐ”๋žŒ

 

์ฃผํ‚คํผ ์„ค์น˜ ๋ฐฉ๋ฒ•

์ฃผํ‚คํผ ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ ํ›„ ์••์ถ• ํ•ด์ œ ํ›„ ํ…Œ์ŠคํŠธ

1
2
3
4
wget https://mirror.navercorp.com/apache/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9.tar.gz
tar xvzf apache-zookeeper-3.5.9.tar.gz
cd bin/zkCli.sh -server 127.0.0.1:2181
./zkCli.sh -server 127.0.0.1:2181
cs

 

์ฃผํ‚คํผ conf์—์„œ zoo.cfg ํŒŒ์ผ ์ƒ์„ฑ

1
2
3
4
5
6
$ vi $ZOOKEEPER_HOME/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/userDIr/zookeeper
clientPort=2181
cs

 

์ฃผํ‚คํผ ์‹คํ–‰

1
bin/zkServer.sh start
cs

 

jps๋กœ ์ฃผํ‚คํผ ์‹คํ–‰ ์ค‘์ธ์ง€ ํ™•์ธ

1
2
3
jps
-------------------
Quorumpeermain
cs

netstat -nltp | grep 2181๋กœ ์ฃผํ‚คํผ ํ™•์ธํ•˜๊ธฐ

 

 

์ฃผํ‚คํผ์˜ ํฌํŠธ๋ฒˆํ˜ธ๋Š” zoo.cfg ํŒŒ์ผ์—์„œ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋‹ค.

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

livy-env.sh

export SPARK_HOME=/usr/lib/spark

export HADOOP_CONF_DIR=/etc/hadoop/conf

 

livy start

./bin/livy-server start

 

livy ์ •์ƒ๋™์ž‘ํ•˜๋Š”์ง€ spark์—์„œ ํ…Œ์ŠคํŠธํ•˜๋Š” ์˜ˆ์ œ

sudo pip install requests

 

import json, pprint, requests, textwrap

host = 'http://localhost:8998'

data = {'kind': 'spark'}

headers = {'Content-Type': 'application/json'}

r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)

r.json() {u'state': u'starting', u'id': 0, u'kind': u'spark'}

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

https://dlcdn.apache.org/hive/hive-3.1.2/

 

Index of /hive/hive-3.1.2

 

dlcdn.apache.org

apache mirror ์‚ฌ์ดํŠธ์—์„œ ์›ํ•˜๋Š” hive ๋ฒ„์ „์˜ binary ํด๋”๋ฅผ ๋‹ค์šด๋ฐ›๋Š”๋‹ค. 

 

์‚ฌ์ „์ž‘์—… - Hadoop Path ์„ค์ • ๋˜์–ด์žˆ์–ด์•ผํ•จ

export HADOOP_HOME=<hadoop-install-dir>

 

์••์ถ•ํ•ด์ œ

wget https://dlcdn.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
tar xvzf apache-hive-3.1.2-bin.tar.gz

 

ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ •

Hive ํ™ˆ ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•ด์•ผํ•œ๋‹ค.

.bash_prifile์—์„œ ์ˆ˜์ •ํ•˜๋Š” ๋ฐฉ์‹๋ณด๋‹จ /etc/profile.d/์— ์‰˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ถ”๊ฐ€ํ•ด์ค„๊ฒƒ

vi /etc/profile.d/hive_home.sh

export HIVE_HOME=/opt/apache-hive-3.1.2-bin

export PATH=$PATH:$HIVE_HOME/bin

 

ํ•ด๋‹น ํŒŒ์ผ ์ €์žฅ ํ›„ ํ•œ๋ฒˆ ์‹คํ–‰ํ•ด์ค€๋‹ค.

chmod +x hive_home.sh
./hive_home.sh
source hive_home.sh

echo $HIVE_HOME

ํ•ด๋‹น ๋ช…๋ น์–ด ๊ฒฐ๊ณผ๋กœ ์ •์ƒ์ ์œผ๋กœ ๋ฐ˜์˜๋˜์—ˆ๋Š”์ง€ ํ™•์ธ์ž‘์—… ํ•„์š”ํ•˜๋‹ค.

 

 

Hadoop์— tmp, hive warehouse ๋””๋ ‰ํ„ฐ๋ฆฌ ์ƒ์„ฑ

hadoop fs -mkdir       /tmp
hadoop fs -mkdir       /user/hive/warehouse
hadoop fs -chmod g+w   /tmp
hadoop fs -chmod g+w   /user/hive/warehouse

 

Hive CLI ์‹คํ–‰

 $HIVE_HOME/bin/hive
 $HIVE_HOME/bin/schematool -dbType <db type> -initSchema

dbtype ์€ ๊ธฐ์กด์— ์„ค์น˜๋˜์–ด์žˆ๋Š” DB๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ํ•ด๋‹น ๋ถ€๋ถ„์— mysql, oracle, postgres ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๋˜๊ณ ,

ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•˜๋ฉด derby๋ผ๋Š” hive ๋‚ด์žฅ DB๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

 

HIVE Config ์„ค์ •

cp /conf/hive-default.xml /conf/hive-site.xml

ํ•ด๋‹น ํ…œํ”Œ๋ฆฟ์„ hive-site๋กœ ๋ณต์‚ฌ

 

hive-site.xml ๋ณ€๊ฒฝ - postgresql DB ์‚ฌ์šฉ์‹œ

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:postgresql://mypostgresql.testabcd1111.us-west-2.rds.amazonaws.com:5432/mypgdb</value>
    <description>PostgreSQL JDBC driver connection URL</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.postgresql.Driver</value>
    <description>PostgreSQL metastore driver class name</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>database_username</value>
    <description>the username for the DB instance</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>database_password</value>
    <description>the password for the DB instance</description>
  </property>
728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

RDBMS

์Šคํ‚ค๋งˆ์— ์˜์กด์„ฑ ๊ฐ•ํ•จ ์œผ๋กœ

์ž‘์—…์ด ์ง„ํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Œ

 - ์Šคํ‚ค๋งˆ๊ฐ€ ์ œ๋Œ€๋กœ ์ •์˜๋˜์–ด ์žˆ์ง€์•Š๊ฑฐ๋‚˜

 - ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•œ ์งˆ์˜๊ฐ€ ์Šคํ‚ค๋งˆ์— ๋งž์ง€ ์•Š์„ ๊ฒฝ์šฐ

๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ๋ถ€์ ์ ˆํ•จ ๋งŽ์€ ์‹œ๊ฐ„ ์†Œ์š”๋จ

 

HIVE

๋ฐ์ดํ„ฐ์›จ์–ด ํ•˜์šฐ์ง• ์ธํ”„๋ผ

๋ฐ์ดํ„ฐ ์ €์žฅ, ์ฒ˜๋ฆฌ์— ์Šคํ‚ค๋งˆ ๊ฒ€์ฆ ์—†์Œ

์Šคํ‚ค๋งˆ์— ๋งž์ง€ ์•Š๋Š” ์ฟผ๋ฆฌ๋Š” null ๋ฆฌํ„ด

 

  • SQL๊ณผ ์œ ์‚ฌํ•œ HiveQL ์‚ฌ์šฉ
  • MapReduce ํ”„๋กœ๊ทธ๋žจ ์ž‘์„ฑ ๋Œ€์‹  ์ฟผ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค ์„œ๋น„์Šค ์ œ๊ณต
  • ์ฟผ๋ฆฌ ์‹คํ–‰ ์‹œ MapReduce ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ ์ „ํ™˜๋˜์–ด ๊ฒฐ๊ณผ ์ƒ์„ฑ
  • ๋น„ ์ •ํ˜•ํ™”๋œ ์ž…๋ ฅ ์†Œ์Šค ๋ถ„์„์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Œ

HIVE Architecture

-HIVE Client

-JDBC ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ง€์› ,

-Thrift ๊ธฐ๋ฐ˜ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ง€์›(์„œ๋กœ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ํ•  ์ˆ˜ ์žˆ๋Š” ํ†ต์‹  ํ”„๋กœํ† ์ฝœ),

-ODBC๊ธฐ๋ฐ˜ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์ง€์›

-HIVE Server

-HIVE ์„œ๋ฒ„ 

-CLI Command line interface : ์œ ์ €๊ฐ€ HIVE ์ฟผ๋ฆฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ธํ„ฐํŽ˜์ด์Šค

-Hive Web Interface : ์œ ์ €๊ฐ€ HIVE ์ฟผ๋ฆฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์›น ์ธํ„ฐํŽ˜์ด์Šค

-Driver : HIVE์— job์ด ์™”์„ ๋•Œ ์ƒ์„ฑํ•˜๊ณ  ํ–‰๋™ํ•จ, meta-store์— ๋ฌผ์–ด ๋ด„

-meta-store : ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์žˆ๋Š” ์ €์žฅ๊ณต๊ฐ„(ํ…Œ์ด๋ธ”์ •๋ณด ๋“ฑ)

-Apache Derby Database  : ์‹ค์ œ ํ…Œ์ด๋ธ”์˜ ๋‚ด์šฉ๋ฌผ์ธ ๋ฐ์ดํ„ฐ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Œ(HDFS๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•จ)

HIVE ๋ฐ์ดํ„ฐ ๋ชจ๋ธ

Hadoop ์ƒ์— ๊ตฌ์ถ•๋œ ์ •ํ˜•ํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€๋ฆฌํ•˜๊ณ  ์ฟผ๋ฆฌํ•˜๋Š” ์‹œ์Šคํ…œ

์Šคํ† ๋ฆฌ์ง€๋กœ HDFS์‚ฌ์šฉํ•จ

OLTP์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Œ

๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๋ฐฉ์‹

ํ…Œ์ด๋ธ”์— ํ•ด๋‹น ๋˜๋Š” ์š”์†Œ๋ฅผ HDFS ๋””๋ ‰ํ† ๋ฆฌ๋กœ ๋งตํ•‘

ํŒŒํ‹ฐ์…˜ ํ…Œ์ด๋ธ” ์š”์†Œ๋ฅผ ํŒŒํ‹ฐ์…”๋‹ ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ HDFS ์„œ๋ธŒ ๋””๋ ‰ํ„ฐ๋ฆฌ

๋ฐ์ดํ„ฐ๋Š” HDFS์˜ ํŒŒ์ผ๋กœ ์ƒ๊ฐํ•˜์ž.

 

HIVE ๋ฉ”ํƒ€์Šคํ† ์–ด

ํ…Œ์ด๋ธ”์˜ ๋ฉ”ํƒ€์ •๋ณด๋ฅผ ์ €์žฅํ•จ

์นผ๋Ÿผ์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž… ๋“ฑ ์‹ค์ œ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ์ •๋ณด๋“ค์€ ๋ฉ”ํƒ€์ •๋ณด๋กœ ๊ฐ„์ฃผ

ํ•˜์ด๋ธŒ ํ…Œ์ด๋ธ”์€ ๋ฉ”ํƒ€์Šคํ† ์–ด์— ์ €์žฅ๋œ ์Šคํ‚ค๋งˆ๋กœ ๊ตฌ์„ฑ๋˜๊ณ  ๋ฐ์ดํ„ฐ๋Š” HDFS์— ์ €์žฅ๋จ

 

ํ•˜์ด๋ธŒ ์ฟผ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค

-ํ•˜์ด๋ธŒ ํด๋ผ์ด์–ธํŠธ(์˜ˆ์ „)

  hive ์ž…๋ ฅ

  ํ•˜์ด๋ธŒ ๋™์ž‘ ๊ณผ์ • 1. ์„œ๋ฒ„๊ฐ€ ์‚ฌ์šฉ์ž์˜ HiveQL ๋ช…๋ น์–ด ํ•ด์„ํ•˜์—ฌ ๋งต๋ฆฌ๋“€์Šค ์ž‘์—…์œผ๋กœ ๋ณ€ํ™˜

  2. ๋ฉ”ํƒ€์Šคํ† ์–ด์—์„œ ํ…Œ์ด๋ธ” ๊ตฌ์กฐ์™€ ๋ฐ์ดํ„ฐ ์œ„์น˜๋ฅผ ์–ป์Œ

  3. ์‹ค์ œ ๋ฐ์ดํ„ฐ ์งˆ์˜ ์ „๋‹ฌ

-beeline ํด๋ผ์ด์–ธํŠธ(์ตœ์‹ )

 

ํ•˜์ด๋ธŒ์™€ Hcatalog

-์—์ฝ”์‹œ์Šคํ…œ์ด ๋ฉ”ํƒ€์Šคํ† ์–ด์— ์ ‘๊ทผ ํ•  ์ˆ˜ ์žˆ๋Š” ์ค‘๊ฐ„๊ณ ๋ฆฌ ์—ญํ• 

-๋ฉ”ํƒ€์Šคํ† ์–ด ๊ด€๋ฆฌ

 

HIVE ํ…Œ์ด๋ธ” ๊ด€๋ฆฌ

HIVE ํ…Œ์ด๋ธ”

1. ๋ฐ์ดํ„ฐ๋ฅผ HIVE ํ…Œ์ด๋ธ”๋กœ ๊ฐ€์ ธ์˜ค๋ฉด?

HiveQL, ํ”ผ๊ทธ, ์ŠคํŒŒํฌ ๋“ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ > ์ƒํ˜ธ์šด์˜ ๋ณด์žฅ

2. HIVE๊ฐ€ ์ง€์›ํ•˜๋Š” ํ…Œ์ด๋ธ” ์ข…๋ฅ˜

    - ๋‚ด๋ถ€ ํ…Œ์ด๋ธ” : HIVE๊ฐ€ ๊ด€๋ฆฌ, HIVE/ ๋ฐ์ดํ„ฐ์›จ์–ดํ•˜์šฐ์Šค์— ์ €์žฅ, ๋‚ด๋ถ€ํ…Œ์ด๋ธ” ์‚ญ์ œ ์‹œ ๋ฉ”ํƒ€์ •์˜์™€ ๋ฐ์ดํ„ฐ๊นŒ์ง€ ์‚ญ์ œ๋จ,

   ORC๊ฐ™์€ ํ˜•์‹์œผ๋กœ ์ €์žฅ๋˜์–ด ๋น„๊ต์  ๋น ๋ฅธ ์„ฑ๋Šฅ

    - ์™ธ๋ถ€ ํ…Œ์ด๋ธ” : ํ•˜์ด๋ธŒ๊ฐ€ ์ง์ ‘ ๊ด€๋ฆฌํ•˜์ง€ ์•Š์Œ,

   ํ•˜์ด๋ธŒ์˜ ๋ฉ”ํƒ€์ •์˜๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์›์‹œ ํ˜•ํƒœ๋กœ ์ €์žฅ๋œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผ

   ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•ด๋„ ํ…Œ์ด๋ธ” ๋ฉ”ํƒ€ ์ •์˜๋งŒ ์‚ญ์ œ๋˜๊ณ  ๋ฐ์ดํ„ฐ๋Š” ์œ ์ง€๋จ.

   ํ•ด๋‹น ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜์ด๋ธŒ ์™ธ๋ถ€์— ์ ์žฌ ๋˜์–ด์žˆ๊ฑฐ๋‚˜ ํ…Œ์ด๋ธ”์ด ์‚ญ์ œ๋˜๋”๋ผ๋„ ์›๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚จ์•„ ์žˆ์–ด์•ผํ•  ๋•Œ ์‚ฌ์šฉ

3.csv ํŒŒ์ผ์„ ํ•˜์ด๋ธŒ ํ…Œ์ด๋ธ”๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ

  1.names.csv ์„ HDFS์— ๋ณต์‚ฌ

  2. hdfs dfsmkdir names

  3. hdfs dfs –put names.csv names

  4. hive ์‹คํ–‰ ํ›„ ์ฟผ๋ฆฌ๋กœ ํ…Œ์ด๋ธ” ์ƒ์„ฑ  location ‘/directory’ ๊ตฌ๋ฌธ์€ ํ…Œ์ด๋ธ”์ด ์‚ฌ์šฉํ•  ์ž…๋ ฅ ํŒŒ์ผ์˜ ๊ฒฝ๋กœ์ด๋‹ค.

  5. select * from ~ ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ

  6. stored as orc > ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”

  7. ๋ฐ์ดํ„ฐ ํ˜•์‹ ํ…์ŠคํŠธ ํŒŒ์ผ, ์‹œํ€€์Šค ํŒŒ์ผ(k-v์Œ), RC ํŒŒ์ผ, ORC ํ˜•์‹, Parquet ํ˜•์‹

 

์™ธ๋ถ€ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

suhdfs

hdfs dfsmkdir /Smartcar

hdfs dfs –put /txtfile.txt /Smartcar

hdfs dfschown –R hive /Smartcar

hdfs dfschmod –R 777 /Smartcar

su – hive

hive

create external table (~) ~ location /Smartcar;

๋‚ด๋ถ€ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

create table (~) ~ location /Smartcar;

์™ธ๋ถ€ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”๋กœ ๋ณต์‚ฌ

insert overwrite table SmartCar_in

select * from SmartCar_ex;

๋‚ด๋ถ€ ํ…Œ์ด๋ธ” ๋””๋ ‰ํ„ฐ๋ฆฌ ์ƒ์„ฑํ™•์ธ

hdfs dfs –ls /Smartcar

/Smartcar/base_0000001/bucket_00000/bucket_00000

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

ํ•˜๋‘ก์— ๋“ค์–ด๊ฐ€๊ธฐ ์•ž์„œ ๋น…๋ฐ์ดํ„ฐ๊ฐ€ ๋ญ”์ง€ ์•Œ์•„๋ณด๋ ค ํ•œ๋‹ค.

 

๋น…๋ฐ์ดํ„ฐ๋ž€ ? 

๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ด€๋ฆฌ ๋„๊ตฌ ๋ฐฉ๋ฒ•์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†๋Š” ๊ทœ๋ชจ๋กœ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

 

๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค๋Š” OLTP์„ฑ์œผ๋กœ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜๋‹ค.

 

๋น…๋ฐ์ดํ„ฐ๋Š” ์ •ํ™•์„ฑ์— ์ดˆ์ ์„ ๋‘๊ธฐ๋ณด๋‹ค๋Š” ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ์— ์ดˆ์ ์„ ๋‘๊ณ  ์žˆ๋‹ค.

 

๋”ฐ๋ผ์„œ pk, update๋“ฑ ์•ˆ๋˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ putํ•ด์•ผ ํ•œ๋‹ค.

 

๋น…๋ฐ์ดํ„ฐ 3V

  • Volume 
    • ๋Œ€๊ทœ๋ชจ์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. (๊ธฐ์—…๋งˆ๋‹ค ์ฐจ์ด๋Š” ์žˆ์ง€๋งŒ ์ˆ˜์‹ญํ…Œ๋ผ๋ฐ”์ดํŠธ๋ถ€ํ„ฐ ์ˆ˜์‹ญํŽ˜ํƒ€๋ฐ์ดํ„ฐ ์ด์ƒ)
  • Variety
    • ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ฐฉ์‹์ด ๋‹ค์–‘ํ•˜๋‹ค.
    • ์ •ํ˜•   : ์˜๋ฏธ ํŒŒ์•…ํ•˜๊ธฐ ์‰ฌ์šฐ๋ฉฐ ๊ทœ์น™์ ์ธ ๋ฐ์ดํ„ฐ
    • ๋ฐ˜์ •ํ˜•: HTML, XML,JSON ํ˜•ํƒœ๋กœ ํ•œ ํ…์ŠคํŠธ์— column, value ๊ฐ™์ด
    • ๋น„์ •ํ˜•:ํ…์ŠคํŠธ, ์Œ์„ฑ, ์˜์ƒ ๋“ฑ์œผ๋กœ ๊ทœ์น™์ ์ด์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์˜ˆ์‹œ๋กœ ๋ฉ”์‹ ์ €๋กœ ์ฃผ๊ณ  ๋ฐ›์€ ๋‚ด์šฉ, ํ†ตํ™”๋‚ด์šฉ ๋“ฑ์ด ์žˆ๋‹ค.
  • Velocity
    • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ด์—ฌ์•ผ ํ•œ๋‹ค

ํ•˜๋‘ก ์ด๋ž€?

๋Œ€๋Ÿ‰์˜ ์ž๋ฃŒ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ํฐ ์ปดํ“จํ„ฐ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋™์ž‘ํ•˜๋Š” ๋ถ„์‚ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์„ ์ง€์›ํ•œ๋‹ค.

์™œ ํ•˜๋‘ก?

๋ผ์ด์„ ์Šค ๋น„์šฉ ๋“ค์ง€ ์•Š์Œ > ์ €๋ ดํ•œ ๊ตฌ์ถ• ๋น„์šฉ

๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ๋น ๋ฅธ ์ฒ˜๋ฆฌ

๋ฐ์ดํ„ฐ์˜ ๋ณต์ œ ๋ณธ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ ๋ณต๊ตฌ ๊ฐ€๋Šฅํ•˜๋‹ค.

 

HDFS + MapReduce ๊ตฌ์กฐ

 

Hadoop Data File System

ํ•˜๋‘ก์€ ์„œ๋ฒ„๋ฅผ ๋‘๊ฐœ๋ฅผ ์“ด๋‹ค.(๋ถ„์‚ฐ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•˜์—ฌ - ์„œ๋ฒ„ ๊ณผ๋ถ€ํ™” ๋ฐฉ์ง€, ๋ฐ์ดํ„ฐ ์•ˆ์ „ํ•˜๊ฒŒ ๋ณต์‚ฌํ•˜์—ฌ ์ €์žฅํ•˜๊ธฐ์œ„ํ•ด)

์„œ๋ฒ„๋ฅผ ์ด์ค‘ํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๋ฐ ๊ฐ„๋‹จํ•˜๊ฒŒ Master-Slave๋ฐฉ์‹์œผ๋กœ๋งŒ ์ดํ•ดํ–ˆ๋‹ค.

Master ์„œ๋ฒ„์—๋Š” "๋„ค์ž„๋…ธ๋“œ"๊ฐ€ ์žˆ๋‹ค.

์—ฌ๊ธฐ์„œ, ๋„ค์ž„๋…ธ๋“œ๋ž€ ๋ฐ์ดํ„ฐ์˜ Meta Data๋ฅผ ๊ฐ€์ง„๋‹ค.

Meta Data - ๋ฐ์ดํ„ฐ์˜ FSImage (namespace์ •๋ณด, data node๊ฐ„์˜ block ๋งคํ•‘ ์ •๋ณด),

         Editslog (๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ์ •๋ณด)๋ฅผ ๊ฐ€์ง

๋ฐ์ดํ„ฐ๋…ธ๋“œ - ๋Š” 3-copy๋ฐฉ์‹์œผ๋กœ master์— ํ•œ ๊ฐœ, slave์„œ๋ฒ„์— 2๊ฐœ๊ฐ€ ์ €์žฅ๋˜์–ด ์žˆ๊ณ  ์‹ค์ œ ํŒŒ์ผ์ด๋‹ค.

โ€‹

HDFS์˜ ์“ฐ๊ธฐ

-APP์—์„œ HDFS CLient์— ๋ฐ์ดํ„ฐ ์“ฐ๊ธฐ ์ •๋ณด ์š”์ฒญํ•จ.

client๊ฐ€ name node์—๊ฒŒ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ฃผ์†Œ 3๊ฐœ๋ฅผ ์คŒ.

๊ทธ์ค‘ ๊ฐ€์žฅ ์•ž์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ฃผ์†Œ์— ๋ฐ์ดํ„ฐ๋ฅผ 3๊ตฐ๋ฐ ๋ถ„์‚ฐํ•˜์—ฌ ์ €์žฅํ•จ

HDFS์˜ ์ฝ๊ธฐ

-App์—์„œ HDFS Client์— ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ ์ •๋ณด ์š”์ฒญํ•จ.

client๊ฐ€ name node์—๊ฒŒ ๋ฉ”ํƒ€์ •๋ณด๋ฅผ ์š”์ฒญํ•˜๊ณ 

 master, slave์„œ๋ฒ„์˜ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ฃผ์†Œ๋ฅผ ์ œ๊ณตํ•˜์—ฌ

๊ทธ ์ค‘ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ฃผ์†Œ๋กœ ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

 

MapReduce

๊ฐ„๋‹จํ•˜๊ฒŒ ๋งํ•˜๋ฉด ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์œผ๋กœ

Map๊ณผ Reduce๊ฐ€ ์žˆ๋‹ค

Map์˜ ์—ญํ• 

์›์‹œ๋ฐ์ดํ„ฐ๋ฅผ 64MB๋กœ ์ž๋ฅด๊ณ 

splitํ•˜์—ฌ key/value ์Œ์œผ๋กœ ์ž„์‹œ์ ์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค.

์ž„์‹œ์ ์œผ๋กœ ๋‚˜๋ˆ” >> ๋””์Šคํฌ ์ƒ

๋‚˜๋ˆˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ ๋ฒ„ํผ์ƒ์—์„œ

 ํŒŒํ‹ฐ์…˜ ํ•จ์ˆ˜๋กœ ํŒŒํ‹ฐ์…˜์˜์—ญ์œผ๋กœ ๊ตฌ๋ถ„ํ•œ๋‹ค.

Reduce ์—ญํ• 

๊ตฌ๋ถ„ํ•œ ์ค‘๊ฐ„ํŒŒ์ผ์„ Reduce๋กœ ํ•ฉ์ณ key์— ์˜ํ•ด sort๋˜๊ณ  reduce ํ•จ์ˆ˜๋กœ ๊ฒฐ๊ณผ๋ฌผ์„ ๋””์Šคํฌ์— ์ €์žฅํ•œ๋‹ค.

728x90
๋ฐ˜์‘ํ˜•

+ Recent posts