๋ฐ˜์‘ํ˜•

๋น…๋ฐ์ดํ„ฐ ๋ถ„์•ผ ์ค‘ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋ง์— ๊ด€ํ•œ ์ฑ…๋„ ์ถœ๊ฐ„์ด ๋งŽ์ด ๋Š˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฆฌ๋ทฐํ•  ์ฑ…์€ ํ•œ๋น›๋ฏธ๋””์–ด์˜ "๊ฒฌ๊ณ ํ•œ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋ง"์ž…๋‹ˆ๋‹ค.

https://www.yes24.com/Product/Goods/119712582

 

๊ฒฌ๊ณ ํ•œ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง - YES24

์‹ค์šฉ์ ์ธ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ์„ธ๊ณ„๋กœ ์ด๋„๋Š” ์ตœ๊ณ ์˜ ์•ˆ๋‚ด์„œ!๊ณ ๊ฐ ์š”๊ตฌ ์‚ฌํ•ญ์— ๋งž๋Š” ์‹œ์Šคํ…œ์„ ๊ณ„ํšํ•˜๊ณ  ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ถ„์•ผ๊ฐ€ ๋น ๋ฅด๊ฒŒ ์„ฑ์žฅํ•˜๋ฉด์„œ ๋งŽ์€ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด

www.yes24.com

์›์„œ ์ œ๋ชฉ์€ ์˜ค๋ผ์ผ๋ฆฌ์‚ฌ์˜ The Fundamental of Data Engineering ์ž…๋‹ˆ๋‹ค.

 

 

ํ•ด๋‹น ์ฑ…์€ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด์—๊ฒŒ ์ถ”์ฒœ์ด ๋งŽ์€ ์ฑ…์ž…๋‹ˆ๋‹ค. ๊ผญ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๊ฐ€ ์•„๋‹ˆ๋”๋ผ๋„ ํ•ด๋‹น ์ง๋ฌด์— ๊ด€์‹ฌ์ด์žˆ๊ฑฐ๋‚˜ ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๋„์ž…ํ•˜๊ฑฐ๋‚˜ ๊ด€์‹ฌ์ด ์žˆ๋Š” ๋ถ„์—๊ฒŒ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. ์ €๋„ ์ถœ๊ฐ„๋˜๋Š”๊ฒƒ์„ ๊ธฐ๋‹ค๋ ธ๋Š”๋ฐ, ์ถœ๊ฐ„๋˜๊ณ  ์šด์ด์ข‹๊ฒŒ ์ด๋ฒคํŠธ๋กœ ์ฑ…์„ ๋ฐ›์•„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

 

๊ฐ„๋‹จํ•œ ๋ชฉ์ฐจ๋ฅผ ๋ณด๋ฉด ์ด๋ ‡๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค.

[PART I ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๊ธฐ๋ฐ˜ ๊ตฌ์ถ•ํ•˜๊ธฐ]
CHAPTER 1 ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ƒ์„ธ
CHAPTER 2 ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช… ์ฃผ๊ธฐ
CHAPTER 3 ์šฐ์ˆ˜ํ•œ ๋ฐ์ดํ„ฐ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„
CHAPTER 4 ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช… ์ฃผ๊ธฐ ์ „์ฒด์— ๊ฑธ์นœ ๊ธฐ์ˆ  ์„ ํƒ

[PART II ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช… ์ฃผ๊ธฐ ์‹ฌ์ธต ๋ถ„์„]
CHAPTER 5 1๋‹จ๊ณ„: ์›์ฒœ ์‹œ์Šคํ…œ์—์„œ์˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
CHAPTER 6 2๋‹จ๊ณ„: ๋ฐ์ดํ„ฐ ์ €์žฅ
CHAPTER 7 3๋‹จ๊ณ„: ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
CHAPTER 8 4๋‹จ๊ณ„: ์ฟผ๋ฆฌ ๋ชจ๋ธ๋ง ๋ฐ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜
CHAPTER 9 5๋‹จ๊ณ„: ๋ถ„์„, ๋จธ์‹ ๋Ÿฌ๋‹ ๋ฐ ์—ญ ETL์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์„œ๋น™


[PART III ๋ณด์•ˆ, ๊ฐœ์ธ์ •๋ณด๋ณดํ˜ธ ๋ฐ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ๋ฏธ๋ž˜]
CHAPTER 10 ๋ณด์•ˆ๊ณผ ๊ฐœ์ธ์ •๋ณด๋ณดํ˜ธ
CHAPTER 11 ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ๋ฏธ๋ž˜

์ด์ฑ…์€ ์•ฝ 534ํŽ˜์ด์ง€์ •๋„๋กœ ๋งŽ์€ ๋‚ด์šฉ์„ ๋‹ค๋ฃจ๋Š” ์–‘์ด๊ณ , ๊ด€์‹ฌ์ด ์žˆ์œผ์‹  ๋ถ„๋“ค์€ ๊ผญ! ์„œ์ ์—์„œ ๋ณด๊ฑฐ๋‚˜ ๊ตฌ๋งคํ•ด์„œ ๋ณด๊ธฐ๋ฅผ ์ถ”์ฒœํ•ฉ

๋‚ด์šฉ ์ค‘ ์ฑ•ํ„ฐ 4 "๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช… ์ฃผ๊ธฐ ์ „์ฒด์— ๊ฑธ์นœ ๊ธฐ์ˆ  ์„ ํƒ" ๋ฅผ ๋ฆฌ๋ทฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

 

๋“ค์–ด๊ฐ€๊ธฐ์ „์— ์ƒ๊ฐ ์ •๋ฆฌ

๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋งํŒ€ ๊ฐ ํšŒ์‚ฌ์—์„œ ๋ถ€๋ฅด๋Š” ๋ช…์นญ์ด ๋‹ค์–‘ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค

๋ฐ์ดํ„ฐํ”Œ๋žซํผํŒ€, ๋ฐ์ดํ„ฐ๊ฐœ๋ฐœํŒ€, ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋งํŒ€ ๋“ฑ๋“ฑ ๋ช…์นญ๋„ ๋‹ค์–‘ํ•˜๊ณ  ๊ทธ ํŒ€์—์„œ ์—…๋ฌด๋˜ํ•œ ํšŒ์‚ฌ๋งˆ๋‹ค ๋ฒ”์œ„๊ฐ€ ๋‹ค๋ฅผ๊ฒƒ๊ฐ™์Šต๋‹ˆ๋‹ค.

ํ•˜๋‘ก์—์ฝ”์‹œ์Šคํ…œ ์šด์˜, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ์ ์žฌ, ํƒ€ํŒ€์œผ๋กœ ๋ฐ์ดํ„ฐ ๋”œ๋ฆฌ๋ฒ„๋ฆฌ ๋„“๊ฒŒ๋Š” ์‹œ๊ฐํ™” ๋“ฑ๋“ฑ์˜ ์—…๋ฌด๋„ ํŒ€๋‚ด ์—…๋ฌด๋กœ ์ง€์ •์ด ๋˜์–ด์žˆ์„๊ฒƒ์œผ๋กœ ์ถ”์ธก๋ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐํŒ€์— ์ด๋ฏธ ๊ธฐ์กด์— ๊ตฌ์ถ•๋˜์–ด ์žˆ๋Š” ํ•˜๋‘กํ”Œ๋žซํผ์ด ์žˆ๊ณ  ์ข€ ๋” ํšจ์œจ์ ์ธ ์—…๋ฌด์ง„ํ–‰์„ ์œ„ํ•ด์„œ ์ƒˆ๋กœ์šด ์˜คํ”ˆ์†Œ์Šค ๋„์ž… ๋˜๋Š” ์ถ”๊ฐ€๊ฐœ๋ฐœ์ด ์—†์–ด์ง„ ์˜คํ”ˆ์†Œ์Šค ์ œ๊ฑฐ ๋“ฑ๋“ฑ์œผ๋กœ ๊ธฐ์ˆ  ๊ณ ๋ฏผ์ด ์žˆ์—ˆ์œผ๋ฉฐ ์•ž์œผ๋กœ๋„ ๊ด€๋ จ๋œ ๊ณ ๋ฏผ์ด ๋Š˜์–ด๋‚  ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๊ฐ€์žฅ ์ตœ๊ทผ์—๋Š” ํด๋ผ์šฐ๋“œ ๋„์ž…์œผ๋กœ ์ธํ•œ ํ•˜๋‘ก๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ์—…๋ฌด ๋˜๋Š” ์˜คํ”ˆ์†Œ์Šค๊ต์ฒด ๋“ฑ์˜ ์—…๋ฌด๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๋ถ„๋“ค์ด ๋Š˜์–ด๋‚ฌ์„ ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๊ฒฌ๊ณ ํ•œ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ฑ…์€ ์‹ค์ œ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋“ค์ด ํ•˜๋Š” ์—…๋ฌด์™€ ํ˜„์žฌ ์—…๋ฌด์—์„œ ๋ถ€๋”ชํžŒ ๋ฌธ์ œ์— ๊ด€๋ จํ•ด์„œ ํžŒํŠธ๋ฅผ ์ค๋‹ˆ๋‹ค.

์ €๋Š” ๋งŽ์€ ๋‚ด์šฉ ์ค‘์— ํ˜„์žฌ์™€ ๋ฏธ๋ž˜์— ๋Œ€ํ•œ ๊ธฐ์ˆ ๋น„๊ต ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ์˜ ์œ„์น˜ ๋น„๊ต ๋“ฑ์„ ๋‹ค๋ฃฌ "๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช… ์ฃผ๊ธฐ ์ „์ฒด์— ๊ฑธ์นœ ๊ธฐ์ˆ  ์„ ํƒ" ์ฑ•ํ„ฐ๋ฅผ ์ฝ๊ณ  ๋ฆฌ๋ทฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

 

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช…์ฃผ๊ธฐ ์ „์ฒด์— ๊ฑธ์นœ ๊ธฐ์ˆ  ์„ ํƒ

์•„ํ‚คํ…์ณ๋ฅผ ์‹คํ˜„ํ•˜๋Š”๋ฐ ์“ฐ์ด๋Š” ๋„๊ตฌ๋Š” "์–ด๋–ป๊ฒŒ" ๊ตฌ์ถ•ํ• ์ง€ ๊ฒฐ์ •ํ•œ

๊ธฐ์ˆ (๋„๊ตฌ)์„ ์„ ํƒํ•  ๋•Œ ๊ณ ๋ คํ•ด์•ผํ•˜๋Š” ์‚ฌํ•ญ์ด๋‹ค.

 

1. ํŒ€์˜ ๊ทœ๋ชจ์™€ ๋Šฅ๋ ฅ

2. ์‹œ์žฅ ์ถœ์‹œ ์†๋„

3. ์ƒํ˜ธ์šด์šฉ์„ฑ

4. ๋น„์šฉ์ตœ์ ํ™” ๋ฐ ๋น„์ฆˆ๋‹ˆ์Šค ๊ฐ€์น˜

5. ํ˜„์žฌ์™€ ๋ฏธ๋ž˜: ๋ถˆ๋ณ€์˜ ๊ธฐ์ˆ ๊ณผ ์ผ์‹œ์  ๊ธฐ์ˆ  ๋น„๊ต

6. ๊ตฌ์ถ•๊ณผ ๊ตฌ๋งค ๋น„๊ต

7. ๋ชจ๋†€๋ฆฌ์‹๊ณผ ๋ชจ๋“ˆ์‹ ๋น„๊ต

8. ์„œ๋ฒ„๋ฆฌ์Šค์™€ ์„œ๋ฒ„์˜ ๋น„๊ต

9. ์ตœ์ ํ™”, ์„ฑ๋Šฅ, ๋ฒค์น˜๋งˆํฌ

10. ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ˆ˜๋ช…์ฃผ๊ธฐ์˜ ๋“œ๋Ÿฌ๋‚˜์ง€์•Š๋Š” ์š”์†Œ

 

๊ธฐ์ˆ  ์„ ํƒ์— ๊ณ ๋ คํ•ด์•ผํ•˜๋Š” ์š”์†Œ ์ค‘

5๋ฒˆ ํ•ญ๋ชฉ ํ˜„์žฌ์™€ ๋ฏธ๋ž˜์˜ ๋ถˆ๋ณ€์˜ ๊ธฐ์ˆ ๊ณผ ์ผ์‹œ์ ์ธ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ๋น„๊ตํ•ด๋ณด์ž ํ•œ๋‹ค.

์ฒ˜์Œ์— ๋ณด๋ฉด ๊ธ€์”จ๋กœ ์ ‘ํ•ด ์กฐ๊ธˆ ์–ด๋ ค์šด๊ฒƒ ๊ฐ™์œผ๋‚˜ ์‰ฝ๊ฒŒ ๋‹ค์‹œ ์ƒ๊ฐํ•ด๋ณด๋ฉด ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ๊ธฐ์ˆ ๋“ค๊ณผ ๊ณ„์†ํ•ด์„œ ํŠธ๋ Œ๋“œ๊ณผ ๋ฐ”๋€Œ๋Š” ๊ธฐ์ˆ ๋“ค์ด ์žˆ๋‹ค.

์‰ฝ๊ฒŒ ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ๊ธฐ์ˆ ๋กœ๋Š” ๋„คํŠธ์›Œํฌ, ๋ณด์•ˆ ๋“ฑ์ด ์žˆ๋‹ค.

์‰ฝ๊ฒŒ ๋ฐ”๋€Œ์ง€์•Š๋Š” ๊ธฐ์ˆ ์— ์ถ”๊ฐ€๋  ๊ธฐ์ˆ ๋กœ๋Š” AWS S3, GCP Bigquery, Azure Blob ๋“ฑ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ์ฒด ์Šคํ† ๋ฆฌ์ง€์— ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ํ˜„๋ช…ํ•œ ์„ ํƒ์ด๋‹ค.

์ผ์‹œ์ ์ธ ๊ธฐ์ˆ ๋กœ๋Š” ํ”„๋ก ํŠธ์—”๋“œ๋ฅผ ์˜ˆ๋ฅผ ๋“ค ์ˆ˜ ์žˆ๋‹ค. ํ”„๋ก ํŠธ์—”๋“œ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋ณด๋ฉด ๊ณผ๊ฑฐ์—๋Š” apache ambari ์›น ๊ฐœ๋ฐœ์— ์‚ฌ์šฉ๋œ ํ”„๋ ˆ์ž„์›Œํฌ์ธ ember.js ๋“ฑ ์—์„œ ํ˜„์žฌ๋Š” react๋กœ ํŠธ๋ Œ๋“œ๊ฐ€ ๋ฐ”๋€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ, ๋‚˜๋ฅผ ์œ„ํ•œ ๊ธฐ์ˆ (๋„๊ตฌ)๋ฅผ ์„ ํƒํ•  ๋•Œ ๋ถˆ๋ณ€์˜ ๊ธฐ์ˆ ์„ ๋‚ด ๊ธฐ์ˆ ๋กœ ์‚ผ๊ณ  ์ผ์‹œ์ ์ธ ๊ธฐ์ˆ ์€ ๊ธฐ์ˆ ์ฃผ์œ„์˜ ๋„๊ตฌ๋กœ ์‚ผ์•„์•ผํ•œ๋‹ค.

๊ฐœ์ธ์ ์œผ๋กœ๋Š” ๋ถˆ๋ณ€์˜ ๊ธฐ์ˆ ๋กœ๋Š” ์ปดํ“จํ„ฐ ๊ณตํ•™์˜ ๊ธฐ๋ณธ์„ ๋‚ด ๊ธฐ์ˆ ๋กœ ์‚ผ๊ณ  ๊ทธ ์™ธ ์˜คํ”ˆ์†Œ์Šค๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์€ ๊ธฐ์ˆ ์ฃผ์œ„์˜ ๋„๊ตฌ๋กœ ์‚ผ์œผ๋ผํ•˜๋Š” ํ•„์ž์˜ ์กฐ์–ธ์œผ๋กœ ๋Š๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋„๊ตฌ๋ฅผ ์„ ํƒํ•  ๋•Œ๋„ ๋งŒ์— ํ•˜๋‚˜๋ผ๋„ ํ•ด๋‹น ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์„ ํƒํ•ด์•ผํ•œ๋‹ค.

์˜ˆ๋ฅผ๋“ค๋ฉด ํ”„๋กœ์ ํŠธ๊ฐ€ ์—†์–ด์ง€๊ฑฐ๋‚˜ ํšŒ์‚ฌ๊ฐ€ ์—†์–ด์ง€๊ฑฐ๋‚˜ ๋“ฑ์„ ์—ผ๋ คํ•ด๋‘๊ณ  ๋‹ค๋ฅธ ๋„๊ตฌ๋กœ ์ „ํ™˜ํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์ธ์ง€ํ•ด์•ผํ•œ๋‹ค.

 

6๋ฒˆ ๊ตฌ์ถ•๊ณผ ๊ตฌ๋งค ๋น„๊ต

์ฑ…์—์„œ OSS (์˜คํ”ˆ์†Œ์Šค์†Œํ”„ํŠธ์›จ์–ด)์™€ ์ƒ์šฉ OSS๋ฅผ ๋น„๊ตํ•ด์ค€๋‹ค.

ํŽธํ•˜๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด OSS ์—์„  apache ์žฌ๋‹จ์˜ ์˜คํ”ˆ์†Œ์Šค๋ฅผ ๋– ์˜ฌ๋ฆฌ๊ณ  ์ƒ์šฉ OSS์—์„  apache spark๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ธŒ๋ฆญ์Šค ์ œํ’ˆ, apache kafka๊ธฐ๋ฐ˜ confluent ์ œํ’ˆ์„ ๋– ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค.

๊ตฌ์ถ•๊ณผ ๊ตฌ๋งค ๋น„๊ตํ•ด๋ณด์ž๋ฉด ์žฅ๋‹จ์ ์ด ๋ช…ํ™•ํ•˜๋‹ค.

์ง์ ‘ ๊ตฌ์ถ•์€ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด์—๊ฒŒ ์„ฑ์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ‘๊ฑฐ๋ฆ„์ด ๋œ๋‹ค. ํ™˜๊ฒฝ ๊ตฌ์„ฑํ•˜๋ฉฐ ๋ถ€๋”ชํžˆ๋Š” ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…์ด๋‚˜ ํ™˜๊ฒฝ์—๋Œ€ํ•œ ์ดํ•ด๋„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์˜คํ”ˆ์†Œ์Šค์ด๋‹ค ๋ณด๋‹ˆ known issue๋˜๋Š” ์ƒˆ๋กœ์šด ์ด์Šˆ๋“ค์ด ์ƒ๊ธธ ์ˆ˜ ์žˆ์œผ๋ฉฐ ์šด์˜์ค‘์ธ ํ™˜๊ฒฝ์—์„œ ์˜คํ”ˆ์†Œ์Šค ์ด์Šˆ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ ์ƒ๋‹นํžˆ ๋งŽ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค. 

์ƒ์šฉ OSS๋ฅผ ๊ตฌ๋งคํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์šด์˜์— ๋Œ€ํ•œ ๋ถ€๋‹ด์ด ๋œํ•œ ํŽธ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜คํ”ˆ์†Œ์Šค์†Œํ”„ํŠธ์›จ์–ด๊ฐ€ ์•„๋‹Œ ์ƒ์šฉ์ด๊ธฐ๋•Œ๋ฌธ์— ์ง€์›์ด๋‚˜ ์ด์Šˆ๋Œ€์‘์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ƒ์šฉ OSS๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ์ง์ ‘ ๊ตฌ์ถ•์— ๋น„ํ•ด ์ดํ•ด๋„๊ฐ€ ์ข€ ๋–จ์–ด ์งˆ์ˆ˜๋„ ์žˆ๋‹ค๋Š” ์ƒ๊ฐ๋„ ๋“œ๋‚˜ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด ๊ฐœ์ธ์ด ๊ณต๋ถ€๋งŒ ํ•œ๋‹ค๋ฉด ์ฐจ์ด๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹จ์ ๋„ ์žˆ๋‹ค. ๋น„๊ต์  ํฐ ๋น„์šฉ์ด ๋“ค๋ฉฐ ์—…๋ฌด ์ง„ํ–‰ํ•  ๋•Œ ์ง€์›์ด๋‚˜ ๋ฌธ์„œ์ง€์›์— ๋Œ€ํ•ด์„œ๋„ ๊ณ ๋ฏผ์„ ํ•ด๋ด์•ผํ•œ๋‹ค.

 

๋์œผ๋กœ

์ €๋Š” ํ˜„์žฌ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋กœ ์žฌ์ง์ค‘์— ์žˆ์Šต๋‹ˆ๋‹ค. ์—…๋ฌด ์ง„ํ–‰ํ•˜๋‹ค ๋ณด๋‹ค๋ณด๋ฉด ๊ถ๊ธˆํ•œ๊ฒŒ ์ƒ๊ธฐ๋ฉด ์›น ๊ฐœ๋ฐœ์— ๋น„ํ•ด ์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•œํŽธ์ด ์‚ฌ์‹ค์ž…๋‹ˆ๋‹ค. ๊ธฐ์ˆ ์™ธ์ ์œผ๋กœ ๊ณ ๋ฏผ๋˜๋Š” ๋‚ด์šฉ๋“ค ํšŒ์‚ฌ ๋™๋ฃŒ์˜ ์ƒ๊ฐ๋„ ๋“ฃ๊ณ , ๋” ๋‚˜์•„๊ฐ€ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์˜ ๊ฒฝํ—˜๊ณผ ์ƒ๊ฐ์„ ์ ‘ํ•˜๊ณ  ์‹ถ์—ˆ๋Š”๋ฐ ์ด ์ฑ…์„ ์ฝ๊ณ  ์–ด๋Š์ •๋„ ํ•ด์†Œ๋˜๋Š” ๊ฒƒ ๊ฐ™์•˜์Šต๋‹ˆ๋‹ค.

๋‚ด์šฉ์€ ์•„๋ฌด๋ž˜๋„ ๊ฒฝํ—˜์ด ์ „ํ˜€ ์—†๋Š” ์‚ฌ๋žŒ์—๊ฒŒ๋Š” ์ „๋ถ€ ์ดํ•ด๋Š” ์ข€ ์–ด๋ ค์šธ ๊ฒƒ ๊ฐ™์œผ๋‚˜ ๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด์ง๋ฌด๋ฅผ ํฌ๋งํ•˜๋Š” ๋ถ„์€ ๊ผญ ์ฝ์–ด๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด ์ฑ…์˜ ๋‚ด์šฉ์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

ํ•œ๋น›๋ฏธ๋””์–ด์—์„œ ์ฑ…์„ ์ œ๊ณต ๋ฐ›์•„ ์ž‘์„ฑ๋œ ์„œํ‰์ž…๋‹ˆ๋‹ค.

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์Šค์นผ๋ผ ํด๋ž˜์Šค์— ๊ด€ํ•ด ์ •๋ฆฌํ•จ
https://docs.scala-lang.org/overviews/scala-book/classes.html

๊ธฐ๋ณธ ํด๋ž˜์Šค ์ƒ์„ฑ์ž

class Person(var firstName: String, var lastName: String)
val p = new Person("Bill", "Panner")
println(p.firstName + " " + p.lastName) //Bill Panner
p.firstName = "William" 
p.lastName = "Bernheim"

val๋Š” ํ•„๋“œ๋ฅผ ์ฝ๊ธฐ ์ „์šฉ์œผ๋กœ ๋งŒ๋“ฆ

val(value) : ๋ณ€๊ฒฝํ•  ์ˆ˜ ์—†์Œ
var(variable) : ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Œ

์Šค์นผ๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ์ฒด ์ง€ํ–ฅ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ฒฝ์šฐ var ํ•„๋“œ๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•„๋“œ ๋งŒ๋“ค๊ธฐ
ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•  ๋•Œ ์ผ๋ฐ˜์ ์œผ๋กœ ํด๋ž˜์Šค๋Œ€์‹  ์ผ€์ด์Šค ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•จ

ํด๋ž˜์Šค ์ƒ์„ฑ์ž

์ƒ์„ฑ์ž ๋งค๊ฐœ๋ณ€์ˆ˜
ํด๋ž˜์Šค ๋ณธ์ฒด์—์„œ ํ˜ธ์ถœ๋˜๋Š” ๋ฉ”์„œ๋“œ
ํด๋ž˜์Šค ๋ณธ๋ฌธ์—์„œ ์‹คํ–‰๋˜๋Š” ๋ช…๋ น๋ฌธ ๋ฐ ํ‘œํ˜„์‹

๋‹ค๋ฅธ ์Šค์นผ๋ผ ํด๋ž˜์Šค ์˜ˆ์ œ

class Pizza (var crustSize: Int, var crustType: String)

// a stock, like AAPL or GOOG
class Stock(var symbol: String, var price: BigDecimal)

// a network socket
class Socket(val timeout: Int, val linger: Int) {
    override def toString = s"timeout: $timeout, linger: $linger"
}

class Address (
    var street1: String,
    var street2: String,
    var city: String, 
    var state: String
)
 
728x90
๋ฐ˜์‘ํ˜•

'๋ฏธ์‚ฌ์šฉ > 3. ์Šค์นผ๋ผ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

์Šค์นผ๋ผ ๋ถ 01  (0) 2023.01.11
๋ฐ˜์‘ํ˜•

์Šค์นผ๋ผ์˜ ๋ง›

์Šค์นผ๋ผ ํŠน์ง•

์ •์  ํƒ€์ž…
๊ตฌ๋ฌธ ๊ฐ„๊ฒฐํ•˜๋ฉฐ ์ฝ๊ธฐ ์‰ฌ์›€
๊ฐ์ฒด ์ง€ํ–ฅ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ณผ ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํŒจ๋Ÿฌ๋‹ค์ž„ ์ง€์›
์ •๊ตํ•œ ์œ ํ˜• ์ถ”๋ก  ์‹œ์Šคํ…œ?
JVM์—์„œ ์‹คํ–‰๋˜๋Š” ํด๋ž˜์ŠคํŒŒ์ผ ์ƒ์„ฑ
์ž๋ฐ” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์›€

Hello, World

Hello.scala

object Hello expends App {
	println("Hello, World")
}

๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๋ณ€์ˆ˜

val : ๋ถˆ๋ณ€ ๋ณ€์ˆ˜ - ์ž๋ฐ”์—์„œ final๊ณผ ๊ฐ™์Œ
var : ๊ฐ€๋ณ€ ๋ณ€์ˆ˜ - ํŠน๋ณ„ํ•œ ์ด์œ ๊ฐ€ ์žˆ์„ ๋•Œ๋งŒ ์‚ฌ์šฉ

๋ณ€์ˆ˜ ์œ ํ˜• ์„ ์–ธ

์œ ํ˜•์„ ์„ ์–ธํ•˜์ง€ ์•Š๊ณ  ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ฆ

val x = 1
val s = "string"
val p = new Persion("Regina")

๋ฐ์ดํ„ฐ ์œ ํ˜•์„ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ฝ”๋“œ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋Š”๋ฐ ๋„์›€๋จ
์œ ํ˜• ๋ช…์‹œํ•  ์ˆ˜ ์žˆ์œผ๋‚˜ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•„์š”ํ•˜์ง€์•Š์Œ

์ œ์–ด ๊ตฌ์กฐ

if (test1) {
    doA()
} else if (test2) {
    doB()
} else if (test3) {
    doC()
} else {
    doD()
}

์‚ผํ•ญ ์—ฐ์‚ฐ์ž

val x = if (a < b) a else b

match ํ‘œํ˜„์‹

์Šค์นผ๋ผ์—์„œ match๋Š” ์ž๋ฐ”์—์„œ swtich์™€ ์œ ์‚ฌํ•จ

val result = i match {
    case 1 => "one"
    case 2 => "two"
    case _ => "not 1 or 2"
}

์ •์ˆ˜๋กœ๋งŒ ์ œํ•œ๋˜์•Š๊ณ  boolean์„ ํฌํ•จํ•œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์‚ฌ์šฉํ•จ

val booleanAsString = bool match {
    case true => "true"
    case false => "false"
}

try/catch ํ‘œํ˜„์‹

์ž๋ฐ” try/catch์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ match ํ‘œํ˜„์‹๊ณผ ์ผ์น˜ํ•จ

try {
    writeToFile(text)
} catch {
    case fnfe: FileNotFoundException => println(fnfe)
    case ioe: IOException => println(ioe)
}

for loop์™€ ํ‘œํ˜„์‹

for (arg <- args) println(arg)

// "x to y" syntax
for (i <- 0 to 5) println(i)

// "x to y by" syntax
for (i <- 0 to 10 by 2) println(i)

for-loop์— ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์‚ฐ์ถœํ•˜๋Š” yeild ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ

val x = for (i <- 1 to 5) yield i * 2

while do/while ํ‘œํ˜„์‹

// while loop
while(condition) {
    statement(a)
    statement(b)
}

// do-while
do {
   statement(a)
   statement(b)
} 
while(condition)

ํด๋ž˜์Šค

class Person(var firstName: String, var lastName: String) {
    def printFullName() = println(s"$firstName $lastName")
}
val p = new Person("Julia", "Kern")
println(p.firstName)
p.lastName = "Manes"
p.printFullName()

์Šค์นผ๋ผ ๋ฉ”์„œ๋“œ

๋ฐ˜ํ™˜ ์œ ํ˜• ์„ ์–ธํ•  ํ•„์š” ์—†์Œ

def sum(a: Int, b: Int) = a + b
def concatenate(s1: String, s2: String) = s1 + s2

๋ฉ”์„œ๋“œ ํ˜ธ์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•

def sum(a: Int, b: Int) = a + b
def concatenate(s1: String, s2: String) = s1 + s2

trait

์ถ” ํ›„์— ์ž์„ธํžˆ ๋ด์„œ ์ง€๊ธˆ์€ ๊ทธ๋ƒฅ ๋„˜๊น€

Collection Class

ํŠœํ”Œ

ํ‘œํ˜„์‹์€ ์—ฌ๊ธฐ์„œ ๋Œ€์ถฉ ์ดํ•ดํ•˜๊ณ  ๋„˜๊ธธ๊ฑฐ์ž„

728x90
๋ฐ˜์‘ํ˜•

'๋ฏธ์‚ฌ์šฉ > 3. ์Šค์นผ๋ผ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

์Šค์นผ๋ผ ๋ถ 02  (0) 2023.01.11
๋ฐ˜์‘ํ˜•

HUE ๋‹ค์šด๋กœ๋“œ

์›ํ•˜๋Š” ํœด ์„ ํƒํ•˜์—ฌ ๋‹ค์šด๋กœ๋“œ
https://github.com/cloudera/hue/tags

 

GitHub - cloudera/hue: Open source SQL Query Assistant service for Databases/Warehouses

Open source SQL Query Assistant service for Databases/Warehouses - GitHub - cloudera/hue: Open source SQL Query Assistant service for Databases/Warehouses

github.com

 

Dependency ๋ฌดํ•œ๊ณ ํ†ต

mvn, database ์„ค์น˜ํ›„ ๊ธฐ๋ณธ์ ์ธ ์„ธํŒ…์€ ์™„๋ฃŒ (hue ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค์™€ ์œ ์ € ์ƒ์„ฑ ์ž‘์—…๊นŒ์ง€ !)

python pip upgrade

curl https://bootstrap.pypa.io/pip/2.7/get-pip.py -o get-pip.py
python get-pip.py
pip install --upgrade pip

python package ์„ค์น˜

pip install psycopg2
pip install psycopg2-binary

OS Package ์„ค์น˜

sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
libffi-devel python-devel openssl-devel -y

SQLite ๋ฒ„์ „ ์—…๊ทธ๋ ˆ์ด๋“œ (hue django ๋•Œ๋ฌธ์— ๋„ˆ๋ฌด ์•„๋ž˜๋ฒ„์ „์ผ์ˆ˜๋ก ์„ค์น˜ ์•ˆ๋จ)

https://kojipkgs.fedoraproject.org/packages/sqlite/

 

Index of /packages/sqlite

kojipkgs.fedoraproject.org

์—ฌ๊ธฐ์„œ ๋งž๋Š” sqlite rpm, sqlite-devel rpm ๋‹ค์šด๋กœ๋“œ

wget https://kojipkgs.fedoraproject.org/packages/sqlite/3.12.2/1.fc24/x86_64/sqlite-3.12.2-1.fc24.x86_64.rpm
wget https://kojipkgs.fedoraproject.org/packages/sqlite/3.12.2/1.fc24/x86_64/sqlite-devel-3.12.2-1.fc24.x86_64.rpm

rpm -Uvh sqlite-3.12.2-1.fc24.x86_64.rpm sqlite-devel-3.12.2-1.fc24.x86_64.rpm

HUE Build

desktop/devtools.mk ์ˆ˜์ •

DEVTOOLS += \
        ipython[7.10.0] \
        ipdb[0.13.9] \

ํœด ๋นŒ๋“œ

cd ${HUE_SRC}
make apps

 

HUE Start

[hue database ๋™๊ธฐํ™”] build/env/bin/hue migrate 
[hue server ์‹œ์ž‘] build/env/bin/hue runserver 0.0.0.0:8000
[hue login] user id/ password -admin/admin
[hdfs user ์ƒ์„ฑ] hdfs dfs -mkdir /user/admin
[hdfs user dir ๊ถŒํ•œ๋ณ€๊ฒฝ] hdfs dfs -chown -R admin:admin /user/admin

 

HUE Configs

vi ${HUE_SRC}/desktop/conf/pseudo-distributed.ini
  [[database]]
    engine=postgresql_psycopg2
    host=1.2.3.4
    name=hue
    port=5432
    user=hue
    password=hue


[hadoop]
  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs
    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://1.2.3.4:8020
      webhdfs_url=http://1.2.3.4:50070/webhdfs/v1
# ------------------------------------------------------------------------     
[beeswax]
  hive_server_host=1.2.3.4
  hive_server_port=10000
  hive_server_http_port=10001
  max_number_of_sessions=3
  thrift_version=11
  use_sasl=true
  # ------------------------------------------------------------------------
[hbase]
  hbase_clusters=(Cluster|1.2.3.4:9090)
  thrift_transport=buffered
  ssl_cert_ca_verify=false
728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์ŠคํŒŒํฌ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ, ๋ฉ”๋ชจ๋ฆฌ์™€ ์ฝ”์–ด๋ฅผ ์„ค์ •ํ•˜์—ฌ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

x=sc.parallelize([“spark”, ”rdd”, ”example”, “sample”, “example”], 3)  ๋ณ‘๋ ฌํ™”(transformation)

x=x.map(lambda x:(x,1))  #์ž…๋ ฅ๊ฐ’ : x   ์ถœ๋ ฅ๊ฐ’:  (x,1)  ๋งคํ•‘(transformation)

y.collect   ์ง‘ํ•ฉ(action)

[(‘spark’,1), (‘rdd’,1), (‘example’,1), (‘sample’,1), (‘example’,1)]

 

spark yarn ์‹คํ–‰

scala : spark-shell --master yarn --queue queue_name

python : pyspark --master yarn --queue queue_name

--driver-memory 3G : spark driver๊ฐ€ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ default = 1024M

--executor-memory 3G : ๊ฐ spark executor๊ฐ€ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ์–‘

--executor-cores NUM : ๊ฐ spark executor์˜ ์ฝ”์–ด์˜ ์–‘

์ž‘์„ฑํ•œ ํŒŒ์ผ Spark์—์„œ ์‹คํ–‰์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

ํŒŒ์ด์ฌ ํŒŒ์ผ

spark-submit –master local[num] ํŒŒ์ผ๋ช….py 

  (num์€ ์“ฐ๋ ˆ๋“œ ๊ฐœ์ˆ˜,default ๊ฐ’์€ 2~4๊ฐœ ์ •๋„)

์ž๋ฐ”,์Šค์นผ๋ผ

spark-submit \ --class “SimpleApp”\ --master local[num] /location~/name.jar

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

NoSQL ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ด๋‹ค.

ํ•˜๋‘ก์˜ ๋ฐ์ดํ„ฐ๋ฅผ NoSQL (Key, value) ์Œ์œผ๋กœ ์ €์žฅํ•จ

 

 

$ /hadoop/sbin/start-all.sh

$ ./start-hbase.sh

$ ./hbase shell

### hbase test ###

create 'test', 'cf'

list 'test'

describe 'test'

put 'test', 'row1', 'cf:a', 'value1'

put 'test', 'row2', 'cf:b', 'value2'

put 'test', 'row3', 'cf:c', 'value3'

scan 'test'

------------------------

ROW COLUMN+CELL

row1 column=cf:a, timestamp=1612833812641, value=value1  

row2 column=cf:b, timestamp=1612833817184, value=value2

row3 column=cf:c, timestamp=1612833818011, value=value3

3 row(s)

Took 0.8014 seconds

whoami

grant 'username','RWXCA'

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

HIVE ํ…Œ์ด๋ธ” ๊ด€๋ฆฌ

HIVE ํ…Œ์ด๋ธ”

1. ๋ฐ์ดํ„ฐ๋ฅผ HIVE ํ…Œ์ด๋ธ”๋กœ ๊ฐ€์ ธ์˜ค๋ฉด?

HiveQL, ํ”ผ๊ทธ, ์ŠคํŒŒํฌ ๋“ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ > ์ƒํ˜ธ์šด์˜ ๋ณด์žฅ

2. HIVE๊ฐ€ ์ง€์›ํ•˜๋Š” ํ…Œ์ด๋ธ” ์ข…๋ฅ˜

    - ๋‚ด๋ถ€ ํ…Œ์ด๋ธ” : HIVE๊ฐ€ ๊ด€๋ฆฌ, HIVE/ ๋ฐ์ดํ„ฐ์›จ์–ดํ•˜์šฐ์Šค์— ์ €์žฅ, ๋‚ด๋ถ€ํ…Œ์ด๋ธ” ์‚ญ์ œ ์‹œ ๋ฉ”ํƒ€์ •์˜์™€ ๋ฐ์ดํ„ฐ๊นŒ์ง€ ์‚ญ์ œ๋จ,

   ORC๊ฐ™์€ ํ˜•์‹์œผ๋กœ ์ €์žฅ๋˜์–ด ๋น„๊ต์  ๋น ๋ฅธ ์„ฑ๋Šฅ

    - ์™ธ๋ถ€ ํ…Œ์ด๋ธ” : ํ•˜์ด๋ธŒ๊ฐ€ ์ง์ ‘ ๊ด€๋ฆฌํ•˜์ง€ ์•Š์Œ,

   ํ•˜์ด๋ธŒ์˜ ๋ฉ”ํƒ€์ •์˜๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์›์‹œ ํ˜•ํƒœ๋กœ ์ €์žฅ๋œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผ

   ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•ด๋„ ํ…Œ์ด๋ธ” ๋ฉ”ํƒ€ ์ •์˜๋งŒ ์‚ญ์ œ๋˜๊ณ  ๋ฐ์ดํ„ฐ๋Š” ์œ ์ง€๋จ.

   ํ•ด๋‹น ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜์ด๋ธŒ ์™ธ๋ถ€์— ์ ์žฌ ๋˜์–ด์žˆ๊ฑฐ๋‚˜ ํ…Œ์ด๋ธ”์ด ์‚ญ์ œ๋˜๋”๋ผ๋„ ์›๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚จ์•„ ์žˆ์–ด์•ผํ•  ๋•Œ ์‚ฌ์šฉ

3.csv ํŒŒ์ผ์„ ํ•˜์ด๋ธŒ ํ…Œ์ด๋ธ”๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ

  1.names.csv ์„ HDFS์— ๋ณต์‚ฌ

  2. hdfs dfsmkdir names

  3. hdfs dfs –put names.csv names

  4. hive ์‹คํ–‰ ํ›„ ์ฟผ๋ฆฌ๋กœ ํ…Œ์ด๋ธ” ์ƒ์„ฑ  location ‘/directory’ ๊ตฌ๋ฌธ์€ ํ…Œ์ด๋ธ”์ด ์‚ฌ์šฉํ•  ์ž…๋ ฅ ํŒŒ์ผ์˜ ๊ฒฝ๋กœ์ด๋‹ค.

  5. select * from ~ ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ

  6. stored as orc > ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”

  7. ๋ฐ์ดํ„ฐ ํ˜•์‹ ํ…์ŠคํŠธ ํŒŒ์ผ, ์‹œํ€€์Šค ํŒŒ์ผ(k-v์Œ), RC ํŒŒ์ผ, ORC ํ˜•์‹, Parquet ํ˜•์‹

 

์™ธ๋ถ€ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

suhdfs

hdfs dfsmkdir /Smartcar

hdfs dfs –put /txtfile.txt /Smartcar

hdfs dfschown –R hive /Smartcar

hdfs dfschmod –R 777 /Smartcar

su – hive

hive

create external table (~) ~ location /Smartcar;

๋‚ด๋ถ€ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

create table (~) ~ location /Smartcar;

์™ธ๋ถ€ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”๋กœ ๋ณต์‚ฌ

insert overwrite table SmartCar_in

select * from SmartCar_ex;

๋‚ด๋ถ€ ํ…Œ์ด๋ธ” ๋””๋ ‰ํ„ฐ๋ฆฌ ์ƒ์„ฑํ™•์ธ

hdfs dfs –ls /Smartcar

/Smartcar/base_0000001/bucket_00000/bucket_00000

 

ํ•˜์ด๋ธŒ๋Š” SQL๊ณผ ์œ ์‚ฌํ•ด์„œ

๊ธฐ์กด์— SQL์„ ๊ณต๋ถ€ํ–ˆ๋‹ค๋ฉด ์–ด๋ ต์ง€์•Š๋‹ค.

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

ํ•˜๋‘ก ๋ฒ„์ „ 3.1 ๊ธฐ์ค€์œผ๋กœ 

๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๋ช…๋ น์–ด์ด๋‹ค.

 

๊ธฐ์กด์— ๋ฆฌ๋ˆ…์Šค์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ–ˆ๋‹ค๋ฉด ํ•˜๋‘ก ๋ช…๋ น์–ด๋ฅผ ๊ณต๋ถ€ํ•˜๋Š”๋ฐ์— ์—„์ฒญ ์–ด๋ ต์ง„์•Š๋‹ค.

 

1.hdfs dfs –cat /tmp/Sample2.txt      #ํŒŒ์ผ ์ฝ๊ธฐ

2.hdfs dfs –checksum /tmp/Sample2.txt ๋ฐ์ดํ„ฐ๋ฌด๊ฒฐ์„ฑ

3.hdfs dfschgrp kyn /tmp/Sample2.txt

4.hdfs dfschown kyn /tmp/Sample2.txt

5.hdfs dfschmod –R 777 /tmp/Sample2.txt

6.hdfs dfscopyFromLocal /tmp/Sample2.txt put์œ ์‚ฌ

7.hdfs dfscopyToLocal /tmp/Sample2.txt

8.hdfs dfs –count /tmp/Sample2.txt

9.hdfs dfs –cp /tmp/Sample2.txt /tmp/rename.txt

10. hdfs dfscreateSnapshot

11. hdfs dfsdeleteSnapshot

12.hdfs dfs -df -h /tmp/ ๋””์Šคํฌ ์—ฌ์œ  ๊ณต๊ฐ„

13.hdfs dfs –du –h /tmp/ ๋””์Šคํฌ ์‚ฌ์šฉ๋Ÿ‰

14.hdfs dfs –expunge ํœด์ง€ํ†ต ๋น„์šฐ๊ธฐ (hdfs ํŒŒ์ผ ์‚ญ์ œ ํ›„ ํœด์ง€ํ†ต์— ์ž„์‹œ์ €์žฅ, ์‹œ๊ฐ„๊ฒฝ๊ณผํ›„ ์‚ญ์ œํ•จ)

15.hdfs dfs –find /temp/Sample2.txt

16.hdfs dfs –get /temp/Sample2.txt

17.hdfs dfs –head /temp/Sample2.txt

18.hdfs dfs –help

20. hdfs dfs –ls /

21. hdfs dfsmkdir /temp/test

22. hdfs dfsmoveFromLocal  (Local ์—์„œ hdfs)

23. hdfs dfsmoveToLocal (hdfs์—์„œ local)

24. hdfs dfs –mv /tmp/Sample2.txt /tmp/test

25. hdfs dfs –put /root/home/Sample.txt /tmp

26. hdfs dfsrenameSnapshot oldname newname

27. hdfs dfsrmdir/test

28. hdfs dfs –rm /tmp/Sample2.txt

29. hdfs dfs –stat ‘%d %o %r %u %n’ /tmp/Sample2.txt

%b  ํŒŒ์ผ ํฌ๊ธฐ %o ํŒŒ์ผ ๋ธ”๋ก ํฌ๊ธฐ $r ๋ณต์ œ ์ˆ˜ %u ์†Œ์œ ์ž %n ํŒŒ์ผ๋ช…

30. hdfs dfs –tail –F /tmp/Sample2.txt

31. hdfs dfs –test ~  true > 0 false > -1

32. hdfs dfs –text /tmp/filename

33. hdfs dfs –touch /tmp/filename

34. hdfs dfstouchz /tmp/filename (0 byte)

35. hdfs dfs –usage command ์‚ฌ์šฉ๋ฐฉ๋ฒ• ์ถœ๋ ฅ

 

File system checking

1.hdfs fsck /

2.hdfs fsck delete

์ปค๋ŸฝํŠธ๋œ ํŒŒ์ผ ์‚ญ์ œ

3. hdfs fsck move

์ปค๋ŸฝํŠธ๋œ ํŒŒ์ผ ์ด๋™

 

Status: HEALTHY

 Number of data-nodes:  3

 Number of racks:               1

 Total dirs:                    950

 Total symlinks:                0

Replicated Blocks:

 Total size:    3664434506 B (Total open files size: 1353 B)  ํ˜„์žฌ ์‚ฌ์šฉ์ค‘์ธ byte

 Total files:   2127 (Files currently being written: 10)

 Total blocks (validated):      1998 (avg. block size 1834051 B)

 (Total open file blocks (not validated): 5)

 Minimally replicated blocks:   1998 (100.0 %)  ์ตœ์†Œ๊ฐœ์ˆ˜๋กœ ๋ณต์ œ๋œ ๋ธ”๋ก

 Over-replicated blocks:        0 (0.0 %)     ์„ค์ • ๊ฐ’๋ณด๋‹ค ๋” ๋ณต์ œ๋œ ๋ธ”๋ก

 Under-replicated blocks:       1998 (100.0 %)     ์„ค์ • ๊ฐ’๋ณด๋‹ค ์•„๋ž˜ ๋ณต์ œ๋ธ”๋ก

 Mis-replicated blocks:         0 (0.0 %)     ๊ทœ์ • ์œ„๋ฐ˜ ๋ธ”๋ก

 Default replication factor:    3     dfs.replication ๊ฐ’

 Average block replication:     1.998999     ๋ณต์ œ๊ฐ’ ํ‰๊ท 

 Missing blocks:                0

 Corrupt blocks:                0     ์˜ค๋ฅ˜๋ฐœ์ƒ

 Missing replicas:              1998 (33.34446 %)       ๋ณต์ œ์กด์žฌX๋ธ”๋ก

 

 

์ปค๋ŸฝํŠธ ์ƒํƒœ

๋ชจ๋“  ๋ธ”๋ก์— ๋ฌธ์ œ ์ƒ๊ฒจ

๋ณต๊ตฌ ๋ชปํ•˜๋Š” ์ƒํƒœ

3copy ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋…ธ๋“œ ์ค‘ ๋ฌธ์ œ ์ƒ๊ธฐ๋ฉด

Reblancing์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ๋งž์ถ”๊ฑฐ๋‚˜

์œ ์‹ค๋œ?๋ฐ์ดํ„ฐ๋ฅผ copy

 

 

 

NameNode ์ƒํƒœ ๊ด€๋ฆฌ

Hdsf dfsadminsafemode enter  : Namenode ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ๋ชปํ•˜๊ฒŒ safemode

Hdfs dfsadminsafemode get : name node ํ™•์ธ

Hdfs dfsadminsafemode leave : name node๊ฐ€ safemode ๋‚˜๊ฐ

 

hdfs envvars (ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์ถœ๋ ฅ๋จ)

JAVA_HOME='/usr/java/jdk'

HADOOP_HDFS_HOME='/usr/hdp/3.1.0.0-78/hadoop-hdfs'

HDFS_DIR='./'

HDFS_LIB_JARS_DIR='lib'

HADOOP_CONF_DIR='/usr/hdp/3.1.0.0-78/hadoop/conf'

HADOOP_TOOLS_HOME='/usr/hdp/3.1.0.0-78/hadoop'

HADOOP_TOOLS_DIR='share/hadoop/tools'

HADOOP_TOOLS_LIB_JARS_DIR='share/hadoop/tools/lib'

 

 

hdfs httpfs

HttpFS ์„œ๋ฒ„, HDFS HTTP ๊ฒŒ์ดํŠธ์›จ์ด ์‹คํ–‰

 

1.hdfs version  ์„ค์น˜๋œ ํ•˜๋‘ก ๋ฒ„์ „ ํ™•์ธ

2.hdfs classpath ์„ค์น˜๋œ ํ•˜๋‘ก jar, ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ classpath ์ถœ๋ ฅ

3.hdfs groups  hdfs : hadoop hdfs kyn

4.hdfs lsSnapshottableDir ์Šค๋ƒ…์ƒท ๋ฆฌ์ŠคํŠธ ์ถœ๋ ฅ๋จ

5.hdfs jmxget jmx์ •๋ณด ์ถœ๋ ฅ

 

 

1.init: server=localhost;port=;service=NameNode;localVMUrl=null

2.Domains:

3.        Domain = JMImplementation

4.        Domain = com.sun.management

5.        Domain = java.lang

6.        Domain = java.nio

7.        Domain = java.util.logging

8.MBeanServer default domain = DefaultDomain

9.MBean count = 22

10.Query MBeanServer MBeans:

11.List of all the available keys:

 

JMX - ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ(์†Œํ”„ํŠธ์›จ์–ด)/๊ฐ์ฒด/์žฅ์น˜ (ํ”„๋ฆฐํ„ฐ ๋“ฑ) ๋ฐ ์„œ๋น„์Šค ์ง€ํ–ฅ ๋„คํŠธ์›Œํฌ ๋“ฑ์„ ๊ฐ์‹œ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ž๋ฐ” API์ด๋‹ค.

 

 

1.Hdfs oev   Hadoop offline edits viewer editlogs ํŒŒ์ผ ํฌ๋งท ํŒŒ์‹ฑ

2.Hdfs oiv    Hadoop offline image viewer fsImage ์‚ฌ๋žŒ์ด ์ฝ์„ ์ˆ˜ ์žˆ๊ฒŒ ๋ณ€๊ฒฝ

3.Hdfs snapshotDiff [๊ฒฝ๋กœ]

1.A์Šค๋ƒ…์ƒท B ์Šค๋ƒ…์ƒท A์—์„œ B์™€ ๋‹ค๋ฅธ ์  ์ถœ๋ ฅ

 

 

 

Administration Commands

 

Hdfs balancer : ๋ธ”๋ก ์œ„์น˜ ๋ถ„์„, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ๊ท ํ˜•๋งž์ถค

Hdfs cacheadmin : cache pool๊ณผ ์ƒํ˜ธ์ž‘์šฉ ๊ฐ€๋Šฅ,

Hdfs cypto : ๋ฐฐ์น˜์—์„œ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ enctyption zone ๋ชฉ๋กํ™”, ๋„ค์ž„๋…ธ๋“œ ํผํฌ๋จผ์Šค ํ–ฅ์ƒ

Hdfs datanode : HDFS ๋ฐ์ดํ„ฐ๋…ธ๋“œ ์‹คํ–‰, ๋กค๋ฐฑ

Hdfs dfsrouter : DFS ๋ผ์šฐํ„ฐ ์‹คํ–‰

Hdfs dfsrouteradmin : ๋ผ์šฐํ„ฐ ๊ธฐ๋ฐ˜ ์กฐ์ง ๊ด€๋ฆฌ

Hdfs diskbalancer : diskbalancer ์‹คํ–‰ (๋ชจ๋“  ๋””์Šคํฌ์˜ ๋ฐ์ดํ„ฐ๋…ธ๋“œ์— ๋ฐ์ดํ„ฐ ๋ถ„๋ฐฐํ•˜๋Š” ์—ญํ• )

Hdfs ec : erasure conding _ ์†Œ๊ฑฐ์ฝ”๋“œ ๊ด€๋ จ๋œ ๋ช…๋ น์–ด

Hdfs haadmin : namenode status ํ™•์ธ, namenode active ์„ ์ •

Hdfs journalnode : journalnode ์‹œ์ž‘

Hdfs mover : data migration ์‹คํ–‰

Hdfs namenode : namenode ์‹คํ–‰, ๋ฐฑ์—…, ํšŒ๋ณต, ์—…๊ทธ๋ ˆ์ด๋“œ, ์ด์ „๋ฒ„์ „์œผ๋กœ ๋กค๋ฐฑ,,๋“ฑ

Hdfs nfs3 : HDFS NFS3 ์„œ๋น„์Šค ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด NFS3 ๊ฒŒ์ดํŠธ์›จ์ด ์‹คํ–‰

Hdfs portmap : HDFS NFS3์„œ๋น„์Šค ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด RPS portmap ์‹คํ–‰

Hdfs secondarynamenode : secondary namenode ์‹คํ–‰

Hdfs storagepolicies : storage policies ๋ฆฌ์ŠคํŠธ ์ถœ๋ ฅ, ์„ค์ •  

BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED, DISK], creationFallbacks=[PROVIDED, DISK]

, replicationFallbacks=[PROVIDED, DISK]}

Hdfs zkfc : zookeeper failover(์žฅ์• ๋Œ€๋น„๊ธฐ๋Šฅ) controller process ์‹คํ–‰

 

hdfs dfsadmin -report

Configured Capacity: 248290449920 (231.24 GB)

Present Capacity: 194169229810 (180.83 GB)

DFS Remaining: 183161573874 (170.58 GB)

DFS Used: 11007655936 (10.25 GB)

DFS Used%: 5.67%

Replicated Blocks:

        Under replicated blocks: 0

        Blocks with corrupt replicas: 0

        Missing blocks: 0

        Missing blocks (with replication factor 1): 0

        Low redundancy blocks with highest priority to recover: 0

        Pending deletion blocks: 0

Erasure Coded Block Groups:

        Low redundancy block groups: 0

        Block groups with corrupt internal blocks: 0

        Missing block groups: 0

        Low redundancy blocks with highest priority to recover: 0

        Pending deletion blocks: 0

-------------------------------------------------

 

Live datanodes (3):

Name: 192.168.56.131:50010 

Hostname: bdd.co.kr

Decommission Status : Normal

Configured Capacity: 76060626432 (70.84 GB)

DFS Used: 3599179776 (3.35 GB)

Non DFS Used: 21608422912 (20.12 GB)

DFS Remaining: 50584588454 (47.11 GB)

DFS Used%: 4.73%

DFS Remaining%: 66.51%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 6

Last contact: Fri Sep 04 12:14:34 KST 2020

Last Block Report: Fri Sep 04 08:00:09 KST 2020

Num of Blocks: 2003

 

 

Hdfs dfsadmin –report –live : ์‚ด์•„์žˆ๋Š” ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ํฌํ•จ ํ•˜๋‘ก ์ƒํƒœ

Hdfs dfsadmin –report –dead: ์ฃฝ์€ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ƒํƒœ

 

 

Debug Commands

Hdfs debug verifyMeta : HDFS ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ๋ธ”๋ก ํŒŒ์ผ ๊ฒ€์ฆ.

๋ฉ”ํƒ€ํŒŒ์ผ์˜ ์ฒดํฌ์„ฌ๊ณผ ๋ธ”๋กํŒŒ์ผ๊ณผ ๋งค์น˜ํ•˜์—ฌ ๊ฒ€์ฆ

Hdfs debug computeMeta : ๋ธ”๋กํŒŒ์ผ์—์„œ์˜ HDFS ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ณ„์‚ฐ,

๋ธ”๋กํŒŒ์ผ์—์„œ ์ฒดํฌ์„ฌ ๊ณ„์‚ฐํ•˜๊ณ  ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์‚ฐ์ถœ๋ฌผ์— ์ €์žฅํ•จ

Hdfs debug recoverLease : recoverLease๋ฅผ ํด๋ผ์ด์–ธํŠธ๊ฐ€ recoverLease๋ฅผ ๋ช‡ ๋ฒˆ ํ˜ธ์ถœํ• ์ง€ ์„ค์ •(default = 1),

์ง€์ •๋œ path์—์„œ lease ํšŒ๋ณต

HDFS lease >> ํด๋ผ์ด์–ธํŠธ์—๊ฒŒ ์“ฐ๊ธฐ ์ž‘์—…์„ ์œ„ํ•ด ํŒŒ์ผ ์—ฌ๋Š” ๊ฒƒ์„ ๊ถŒํ•œ์„ ๋ถ€์—ฌ ๋ฐ›์Œ

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

ํ•˜๋‘ก์˜ ๋ฌธ์ œ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์ŠคํŒŒํฌ ์ƒ๊น€

 

ํ•˜๋‘ก์˜ ๋ฌธ์ œ๋Š”

1. ๋ฐ˜๋ณต์ ์ธ ์ž‘์—…์—๋Š” ๋น„ํšจ์œจ์ ์ž„

2. ๋งต๋ฆฌ๋“€์Šค์‹œ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ์œผ๋กœ ์ธํ•ด ์„ฑ๋Šฅ์ €ํ•˜๋จ.

 

์ŠคํŒŒํฌ๋ž€?

๊ธฐ์กด ๋งต๋ฆฌ๋“€์Šค์˜ ๋””์Šคํฌ ์ž…์ถœ๋ ฅ์„ ๋ณด์™„ํ•˜์—ฌ

์ธ ๋ฉ”๋ชจ๋ฆฌ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํ”„๋ ˆ์ž„ ์›Œํฌ ์ด๋‹ค.

 

์ธ ๋ฉ”๋ชจ๋ฆฌ - ์ตœ์ดˆ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ, ์ถœ๋ ฅ์—๋งŒ ๋””์Šคํฌ์— ์ž‘์„ฑํ•จ์œผ๋กœ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ ๋ฐœ์ƒ ๋‚ฎ์ถค, ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋Š” ๋ณ‘๋ ฌ์ฒ˜๋ฆฌํ•จ

 

์ŠคํŒŒํฌ์˜ ์ฃผ์š”๊ธฐ๋Šฅ - ์ŠคํŒŒํฌ SQL, ์ŠคํŒŒํฌ ์ŠคํŠธ๋ฆฌ๋ฐ, ์ŠคํŒŒํฌ MLlib, ์ŠคํŒŒํฌ GraphX, ์ŠคํŒŒํฌ ์ฝ”์–ด, ์ŠคํŒŒํฌ ์ž‘์—… ์ฒ˜๋ฆฌ

 

์ŠคํŒŒํฌ ์•„ํ‚คํ…์ฒ˜

๋…ธ๋“œ๋งค๋‹ˆ์ € ์•ˆ์—  ๋“œ๋ผ์ด๋ฒ„ ํ”„๋กœ๊ทธ๋žจ์ด ์žˆ์Œ.

1. ๋“œ๋ผ์ด๋ฒ„ ํ”„๋กœ๊ทธ๋žจ์ด SparkContext ์ธ์Šคํ„ด์Šค ์ƒ์„ฑํ•จ(์ด๋•Œ yarn๊ณผ ์—ฐ๊ฒฐ)

2. executors ๋ฅผ ์š”๊ตฌ

3. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ฝ”๋“œ๋ฅผ executors์— ๋ณด๋‚ผ ๊ฒƒ

4. SparkContext ๋Š” executors๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด task๋ฅผ ๋ณด๋ƒ„

 

Spark์˜ Driver๋Š” YARN์—์„œ Application Master์™€ ๊ฐ™์Œ

 

์ŠคํŒŒํฌ ์„ค์น˜

1
2
3
$ wget https://downloads.apache.org/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz
$ tar xvzf spark-2.4.7-bin-hadoop2.7.tgz
ln -s spark-2.4.7-bin-hadoop2.7/ spark
cs

 

์ŠคํŒŒํฌ์˜ RDD

RDD์˜ ๊ฐœ๋…(Resilient Distributed Datasets)

-์ŠคํŒŒํฌ ๋‚ด์— ์ €์žฅ๋˜๋Š” ๋ฐ์ดํ„ฐ ์…‹ ํƒ€์ž…

-๋‚ด๋ถ€์ ์œผ๋กœ ์—ฐ์‚ฐํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค์„ ๋ชจ๋‘ RDD ํƒ€์ž…์œผ๋กœ ์ฒ˜๋ฆฌ

Immutable, Partitioned Collections of Records

์—ฌ๋Ÿฌ ๋ถ„์‚ฐ ๋…ธ๋“œ์— ๋‚˜๋ˆ„์–ด์ง€๋ฉฐ

๋‹ค์ˆ˜์˜ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๊ด€๋ฆฌ๋จ

๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์…‹

 

RDD์˜ ์ƒ์„ฑ

1. ์™ธ๋ถ€๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ

2. ์ฝ”๋“œ์—์„œ ์ƒ์„ฑ๋˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ๋•Œ

 

RDD๋ฅผ ์ œ์–ดํ•˜๋Š” 2๊ฐœ์˜ ์—ฐ์‚ฐ ํƒ€์ž…

1.Transformation : RDD์—์„œ ์ƒˆ๋กœ์šด RDD ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜ (filter, map)

2.Action : RDD์—์„œ RDD๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ํƒ€์ž…์˜ ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜(count, collect)

 

RDD ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•

1.Immutable : ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ ๋’ค ๋ณ€ํ•˜์ง€ ์•Š์Œ

2.Partitoned : ๋ฐ์ดํ„ฐ์…‹์„ ์ž˜๊ฒŒ ์ž๋ฆ„

 

RDD Partitioning

ํ•˜๋‚˜์˜ RDD๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๋‚˜๋‰œ๋‹ค.

ํŒŒํ‹ฐ์…˜์˜ ๊ฐœ์ˆ˜์™€ ํŒŒํ‹ฐ์…˜์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋‹ค.

 

RDD Dependency

-Narrow Dependency

ํŒŒํ‹ฐ์…˜์ด 1:1๋กœ ๋งคํ•‘ ๋˜์–ด ๋„คํŠธ์›Œํฌ ํ•„์š” ์—†๊ณ  ํ•˜๋‚˜์˜ ๋…ธ๋“œ์—์„œ ์ž‘์—… ๊ฐ€๋Šฅํ•˜๋‹ค, ๊ทธ๋ฆฌ๊ณ  ํŒŒํ‹ฐ์…˜ ๋ณต์› ์‰ฌ์›€  

-Wide Dependency

ํŒŒํ‹ฐ์…˜์ด 1:N๋กœ ๋งคํ•‘ ๋˜์–ด ํŒŒํ‹ฐ์…˜ ์žฌ๊ณ„์‚ฐ ๋น„์šฉ ๋น„์‹ธ๋ฉฐ, ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

RDD Lineage

RDD์—ฐ์‚ฐ ์ˆœ์„œ ๊ธฐ๋ก -> DAG (์ˆœํ™˜๋˜์ง€์•Š์Œ)

 Fault tolerant : ๊ณ„๋ณด๋กœ ๋˜‘๊ฐ™์€ RDD ์ƒ์„ฑ ๊ฐ€๋Šฅํ•จ

 Lazy execution

-Transformation ์—ฐ์‚ฐ ์‹œ ๊ณ„๋ณด ์ž‘์ • ๋จ

-Action ์—ฐ์‚ฐ ์‹œ ๊ณ„๋ณด ์‹คํ–‰๋จ

๋ฏธ๋ฆฌ ์ž‘์„ฑ๋œ ๊ณ„๋ณด๋กœ ์ž์›ํ• ๋‹น ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ์Œ

ํ˜„์žฌ ์“ฐ๊ณ ์žˆ๋Š” ์ž์›, ์•ž์œผ๋กœ ์‚ฌ์šฉํ•  ์ž์›, Dependency๋กœ ์ž‘์—… ์Šค์ผ€์ค„๋ง์— ํ™œ์šฉ ๊ฐ€๋Šฅํ•จ

 

Spark YARN ์‹คํ–‰

1
2
3
4
5
6
scala : spark-shell --master yarn --queue queue_name
python : pyspark --master yarn --queue queue_name
 
--driver-memory 3G : spark driver๊ฐ€ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ default = 1024M
--executor-memory 3G : ๊ฐ spark executor๊ฐ€ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ์–‘
--executor-cores NUM : ๊ฐ spark executor์˜ ์ฝ”์–ด์˜ ์–‘
cs

Spark Shell ์ž…๋ ฅ ํ›„ ์ฝ”๋“œ ์ž‘์„ฑํ•ด๋ณด๊ธฐ

1
2
3
4
x=sc.parallelize([“spark”, ”rdd”, ”example”, “sample”, “example”], 3)
y=x.map(lambda x:(x,1))
y.collect()
[(‘spark’,1), (‘rdd’,1), (‘example’,1), (‘sample’,1), (‘example’,1)]
cs

x์— ์ŠคํŒŒํฌ ์ฝ˜ํ…์ŠคํŠธ ๋ณ‘๋ ฌํ™”๋กœ ์ƒ์„ฑํ•จ

y์— x๋ฅผ ๋งตํ˜•์‹์œผ๋กœ x๊ฐ’๊ณผ 1 ์ €์žฅํ•จ

collect ์จ์„œ ์ง‘ํ•ฉ ์ถœ๋ ฅํ•จ

 

์ž‘์„ฑํ•œ ์ŠคํŒŒํฌ์—์„œ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•

ํŒŒ์ด์ฌ ํŒŒ์ผ  (num์€ ์“ฐ๋ ˆ๋“œ ๊ฐœ์ˆ˜, default ๊ฐ’์€ 2~4๊ฐœ ์ •๋„)

1
spark-submit –master local[num] ํŒŒ์ผ๋ช….py 
cs

์ž๋ฐ”,์Šค์นผ๋ผ ํŒŒ์ผ

1
spark-submit \ --class “SimpleApp”\ --master local[num] /location~/name.jar
cs

 

์ŠคํŒŒํฌ์—์„œ ๋งต๋ฆฌ๋“€์Šค

1
2
3
4
val input: RDD[(K1, V1)] = ...
val mapOutput: RDD[(K2, V2)] = input.flatMap(mapFn)
val shuffled: RDD[(K2, Iterable[V2])] = mapOutput.groupByKey().sortByKey()
val output: RDD[(K3, V3)] = shuffled.flatMap(reduceFn)
cs

RDD (K1,V1) ๋ฐ์ดํ„ฐ ์ž…๋ ฅ

flatMap()์—ฐ์‚ฐ ์ˆ˜ํ–‰ํ•˜์—ฌ RDD (K2,V2) ์ถœ๋ ฅ

RDD (K2,V2) ๊ฐ’์œผ๋กœ ์…”ํ”Œํ•จ , groupByKey()๊ณผ sortByKey()์—ฐ์‚ฐ ์ˆ˜ํ–‰ํ•จ

RDD (K3, V3) ๊ฐ’์— shuffledํ›„ flatMap() ์—ฐ์‚ฐํ•œ ๊ฒฐ๊ณผ๊ฐ’ ์ €์žฅํ•จ

 

 

728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์•„ํŒŒ์น˜ ์—์ด๋ธŒ๋กœ๋ž€ ? 

https://dennyglee.com/2013/03/12/using-avro-with-hdinsight-on-azure-at-343-industries/

- ํŠน์ • ์–ธ์–ด์— ์ข…์†๋˜์ง€ ์•Š๋Š” ์–ธ์–ด ์ค‘๋ฆฝ์  ๋ฐ์ดํ„ฐ ์ง๋ ฌํ™” ์‹œ์Šคํ…œ

- ํ•˜๋‘ก Writable์˜ ์ฃผ์š” ๋‹จ์ ์ธ ์–ธ์–ด ์ด์‹์„ฑ ํ•ด๊ฒฐ ์œ„ํ•ด ์ƒ๊ฒจ๋‚จ

 

์•„ํŒŒ์น˜ ์“ฐ๋ฆฌํ”„ํŠธ, ๊ตฌ๊ธ€ ํ”„๋กœํ† ์ฝœ ๋ฒ„ํผ์™€ ๋‹ค๋ฅธ ์ฐจ๋ณ„ํ™”๋œ ํŠน์„ฑ๊ฐ€์ง€๊ณ  ์žˆ์Œ

 

๋ฐ์ดํ„ฐ๋Š” ๋‹ค๋ฅธ ์‹œ์Šคํ…œ๊ณผ ๋น„์Šทํ•˜๊ฒŒ ์–ธ์–ด ๋…๋ฆฝ ์Šคํ‚ค๋งˆ๋กœ ๊ธฐ์ˆ ๋จ

์—์ด๋ธŒ๋กœ์—์„œ ์ฝ”๋“œ ์ƒ์„ฑ์€ ์„ ํƒ์‚ฌํ•ญ์ž„

๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์“ฐ๋Š” ์‹œ์ ์— ์Šคํ‚ค๋งˆ๋Š” ํ•ญ์ƒ ์กด์žฌํ•œ๋‹ค ๊ฐ€์ •ํ•จ - ๋งค์šฐ ๊ฐ„๊ฒฐํ•œ ์ฝ”๋”ฉ์ด ๊ฐ€๋Šฅ

 

์Šคํ‚ค๋งˆ์˜ ์ž‘์„ฑ

JSON

๋ฐ์ดํ„ฐ๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ ํฌ๋งท์œผ๋กœ ์ธ์ฝ”๋”ฉ

 

์—์ด๋ธŒ๋กœ ๋ช…์„ธ - ๋ชจ๋“  ๊ตฌํ˜„์ฒด๊ฐ€ ์ง€์›ํ•ด์•ผ ํ•˜๋Š” ๋ฒ„์ด๋„ˆ๋ฆฌ ํฌ๋งท์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ

API - ์—์ด๋ธŒ๋กœ ๋ช…์„ธ์—์„œ ๋น ์ ธ์žˆ๋Š” ๋‚ด์šฉ์ž„. ๊ฐ ํŠน์ •์–ธ์–ด์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ์ž‘์„ฑ๋จ. ์–ธ์–ด์˜ ๋ฐ”์ธ๋”ฉ ํŽธ์˜์„ฑ ๋†’์ด๊ณ  ์ƒํ˜ธ์šด์˜์„ฑ ์ €ํ•˜ ๋ฌธ์ œ ํ•ด๊ฒฐ๋จ

 

์Šคํ‚ค๋งˆํ•ด์„ - ์‹ ์ค‘ํ•˜๊ฒŒ ์ •์˜๋œ ์–ด๋– ํ•œ ์ œ์•ฝ์กฐ๊ฑด์—์„œ๋„ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์Šคํ‚ค๋งˆ์™€ ๋ฐ์ดํ„ฐ๋ฅผ ์“ฐ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์Šคํ‚ค๋งˆ๊ฐ€ ๊ฐ™์ง€ ์•Š์•„๋„ ๋œ๋‹ค.(์Šคํ‚ค๋งˆ ๋ณ€ํ˜• ๋ฉ”์ปค๋‹ˆ์ฆ˜)

 

ex ) ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์„ ๋•Œ ์‚ฌ์šฉํ•œ ์Šคํ‚ค๋งˆ์— ์ƒˆ๋กœ์šด ํ•„๋“œ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž์™€ ๊ธฐ์กด ์‚ฌ์šฉ์ž๋Š” ๋ชจ๋‘ ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌธ์ œ์—†์ด ์ฝ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž๋Š” ์ƒˆ๋กœ์šด ํ•„๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์“ธ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ์กด ์‚ฌ์šฉ์ž๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ฒŒ๋˜๋Š”๋ฐ ์ƒˆ๋กœ์šด ํ•„๋“œ๋Š” ๋ฌด์‹œํ•˜๊ณ  ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์ž‘์—…์ฒ˜๋Ÿผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๊ฐ์ฒด ์ปจํ…Œ์ด๋„ˆ ํฌ๋งท ์ œ๊ณต(ํ•˜๋‘ก ์‹œํ€€์Šค ํŒŒ์ผ๊ณผ ์œ ์‚ฌํ•จ)

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์€ ์Šคํ‚ค๋งˆ๊ฐ€ ์ €์žฅ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์„น์…˜์„ ํฌํ•จํ•˜๊ณ  ์žˆ์–ด ์ž์‹ ์„ ์„ค๋ช…ํ•˜๋Š” ํŒŒ์ผ์ž„

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์€ ์••์ถ•๊ณผ ๋ถ„ํ•  ๊ธฐ๋Šฅ ์ œ๊ณต

 

์—์ด๋ธŒ๋กœ ์ž๋ฃŒํ˜•๊ณผ ์Šคํ‚ค๋งˆ

์—์ด๋ธŒ๋กœ์˜ ๊ธฐ๋ณธ ์ž๋ฃŒํ˜• ํ‘œ์‹œ

1
2
3
4
5
6
7
8
{"type":"null"}
{"type":"boolean"}
{"type":"int"}
{"type":"long"}
{"type":"float"}
{"type":"double"}
{"type":"bytes"}
{"type":"string"}
cs

์—์ด๋ธŒ๋กœ ๋ณตํ•ฉ ์ž๋ฃŒํ˜•

array ์ˆœ์„œ์žˆ๋Š” ๊ฐ์ฒด ์ง‘ํ•ฉ, ๋™์ผ ํƒ€์ž…

1
2
3
4
5
{
 "type""array",
 "items""long"
}
 
cs

map ์ˆœ์„œ ์—†๋Š” ํ‚ค-๊ฐ’, ๋™์ผ ํƒ€์ž…

1
2
3
4
{
 "type""map",
 "values""string"
}
cs

record ์ž„์˜์˜ ์ž๋ฃŒํ˜•

1
2
3
4
5
6
7
8
9
10
11
{
 "type""record",
 "name""WeatherRecord",
 "doc""A weather reading.",
 "fields": [
 {"name""year""type""int"},
 {"name""temperature""type""int"},
 {"name""stationId""type""string"}
 ]
}
 
cs

enum ๋ช…๋ช…๋œ ๊ฐ’์˜ ์ง‘ํ•ฉ

1
2
3
4
5
6
7
{
 "type""enum",
 "name""Cutlery",
 "doc""An eating utensil.",
 "symbols": ["KNIFE""FORK""SPOON"]
}
 
cs

fixed ๊ณ ์ •๊ธธ์ด 8๋น„ํŠธ ๋ถ€ํ˜ธ ์—†๋Š” ๋ฐ”์ดํŠธ

1
2
3
4
5
6
{
 "type""fixed",
 "name""Md5Hash",
 "size"16
}
 
cs

union ์Šคํ‚ค๋งˆ์˜ ์œ ๋‹ˆ์˜จ, ๋ฐฐ์—ด์˜ ๊ฐ์š”์†Œ๋Š” ์Šคํ‚ค๋งˆ์ž„

1
2
3
4
5
[
 "null",
 "string",
 {"type""map""values""string"}
]
cs

 

 

์—์ด๋ธŒ๋กœ ์ž๋ฃŒํ˜•๊ณผ ๋‹ค๋ฅธ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด ์ž๋ฃŒํ˜• ๋งคํ•‘ ํ•„์š”

 

์ž๋ฐ” - ์ œ๋„ค๋ฆญ ๋งคํ•‘ : ์Šคํ‚ค๋งˆ๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์—†์„ ๋•Œ

์ž๋ฐ”, C++ - ๊ตฌ์ฒด์  ๋งคํ•‘ : ์Šคํ‚ค๋งˆ ๋ฐ์ดํ„ฐ ํ‘œํ˜„ ์ฝ”๋“œ ์ƒ์„ฑ

์ž๋ฐ” - ๋ฆฌํ”Œ๋ ‰ํŠธ ๋งคํ•‘ : ์—์ด๋ธŒ๋กœ ์ž๋ฃŒํ˜•์„ ๊ธฐ์กด ์ž๋ฐ” ์ž๋ฃŒํ˜•์œผ๋กœ ๋งคํ•‘ 

 

 

์ธ๋ฉ”๋ชจ๋ฆฌ ์ง๋ ฌํ™”์™€ ์—ญ์ง๋ ฌํ™”

์—์ด๋ธŒ๋กœ ์Šคํ‚ค๋งˆ ์˜ˆ์‹œ - ํ•ด๋‹น ์—์ด๋ธŒ๋กœ ์Šคํ‚ค๋งˆ๋Š” StringPair.avsc ์— ์ €์žฅ๋จ 

1
2
3
4
5
6
7
8
9
{
 "type""record",
 "name""StringPair",
 "doc""A pair of strings.",
 "fields": [
 {"name""left""type""string"},
 {"name""right""type""string"}
 ]
}
cs

ํŒŒ์ผ์„ ํด๋ž˜์Šค ๊ฒฝ๋กœ์— ์ €์žฅํ•œ ํ›„ ๋กœ๋”ฉํ•จ

1
2
3
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(
getClass().getResourceAsStream("StringPair.avsc"));
cs

์ œ๋„ค๋ฆญ API ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—์ด๋ธŒ๋กœ ๋ ˆ์ฝ”๋“œ์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•จ

1
2
3
4
GenericRecord datum = new GenericData.Record(schema);
datum.put("left""L");
datum.put("right""R");
cs

์ถœ๋ ฅ ์ŠคํŠธ๋ฆผ์— ๋ ˆ์ฝ”๋“œ๋ฅผ ์ง๋ ฌํ™”ํ•จ

1
2
3
4
5
6
7
8
ByteArrayOutputStream out = new ByteArrayOutputStream();
DatumWriter<GenericRecord> writer =
new GenericDatumWriter<GenericRecord>(schema);
Encoder encoder = EncoderFactory.get().binaryEncoder(outnull);
writer.write(datum, encoder);
encoder.flush();
out.close();
cs

DatumReader : ๋ฐ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ ์ธ์ฝ”๋”๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋ฃŒํ˜•์œผ๋กœ ๋ฐ˜ํ™˜

GenericDatumReader : GenericRecord์˜ ํ•„๋“œ๋ฅผ ์ธ์ฝ”๋”๋กœ ์ „๋‹ฌ

์ด์ „์˜ ์ƒ์„ฑ๋œ ์ธ์ฝ”๋”๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ธ์ฝ”๋” ํŒฉํ† ๋ฆฌ์— null ์ „๋‹ฌ

writer() ์ŠคํŠธ๋ฆผ ๋‹ซ๊ธฐ ์ „์— ํ•„์š”ํ•˜๋ฉด ๋” ํ˜ธ์ถœ ๊ฐ€๋Šฅ

encoder.flush(); write๋ฉ”์„œ๋“œ ํ˜ธ์ถœ ํ›„ ์ธ์ฝ”๋” ํ”Œ๋Ÿฌ์‹œํ•˜๊ณ  ์ถœ๋ ฅ ์ŠคํŠธ๋ฆผ ๋‹ซ์Œ

 

ํ•ด๋‹น ๊ณผ์ • ๋ฐ˜๋Œ€๋กœ ํ•˜๋ฉด ๋ฐ”์ดํŠธ ๋ฒ„ํผ์—์„œ ๊ฐ์ฒด ์ฝ์„ ์ˆ˜ ์žˆ์Œ

1
2
3
4
5
6
7
 DatumReader<GenericRecord> reader =
 new GenericDatumReader<GenericRecord>(schema);
 Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(),
 null);
 GenericRecord result = reader.read(null, decoder);
 assertThat(result.get("left").toString(), is("L"));
 assertThat(result.get("right").toString(), is("R"));
cs

 

 

๊ตฌ์ฒด์ ์ธ API

๋ฉ”์ด๋ธ์— ์ถ”๊ฐ€ํ•˜์—ฌ ์ž๋ฐ”๋กœ๋œ ์Šคํ‚ค๋งˆ ์ฝ”๋“œ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<project>
 ...
 <build>
 <plugins>
 <plugin>
 <groupId>org.apache.avro</groupId>
 <artifactId>avro-maven-plugin</artifactId>
 <version>${avro.version}</version>
 <executions>
 <execution>
 <id>schemas</id>
 <phase>generate-sources</phase>
 <goals>
 <goal>schema</goal>
 </goals>
 <configuration>
 <includes>
 <include>StringPair.avsc</include>
 </includes>
 <stringType>String</stringType>
 <sourceDirectory>src/main/resources</sourceDirectory>
 <outputDirectory>${project.build.directory}/generated-sources/java
 </outputDirectory>
 </configuration>
 </execution>
 </executions>
 </plugin>
 </plugins>
 </build>
 ...
</project>
 
cs

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ

์ธ๋ฉ”๋ชจ๋ฆฌ ์ŠคํŠธ๋ฆผ์—์„œ ํŒŒ์ผ ์ฝ๊ธฐ

๋ฐ์ดํ„ฐ ํŒŒ์ผ = ์—์ด๋ธŒ๋กœ ์Šคํ‚ค๋งˆ +  ํ—ค๋” (๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (์‹ฑํฌ๋งˆ์ปค ํฌํ•จ)) +์ผ๋ จ์˜ ๋ธ”๋ก (์ง๋ ฌํ™”๋œ ์—์ด๋ธŒ๋กœ ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š”)

๋ฐ์ดํ„ฐ ํŒŒ์ผ์— ๊ธฐ๋ก๋œ ๊ฐ์ฒด๋Š” ๋ฐ˜๋“œ์‹œ ํŒŒ์ผ์˜ ์Šคํ‚ค๋งˆ์™€ ์ผ์น˜ํ•ด์•ผํ•œ๋‹ค.

์ผ์น˜ํ•˜์ง€์•Š๋Š” ๊ฒฝ์šฐ์— append์‹œ ์˜ˆ์™ธ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
File file = new File("data.avro");
DatumWriter<GenericRecord> writer =
 new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter =
new DataFileWriter<GenericRecord>(writer);
dataFileWriter.create(schema, file);
dataFileWriter.append(datum);
dataFileWriter.close();
 
cs

 

ํŒŒ์ผ์— ํฌํ•จ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ฐธ์กฐํ•˜์—ฌ ์ฝ๊ธฐ ๋•Œ๋ฌธ์— ์Šคํ‚ค๋งˆ๋ฅผ ๋”ฐ๋กœ ์ •์˜ํ•˜์ง€์•Š์•„๋„๋จ

getSchema()๋ฅผ ์ด์šฉํ•˜๋ฉด DataFileReader ์ธ์Šคํ„ด์Šค์˜ ์Šคํ‚ค๋งˆ ์ •๋ณด ์–ป์„ ์ˆ˜ ์žˆ๊ณ  ์›๋ณธ ๊ฐ์ฒด์— ์‚ฌ์šฉํ•œ ์Šคํ‚ค๋งˆ์™€ ๊ฐ™์€์ง€ ํ™•์ธ๊ฐ€๋Šฅ 

1
2
3
4
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> dataFileReader =
new DataFileReader<GenericRecord>(file, reader);
assertThat("Schema is the same", schema, is(dataFileReader.getSchema()));
cs

DataFileReader๋Š” ์ •๊ทœ ์ž๋ฐ” ๋ฐ˜๋ณต์ž๋กœ hasNext, next๋ฉ”์„œ๋“œ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ˜ธ์ถœํ•˜์—ฌ

๋ชจ๋“  ๋ฐ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ ์ˆœํ™˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•œ๊ฐœ๋งŒ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ๊ธฐ๋Œ€ํ•œ ํ•„๋“œ๊ฐ’ ์žˆ๋Š” ์ง€ ํ™•์ธ

1
2
3
4
5
assertThat(dataFileReader.hasNext(), is(true));
GenericRecord result = dataFileReader.next();
assertThat(result.get("left").toString(), is("L"));
assertThat(result.get("right").toString(), is("R"));
assertThat(dataFileReader.hasNext(), is(false));
cs

์ƒํ˜ธ ์šด์˜์„ฑ

ํŒŒ์ด์ฌ API

1
2
3
4
5
6
import os
import string
import sys
from avro import schema
from avro import io
from avro import datafile
cs

 

์—์ด๋ธŒ๋กœ ๋„๊ตฌ

1
% java -jar $AVRO_HOME/avro-tools-*.jar tojson pairs.avro
cs

 

์Šคํ‚ค๋งˆ ํ•ด์„

๊ธฐ๋กํ•  ๋•Œ ์‚ฌ์šฉํ•œ writer ์Šคํ‚ค๋งˆ์™€ ๋‹ค๋ฅธ reader์˜ ์Šคํ‚ค๋งˆ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์‹œ ์ฝ์„ ์ˆ˜ ์žˆ๋‹ค,.

์ถ”๊ฐ€๋œ ํ•„๋“œ - reader ์‹ ๊ทœ์ผ ๋•Œ, reader๋Š” ์‹ ๊ทœ ํ•„๋“œ์˜ ๊ธฐ๋ณธ๊ฐ’์ด์šฉ

์ถ”๊ฐ€๋œ ํ•„๋“œ - writer ์‹ ๊ทœ์ผ ๋•Œ, reader๋Š” ์‹ ๊ทœ ํ•„๋“œ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด์‹œํ•จ

์ œ๊ฑฐ๋œ ํ•„๋“œ - reader ์‹ ๊ทœ์ผ ๋•Œ, ์‚ญ์ œ๋œ ํ•„๋“œ ๋ฌด์‹œ

์ œ๊ฑฐ๋œ ํ•„๋“œ - writer ์‹ ๊ทœ์ผ ๋•Œ, ์ œ๊ฑฐ๋œ ํ•„๋“œ ๊ธฐ๋กํ•˜์ง€ ์•Š์Œ. reader ์˜ ์Šคํ‚ค๋งˆ๋ฅผ writer ์Šคํ‚ค๋งˆ์™€ ๊ฐ™๊ฒŒ ๋งž์ถ”๊ฑฐ๋‚˜ ์ด์ „์œผ๋กœ ๊ฐฑ์‹ ํ•จ

์ •๋ ฌ ์ˆœ์„œ

record๋ฅผ ์ œ์™ธ์•ˆ ๋ชจ๋“  ์ž๋ฃŒํ˜•์—๋Š” ์ˆœ์„œ๊ฐ€ ์ •ํ•ด์ ธ์žˆ์Œ

record๋Š” order ์†์„ฑ ๋ช…์‹œํ•˜์—ฌ ์ •๋ ฌ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค.

์˜ค๋ฆ„์ฐจ์ˆœ (๊ธฐ๋ณธ)

๋‚ด๋ฆผ์ฐจ์ˆœ : descending

๋ฌด์‹œ : ignore

 

์—์ด๋ธŒ๋กœ ๋งต๋ฆฌ๋“€์Šค

๋‚ ์”จ ๋ ˆ์ฝ”๋“œ 

1
2
3
4
5
6
7
8
9
10
{
 "type""record",
 "name""WeatherRecord",
 "doc""A weather reading.",
 "fields": [
 {"name""year""type""int"},
 {"name""temperature""type""int"},
 {"name""stationId""type""string"}
 ]
}
cs

์ตœ๊ณ  ๊ธฐ์˜จ์„ ์ฐพ๋Š” ๋งต๋ฆฌ๋“€์Šค ํ”„๋กœ๊ทธ๋žจ, ์—์ด๋ธŒ๋กœ ์ถœ๋ ฅ ๋งŒ๋“ฆ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
public class AvroGenericMaxTemperature extends Configured implements Tool {
 
 private static final Schema SCHEMA = new Schema.Parser().parse(
 "{" +
 " \"type\": \"record\"," +
 " \"name\": \"WeatherRecord\"," +
 " \"doc\": \"A weather reading.\"," +
 " \"fields\": [" +
 " {\"name\": \"year\", \"type\": \"int\"}," +
 " {\"name\": \"temperature\", \"type\": \"int\"}," +
 " {\"name\": \"stationId\", \"type\": \"string\"}" +
 " ]" +
 "}"
 );
 public static class MaxTemperatureMapper
 extends Mapper<LongWritable, Text, AvroKey<Integer>,
 AvroValue<GenericRecord>> {
 private NcdcRecordParser parser = new NcdcRecordParser();
 private GenericRecord record = new GenericData.Record(SCHEMA);
 @Override
 protected void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {
 parser.parse(value.toString());
 if (parser.isValidTemperature()) {
 record.put("year", parser.getYearInt());
 record.put("temperature", parser.getAirTemperature());
 record.put("stationId", parser.getStationId());
 context.write(new AvroKey<Integer>(parser.getYearInt()),
 new AvroValue<GenericRecord>(record));
 }
 }
 }
 
 public static class MaxTemperatureReducer
 extends Reducer<AvroKey<Integer>, AvroValue<GenericRecord>,
 AvroKey<GenericRecord>, NullWritable> {
 @Override
 protected void reduce(AvroKey<Integer> key, Iterable<AvroValue<GenericRecord>>
 values, Context context) throws IOException, InterruptedException {
 GenericRecord max = null;
 for (AvroValue<GenericRecord> value : values) {
 GenericRecord record = value.datum();
 if (max == null ||
360 | Chapter 12: Avro (Integer) record.get("temperature"> (Integer) max.get("temperature")) {
 max = newWeatherRecord(record);
 }
 }
 context.write(new AvroKey(max), NullWritable.get());
 }
 private GenericRecord newWeatherRecord(GenericRecord value) {
 GenericRecord record = new GenericData.Record(SCHEMA);
 record.put("year", value.get("year"));
 record.put("temperature", value.get("temperature"));
 record.put("stationId", value.get("stationId"));
 return record;
 }
 }
 @Override
 public int run(String[] args) throws Exception {
 if (args.length != 2) {
 System.err.printf("Usage: %s [generic options] <input> <output>\n",
 getClass().getSimpleName());
 ToolRunner.printGenericCommandUsage(System.err);
 return -1;
 }
 Job job = new Job(getConf(), "Max temperature");
 job.setJarByClass(getClass());
 job.getConfiguration().setBoolean(
 Job.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 AvroJob.setMapOutputKeySchema(job, Schema.create(Schema.Type.INT));
 AvroJob.setMapOutputValueSchema(job, SCHEMA);
 AvroJob.setOutputKeySchema(job, SCHEMA);
 job.setInputFormatClass(TextInputFormat.class);
 job.setOutputFormatClass(AvroKeyOutputFormat.class);
 job.setMapperClass(MaxTemperatureMapper.class);
 job.setReducerClass(MaxTemperatureReducer.class);
 return job.waitForCompletion(true) ? 0 : 1;
 }
 
 public static void main(String[] args) throws Exception {
 int exitCode = ToolRunner.run(new AvroGenericMaxTemperature(), args);
 System.exit(exitCode);
 }
}
cs

ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰

1
2
3
4
export HADOOP_CLASSPATH=avro-examples.jar
export HADOOP_USER_CLASSPATH_FIRST=true # override version of Avro in Hadoop
% hadoop jar avro-examples.jar AvroGenericMaxTemperature \
 input/ncdc/sample.txt output

๊ฒฐ๊ณผ๋ฌผ์ถœ๋ ฅ

1
2
3
% java -jar $AVRO_HOME/avro-tools-*.jar tojson output/part-r-00000.avro
{"year":1949,"temperature":111,"stationId":"012650-99999"}
{"year":1950,"temperature":22,"stationId":"011990-99999"}
cs

 

์—์ด๋ธŒ๋กœ ๋งต๋ฆฌ๋“€์Šค ์ด์šฉํ•ด ์ •๋ ฌ

์—์ด๋ธŒ๋กœ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์ •๋ ฌํ•˜๋Š” ๋งต๋ฆฌ๋“€์Šค ํ”„๋กœ๊ทธ๋žจ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
public class AvroSort extends Configured implements Tool {
 static class SortMapper<K> extends Mapper<AvroKey<K>, NullWritable,
 AvroKey<K>, AvroValue<K>> {
 @Override
 protected void map(AvroKey<K> key, NullWritable value,
 Context context) throws IOException, InterruptedException {
 context.write(key, new AvroValue<K>(key.datum()));
 }
 }
 static class SortReducer<K> extends Reducer<AvroKey<K>, AvroValue<K>,
 AvroKey<K>, NullWritable> {
 @Override
 protected void reduce(AvroKey<K> key, Iterable<AvroValue<K>> values,
 Context context) throws IOException, InterruptedException {
 for (AvroValue<K> value : values) {
 context.write(new AvroKey(value.datum()), NullWritable.get());
 }
 }
 }
 @Override
 public int run(String[] args) throws Exception {
 
 if (args.length != 3) {
 System.err.printf(
 "Usage: %s [generic options] <input> <output> <schema-file>\n",
 getClass().getSimpleName());
 ToolRunner.printGenericCommandUsage(System.err);
 return -1;
 }
 
 String input = args[0];
 String output = args[1];
 String schemaFile = args[2];
 Job job = new Job(getConf(), "Avro sort");
 job.setJarByClass(getClass());
 job.getConfiguration().setBoolean(Job.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true);
 FileInputFormat.addInputPath(job, new Path(input));
 FileOutputFormat.setOutputPath(job, new Path(output));
 AvroJob.setDataModelClass(job, GenericData.class);
 Schema schema = new Schema.Parser().parse(new File(schemaFile));
 AvroJob.setInputKeySchema(job, schema);
 AvroJob.setMapOutputKeySchema(job, schema);
 AvroJob.setMapOutputValueSchema(job, schema);
 AvroJob.setOutputKeySchema(job, schema);
 job.setInputFormatClass(AvroKeyInputFormat.class);
 job.setOutputFormatClass(AvroKeyOutputFormat.class);
 job.setOutputKeyClass(AvroKey.class);
 job.setOutputValueClass(NullWritable.class);
 job.setMapperClass(SortMapper.class);
 job.setReducerClass(SortReducer.class);
 return job.waitForCompletion(true) ? 0 : 1;
 }
 
 public static void main(String[] args) throws Exception {
 int exitCode = ToolRunner.run(new AvroSort(), args);
 System.exit(exitCode);
 }
}
cs

์ •๋ ฌ์€ ๋งต๋ฆฌ๋“€์Šค ์…”ํ”Œ ๊ณผ์ •์—์„œ ์ผ์–ด๋‚˜๋ฉฐ ์ •๋ ฌ๊ธฐ๋Šฅ์€ ์—์ด๋ธŒ๋กœ์˜ ์Šคํ‚ค๋งˆ์— ์˜ํ•ด ์ •ํ•ด์ง

์ž…๋ ฅ๋ฐ์ดํ„ฐ ์ ๊ฒ€

1
2
3
4
5
% java -jar $AVRO_HOME/avro-tools-*.jar tojson input/avro/pairs.avro
{"left":"a","right":"1"}
{"left":"c","right":"2"}
{"left":"b","right":"3"}
{"left":"b","right":"2"}
cs

ํ”„๋กœ๊ทธ๋žจ์‚ฌ์šฉํ•˜์—ฌ ์ •๋ ฌ

1
2
% hadoop jar avro-examples.jar AvroSort input/avro/pairs.avro output \
 ch12-avro/src/main/resources/SortedStringPair.avsc
cs

์ •๋ ฌ ํ›„ ์ €์žฅ๋œ ํŒŒ์ผ ์ถœ๋ ฅ

1
2
3
4
5
% java -jar $AVRO_HOME/avro-tools-*.jar tojson output/part-r-00000.avro
{"left":"b","right":"3"}
{"left":"b","right":"2"}
{"left":"c","right":"2"}
{"left":"a","right":"1"}
cs
728x90
๋ฐ˜์‘ํ˜•

+ Recent posts