๊ด€๋ฆฌ ๋ฉ”๋‰ด

data_lab

[Hadoop] Hadoop 3.1.1 ๋ฒ„์ „ ํ…Œ์ŠคํŠธ / ํ•˜๋‘ก ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ• ๋ณธ๋ฌธ

BIGDATA/ํ•˜๋‘ก์—์ฝ”์‹œ์Šคํ…œ

[Hadoop] Hadoop 3.1.1 ๋ฒ„์ „ ํ…Œ์ŠคํŠธ / ํ•˜๋‘ก ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•

๐Ÿฐํžˆํžˆ 2021. 6. 28. 11:12

ํ•˜๋‘ก์„ ์„ค์น˜ํ•˜๊ณ  ํ™˜๊ฒฝ์— ๋งž๊ฒŒ ์„ธํŒ…ํ•œ ํ›„ ์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๊ฐ€ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

 

๊ธฐ๋ณธ์ ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” example.jar ํŒŒ์ผ์„ ํ†ตํ•ด

 

์ž„์˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋ณ‘๋ ฌ์ฒ˜๋ฆฌํ•˜์—ฌ ์ •๋ ฌ, ์ •๋ ฌ๋œ ๋ฐ์ดํ„ฐ์˜ ์œ ํšจ์„ฑ์„ ๊ฒ€์‚ฌํ•˜์—ฌ ๊ธฐ๋ณธ์ ์ธ ์„ฑ๋Šฅ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

 

๊ณต์‹์‚ฌ์ดํŠธ์—์„œ๋Š” ์˜ˆ์ œ ์ฐพ๊ธฐ ํž˜๋“ค์—ˆ๊ณ , microsoft azure๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ–ˆ๋‹ค.

 

https://docs.microsoft.com/ko-kr/azure/hdinsight/hadoop/apache-hadoop-run-samples-linux

 

HDInsight์—์„œ Apache Hadoop MapReduce ์˜ˆ์ œ ์‹คํ–‰ - Azure

HDInsight์— ํฌํ•จ๋œ jar ํŒŒ์ผ์˜ MapReduce ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ์ž‘ํ•˜์„ธ์š”. SSH๋ฅผ ํ†ตํ•ด ํด๋Ÿฌ์Šคํ„ฐ์— ์—ฐ๊ฒฐํ•œ ๋‹ค์Œ Hadoop ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ ์ž‘์—…์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

docs.microsoft.com

 

10GB GraySort ์˜ˆ์ œ

  • TeraGen: ์ •๋ ฌํ•  ๋ฐ์ดํ„ฐ์˜ ํ–‰์„ ์ƒ์„ฑํ•˜๋Š” MapReduce ํ”„๋กœ๊ทธ๋žจ
  • TeraSort: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  MapReduce๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒด ์ˆœ์„œ๋กœ ์ •๋ ฌ
  • TeraSort๋Š” ์‚ฌ์šฉ์ž ์ง€์ • ํŒŒํ‹ฐ์…”๋„ˆ๋ฅผ ์ œ์™ธํ•˜๊ณ  ํ‘œ์ค€ MapReduce ์ •๋ ฌ์ž…๋‹ˆ๋‹ค. ํŒŒํ‹ฐ์…”๋„ˆ๋Š” ๊ฐ reduce์˜ ํ‚ค ๋ฒ”์œ„๋ฅผ ์ •์˜ํ•˜๋Š” N-1 ์ƒ˜ํ”Œ ํ‚ค์˜ ์ •๋ ฌ๋œ ๋ชฉ๋ก์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, sample[i-1] <= key < sample[i]์™€ ๊ฐ™์€ ๋ชจ๋“  ํ‚ค๋Š” reduce i๋กœ ์ „์†ก๋ฉ๋‹ˆ๋‹ค. ์ด ํŒŒํ‹ฐ์…”๋„ˆ๋Š” reduce i์˜ ์ถœ๋ ฅ์ด ๋ชจ๋‘ reduce i+1์˜ ์ถœ๋ ฅ๋ณด๋‹ค ์ž‘๋„๋ก ๋ณด์ฆํ•ฉ๋‹ˆ๋‹ค.
  • TeraValidate: ์ถœ๋ ฅ์ด ์ „์—ญ์œผ๋กœ ์ •๋ ฌ๋˜๋Š”์ง€ ํ™•์ธํ•˜๋Š” MapReduce ํ”„๋กœ๊ทธ๋žจ

ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ธฐ๋ณธ ์Šคํ† ๋ฆฌ์ง€์— ์ €์žฅ๋˜๋Š” 10GB์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

1
$ yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar 
teragen -Dmapred.map.tasks=50 100000000 /example/data/10GB-sort-input
cs

 

๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.

1
$ yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar 
terasort -Dmapred.map.tasks=50 -Dmapred.reduce.tasks=25 
/example/data/10GB-sort-input /example/data/10GB-sort-output
cs

 

 ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ์œ ํšจ์„ฑ์„ ๊ฒ€์‚ฌํ•ฉ๋‹ˆ๋‹ค.

1
$ yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar 
teravalidate -Dmapred.map.tasks=50 -Dmapred.reduce.tasks=25 
/example/data/10GB-sort-output /example/data/10GB-sort-validate
cs

 

example.jar ํŒŒ์ผ์˜ ๋ช…๊ณผ ์œ„์น˜๋Š” ํ•˜๋‘ก์˜ ๋ฒ„์ „์— ๋”ฐ๋ผ ์ƒ์ดํ•˜๋‹ค.

 

jps ๋ช…๋ น์–ด๋กœ ํ•˜๋‘ก ๋ฐ๋ชฌ๋“ค์ด ์˜ฌ๋ผ์™€ ์žˆ๋Š”์ง€ ํ™•์ธ ํ›„ ํ…Œ์ŠคํŠธ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

ํ•ด๋‹น ํ…Œ์ŠคํŠธ์— ๋ฌธ์ œ ์—†์œผ๋ฉด Map, Reduce ํ•จ์ˆ˜๊ฐ€ ๋™์ž‘ํ•œ๋‹ค.

 

ํ…Œ์ŠคํŠธ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š์„ ๋•Œ, config ๊ฐ’๋“ค์„ ์ˆ˜์ •ํ•ด์ค˜์•ผํ•œ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๋„ค์ž„๋…ธ๋“œ์™€ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด์žˆ์ง€์•Š์„ ๋•Œ๋„ ํ…Œ์ŠคํŠธ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์ง„ํ–‰๋˜์ง€์•Š๋Š”๋‹ค.

hdfs dfsadmin -report๋กœ ํ™•์ธํ•ด์ค˜์•ผํ•˜๋ฉฐ, ๋Œ€์ฒด์ ์œผ๋กœ data๋…ธ๋“œ์—์„œ data๋…ธ๋“œ์˜ ์ •๋ณด๊ฐ€ ์ €์žฅ๋˜๋Š” ๋””๋ ‰ํ„ฐ๋ฆฌ๋ฅผ ์‚ญ์ œ ํ›„ ์žฌ์‹œ์ž‘ํ•˜๋ฉด ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค.

 

์•„๋ž˜ ๋กœ๊ทธ๋Š” ํ…Œ๋ผ์†ŒํŠธ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ๋๋‚œ ํ›„์˜ ๋กœ๊ทธ์ด๋‹ค.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2021-06-28 11:04:18,129 INFO mapreduce.Job: Job job_1624841289722_0002 completed successfully
2021-06-28 11:04:18,257 INFO mapreduce.Job: Counters: 55
        File System Counters
                FILE: Number of bytes read=10400013350
                FILE: Number of bytes written=20816541505
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=10000006150
                HDFS: Number of bytes written=10000000000
                HDFS: Number of read operations=275
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=50
        Job Counters
                Killed map tasks=2
                Killed reduce tasks=2
                Launched map tasks=50
                Launched reduce tasks=27
                Data-local map tasks=50
                Total time spent by all maps in occupied slots (ms)=3529488
                Total time spent by all reduces in occupied slots (ms)=4102629
                Total time spent by all map tasks (ms)=1764744
                Total time spent by all reduce tasks (ms)=1367543
                Total vcore-milliseconds taken by all map tasks=1764744
                Total vcore-milliseconds taken by all reduce tasks=1367543
                Total megabyte-milliseconds taken by all map tasks=3614195712
                Total megabyte-milliseconds taken by all reduce tasks=4201092096
        Map-Reduce Framework
                Map input records=100000000
                Map output records=100000000
                Map output bytes=10200000000
                Map output materialized bytes=10400007500
                Input split bytes=6150
                Combine input records=0
                Combine output records=0
                Reduce input groups=100000000
                Reduce shuffle bytes=10400007500
                Reduce input records=100000000
                Reduce output records=100000000
                Spilled Records=200000000
                Shuffled Maps =1250
                Failed Shuffles=0
                Merged Map outputs=1250
                GC time elapsed (ms)=32863
                CPU time spent (ms)=581450
                Physical memory (bytes) snapshot=61669625856
                Virtual memory (bytes) snapshot=247083884544
                Total committed heap usage (bytes)=68783828992
                Peak Map Physical memory (bytes)=817356800
                Peak Map Virtual memory (bytes)=2748321792
                Peak Reduce Physical memory (bytes)=869761024
                Peak Reduce Virtual memory (bytes)=4411772928
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=10000000000
        File Output Format Counters
                Bytes Written=10000000000
2021-06-28 11:04:18,273 INFO terasort.TeraSort: done
 
cs
728x90
๋ฐ˜์‘ํ˜•