[Hadoop] Hadoop 3.1.1 ๋ฒ์ ํ ์คํธ / ํ๋ก ํ ์คํธ ๋ฐฉ๋ฒ
ํ๋ก์ ์ค์นํ๊ณ ํ๊ฒฝ์ ๋ง๊ฒ ์ธํ ํ ํ ์ ์์ ์ผ๋ก ์ค์น๊ฐ ๋์๋์ง ํ์ธํ ํ์๊ฐ ์๋ค.
๊ธฐ๋ณธ์ ์ผ๋ก ์ ๊ณตํ๋ example.jar ํ์ผ์ ํตํด
์์์ ๋ฐ์ดํฐ๋ฅผ ์์ฑํ๊ณ ๋ณ๋ ฌ์ฒ๋ฆฌํ์ฌ ์ ๋ ฌ, ์ ๋ ฌ๋ ๋ฐ์ดํฐ์ ์ ํจ์ฑ์ ๊ฒ์ฌํ์ฌ ๊ธฐ๋ณธ์ ์ธ ์ฑ๋ฅ์ ํ ์คํธํ๋ ๋ฐฉ๋ฒ์ด๋ค.
๊ณต์์ฌ์ดํธ์์๋ ์์ ์ฐพ๊ธฐ ํ๋ค์๊ณ , microsoft azure๋ฅผ ์ฐธ๊ณ ํ์ฌ ํ ์คํธ๋ฅผ ์งํํ๋ค.
https://docs.microsoft.com/ko-kr/azure/hdinsight/hadoop/apache-hadoop-run-samples-linux
10GB GraySort ์์
- TeraGen: ์ ๋ ฌํ ๋ฐ์ดํฐ์ ํ์ ์์ฑํ๋ MapReduce ํ๋ก๊ทธ๋จ
- TeraSort: ์ ๋ ฅ ๋ฐ์ดํฐ๋ฅผ ์ํ๋งํ๊ณ MapReduce๋ฅผ ์ฌ์ฉํ์ฌ ๋ฐ์ดํฐ๋ฅผ ์ ์ฒด ์์๋ก ์ ๋ ฌ
- TeraSort๋ ์ฌ์ฉ์ ์ง์ ํํฐ์ ๋๋ฅผ ์ ์ธํ๊ณ ํ์ค MapReduce ์ ๋ ฌ์ ๋๋ค. ํํฐ์ ๋๋ ๊ฐ reduce์ ํค ๋ฒ์๋ฅผ ์ ์ํ๋ N-1 ์ํ ํค์ ์ ๋ ฌ๋ ๋ชฉ๋ก์ ์ฌ์ฉํฉ๋๋ค. ํนํ, sample[i-1] <= key < sample[i]์ ๊ฐ์ ๋ชจ๋ ํค๋ reduce i๋ก ์ ์ก๋ฉ๋๋ค. ์ด ํํฐ์ ๋๋ reduce i์ ์ถ๋ ฅ์ด ๋ชจ๋ reduce i+1์ ์ถ๋ ฅ๋ณด๋ค ์๋๋ก ๋ณด์ฆํฉ๋๋ค.
- TeraValidate: ์ถ๋ ฅ์ด ์ ์ญ์ผ๋ก ์ ๋ ฌ๋๋์ง ํ์ธํ๋ MapReduce ํ๋ก๊ทธ๋จ
ํด๋ฌ์คํฐ์ ๊ธฐ๋ณธ ์คํ ๋ฆฌ์ง์ ์ ์ฅ๋๋ 10GB์ ๋ฐ์ดํฐ๋ฅผ ์์ฑํฉ๋๋ค.
1
|
$ yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar
teragen -Dmapred.map.tasks=50 100000000 /example/data/10GB-sort-input |
cs |
๋ฐ์ดํฐ๋ฅผ ์ ๋ ฌํฉ๋๋ค.
1
|
$ yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar
terasort -Dmapred.map.tasks=50 -Dmapred.reduce.tasks=25 /example/data/10GB-sort-input /example/data/10GB-sort-output |
cs |
์์ฑ๋ ๋ฐ์ดํฐ์ ์ ํจ์ฑ์ ๊ฒ์ฌํฉ๋๋ค.
1
|
$ yarn jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar
teravalidate -Dmapred.map.tasks=50 -Dmapred.reduce.tasks=25 /example/data/10GB-sort-output /example/data/10GB-sort-validate |
cs |
example.jar ํ์ผ์ ๋ช ๊ณผ ์์น๋ ํ๋ก์ ๋ฒ์ ์ ๋ฐ๋ผ ์์ดํ๋ค.
jps ๋ช ๋ น์ด๋ก ํ๋ก ๋ฐ๋ชฌ๋ค์ด ์ฌ๋ผ์ ์๋์ง ํ์ธ ํ ํ ์คํธ๊ฐ ํ์ํ๋ค.
ํด๋น ํ ์คํธ์ ๋ฌธ์ ์์ผ๋ฉด Map, Reduce ํจ์๊ฐ ๋์ํ๋ค.
ํ ์คํธ๊ฐ ์ ์์ ์ผ๋ก ๋์ํ์ง ์์ ๋, config ๊ฐ๋ค์ ์์ ํด์ค์ผํ๋ค.
๊ทธ๋ฆฌ๊ณ ๋ค์๋ ธ๋์ ๋ฐ์ดํฐ ๋ ธ๋๊ฐ ์ ์์ ์ผ๋ก ์ฐ๊ฒฐ๋์ด์์ง์์ ๋๋ ํ ์คํธ๊ฐ ์ ์์ ์ผ๋ก ์งํ๋์ง์๋๋ค.
hdfs dfsadmin -report๋ก ํ์ธํด์ค์ผํ๋ฉฐ, ๋์ฒด์ ์ผ๋ก data๋ ธ๋์์ data๋ ธ๋์ ์ ๋ณด๊ฐ ์ ์ฅ๋๋ ๋๋ ํฐ๋ฆฌ๋ฅผ ์ญ์ ํ ์ฌ์์ํ๋ฉด ์ ์์ ์ผ๋ก ๋์ํ๋ค.
์๋ ๋ก๊ทธ๋ ํ ๋ผ์ํธ๊ฐ ์ ์์ ์ผ๋ก ๋๋ ํ์ ๋ก๊ทธ์ด๋ค.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
2021-06-28 11:04:18,129 INFO mapreduce.Job: Job job_1624841289722_0002 completed successfully
2021-06-28 11:04:18,257 INFO mapreduce.Job: Counters: 55
File System Counters
FILE: Number of bytes read=10400013350
FILE: Number of bytes written=20816541505
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10000006150
HDFS: Number of bytes written=10000000000
HDFS: Number of read operations=275
HDFS: Number of large read operations=0
HDFS: Number of write operations=50
Job Counters
Killed map tasks=2
Killed reduce tasks=2
Launched map tasks=50
Launched reduce tasks=27
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=3529488
Total time spent by all reduces in occupied slots (ms)=4102629
Total time spent by all map tasks (ms)=1764744
Total time spent by all reduce tasks (ms)=1367543
Total vcore-milliseconds taken by all map tasks=1764744
Total vcore-milliseconds taken by all reduce tasks=1367543
Total megabyte-milliseconds taken by all map tasks=3614195712
Total megabyte-milliseconds taken by all reduce tasks=4201092096
Map-Reduce Framework
Map input records=100000000
Map output records=100000000
Map output bytes=10200000000
Map output materialized bytes=10400007500
Input split bytes=6150
Combine input records=0
Combine output records=0
Reduce input groups=100000000
Reduce shuffle bytes=10400007500
Reduce input records=100000000
Reduce output records=100000000
Spilled Records=200000000
Shuffled Maps =1250
Failed Shuffles=0
Merged Map outputs=1250
GC time elapsed (ms)=32863
CPU time spent (ms)=581450
Physical memory (bytes) snapshot=61669625856
Virtual memory (bytes) snapshot=247083884544
Total committed heap usage (bytes)=68783828992
Peak Map Physical memory (bytes)=817356800
Peak Map Virtual memory (bytes)=2748321792
Peak Reduce Physical memory (bytes)=869761024
Peak Reduce Virtual memory (bytes)=4411772928
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000000
File Output Format Counters
Bytes Written=10000000000
2021-06-28 11:04:18,273 INFO terasort.TeraSort: done
|
cs |