๊ด€๋ฆฌ ๋ฉ”๋‰ด

data_lab

[Spark] ์•„ํŒŒ์น˜ ์ŠคํŒŒํฌ, Apache Spark ๋ช…๋ น์–ด ์ •๋ฆฌ ๋ณธ๋ฌธ

BIGDATA/ํ•˜๋‘ก์—์ฝ”์‹œ์Šคํ…œ

[Spark] ์•„ํŒŒ์น˜ ์ŠคํŒŒํฌ, Apache Spark ๋ช…๋ น์–ด ์ •๋ฆฌ

๐Ÿฐํžˆํžˆ 2021. 5. 2. 22:27

์ŠคํŒŒํฌ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ, ๋ฉ”๋ชจ๋ฆฌ์™€ ์ฝ”์–ด๋ฅผ ์„ค์ •ํ•˜์—ฌ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

x=sc.parallelize([“spark”, ”rdd”, ”example”, “sample”, “example”], 3)  ๋ณ‘๋ ฌํ™”(transformation)

x=x.map(lambda x:(x,1))  #์ž…๋ ฅ๊ฐ’ : x   ์ถœ๋ ฅ๊ฐ’:  (x,1)  ๋งคํ•‘(transformation)

y.collect   ์ง‘ํ•ฉ(action)

[(‘spark’,1), (‘rdd’,1), (‘example’,1), (‘sample’,1), (‘example’,1)]

 

spark yarn ์‹คํ–‰

scala : spark-shell --master yarn --queue queue_name

python : pyspark --master yarn --queue queue_name

--driver-memory 3G : spark driver๊ฐ€ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ default = 1024M

--executor-memory 3G : ๊ฐ spark executor๊ฐ€ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ์–‘

--executor-cores NUM : ๊ฐ spark executor์˜ ์ฝ”์–ด์˜ ์–‘

์ž‘์„ฑํ•œ ํŒŒ์ผ Spark์—์„œ ์‹คํ–‰์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

ํŒŒ์ด์ฌ ํŒŒ์ผ

spark-submit –master local[num] ํŒŒ์ผ๋ช….py 

  (num์€ ์“ฐ๋ ˆ๋“œ ๊ฐœ์ˆ˜,default ๊ฐ’์€ 2~4๊ฐœ ์ •๋„)

์ž๋ฐ”,์Šค์นผ๋ผ

spark-submit \ --class “SimpleApp”\ --master local[num] /location~/name.jar

 

728x90
๋ฐ˜์‘ํ˜•