๋ฐ˜์‘ํ˜•

partitionColumnํŒŒํ‹ฐ์…˜์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š” ์—ด

lowerBound๊ฐ€์ ธ์˜ฌ ๊ฐ’์˜ ๋ฒ”์œ„ ๋ฅผ upperBound๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. 

 

์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ๋‹ค์Œ ์ฟผ๋ฆฌ์— ํ•ด๋‹นํ•˜๋Š” ํ–‰์„ ์‚ฌ์šฉ

 

SELECT * FROM table WHERE partitionColumn BETWEEN lowerBound AND upperBound

  • lowerBound: 0
  • upperBound: 1000
  • numPartitions: 10

Stride๋Š” 100์ด๊ณ  ํŒŒํ‹ฐ์…˜์€ ๋‹ค์Œ ์ฟผ๋ฆฌ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

  • SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100
  • SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200
  • SELECT * FROM table WHERE partitionColumn BETWEEN 900 AND 1000

 

 

 

upperBound / numPartitions - lowerBound / numPartitions

 

upperbound ๊ตฌํ•˜๋Š” ๋ฒ•

query = f"""SELECT MIN({partitionColumn}), MAX({partitionColumn}) FROM ({db.table})"""
min_max_df = spark.read \
	.format("jdbc") \
	.option("url", "jdbc:postgresql:postgres") \
	.option("dbtable", "db.table") \
	.option("user", "user")\
	.option("password", "pass") \
	.option("query", query) \
	.load()
lowerBound, upperBound = min_max_df.collect()[0]
728x90
๋ฐ˜์‘ํ˜•
๋ฐ˜์‘ํ˜•

์ŠคํŒŒํฌ ์„ธ์…˜ ์ƒ์„ฑ

val spark = SparkSession
    .appName("Spark Session")
    .config("config.name",congfig.value")
    .getOrCreate()

 

 

์ŠคํŒŒํฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ -> ์ŠคํŒŒํฌ ๋“œ๋ผ์ด๋ฒ„ ํ”„๋กœ๊ทธ๋žจ -> ์ŠคํŒŒํฌ ์„ธ์…˜ ๊ฐ์ฒด

 

์ŠคํŒŒํฌ ๋“œ๋ผ์ด๋ฒ„๋Š” ์ŠคํŒŒํฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ํ•˜๋‚˜์ด์ƒ์˜ ์žก์œผ๋กœ ๋ณ€ํ™˜ 

์žก์—๋Š” ์—ฐ์‚ฐ์ด ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ŠคํŒŒํฌ ์Šคํ…Œ์ด์ง€๋กœ ๋‚˜๋‰จ

๊ฐ ์Šคํ…Œ์ด์ง€๋Š” ์ตœ์†Œ ์‹คํ–‰ ๋‹จ์œ„์ด๋ฉฐ ์—ฐํ•ฉ ์‹คํ–‰๋˜๋Š” ์ŠคํŒŒํฌ ํƒœ์Šคํฌ๋“ค๋กœ ์ด๋ฃจ์–ด์ง

 

728x90
๋ฐ˜์‘ํ˜•

+ Recent posts