๋ฐ์ํ
250x250
Notice
Recent Posts
Recent Comments
Link
์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- ๋น ๋ฐ์ดํฐ
- HBase
- kudu
- ํ๋ก์์ฝ์์คํ
- ๋ฐ์ดํฐ๋ถ์์ค์ ๋ฌธ๊ฐ
- ํ์ด๋ธ
- CLF-C02
- ํ๋ก์ค์น
- ๋ฆฌ๋ ์ค
- ์ค์๋ธ
- airflow
- ๋ฆฌ๋ ์ค RPM
- ๋ฆฟ์ฝ๋
- standalone
- rpmbuild
- ํด ์ค์น
- ์๋ผ์คํฑ์์น
- LeetCode
- hadoop
- ์คํํฌ
- ํ๋ก
- elastic stack
- BIGDATA
- ์ค์นผ๋ผ ๋ถ
- Apache spark
- ์ค์๋ธ ์ค์น
- ์ฑ๊ธ์๋ฒ
- aws ccp
- ansible
- ์ํ์น ์คํํฌ
Archives
- Today
- Total
data_lab
spark mysql option ๋ณธ๋ฌธ
partitionColumnํํฐ์ ์ ๊ฒฐ์ ํ๋ ๋ฐ ์ฌ์ฉํด์ผ ํ๋ ์ด
lowerBound๊ฐ์ ธ์ฌ ๊ฐ์ ๋ฒ์ ๋ฅผ upperBound๊ฒฐ์ ํฉ๋๋ค.
์ ์ฒด ๋ฐ์ดํฐ ์ธํธ๋ ๋ค์ ์ฟผ๋ฆฌ์ ํด๋นํ๋ ํ์ ์ฌ์ฉ
SELECT * FROM table WHERE partitionColumn BETWEEN lowerBound AND upperBound
- lowerBound: 0
- upperBound: 1000
- numPartitions: 10
Stride๋ 100์ด๊ณ ํํฐ์ ์ ๋ค์ ์ฟผ๋ฆฌ์ ํด๋นํฉ๋๋ค.
- SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100
- SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200
- SELECT * FROM table WHERE partitionColumn BETWEEN 900 AND 1000
upperBound / numPartitions - lowerBound / numPartitions
upperbound ๊ตฌํ๋ ๋ฒ
query = f"""SELECT MIN({partitionColumn}), MAX({partitionColumn}) FROM ({db.table})"""
min_max_df = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql:postgres") \
.option("dbtable", "db.table") \
.option("user", "user")\
.option("password", "pass") \
.option("query", query) \
.load()
lowerBound, upperBound = min_max_df.collect()[0]
728x90
๋ฐ์ํ