Using Apache spark 2.0 and python I’ll show how to import a table from a relational database (using its jdbc driver) into a python dataframe and save it in a parquet file. In this demo the database is an oracle 12.x file jdbc-to-parquet.py:```
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName(“Python Spark SQL basic example”) \
.getOrCreate()
df = spark.read.format(“jdbc”).options(url=“jdbc:oracle:thin:ro/ro@mydboracle.redaelli.org:1521:MYSID”, dbtable=“myuser.dim_country”, driver=“oracle.jdbc.OracleDriver”).load()
df.write.parquet(“country.parquet”)