Apache Spark howto import data from a jdbc database using python

Using Apache spark 2.0 and python I’ll show how to import a table from a relational database (using its jdbc driver) into a python dataframe and save it in a parquet file. In this demo the database is an oracle 12.x file jdbc-to-parquet.py:``` from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName(“Python Spark SQL basic example”) \ .getOrCreate() df = spark.read.format(“jdbc”).options(url=“jdbc:oracle:thin:ro/ro@mydboracle.redaelli.org:1521:MYSID”, dbtable=“myuser.dim_country”, driver=“oracle.jdbc.OracleDriver”).load() df.write.parquet(“country.parquet”)

October 27, 2016 · 1 min · 66 words · Matteo Redaelli