The popurse of this blog (Including source code) is to demonstrate a reliable way to connect to PostgreSQL server from Spark 2.1.1.
In this case, the file is "postgresql-42.1.1.jar"
val driver = "org.postgresql.Driver"
connectionProperties.put("driver", driver)
The following is a "popular" SQL question:) Given a table of employee with their salaries and departments, find the highest three slaralies in each department.
The employee table in PostgreSQL database.
create table employee (
id int,
name char(50),
salary int,
department char(50)
| id| name|salary| department|
| 1|Joe ...| 70000|IT ...|
| 2|Henry ...| 80000|Sales ...|
| 3|Sam ...| 60000|Sales ...|
| 4|Max ...| 90000|IT ...|
| 5|Janet ...| 69000|IT ...|
| 6|Randy ...| 85000|IT ...|
| department| name|salary|
|IT ...|Max ...| 90000|
|IT ...|Randy ...| 85000|
|IT ...|Joe ...| 70000|
|Sales ...|Henry ...| 80000|
|Sales ...|Sam ...| 60000|
val employees_table =, "employee", connectionProperties).cache()
select department, name, salary
from (
select department, name, salary,dense_rank() over(partition by department order by salary desc) salary_rank
from global_temp.employee
) t
where salary_rank <= 3
order by department, salary desc
var query_str = """
(select e.department, name, e.salary
from employee e
where e.salary in
select distinct salary as salary_d
from employee
where department=e.department
order by salary_d desc
limit 3
order by e.department, e.salary desc) as e_q
""",query_str , connectionProperties)