Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add single col joinsyntax, 4th if_else arg #81

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# TidierDB.jl updates
## v0.5.1 - 2024-10-29
- adds `@union_all` to bind all rows not just distinct rows as with `@union`
- joining syntax now supports `(table1, table2, col_name)` when joining columns have shared name
- `if_else` now has optional final argument for handling missing values to match TidierData

# TidierDB.jl updates
## v0.5.0 - 2024-10-15
Breaking Changes:
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "TidierDB"
uuid = "86993f9b-bbba-4084-97c5-ee15961ad48b"
authors = ["Daniel Rizk <[email protected]> and contributors"]
version = "0.5.0"
version = "0.5.1"

[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ TidierDB.jl currently supports the following top-level macros:
| **Category** | **Supported Macros and Functions** |
|----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Data Manipulation** | `@arrange`, `@group_by`, `@filter`, `@select`, `@mutate` (supports `across`), `@summarize`/`@summarise` (supports `across`), `@distinct` |
| **Joining** | `@left_join`, `@right_join`, `@inner_join`, `@anti_join`, `@full_join`, `@semi_join`, `@union` |
| **Joining** | `@left_join`, `@right_join`, `@inner_join`, `@anti_join`, `@full_join`, `@semi_join`, `@union`, `@union_all` |
| **Slice and Order** | `@slice_min`, `@slice_max`, `@slice_sample`, `@order`, `@window_order`, `@window_frame` |
| **Utility** | `@show_query`, `@collect`, `@head`, `@count`, `show_tables`, `@create_view` , `drop_view` |
| **Helper Functions** | `across`, `desc`, `if_else`, `case_when`, `n`, `starts_with`, `ends_with`, `contains`, `as_float`, `as_integer`, `as_string`, `is_missing`, `missing_if`, `replace_missing` |
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/UserGuide/duckplyr_reprex.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
# ```julia
# @chain DB.t(data) begin
# DB.@filter(str_detect(count, r"^\d+$"))
# DB.@mutate(count_ = "TRY_CAST(count AS INT)")
# DB.@mutate(count_ = as_integer(count))
# DB.@filter(count_ > 0)
# DB.@inner_join(
# (@chain DB.t(age) begin
Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ TidierDB.jl currently supports:
| **Category** | **Supported Macros and Functions** |
|----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Data Manipulation** | `@arrange`, `@group_by`, `@filter`, `@select`, `@mutate` (supports `across`), `@summarize`/`@summarise` (supports `across`), `@distinct` |
| **Joining** | `@left_join`, `@right_join`, `@inner_join`, `@anti_join`, `@full_join`, `@semi_join`, `@union` |
| **Joining** | `@left_join`, `@right_join`, `@inner_join`, `@anti_join`, `@full_join`, `@semi_join`, `@union`, `@union_all` |
| **Slice and Order** | `@slice_min`, `@slice_max`, `@slice_sample`, `@order`, `@window_order`, `@window_frame` |
| **Utility** | `@show_query`, `@collect`, `@head`, `@count`, `show_tables`, `@create_view` , `drop_view` |
| **Helper Functions** | `across`, `desc`, `if_else`, `case_when`, `n`, `starts_with`, `ends_with`, `contains`, `as_float`, `as_integer`, `as_string`, `is_missing`, `missing_if`, `replace_missing` |
Expand Down
2 changes: 1 addition & 1 deletion ext/LibPQExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ function TidierDB.show_tables(con::LibPQ.Connection)
end

function TidierDB.final_compute(sqlquery::SQLQuery, ::Type{<:postgres}, sql_cr_or_relace::String=nothing)
final_query = finalize_query(sqlquery)
final_query = TidierDB.finalize_query(sqlquery)
final_query = sql_cr_or_relace * final_query
return LibPQ.execute(sq.db, final_query)
end
Expand Down
2 changes: 1 addition & 1 deletion ext/MySQLExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ function TidierDB.show_tables(conn::MySQL.Connection)
end

function TidierDB.final_compute(sqlquery::SQLQuery, ::Type{<:mysql}, sql_cr_or_relace::String=nothing)
final_query = finalize_query(sqlquery)
final_query = TidierDB.finalize_query(sqlquery)
final_query = sql_cr_or_relace * final_query
return DBInterface.execute(sq.db, final_query)
end
Expand Down
15 changes: 5 additions & 10 deletions src/TBD_macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ $docstring_select
"""
macro select(sqlquery, exprs...)
exprs = parse_blocks(exprs...)
# exprs_str = parse_interpolation2.(exprs)

return quote
exprs_str = map(expr -> isa(expr, Symbol) ? string(expr) : expr, $exprs)
let columns = parse_tidy_db(exprs_str, $(esc(sqlquery)).metadata)
Expand Down Expand Up @@ -39,7 +39,7 @@ $docstring_filter
"""
macro filter(sqlquery, conditions...)
conditions = parse_blocks(conditions...)
conditions = parse_interpolation2.(conditions)

return quote
sq = $(esc(sqlquery))
if isa(sq, SQLQuery)
Expand Down Expand Up @@ -121,7 +121,6 @@ desc(col::Symbol) = (col, :desc)
$docstring_arrange
"""
macro arrange(sqlquery, columns...)
columns = parse_interpolation2.(columns)

# Initialize a string to hold column order specifications
order_specs = String[]
Expand Down Expand Up @@ -200,7 +199,7 @@ $docstring_mutate
"""
macro mutate(sqlquery, mutations...)
mutations = parse_blocks(mutations...)
mutations = parse_interpolation2.(mutations)

return quote
sq = $(esc(sqlquery))
if isa(sq, SQLQuery)
Expand Down Expand Up @@ -309,7 +308,6 @@ $docstring_group_by
"""
macro group_by(sqlquery, columns...)
columns = parse_blocks(columns...)
columns = parse_interpolation2.(columns)

return quote
columns_str = map(col -> isa(col, Symbol) ? string(col) : col, $columns)
Expand Down Expand Up @@ -347,7 +345,6 @@ end
$docstring_distinct
"""
macro distinct(sqlquery, distinct_columns...)
distinct_columns = parse_interpolation2.(distinct_columns)
return quote
sq = $(esc(sqlquery))
if isa(sq, SQLQuery)
Expand Down Expand Up @@ -408,7 +405,6 @@ $docstring_summarize
"""
macro summarize(sqlquery, expressions...)
expressions = parse_blocks(expressions...)
expressions = parse_interpolation2.(expressions)

return quote
sq = $(esc(sqlquery))
Expand Down Expand Up @@ -471,7 +467,7 @@ $docstring_count
"""
macro count(sqlquery, group_by_columns...)
# Convert the group_by_columns to a string representation
group_by_columns = parse_interpolation2.(group_by_columns)

group_by_cols_str = [string(col) for col in group_by_columns]
group_clause = join(group_by_cols_str, ", ")

Expand Down Expand Up @@ -511,7 +507,7 @@ $docstring_rename
"""
macro rename(sqlquery, renamings...)
renamings = parse_blocks(renamings...)
renamings = parse_interpolation2.(renamings)

return quote
# Prepare the renaming rules from the macro arguments
renamings_dict = Dict{String, String}()
Expand Down Expand Up @@ -713,4 +709,3 @@ $docstring_show_tables
function show_tables(con::Union{DuckDB.DB, DuckDB.Connection})
return DataFrame(DBInterface.execute(con, "SHOW ALL TABLES"))
end

109 changes: 80 additions & 29 deletions src/db_parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -190,33 +190,75 @@ function parse_tidy_db(exprs, metadata::DataFrame)
return included_columns
end

using MacroTools

function parse_if_else(expr)
transformed_expr = MacroTools.postwalk(expr) do x
# Ensure we're dealing with an Expr object and it's a call to if_else
if isa(x, Expr) && x.head == :call && x.args[1] == :if_else && length(x.args) == 4
# Extract condition, true_case, and false_case from the arguments
condition = x.args[2]
true_case = x.args[3]
false_case = x.args[4]

# Check and handle `missing` cases and formatting for string literals
true_case_formatted = (string(true_case) == "missing") ? "NULL" : (isa(true_case, String) ? "'$true_case'" : true_case)
false_case_formatted = (string(false_case) == "missing") ? "NULL" : (isa(false_case, String) ? "'$false_case'" : false_case)

# Construct the SQL CASE statement as a string
sql_case = "CASE WHEN $(condition) THEN $(true_case_formatted) ELSE $(false_case_formatted) END"

# Return just the string
return sql_case
# Check if the expression is a call to if_else
if isa(x, Expr) && x.head == :call && x.args[1] == :if_else
args_length = length(x.args)

# Handle 4-argument if_else
if args_length == 4
condition = x.args[2]
true_case = x.args[3]
false_case = x.args[4]

# Format true_case
true_case_formatted = string(true_case) == "missing" ? "NULL" :
isa(true_case, String) ? "'$true_case'" : true_case

# Format false_case
false_case_formatted = string(false_case) == "missing" ? "NULL" :
isa(false_case, String) ? "'$false_case'" : false_case

# Construct SQL CASE WHEN statement
sql_case = "CASE WHEN $(condition) THEN $(true_case_formatted) ELSE $(false_case_formatted) END"

return sql_case

# Handle 5-argument if_else
elseif args_length == 5
condition = x.args[2]
true_case = x.args[3]
false_case = x.args[4]
missing_case = x.args[5]

# Format true_case
true_case_formatted = string(true_case) == "missing" ? "NULL" :
isa(true_case, String) ? "'$true_case'" : true_case

# Format false_case
false_case_formatted = string(false_case) == "missing" ? "NULL" :
isa(false_case, String) ? "'$false_case'" : false_case

# Format missing_case
missing_case_formatted = string(missing_case) == "missing" ? "NULL" :
isa(missing_case, String) ? "'$missing_case'" : missing_case

# Construct SQL CASE WHEN statement
sql_case = "CASE WHEN $(condition) THEN $(true_case_formatted) ELSE $(false_case_formatted) END"

# Wrap the CASE statement to handle the missing_case
# This ensures that if the result of CASE is NULL, it remains NULL
sql_case_with_missing = "CASE WHEN ($sql_case) IS NULL THEN $(missing_case_formatted) ELSE ($sql_case) END"

return sql_case_with_missing

else
# Unsupported number of arguments; return as is
return x
end
else
# Return the unmodified object if it's not an Expr or not an if_else call
# Not an if_else call; return as is
return x
end
end
return transformed_expr
end



function parse_case_when(expr)
MacroTools.postwalk(expr) do x
# Ensure we're dealing with an Expr object
Expand Down Expand Up @@ -439,15 +481,6 @@ function parse_interpolation2(expr)
end
end
end
#my_var = "gear"
#my_var = :gear
#my_val = 3.7
#my_var = [:gear, :cyl]
#expr = :((!!my_var) * (!!my_val))
#parse_interpolation2(expr)

#expr = :((!!my_val) * (!!my_var))
#parse_interpolation2(expr)


function parse_blocks(exprs...)
Expand Down Expand Up @@ -481,14 +514,32 @@ function construct_window_clause(sq::SQLQuery ; from_cumsum::Bool = false)
end

function parse_join_expression(expr)
if expr.head == :(=)
if isa(expr, Expr) && expr.head == :(=)
# Handle equality condition (e.g., ticker = ticker)
rhs_column = expr.args[1]
lhs_column = expr.args[2]
# Convert column references to strings
lhs_col_str = string(lhs_column)
rhs_col_str = string(rhs_column)
return lhs_col_str, rhs_col_str

elseif isa(expr, Symbol)
# Handle single column reference (e.g., ticker)
col_str = string(expr)
return col_str, col_str

else
error("Unsupported join expression: $expr")
end
end

function parse_closest_expression(expr)
as_of = ""
if expr.head == :call && string(expr.args[1]) == "closest"
inner_expr = expr.args[2]
as_of = " ASOF "
and = " AND"
return string(" ", inner_expr), as_of, and
else
error("Expression must be of the form lhs = rhs")
error("Expression must be of the form closest(expr)")
end
end
59 changes: 54 additions & 5 deletions src/docstrings.jl
Original file line number Diff line number Diff line change
Expand Up @@ -736,7 +736,7 @@ julia> @chain db_table(db, :df_mem) begin
const docstring_full_join =
"""
@inner_join(sql_query, join_table, orignal_table_col = new_table_col)

Perform an full join between two SQL queries based on a specified condition.
This syntax here is slightly different than TidierData.jl, however, because
SQL does not drop the joining column, for the metadata storage, it is
Expand All @@ -754,7 +754,7 @@ julia> df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
value = repeat(1:5, 2),
percent = 0.1:0.1:1.0);

julia> df2 = DataFrame(id2 = ["AA", "AC", "AE", "AG", "AI", "AK", "AM"],
julia> df2 = DataFrame(id = ["AA", "AC", "AE", "AG", "AI", "AK", "AM"],
category = ["X", "Y", "X", "Y", "X", "Y", "X"],
score = [88, 92, 77, 83, 95, 68, 74]);

Expand All @@ -765,12 +765,12 @@ julia> copy_to(db, df, "df_mem");
julia> copy_to(db, df2, "df_join");

julia> @chain db_table(db, :df_mem) begin
@full_join((@chain db_table(db, "df_join") @filter(score > 70)), id = id2)
@full_join((@chain db_table(db, "df_join") @filter(score > 70)), id)
#@aside @show_query _
@collect
end
11×7 DataFrame
Row │ id groups value percent id2 category score
Row │ id groups value percent id_1 category score
│ String? String? Int64? Float64? String? String? Int64?
─────┼──────────────────────────────────────────────────────────────────
1 │ AA bb 1 0.1 AA X 88
Expand Down Expand Up @@ -1292,7 +1292,7 @@ Combine two SQL queries using the `UNION` operator.
- `sql_query2`: The second SQL query to combine.

# Returns
- A new SQL query struct representing the combined queries.
- A lazy query of all distinct rows in the second query bound to the first

# Examples
```julia
Expand Down Expand Up @@ -1336,6 +1336,55 @@ julia> @chain t(df1_table) begin
2 │ 2 20
3 │ 3 30
4 │ 5 50

julia> @chain t(df1_table) begin
@union(t(df1_table))
@collect
end
3×2 DataFrame
Row │ id value
│ Int64 Int64
─────┼──────────────
1 │ 1 10
2 │ 2 20
3 │ 3 30
```
"""

const docstring_union_all =
"""
@union(sql_query1, sql_query2)

Combine two SQL queries using the `UNION ALL ` operator.

# Arguments
- `sql_query1`: The first SQL query to combine.
- `sql_query2`: The second SQL query to combine.

# Returns
- A lazy query of all rows in the second query bound to the first

# Examples
```julia
julia> db = connect(duckdb());

julia> df1 = DataFrame(id = [1, 2, 3], value = [10, 20, 30]);

julia> copy_to(db, df1, "df1");

julia> df1_table = db_table(db, "df1");

julia> @chain t(df1_table) @union_all(df1_table) @collect
6×2 DataFrame
Row │ id value
│ Int64 Int64
─────┼──────────────
1 │ 1 10
2 │ 2 20
3 │ 3 30
4 │ 1 10
5 │ 2 20
6 │ 3 30
```
"""

Expand Down
Loading