-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write unit test(s) for Spark functions #17
Comments
Hi, would the scope for this be the following path - |
Hi @lexara-prime-ai Welcome! Great question! I have been trying to mirror the doctests from pyspark 3.5. And all the column unit tests in the docs are technically using the function Might be worth while to move those tests into |
Thanks @sjrusso8! The explanation makes a lot of sense. Appreciate the groundwork you've laid with the existing test cases, they've been incredibly helpful as I piece everything together. I'll do my best to contribute and assist as much as possible. |
Hi @sjrusso8 hope you're doing well. Quick question on how the Moving the "definition" from here // functions that require exactly two col arguments
generate_functions!(
two_cols: nvl,
nullif,
isnotnull,
ifnull, to here // functions that require a single col argument
generate_functions!(
one_col: isnan,
isnull,
isnotnull, // <---
sqrt, allows compilation and the test passes... res -> RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "r1", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int64>
[
null,
1,
], BooleanArray
[
false,
true,
]], row_count: 2 }
res -> RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "r1", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int64>
[
null,
1,
], BooleanArray
[
true,
false,
]], row_count: 2 }
test functions::tests::test_func_is_not_null ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 117 filtered out; finished in 0.11s INITIAL COMPILER DIAGNOSTIC --> core/src/functions/mod.rs:1180:32
|
1180 | .select([col("a"), isnotnull("a").alias("r1")])
| ^^^^^^^^^-----
| ||
| |expected all arguments to be this `&str` type because they need to match the type of this parameter
| an argument of type `&str` is missing
|
note: function defined here
--> core/src/functions/mod.rs:468:5
|
294 | pub fn $func_name<T: expressions::ToExpr>(col1: T, col2: T) -> Column
| - ------- ------- this parameter needs to match the `&str` type of `col1`
| | |
| | `col2` needs to match the `&str` type of this parameter
| `col1` and `col2` all reference this parameter T
...
468 | isnotnull,
| ^^^^^^^^^
help: provide the argument
|
1180 | .select([col("a"), isnotnull("a", /* &str */).alias("r1")]) |
@lexara-prime-ai I see what you mean. It should be only one input argument for the function, so your change is correct. Probably was not caught because of the lack of unit tests :) I'm getting the list of arguments from the docs here -> https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/functions.html |
Awesome! |
* Add test for ascending order with NULLs first. * feat(tests): add test for ascending order with NULLs first (#17) * feat(tests): implement `test_func_asc_nulls_first` - Verify sorting functionality for ascending order with NULL values positioned first. * feat(tests): add test for ascending order with NULLs last (#17) * feat(tests): implement `test_func_asc_nulls_last` - Verify sorting functionality for ascending order with NULL values positioned last. * feat(tests): add tests for descending order with NULLs first and last (#49) * feat(tests): implement `test_func_desc_nulls_first` - Verify sorting functionality for descending order with NULL values positioned first. * feat(tests): implement `test_func_desc_nulls_last` - Verify sorting functionality for descending order with NULL values positioned last. * validate robustness and accuracy of the sorting mechanism * feat(tests): add tests for descending order with NULLs first and last (#17) * feat(tests): implement `test_func_desc_nulls_first` - Verify sorting functionality for descending order with NULL values positioned first. * feat(tests): implement `test_func_desc_nulls_last` - Verify sorting functionality for descending order with NULL values positioned last. * validate robustness and accuracy of the sorting mechanism * feat(tests): add test for LIKE filter functionality (#17) * feat(tests): implement `test_func_like` - Verify filter functionality using the LIKE operator. - Ensure the filter correctly selects the name "Alice". * validate robustness and accuracy of the LIKE filter mechanism * feat(tests): add test for ILIKE filter functionality (#17) * feat(tests): implement `test_func_ilike` - Verify filter functionality using the ILIKE operator. - Ensure the filter correctly selects the name "Alice" using a case-insensitive pattern "%Ice". * validate robustness and accuracy of the ILIKE filter mechanism * feat(tests): add test for RLIKE filter functionality (#17) * feat(tests): implement - Implemented a test to validate the behavior of the 'rlike' filter on a DataFrame. - Ensured that the filter correctly identifies records with names matching the regex pattern ice. - Added necessary setup and assertions for expected results. * feat(tests): add test for RLIKE filter functionality (#17) * feat(tests): implement `test_func_rlike` - Implemented a test to validate the behavior of the 'rlike' filter on a DataFrame. - Ensured that the filter correctly identifies records with names matching the regex pattern ice. - Added necessary setup and assertions for expected results. * feat(tests): add test for EQ filter functionality (#17) * feat(tests): implement `test_func_eq` - Verify filter functionality using the EQ operator. * feat(tests): add test for OR filter functionality (#17) * feat(tests): implement `test_func_or` - Verify filter functionality using the OR operator. - Ensure the filter correctly selects the name "Alice" using a case-insensitive pattern "%Ice". * validate robustness and accuracy of the ILIKE filter mechanism main * feat(tests): add test for IS_NULL filter functionality (#17) * feat(tests): implement `test_func_is_null` - Implemented a test to validate the behavior of the 'is_null' filter on a DataFrame. - Ensured that the filter correctly identifies records with null values. - Added necessary setup and assertions for expected results. * Exclude additional is_null test. * feat(tests): add test for IS_NOT_NULL filter functionality (#17) * feat(tests): implement `test_func_is_not_null` - Implemented a test to validate the behavior of the 'isnotnull' filter on a DataFrame. - Ensured that the filter correctly identifies records with non-null values. - Added necessary setup and assertions for expected results. - Moved `isnotnull` function definition to ` // functions that require a single col argument generate_functions!( one_col: isnan, isnull, isnotnull, ...) ` * feat(tests): add tests for window functions and IS_NAN filter functionality(#17) * feat(tests): implement `test_func_over` - Implemented a test to validate the behavior of window functions on a DataFrame. - Ensured that the window functions (`rank` and `min`) correctly compute values within the specified window. - Added necessary setup, sorting, and assertions for expected results. * feat(tests): implement `test_func_isnan` - Implemented a test to validate the behavior of the `isnan` function on a DataFrame. - Ensured that the `isnan` function correctly identifies NaN values. - Added necessary setup and assertions for expected results. * feat(tests): add tests for column aliasing, casting, and substring functions (#17) - feat(tests): implement `test_func_cast` - Added a test to validate the behavior of casting an integer column to a string. - Ensured that the cast operation correctly transforms the data type and assigns the alias. - feat(tests): implement `test_func_alias` - Added a test to verify column aliasing functionality. - Confirmed that the DataFrame correctly applies aliases to columns. - feat(tests): implement `test_func_substr` - Added a test to validate the substring function on a string column. - Ensured that the substring operation works as expected and assigns the alias. * feat(fix): Fixed typo on test name - test_func_substract to test_func_subtract.
Column code coverage was added, keeping this open for now to track other function coverage |
@sjrusso8 was looking through other modules as well. Are there any areas that you'd consider high priority so I can start from there? |
@lexara-prime-ai there are a bunch of functions that have not been implemented. I attempted to make an inventory on the issue #60. Any of the functions that do something with UDFs probably can't be implemented. Spark UDFs only really work with python and scala right now |
Ok just went through the list, I can take a look at the other issues in the meantime. |
Description
There are many functions for Spark, and most of them are created via a macro. However, not all of them have unit test coverage. Create additional unit tests based on similar test conditions from the exiting Spark API test cases.
I have been mirror the docstring tests from the PySpark API for reference.
The text was updated successfully, but these errors were encountered: