Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload verify file #Issue16 #58

Closed
wants to merge 7 commits into from
Closed

Upload verify file #Issue16 #58

wants to merge 7 commits into from

Conversation

JBorrow
Copy link
Member

@JBorrow JBorrow commented Mar 4, 2024

No description provided.

Copy link
Member Author

@JBorrow JBorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to run black and isort before pushing to the repo. You can set up the pre-commit hook if that helps.

if instance is None:
raise HTTPException(status_code=404, detail="File instance not found in the specified store.")

return {"verified": True}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just saying 'yes', please return the newly computed checksums for all of the instances!

# Check if the file exists in the specified store and matches the given properties
instance = session.query(Instance).filter_by(file_id=file.id, store_id=store.id).one_or_none()
if instance is None:
raise HTTPException(status_code=404, detail="File instance not found in the specified store.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great practice to return a 404 here, that is confusing as it is the same thing as the endpoint not being there. You should probably return a 400.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please don't raise a HTTPException; change the response code like in the other endpoints, and return our custom failure model.


store = session.query(StoreMetadata).filter_by(name=request.store_name).one_or_none()
if store is None:
raise HTTPException(status_code=404, detail="Store not found.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below here.

Comment on lines 149 to 154
request = {
"name": "test_file_to_verify.txt",
"size": get_size_from_path(full_path),
"checksum": get_md5_from_path(full_path),
"store_name": "local_store",
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to exercise the pydantic request model here, instead of writing the JSON yourself.

@JBorrow
Copy link
Member Author

JBorrow commented Mar 4, 2024

In general, I think we also want to make the checksum-checking happen on the client side. We should ask the remote server for fresh checksums (checked against its own internals), that we can check ourselves. We do not want to send 'our' checksum to the remote server.

@@ -134,3 +136,94 @@ def test_add_file_no_store_exists(test_client):
response = AdminRequestFailedResponse.model_validate_json(response.content)

assert response.reason == "Store not_a_store does not exist."


def test_verify_file_success(test_client, test_server, garbage_file, test_orm):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the appropriate fixtures as discussed for these tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JBorrow you mean change the verification request without passing size and checksum because remote librarian check against its own database without requiring these parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The remote librarian has all the information needed to perform this check.

Comment on lines 134 to 146
checksums_and_sizes = []
for instance in instances:
path = instance.path
checksum = get_md5_from_path(path)
size = get_size_from_path(path)
checksums_and_sizes.append(
{
"store_id": str(instance.store_id),
"checksum": checksum,
"size": str(size),
}
)
return {"verified": True, "checksums_and_sizes": checksums_and_sizes}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never return unvalidated JSON. Please write a pydantic model for your return.

Comment on lines 739 to 742
name: str,
size: int,
checksum: str,
store_name: str,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted in the main comment thread, we do not necessarily have access to anything other than name; please have the remote librarian check aganist its own database instead of passing a size and checksum.

Store names are entirely transparent to users (they do not and should not care which store their files have been uploaded to), please remove that option. The remote server should check all instances.

language_version: python3.11
language_version: python3.12
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change before we merge.

Comment on lines 749 to 754
size : int
Size in bytes of the file to verify.
checksum : str
MD5 checksum of the file to verify.
store_name : str
The name of the store where the file resides.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs need to be changed.

hera_librarian/cli.py Outdated Show resolved Hide resolved

file = session.query(File).filter_by(name=request.name).one_or_none()
if file is None:
raise HTTPException(status_code=400, detail="File not found.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor to use the failure response models like the other endpoints.

Comment on lines 119 to 124
instances = (
session.query(Instance)
.join(File, File.name == Instance.file_name)
.filter(File.name == file.name)
.all()
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply do:

file = session.get(File, file.name)

if file is None:
return 400

instances = file.instances

Comment on lines 143 to 223
Tests that a file's properties match the database record.
"""
setup = test_server[2]
store = setup.store_directory
full_path = store / "test_file_to_verify.txt"
# Create the file in the store
shutil.copy2(garbage_file, full_path)

# Get the session and ORM models from the test_server fixture
session = test_server[1]()

# Create or find a store in the database
db_store = (
session.query(test_orm.StoreMetadata).filter_by(name="local_store").first()
)
if not db_store:
# Create a new store if not found (adjust attributes as necessary)
db_store = test_orm.StoreMetadata(
name="local_store", description="A local store for testing"
)
session.add(db_store)
session.commit()

# Create file and instance records in the database
file_data = open(full_path, "rb").read()
db_file = test_orm.File.new_file(
filename="test_file_to_verify.txt",
size=len(file_data),
checksum=hashlib.md5(file_data).hexdigest(),
uploader="test_uploader", # Adjust as necessary
source="test_source", # Adjust as necessary
)
instance = test_orm.Instance.new_instance(
path=str(full_path),
file=db_file,
store=db_store,
deletion_policy="ALLOWED", # Adjust as necessary
)
session.add_all([db_file, instance])
session.commit()

# Assume the file has been added to the database already; here we simulate the verification request
verify_request = AdminVerifyFileRequest(
name="test_file_to_verify.txt",
)

response = test_client.post_with_auth(
"/api/v2/admin/verify_file", json=verify_request.dict()
)

assert response.status_code == 200

# Clean up: Delete the added records and file
session.delete(instance)
session.delete(db_file)
session.commit()
full_path.unlink() # Remove the file from the filesystem

session.close()
response_data = response.json()
assert response_data["verified"] == True
assert isinstance(response_data["checksums_and_sizes"], list)
assert len(response_data["checksums_and_sizes"]) > 0
assert "checksum" in response_data["checksums_and_sizes"][0]
assert "size" in response_data["checksums_and_sizes"][0]
assert "store_id" in response_data["checksums_and_sizes"][0]


def test_verify_file_failure(test_client, test_server, test_orm):
"""
Tests that verification fails when file properties do not match.
"""
# Assume a file "mismatched_file.txt" exists in the database but with different properties
request = {
"name": "mismatched_file.txt",
}

response = test_client.post_with_auth("/api/v2/admin/verify_file", json=request)

assert response.status_code == 400
assert response.json() == {"detail": "File not found."}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the provided fixtures, this contains too much unnecessary replication.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JBorrow you mean "which provided fixtures"? when you say replication, you mean for the whole file or just this specific function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones that we discussed in slack; test_server_with_valid_file and test_server_with_invalid_file

Copy link
Member Author

@JBorrow JBorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a number of changes that are required before merging. Thanks for your work on this so far.

language_version: python3.11
language_version: python3.12
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change before we merge.

Comment on lines +1162 to +1164
sp.add_argument("size", type=int, help="Size in bytes of the file to verify.")
sp.add_argument("checksum", help="MD5 checksum of the file to verify.")
sp.add_argument("store_name", help="The name of the store where the file resides.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These items are no longer needed.

Comment on lines +762 to +783
response = self.post(
endpoint="admin/verify_file",
json={
"name": name,
},
)

if response.status_code != 200:
raise LibrarianError(
"Failed to verify the file due to an unexpected error."
)
return response.json()

except LibrarianHTTPError as e:
if e.status_code == 404:
raise LibrarianError("File or store not found for verification.")
elif e.status_code == 400:
raise LibrarianError(
"File verification failed due to mismatched properties."
)
else:
raise LibrarianError(f"Unknown error during file verification. {e}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, please make sure you serialize/deserialize everything to pydantic models. In fact, I don't think json is even an accepted parameter in our post function! Please can you also expand your test coverage to make sure that these are checked?

Comment on lines +136 to +145
for instance in instances:
path = instance.path
checksum = get_md5_from_path(path)
size = get_size_from_path(path)
checksums_and_sizes.append(
{
"store_id": str(instance.store_id),
"checksum": checksum,
"size": str(size),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will want to check here if the instance is available, or it will not be there and we will get a shock! We should probably check that we have at least one instance on the librarian?

We frequently 'remove' instances (for example during SneakerNet transfers)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by Instance? We check the file instance in the above code!!

Comment on lines +79 to +99
def test_verify_file_success(capsys):
# Mock the client and its verify_file_row method
client_mock = MagicMock()
client_mock.verify_file_row.return_value = {"verified": True}

# Mock the get_client function to return the mock client
with patch("hera_librarian.cli.get_client", return_value=client_mock):
# Create a mock args object
args = MagicMock()
args.conn_name = "test_conn"
args.name = "test_name"

# Call the verify_file function
cli.verify_file(args)

# Assert that the verify_file_row method was called with the correct arguments
client_mock.verify_file_row.assert_called_once_with(name=args.name)

# Check the printed output
captured = capsys.readouterr()
assert "File verification successful." in captured.out
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the other unit tests in the library. You are not correctly using the mocked client objects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any issue with this!!, can you be more specific?

Comment on lines +141 to +171
def setup_test_file(session, test_orm, full_path):
# Create or find a store in the database
db_store = (
session.query(test_orm.StoreMetadata).filter_by(name="local_store").first()
)
if not db_store:
db_store = test_orm.StoreMetadata(
name="local_store", description="A local store for testing"
)
session.add(db_store)
session.commit()

# Create file and instance records in the database
file_data = open(full_path, "rb").read()
db_file = test_orm.File.new_file(
filename="test_file_to_verify.txt",
size=len(file_data),
checksum=hashlib.md5(file_data).hexdigest(),
uploader="test_uploader",
source="test_source",
)
instance = test_orm.Instance.new_instance(
path=str(full_path),
file=db_file,
store=db_store,
deletion_policy="ALLOWED",
)
session.add_all([db_file, instance])
session.commit()

return db_file, instance
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code already exists; please don't duplicate it. You should look at the fixtures that we have already defined in the various conftest.py files and their usage throughout the test suite.

@JBorrow
Copy link
Member Author

JBorrow commented Apr 3, 2024 via email

@JBorrow
Copy link
Member Author

JBorrow commented Apr 3, 2024 via email

@JBorrow JBorrow closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants