Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster the cluster data #18

Merged
merged 1 commit into from
Aug 19, 2020
Merged

Cluster the cluster data #18

merged 1 commit into from
Aug 19, 2020

Conversation

ialarmedalien
Copy link
Collaborator

Part 1 of the changes in this old PR in the relation_engine_spec repo.

Merge all cluster fields in the djornl_node collection into a single field.
Update parser and tests accordingly.

  • I updated the README.md docs to reflect this change. -- N/A
  • This is not a breaking API change

@@ -43,15 +46,15 @@ def _configure(self):

_CLUSTER_BASE = os.path.join(configuration['ROOT_DATA_PATH'], 'cluster_data')
configuration['_CLUSTER_PATHS'] = {
'cluster_I2': os.path.join(
'markov_i2': os.path.join(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename these to something more informative


self.check_deltas(edge_data=edge_data, node_metadata=node_metadata, cluster_data=clusters)

def check_deltas(self, edge_data={}, node_metadata={}, cluster_data={}):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brief dataset summary for sanity checking

Comment on lines +120 to +122
for data_structure in [edge_data, expected]:
for k in data_structure.keys():
data_structure[k] = sorted(data_structure[k], key=lambda n: n['_key'])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order data as it won't necessarily be sorted when coming out of the parser

Comment on lines +20 to +28
clusters:
type: array
title: Clusters
description: Clusters to which the node has been assigned
items:
type: string
format: regex
pattern: ^\w+:\d+$
examples: [["markov_i2:1", "markov_i4:5"], ["markov_i6:3"]]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important bit

# results are in the form {"nodes": [...], "edges": [...]}
# nodes are represented as a list of node[_key]
# edges are objects with keys _to, _from, edge_type and score

def test_fetch_phenotypes_no_results(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the queries with no results have been merged in with the other tests

Base automatically changed from spec_loader_refactor to develop August 19, 2020 19:25
title: Cluster IDs
description: Cluster IDs, in the form "clustering_system_name:cluster_id"
items: {type: string}
examples: [['markov_i2:5', 'markov_i6:2'],['markov_i6:1']]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an object so we don't have to parse these entries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if the client is using string parameters like "markov_i2:5" then it doesn't matter

@jayrbolton jayrbolton merged commit 7e9165b into develop Aug 19, 2020
@jayrbolton jayrbolton deleted the cluster_the_clusters branch August 19, 2020 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants