Design

What are samples called? SampleReplicates?
Do we need replicate groups?
- Case where samples are functionally subsamples but cannot be produced from the same sample as the experimental techniques don't allow it.

Functionality

List / find samples, possibly with...
Search integration
Logging
Workspace @sample integration
- If user has access to sample set, should have access to embedded samples (?)
ACLs:
- cache known good users
- cache user roles
- remove self from acls (read/write)
- change sample owner
  - Probably needs request / accept multistep flow
Cache workspace reponses
Stand alone validator CLI
- Validate without sending data to server
Versioning scheme for validator config
Kafka events
- Improve reliability
  - Currently if the service goes down between DB modification for a new link/sample and kafka reciept of the message the message is lost.
  - Could improve reliability of messaging by putting a sent field or something like that on samples / links in the DB, and not updating the field until the kafka send succeeds.
  - On startup, look for unsent, older messages and resend.
- Tools to recreate events from the DB (backfill new external DBs, handle cases where Kafka messages were lost)
Currently there's no way to list expired links other than setting an effective time on the from-sample search and from-data search
- Maybe that's enough?
  - No way to see the history of the links
- Listing expired links for a sample or data would require sort & paging, but it's not clear how to page. The obvious fields are the creation / expiration date, but since those are not necessarily unique, could return up to 10k documents. That makes paging smaller sizes problematic/impossible.
- Listing expired links for a given sample and data is definitely possible since the links don't overlap in time and can be sorted / paged by the creation date.
  - Needs a sample/data/creation index.
Lots of opportunities for performance improvements if neccessary (bulk reads [and writes, which are a lot harder])

Concerns:

Searching for samples could get very expensive based on the queries.
- Maybe just using a Lucene based solution is the way to go.
Searching for samples may be difficult as the metadata is embedded in the sample node document which limits the possible indexed queries to some extent.
- Separating out the metadata documents means that traversals querying metadata would be more complicated or impossible.
- Lucene as above
If we want more features or constraints, linking data may get more complicated than is already is

Testing

flake8 and bandit on test-sdkless (generated code is poopy)
When python/mypy#6385 is implemented, ditch all the stupid __init__.py files

Misc

The ~10 documents that have been written about samples

Obsolete

Make kb-sdk test run in travis
- full stack is tested without it so nevermind

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

Design

Functionality

Concerns:

Testing

Misc

Obsolete

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

Design

Functionality

Concerns:

Testing

Misc

Obsolete