A one sentence summary of purpose and methodology. Used for creating an overview tables.
Repository: openproblems-bio/task_cyto_batch_integration
Provide a clear and concise description of your task, detailing the specific problem it aims to solve. Outline the input data types, the expected output, and any assumptions or constraints. Be sure to explain any terminology or concepts that are essential for understanding the task.
Explain the motivation behind your proposed task. Describe the biological or computational problem you aim to address and why it’s important. Discuss the current state of research in this area and any gaps or challenges that your task could help address. This section should convince readers of the significance and relevance of your task.
name | roles |
---|---|
John Doe | author, maintainer |
flowchart TB
file_common_dataset("<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#file-format-common-dataset'>Common Dataset</a>")
comp_data_processor[/"<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#component-type-data-processor'>Data processor</a>"/]
file_unintegrated_censored("<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#file-format-unintegrated-censored'>Unintegrated Censored</a>")
file_unintegrated("<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#file-format-unintegrated'>Unintegrated</a>")
file_validation("<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#file-format-validation'>Validation</a>")
comp_method[/"<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#component-type-method'>Method</a>"/]
comp_control_method[/"<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#component-type-control-method'>Control Method</a>"/]
comp_metric[/"<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#component-type-metric'>Metric</a>"/]
file_integrated("<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#file-format-integrated'>Integrated</a>")
file_score("<a href='https://github.com/openproblems-bio/task_cyto_batch_integration#file-format-score'>Score</a>")
file_common_dataset---comp_data_processor
comp_data_processor-->file_unintegrated_censored
comp_data_processor-->file_unintegrated
comp_data_processor-->file_validation
file_unintegrated_censored---comp_method
file_unintegrated---comp_control_method
file_unintegrated---comp_metric
file_validation---comp_control_method
file_validation---comp_metric
comp_method-->file_integrated
comp_control_method-->file_integrated
comp_metric-->file_score
file_integrated---comp_metric
A subset of the common dataset.
Example file:
resources_test/task_cyto_batch_integration/starter_file/common_dataset.h5ad
Format:
AnnData object
obs: 'cell_type', 'batch', 'sample', 'donor', 'group'
var: 'numeric_id', 'channel', 'marker', 'marker_type', 'to_correct'
layers: 'preprocessed'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
Data structure:
Slot | Type | Description |
---|---|---|
obs["cell_type"] |
string |
Cell type information. |
obs["batch"] |
string |
Batch information. |
obs["sample"] |
string |
Sample ID. |
obs["donor"] |
string |
Donor ID. |
obs["group"] |
string |
Biological group of the donor. |
var["numeric_id"] |
integer |
Numeric ID associated with each marker. |
var["channel"] |
string |
The channel / detector of the instrument. |
var["marker"] |
string |
(Optional) The marker name associated with the channel. |
var["marker_type"] |
string |
Whether the marker is a functional or lineage marker. |
var["to_correct"] |
boolean |
Whether the marker will be batch corrected. |
layers["preprocessed"] |
double |
preprocessed data, e.g. already compensated, transformed and debris/doublets removed. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
A data processor.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
A subset of the common dataset. |
--output_unintegrated_censored |
file |
(Output) Unintegrated dataset. |
--output_unintegrated |
file |
(Output) Unintegrated dataset. |
--output_validation |
file |
(Output) Hold-out dataset for validation. |
Unintegrated dataset
Example file:
resources_test/task_cyto_batch_integration/cxg_mouse_pancreas_atlas/train.h5ad
Format:
AnnData object
obs: 'batch', 'sample', 'donor'
var: 'numeric_id', 'channel', 'marker', 'marker_type', 'to_correct'
layers: 'preprocessed'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
Data structure:
Slot | Type | Description |
---|---|---|
obs["batch"] |
string |
Batch information. |
obs["sample"] |
string |
Sample ID. |
obs["donor"] |
string |
(Optional) Donor ID. |
var["numeric_id"] |
integer |
Numeric ID associated with each marker. |
var["channel"] |
string |
The channel / detector of the instrument. |
var["marker"] |
string |
(Optional) The marker name associated with the channel. |
var["marker_type"] |
string |
Whether the marker is a functional or lineage marker. |
var["to_correct"] |
boolean |
Whether the marker will be batch corrected. |
layers["preprocessed"] |
double |
preprocessed data, e.g. already compensated, transformed and debris/doublets removed. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
Unintegrated dataset
Example file:
resources_test/task_cyto_batch_integration/cxg_mouse_pancreas_atlas/train.h5ad
Format:
AnnData object
obs: 'cell_type', 'batch', 'sample', 'donor', 'group'
var: 'numeric_id', 'channel', 'marker', 'marker_type', 'to_correct'
layers: 'preprocessed'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
Data structure:
Slot | Type | Description |
---|---|---|
obs["cell_type"] |
string |
Cell type information. |
obs["batch"] |
string |
Batch information. |
obs["sample"] |
string |
Sample ID. |
obs["donor"] |
string |
Donor ID. |
obs["group"] |
string |
Biological group of the donor. |
var["numeric_id"] |
integer |
Numeric ID associated with each marker. |
var["channel"] |
string |
The channel / detector of the instrument. |
var["marker"] |
string |
(Optional) The marker name associated with the channel. |
var["marker_type"] |
string |
Whether the marker is a functional or lineage marker. |
var["to_correct"] |
boolean |
Whether the marker will be batch corrected. |
layers["preprocessed"] |
double |
preprocessed data, e.g. already compensated, transformed and debris/doublets removed. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
Hold-out dataset for validation.
Example file:
resources_test/task_cyto_batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad
Description:
Samples that were held out and will later be used only to assess whether the batch integration was successful. E.g. if a donor from batch 2 was corrected towards batch 1, but also actually measured in batch 1 (without being used as input to the algorithm).
Format:
AnnData object
obs: 'cell_type', 'batch', 'sample', 'donor', 'group'
var: 'numeric_id', 'channel', 'marker', 'marker_type', 'to_correct'
layers: 'preprocessed'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
Data structure:
Slot | Type | Description |
---|---|---|
obs["cell_type"] |
string |
Cell type information. |
obs["batch"] |
string |
Batch information. |
obs["sample"] |
string |
Sample ID. |
obs["donor"] |
string |
Donor ID. |
obs["group"] |
string |
Biological group of the donor. |
var["numeric_id"] |
integer |
Numeric ID associated with each marker. |
var["channel"] |
string |
The channel / detector of the instrument. |
var["marker"] |
string |
(Optional) The marker name associated with the channel. |
var["marker_type"] |
string |
Whether the marker is a functional or lineage marker. |
var["to_correct"] |
boolean |
Whether the marker will be batch corrected. |
layers["preprocessed"] |
double |
preprocessed data, e.g. already compensated, transformed and debris/doublets removed. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
A method.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
Unintegrated dataset. |
--output |
file |
(Output) Integrated dataset. |
Quality control methods for verifying the pipeline.
Arguments:
Name | Type | Description |
---|---|---|
--input_unintegrated |
file |
Unintegrated dataset. |
--input_validation |
file |
Hold-out dataset for validation. |
--output |
file |
(Output) Integrated dataset. |
A task template metric.
Arguments:
Name | Type | Description |
---|---|---|
--input_validation |
file |
Hold-out dataset for validation. |
--input_unintegrated |
file |
Unintegrated dataset. |
--input_integrated |
file |
Integrated dataset. |
--output |
file |
(Output) File indicating the score of a metric. |
Integrated dataset
Example file:
resources_test/task_cyto_batch_integration/cxg_mouse_pancreas_atlas/prediction.h5ad
Format:
AnnData object
layers: 'integrated'
uns: 'dataset_id', 'method_id', 'parameters'
Data structure:
Slot | Type | Description |
---|---|---|
layers["integrated"] |
double |
The integrated data as returned by a batch correction method. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["method_id"] |
string |
A unique identifier for the method. |
uns["parameters"] |
object |
(Optional) The parameters used for the integration. |
File indicating the score of a metric.
Example file:
resources_test/task_cyto_batch_integration/cxg_mouse_pancreas_atlas/score.h5ad
Format:
AnnData object
uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
Data structure:
Slot | Type | Description |
---|---|---|
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
uns["method_id"] |
string |
A unique identifier for the method. |
uns["metric_ids"] |
string |
One or more unique metric identifiers. |
uns["metric_values"] |
double |
The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |