File Formats

CRAVAT Input Format

OpenCRAVAT has a custom tab separated input file format, that can be used in place of vcf. Each row in a CRAVAT input file describes a genomic variant by the following sequential columns: Chromosome, Position, Strand, Reference-Base, Alternate-Base, and, optionally, Sample. The table below describes each field:

Columns

Column	Description	Example
Chromos ome	The chromosome, prefixed with `'chr'`.	`'chr22 '`, ``'chrX' ``
Positio n	The 1-based position of the first affected nucleotide.	11250130 7, 1804372
Strand	The strand the variant is on. Either `'+'` or `'-'`.	`'+'`/ `'-'`
Referen ce-Base	The affected nucleotide(s ), or a `'-'` for an insertion.	`'G'`, `'AG'` , 'TTCC' ``,\ ``' -'
Alterna te-Base	The alternate nucleotide(s ), or `'-'` for a deletion.	`'A'`, 'TTC'` `, ``'-'
Sample	The sample identifier.	`'s1'` , ``'s25'` `
Tag	Optional: Arbitrary identifiers or category tags associated with the variant - delimited by semi-colon.	`'var00 1'`, ``'TR93; cancer'` `

Example

The following is a basic example of a CRAVAT input file:

chr2    112501307   +   C   A   s1    var001
chr14   104770363   +   T   A   s1    var002
chrX    71127984    +   A   G   s2    var003
chr14   91974629    +   T   G   s3    var004
chr12   57094662    +   G   T   s4    var005
...

Internal Files

OpenCRAVAT uses a variety of text based file formats to pass data internally between modules. Most of these internal files are temporary, and are deleted at the end of a successful run. They can be preserved by passing the --temp-files flag to oc run .

In general, OpenCRAVAT files are tab separated tabular text files with self defined columns, similar to a vcf. They start with a series of comment lines describing the columns in the tabular section, then a header row for the table, then the table itself. A basic example can be seen here:

#column=0,Column0,col0,string
#column=1,Col 1,column_1,int
#column=2,Col-2,c2,float
#Column0    Col 1   Col-2
row1    1   1.0
row2    2   2.0
row3    3   3.0

column definition

The column definition lines define four commas separated values:

Index: which column in the table this column definition refers to.
Title: a display only title for the column. Used as a header when presenting data to the user. Can be changed at any point without affecting cravat.
Name: the internal name of the column, used to refer to it in code. Should only be changed carefully.
Type: The type of data in this column. Data will be cast to this type when read from the file.

header row

This is a header row for the table, typically using the column titles. It is not needed for OpenCRAVAT to function, and is included for readability.

table

Tab separated values. Blank columns should be represented by an empty string.

.crv Files

crv files (.crv) are basic OpenCRAVAT files that describe variants based on their genomic position and effect. They are produced by OpenCRAVAT converters.

crv example

#column=0,UID,uid,int
#column=1,Chrom,chrom,string
#column=2,Position,pos,int
#column=3,Ref Base,ref_base,string
#column=4,Alt Base,alt_base,string
#UID    Chrom   Position    Ref Base    Alt Base
1   chr19   10156403    G   C
2   chr7    140834746   A   T

crv columns

name	Description	Type	Example(s)
uid	Unique id of variant.	int	13
chrom	Chromosome	string	chr1, chr17, chrX
pos	Genomic position of first affected base (1-based)	int	1234
ref_base	Reference base(s)	string	A, AT, -
alt_base	Alternate base(s)	string	G, GC, -

Deletions are written with an ref of the bases to be deleted, and an alt of '-'.

1  chr1    1234    A   -

Insertions are written with an ref of '-' and an alt of the bases to be inserted.

1  chr1    1234    -   A

.crx Files

crx files (.crx) are an extended version of .crv files. They describe variants based on their affect on the genome, but also on genes, transcripts, and proteins. They are produced by OpenCRAVAT mappers.

crx example

#column=0,UID,uid,int
#column=1,Chrom,chrom,string
#column=2,Position,pos,int
#column=3,Ref Base,ref_base,string
#column=4,Alt Base,alt_base,string
#column=5,Hugo,hugo,string
#column=6,Transcript,transcript,string
#column=7,All Mappings,all_mappings,string

#UID    Chrom   Position    Ref Base    Alt Base    Hugo    Transcript  All Mappings
1   chr19   10156403    G   C   DNMT1   ENST00000340748.8   {"DNMT1":[["P26358","P447A","MIS","ENST00000340748.8","C1339G"]]}
2   chr7    140834746   A   T   BRAF    ENST00000288602.10  {"BRAF":[["P15056","S123T","MIS","ENST00000288602.10","T367A"]]}

All Mappings

The all mappings column contains a json object describing the genes, transcripts, and proteins that a variant affected. It has the following schema,

{
  "gene": [
    [
      "protein 1",
      "amino acid change 1",
      "sequence ontology 1",
      "transcript 1",
      "rna change 1"
    ],
    [
      "protein 2",
      "amino acid change 2",
      "sequence ontology 2",
      "transcript 2",
      "rna change 2"
    ]
  ]
}

Sequence ontologies are encoded with three letter abbreviations.

Abbv	Sequence Ontology
2KD	2 Kb downstream from gene
2KU	2 Kb upstream from gene
UT3	In the 3' UTR
UT5	In the 5' UTR
INT	In an intron
UNK	Unknown sequence ontology
SYN	Synonomous
MIS	Missense
CSS	Complex substitution
IDV	Inframe deletion
IIV	Inframe insertion
STL	Stoploss
SPL	Splice site affected
STG	Stopgain
FD2	2 base frameshift deletion
FD1	1 base frameshift deletion
FI2	2 base frameshift insertion
FI1	1 base frameshift insertion

.var/.gen Files

Every OpenCRAVAT annotator will produce an output file with the suffix [annotatorName].var for a variant level annotator, and [annotatorName].gen for gene level.

As an example, running the vest and go annotators on input.vcf:

oc run input.vcf -a vest go --temp-files

Will produce input.vcf.vest.var and input.vcf.go.gen.

The annotator ouput files will contain a header that defines the annotator's internal name display name, and column definitions. Following the header will be rows of tab separated data values.

An example snippet from input.vcf.vest.var is as follows:

#name=vest
#displayname=VEST
#column=0,UID,uid,int
#column=1,VEST score transcript,transcript,string
#column=2,VEST score,score,float
#column=3,VEST p-value,pval,float
#column=4,VEST score (missense),score_mis,float
#column=5,VEST score (frameshift),score_fsv,float
#column=6,VEST score (inframe indel),score_inv,float
#column=7,VEST score (stop gain),score_stg,float
#column=8,VEST score (stop loss),score_stl,float
#column=9,VEST score (splice site),score_spl,float
#column=10,All transcripts,all_results,string
#column=11,HUGO,hugo,string
#no_aggregate=hugo
#UID    VEST score transcript   VEST score  VEST p-value    VEST score (missense) ...
1   ENST00000233336.6   0.773   0.0417  0.773 ...
2   ENST00000554848.5   0.707   0.06973 0.707 ...
3   ENST00000374080.7   0.143   0.65145 0.143 ...
4   ENST00000267622.8   0.541   0.16344 0.541 ...
5   ENST00000342556.6   0.321   0.31889 0.321 ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File-Formats.rst

File-Formats.rst

File Formats

CRAVAT Input Format

Columns

Example

Internal Files

column definition

header row

table

.crv Files

crv example

crv columns

.crx Files

crx example

All Mappings

.var/.gen Files

Files

File-Formats.rst

Latest commit

History

File-Formats.rst

File metadata and controls

File Formats

CRAVAT Input Format

Columns

Example

Internal Files

column definition

header row

table

.crv Files

crv example

crv columns

.crx Files

crx example

All Mappings

.var/.gen Files