Skip to content

Latest commit

 

History

History
21 lines (18 loc) · 1.18 KB

README.md

File metadata and controls

21 lines (18 loc) · 1.18 KB

Silesia Compression Corpus

Silesia corpus is a set of files of different characteristics to test compression algorithms.

It was once available here: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia but is inaccessible recently.

Size File Description
10,192,446 dickens English novels, ASCII plain text
51,220,480 mozilla Program, UNIX executables and others, tar
9,970,564 mr 3-D MRI image, DICOM
33,553,445 nci Chemical database, text
6,152,192 ooffice Windows DLL
10,085,684 osdb Database, synthetic data, binary
6,627,202 reymont Polish text, uncompressed PDF
21,606,400 samba Source code and graphics, tar
7,251,944 sao Database, star catalog, binary
41,458,703 webster English dictionary, HTML
8,474,240 x-ray 16 bit grayscale, DICOM
5,345,280 xml XML files, text, tar