Silesia corpus is a set of files of different characteristics to test compression algorithms.
It was once available here: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia but is inaccessible recently.
Size | File | Description |
---|---|---|
10,192,446 | dickens | English novels, ASCII plain text |
51,220,480 | mozilla | Program, UNIX executables and others, tar |
9,970,564 | mr | 3-D MRI image, DICOM |
33,553,445 | nci | Chemical database, text |
6,152,192 | ooffice | Windows DLL |
10,085,684 | osdb | Database, synthetic data, binary |
6,627,202 | reymont | Polish text, uncompressed PDF |
21,606,400 | samba | Source code and graphics, tar |
7,251,944 | sao | Database, star catalog, binary |
41,458,703 | webster | English dictionary, HTML |
8,474,240 | x-ray | 16 bit grayscale, DICOM |
5,345,280 | xml | XML files, text, tar |