Original format idea by Zoltán Bacskó of Falcosoft, further expanded by spessasus. Specification written by spessasus with the help of Zoltán.
Revision 1.19
MIDI files have long-faced a significant challenge: different sounds on different devices. SF2 + MIDI combinations address this issue partially by ensuring that playing both files through an SF2-compliant synth results in the same sound being produced. The RMIDI format is not new; it was originally developed by Microsoft as a RIFF wrapper for MIDI files and later expanded by the MIDI Manufacturers Association to support embedding DLS sound banks. However, DLS is not widely used today, whereas the SoundFont2 (SF2) format serves a similar purpose and remains quite popular. The SF2 RMIDI format integrates MIDI and SF2 files into a single file, augmented with additional metadata. This document serves as the official specification for this format. This version of RMIDI was created by Zoltán Bacskó of Falcosoft and implemented in Falcosoft SoundFont Midi Player 6. I am in contact with Zoltán, who granted permission to use this as the official specification.
If you find any part of this specification unclear, please reach out via this thread or file a GitHub issue in this repository. Also feel free to report any issues such as typos or expansions to this standard!
- Official SF2 RMIDI Specification
- SF2 RMIDI File Specification
This specification assumes familiarity with the SoundFont2 format and the Standard MIDI File (SMF) format. Additional terminology used in this specification includes:
- The software: Refers to software compliant with this specification.
- Bit: The most basic data structure element, either 0 or 1.
- Byte: A data structure element of eight bits, with no defined meaning to those bits.
- SoundFont: A SoundFont2 compliant binary.
- DLS: DownLoadable Sounds. Sound bank format similar to SoundFont. Used in the older RMIDI files, not compliant with this specification.
- Embedded sound bank: The sound bank embedded within an RMIDI file that is used for playing back the sequence.
- Main SoundFont: The regular SoundFont bank loaded by the software before loading the RMIDI file.
- Bank: MIDI controller 0
Bank Select MSB
and the bank number of a SoundFont preset. It's a 7-bit value, except for the SoundFont's drum presets, which use bank number 128. - Bank offset: The number which gets added to each wBank field in all presets within the embedded bank.
- RIFF: Resource Interchange File Format. A file container format for storing data in tagged chunks.
- Chunk: The top-level division of a RIFF file.
- Little Endian: Byte ordering in memory with the least significant byte at the lowest address.
- MIDI: Musical Instrument Digital Interface. a technical standard that describes a communication protocol for a wide variety of electronic musical devices.
- MIDI file/SMF: Standard MIDI File. A sequence of MIDI messages, usually a song.
- Note On: A MIDI message indicating that a given note should be pressed.
- Note Off: A MIDI message indicating that a given note should be released.
- GM: General MIDI system, ignoring all Bank select messages.
- XG: Yamaha Extended General MIDI, an extension to the General MIDI standard created by Yamaha.
- GS: Roland General Standard, an extension to the General MIDI standard created by Roland.
- Encoding: Assigning numbers to graphical characters.
- ASCII: American Standard Code for Information Interchange, a character encoding standard for electronic communication.
The file extension is .rmi
, and the MIME type is audio/rmid
.
Optionally the extension might be .sfmi
to help distinguish between the older RMIDI format.
The file type should be referred to as MIDI with embedded SF2
, Embedded MIDI
or SF2 RMIDI
.
The RMIDI format uses RIFF chunks to structure the data.
Each RIFF chunk in an RMIDI file follows this format:
- Four bytes: Chunk header in ASCII (e.g.,
RIFF
) - Four bytes: Chunk size as a 32-bit unsigned little-endian number
- Chunk data: Optionally, the first 4 bytes of the data represent the chunk type in ASCII (e.g.,
sfbk
)
NOTE: The chunk size must be even. If the initial chunk data is odd, a padding byte of 0 must be added at the end. The chunk's length does not include this padding byte.
IMPORTANT: This constraint applies only to RIFF chunks within the RMIDI file and does not affect RIFF chunks within the soundfont chunk.
52 49 46 46 05 00 00 00 48 65 6C 6C 6F 00
52 49 46 46
- ASCII string "RIFF"05 00 00 00
- 32-bit chunk length: 548 65 6C 6C 6F
- the chunk's data: ASCII string "Hello"00
- a pad byte of 0 to make the total byte count even.
An RMIDI file consists of:
RIFF
chunk (main chunk)RMID
ASCII stringdata
chunk containing the complete MIDI file (MThd, MTrk, etc.)- Optional
LIST
chunk: Metadata for the file, similar to SF2's chunkINFO
ASCII string- Inner chunks described here
RIFF
chunk: Complete soundfont binary. The first four bytes of this chunk should besfbk
, indicating a soundfont2 binary. SoundFont3 format is allowed.
RIFF
chunkRMID
ASCII stringdata
- The MIDI file data: MThd, MTrk etc...LIST
INFO
ASCII stringINAM
chunkNever Gonna Give You Up
UTF-8 string
IART
chunkRick Astley
UTF-8 string
ICRD
chunk1987
UTF-8 string
IENC
chunkutf-8
ASCII string
DBNK
chunk- 16-bit integer: 1
RIFF
chunk - the SoundFont binary:sfbk
,LIST
,sdta
etc...
The following file structure shows that:
- The bank offset is 1.
- Info chunks are encoded using
UTF-8
encoding. - The song's title is "Never Gonna Give You Up."
- The song's artist is "Rick Astley."
- The song's creation date is "1987."
- The song has an embedded sound bank.
When the file structure deviates from the above:
- Any additional chunks after the specified ones should be ignored and preserved as-is.
- If the chunk order differs from this specification, the file should be rejected.
- If no soundfont bank is present, the file should use the main soundfont and assume a bank offset of 0, ignoring the DBNK chunk.
- If the soundfont bank uses the older DLS format, software not capable of reading DLS should reject the file.
Software that supports DLS should use the contained DLS
and assume a bank offset of 1 or try to detect the bank offset
since the older format does not specify the DBNK
chunk.
The last two rules ensure backwards compatibility with the older RMIDI format.
The INFO chunk describes file metadata and the soundfont's bank offset.
The INFO chunk may contain the following optional chunks:
DBNK
chunk: Soundfont's bank offset. See DBNK Chunk for details.IENC
chunk: Encoding used for the metadata chunks: name of the encoding stored as string. Not case-sensitive, but lowercase is preferred (e.g.,utf-8
). Software capable of reading the IENC chunk must support the following encodings. Note that this field must use basicASCII
encoding.MENC
chunk: Encoding hint for the text evens within the MIDI file. The same string format asIENC
.- Metadata chunks
Below are the defined chunks containing additional information about the song:
INAM
chunk: Song name/title. String of any length.ICOP
chunk: Copyright. String of any length.IART
chunk: Artist (MIDI creator). String of any length.ICRD
chunk: Creation date. String of any length.IPRD
orIALB
chunk: Album name. String of any length. It can be used interchangeably. If both exist in the file, the software should useIALB
.IPIC
chunk: Attached picture (e.g., album cover). Binary picture data. PNG or JPEG recommended.IGNR
chunk: Song genre. String of any length.ICMT
chunk: Comment/description. String of any length.IENG
chunk: Engineer (soundfont creator). String of any length.ISFT
chunk: Software used to create the file. String of any length.
The following rules apply to the INFO chunk:
- The order of chunks within the INFO chunk is arbitrary.
- Chunks of length 0 are illegal and should be discarded.
- Unknown INFO chunks should be ignored and preserved as-is.
- If the
IENC
chunk is not specified, the software can use any encoding, but assumingutf-8
is recommended. - If the
MENC
chunk is not specified, the software decides MIDI's encoding. - If the software can display the song's name, it should use the INAM chunk if present, ignoring the MIDI track name.
- Compatible software may ignore all INFO chunks except the DBNK chunk for the most basic level of compatibility.
- The chunk size must be even, as specified in the general RIFF structure.
- The INFO chunk is optional. The software must not assume that the INFO chunk exists.
For Level 3 compatibility, software must support the following encodings (both lowercase and uppercase):
utf-8
shift-jis
orShift_JIS
(equivalent encodings)windows-1250
(Central Europe)windows-1251
(Cyrillic)windows-1252
(Western)windows-1253
(Greek)windows-1254
(Turkish)windows-1255
(Hebrew)windows-1256
(Arabic)windows-1257
(Baltic)windows-1258
(Vietnamese)
Software may decode other encodings but is not required to.
For Level 4 compatibility, software must support the following image formats:
- Portable Network Graphics (PNG)
- Joint Photographic Experts Group (JPEG)
Other formats (e.g., gif
, webp
, ico
) may also be supported but are not required.
The DBNK chunk is an optional RIFF chunk within the RMIDI INFO List.
It describes the bank offset for the embedded sound bank.
It always has a length of two bytes, with these bytes forming a 16-bit, unsigned, little-endian number. If the chunk's length is not two bytes or the number is out of range, the file should be rejected.
Current boundaries are: minimum: 0 and maximum: 127. The other byte is reserved for future use.
If no DBNK is specified, an offset of 1 is assumed by default. If the file does not contain any Sound bank (SF2 or DLS), the offset shall default to 0.
For general use, a bank offset of 0 is recommended as it allows bundling the soundfont and the MIDI without modification.
The RMI file may come with an embedded SF2 or DLS SoundFont, usually after the INFO chunk. This sound bank provides the exclusive sounds used within the MIDI sequence, temporarily replacing given MIDI program and bank numbers with the presets contained within the sound bank.
The bank offset adjusts every bank in the embedded sound bank
except for bank 128 by adding itself to every patch's wBank
field.
For example,
-
If a preset named
Acoustic Piano 2
with program 0 and bank 1 exists within an RMIDI file which uses bank offset of 1, it should effectively be interpreted as program 0 and bank 2. -
If a preset named
Standard Drum Kit
exists within the same RMIDI file with program 0 and bank 128, the bank will remain 128.
If the resulting bank number exceeds 127 (except for drum kits) or is smaller than 0, then it should be turned into 0.
Below is a simple JavaScript-like code for a Level 1 RMIDI-compatible player.
Note: this code does not perform any checks and assumes that the file is valid and contains all three chunks, for the sake of simplicity.
const file = open("song.rmi");
// read RIFF
const chunk = readRIFF(file);
// skip 'RMID' string
chunk.data.seek(chunk.data.position + 4);
// read 'data' chunk
const midiChunk = readRIFF(chunk.data);
const midiFile = midiChunk.data;
// read the 'LIST' INFO chunk
const info = readRIFF(chunk.data);
// skip the 'INFO' string
const infoString = info.data.seek(info.data.position + 4);
const infoList = readLIST(info.data);
// bank offset is 1 by default
let bankOffset = 1;
// if DBNK exists
if(infoList.find(infoChunk => infoChunk.header === "DBNK")) {
// DBNK is 2 bytes signed int 16
bankOffset = infoList["DBNK"].toSignedInt16();
}
// clamp the bank offset
bankOffset = Math.min(Math.max(0, bankOffset), 127);
// read the sound bank (not as a riff chunk but copy the binary content)
const soundFont = chunk.slice(chunk.data.position, chunk.data.length - chunk.data.position);
// initialize the synthesizer
const player = new Player(soundFont);
// adjust bank offset
for(const preset of player.soundFont.presets)
{
preset.bankNumber += bankOffset;
}
// play the song
player.play(midiFile);
Not all chunks in the file must be read for the file to play correctly. Software compatibility with the RMIDI format is categorized into levels:
Minimum requirements for the software to be compliant. The software must:
- Read and interpret the
RMID
ASCII string as the file indicator. - Handle the
data
chunk containing the MIDI data. - Process the
DBNK
chunk within the INFO chunk and correctly offset the soundfont (or a bank select messages in the MIDI) based on this value. - Read the
RIFF
chunk with the soundfont data.
This level requires basic interpretation of the INFO
chunk. The software must:
- Read all Level 1 chunks.
- Interpret all metadata chunks (
INAM
,IPRD
,ICRD
,ICOP
, etc.) asASCII
orutf-8
.
This level requires support for the IENC
chunk. The software must:
- Read all Level 1 and Level 2 chunks.
- Interpret the
IENC
chunk and support the required encodings.
As of 2024-08-07, Falcosoft Midi Player meets this level of compatibility.
This level requires support for the IPIC
chunk. The software must:
- Read all Level 1, Level 2, and Level 3 chunks.
- Interpret the
IPIC
chunk and support the required image formats.
As of 2024-08-06, SpessaSynth meets this level of compatibility.
As of 2024-08-20, foo_midi meets this level of compatibility.
There are currently two distinct types of RMIDI files that vary in their use cases.
Note that these have identical file structure; these vary only in the way they provide sounds for the sequence.
A self-contained file is defined as a SF2 RMIDI file which only refers to its own SoundFont bank,
and the said bank contains all and only the necessary presets to play the file.
It is recommended to use DBNK of 0 for writing such files, but it is not required.
Writing self-contained RMIDI files is recommended, but not required.
An external file is defined as a SF2 RMIDI file which relies on a complete sound bank loaded as a fallback with the embedded sound bank only containing special sound effects, specific to the file.
The software not capable of loading two sound banks at once (the main one and the embedded one) may reject the file.
This type of file usually uses bank 1 or greater, but it may use bank 0.
The following recommendations are not required for file validity but are advised:
- Trim the soundfont to include only presets and samples used in the file to save space.
- Write a self-contained file to ensure that it will sound the same in every software.
- Always include the DBNK chunk, even if the offset is 1.
- Include the IENC chunk to ensure correct encoding is used.
- Include the MENC chunk if the encoding is known, to help other software read the MIDI text events correctly.
- Use the
utf-8
encoding for the metadata chunks if possible. - Use SoundFont3 compression if available to save space.
The directory examples contains RMIDI Files for testing:
Field of Hopes and Dreams
- complete, level 4, self-contained file with IPIC chunk and metadata. Offset 0. Uses SF3 compression.GRABBAG_EmbeddedSF2
- self-contained file with noDBNK
chunk. Offset 1.offset_5
- self-contained file with offset of 5.Rock_test
- external file with noDBNK
chunk. Offset 1, expects a full GM sound bank loaded at bank 0.bachsb
- an RMIDI file without an embedded sound bank.AWEBLOWN
- a DLS RMIDI file noDBNK
chunk. Offset 1, expects a full GM sound bank loaded at bank 0. Software not capable of reading DLS should reject this file.
Below is SpessaSynth implementation of the format in JavaScript, which may be useful for developers: