Earlier than you start a deep studying undertaking utilizing MIDI recordsdata, Variations between MIDI rating and MIDI efficiency!
This text is meant for anybody planning to make use of or getting began with MIDI recordsdata, a format that’s extensively used within the music group and has attracted the eye of computational music researchers as a result of availability of datasets.
Nonetheless, MIDI recordsdata encode several types of info, and specifically there’s a large distinction between a MIDI rating and a MIDI efficiency. Losing time on pointless duties Or the mistaken selection Coaching Information and Method.
Right here we offer a primary introduction to the 2 codecs and supply sensible examples of methods to get began utilizing them in Python.
What’s MIDI?
MIDI was launched as a real-time communication protocol between synthesizers. The primary concept was Ship a message each time Word pressed A unique message will sound on the MIDI keyboard (be aware on), The memo was printed (be aware off). The receiving synthesizer then is aware of what sound to make.
Welcome to MIDI Information!
For those who acquire all these messages and save them (ensuring so as to add the time positions), you should have a MIDI file that can be utilized to recreate your piece. There are a lot of different forms of messages past be aware on and be aware off, similar to these specifying pedal info and different controllers.
Plotting this info offers: Piano Roll.
Please be aware, this isn’t a MIDI file, only a attainable illustration of its contents. Some software program (on this instance grim reaper) provides a small piano keyboard subsequent to the piano roll to make it simpler to interpret visually.
How are MIDI recordsdata created?
MIDI recordsdata are primarily In two methods1) performed on a MIDI instrument, 2) written by hand in a sequencer (Reaper, Cubase, GarageBand, Logic) or a music notation editor (e.g. MuseScore).
Every technique of making a MIDI file additionally creates a distinct sort of file.
- Enjoying with MIDI devices → MIDI Efficiency
- Write notes manually (sequencer or sheet music) → MIDI Rating
Right here we’ll clarify every sort intimately and summarize the variations between them.
Disclaimer earlier than we start: Right here we deal with what info may be extracted from the file, not how the knowledge is encoded.For instance, should you say “time is expressed in seconds”, it means you will get the seconds, though the encoding itself is extra difficult.
A MIDI efficiency incorporates 4 forms of info:
- When the be aware begins: The beginning of the be aware.
- At be aware finish: Word offset (or offset – be aware size calculated as begin)
- Word performed: Pitch
- How “exhausting” the important thing was pressed: the be aware’s velocity
Begin and finish of sound (and length) It’s measured in seconds, akin to the variety of seconds that an individual taking part in a MIDI instrument presses and releases a be aware.
Pitch It’s encoded as an integer between 0 (lowest) and 127 (highest). Word that this enables for extra notes to be represented than may be performed on a piano, which corresponds to the vary 21 to 108.
Word Velocity It’s encoded as an integer between 0 (silence) and 127 (most depth).
Most MIDI devices are MIDI keyboards, so the overwhelming majority of MIDI taking part in is piano taking part in. Different MIDI devices exist (MIDI saxophone, MIDI drums, MIDI sensors for guitar, and so on.), however are much less widespread.
The most important dataset of human MIDI efficiency (classical piano music) is Maestro Dataset From Google Magenta.
Key traits of MIDI efficiency
The essential options of MIDI efficiency are No two sounds will ever have the very same onset or length. (That is theoretically attainable, however extremely unlikely in apply).
The truth is, even when a performer actually tries, they can’t press two (or extra) notes precisely on the identical time, since there’s a restrict to the accuracy that people can obtain. The identical goes for be aware durations. Furthermore, for many musicians, this isn’t a precedence, since a time lag helps to create a extra expressive and groovy efficiency. Lastly, there may be silences or partial overlaps between successive notes.
For that reason, MIDI efficiency is Unquantized MIDIThe temporal positions are unfold over a steady time scale, not quantized to discrete positions (though for digital encoding causes they’re technically discrete scales, they’re so nice that they are often thought-about steady).
Examples in Follow
Let’s check out the MIDI efficiency. ASAP Data SetAccessible on GitHub.
In your favourite terminal (I exploit PowerShell on Home windows), navigate to a handy location and clone the repository:
git clone https://github.com/fosfrancesco/asap-dataset
We additionally use the Python library Sheet Music You may open the MIDI file and set up it into your Python atmosphere.
pip set up partitura
Now that we’re prepared, let’s open the MIDI file and print out the primary 10 notes. Since this can be a MIDI efficiency, load_midi_performance operate.
from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a efficiency, right here we use Bach Prelude BWV 848 in C#
performance_path = Path("Bach/Prelude/bwv_848/Denisova06M.mid")
print("Loading midi file: ", asap_basepath/performance_path)
# load the efficiency
efficiency = pt.load_performance_midi(asap_basepath/performance_path)
# extract the be aware array
note_array = efficiency.note_array()
# print the dtype of the be aware array (useful to know methods to interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the be aware array
print("First 10 notes:")
print(efficiency.note_array()[:10])
The output of this Python program is as follows:
Numpy dtype:
[('onset_sec', '<f4'), ('duration_sec', '<f4'), ('onset_tick', '<i4'), ('duration_tick', '<i4'), ('pitch', '<i4'), ('velocity', '<i4'), ('track', '<i4'), ('channel', '<i4'), ('id', '<U256')]
First 10 notes:
[(1.0286459, 0.21354167, 790, 164, 49, 53, 0, 0, 'n0')
(1.03125 , 0.09765625, 792, 75, 77, 69, 0, 0, 'n1')
(1.1302084, 0.046875 , 868, 36, 73, 64, 0, 0, 'n2')
(1.21875 , 0.07942709, 936, 61, 68, 66, 0, 0, 'n3')
(1.3541666, 0.04166667, 1040, 32, 73, 34, 0, 0, 'n4')
(1.4361979, 0.0390625 , 1103, 30, 61, 62, 0, 0, 'n5')
(1.4361979, 0.04296875, 1103, 33, 77, 48, 0, 0, 'n6')
(1.5143229, 0.07421875, 1163, 57, 73, 69, 0, 0, 'n7')
(1.6380209, 0.06380209, 1258, 49, 78, 75, 0, 0, 'n8')
(1.6393229, 0.21484375, 1259, 165, 51, 54, 0, 0, 'n9')]
We are able to see that we have now an onset, length in seconds, pitch and velocity. The opposite fields will not be actually related to MIDI efficiency.
The beginning and length are additionally expressed as follows: TickThat is near the precise means this info is encoded in MIDI recordsdata: a really brief time size (= 1 tick) is chosen and all time info is encoded as multiples of this size. When coping with musical performances you’ll be able to often ignore this info and use the knowledge in seconds straight.
You may ensure that no two notes have precisely the identical place to begin or the identical length.
Within the MIDI rating Extra MIDI messages It encodes info similar to time signature, key signature, measure and beat place.
For that reason, Just like sheet music (Music rating) They nonetheless miss vital infoFor instance, pitch notations, ties, dots, rests, beams, and so on.
Time info isn’t encoded in seconds, however in additional musically summary models similar to quarter notes.
Important traits of a MIDI rating
The essential traits of a MIDI rating are All be aware begins are aligned to a quantized gridFor tuplets, a is used which is outlined first by the bar place after which by recursive integer divisions (primarily divisions by 2 and three, but additionally different divisions similar to 5, 7, 11, and so on.).
Examples in Follow
Now let’s check out the rating of the efficiency we simply loaded, the C# model of Bach’s Prelude BWV848. Partitura has a devoted load_score_midi operate.
from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a rating, right here we use Bach Prelude BWV 848 in C#
score_path = Path("Bach/Prelude/bwv_848/midi_score.mid")
print("Loading midi file: ", asap_basepath/score_path)
# load the rating
rating = pt.load_score_midi(asap_basepath/score_path)
# extract the be aware array
note_array = rating.note_array()
# print the dtype of the be aware array (useful to know methods to interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the be aware array
print("First 10 notes:")
print(rating.note_array()[:10])
The output of this Python program is as follows:
Numpy dtype:
[('onset_beat', '<f4'), ('duration_beat', '<f4'), ('onset_quarter', '<f4'), ('duration_quarter', '<f4'), ('onset_div', '<i4'), ('duration_div', '<i4'), ('pitch', '<i4'), ('voice', '<i4'), ('id', '<U256'), ('divs_pq', '<i4')]
First 10 notes:
[(0. , 1.9958333 , 0. , 0.99791664, 0, 479, 49, 1, 'P01_n425', 480)
(0. , 0.49583334, 0. , 0.24791667, 0, 119, 77, 1, 'P00_n0', 480)
(0.5, 0.49583334, 0.25, 0.24791667, 120, 119, 73, 1, 'P00_n1', 480)
(1. , 0.49583334, 0.5 , 0.24791667, 240, 119, 68, 1, 'P00_n2', 480)
(1.5, 0.49583334, 0.75, 0.24791667, 360, 119, 73, 1, 'P00_n3', 480)
(2. , 0.99583334, 1. , 0.49791667, 480, 239, 61, 1, 'P01_n426', 480)
(2. , 0.49583334, 1. , 0.24791667, 480, 119, 77, 1, 'P00_n4', 480)
(2.5, 0.49583334, 1.25, 0.24791667, 600, 119, 73, 1, 'P00_n5', 480)
(3. , 1.9958333 , 1.5 , 0.99791664, 720, 479, 51, 1, 'P01_n427', 480)
(3. , 0.49583334, 1.5 , 0.24791667, 720, 119, 78, 1, 'P00_n6', 480)]
You may see that the beginnings of all of the notes fall precisely on the grid. onset_quarter (third column) As anticipated, we will see {that a} sixteenth be aware is sounded each 0.25 quarter notes.
Durations are a bit extra problematic. For instance, on this rating, the sixteenth notes are quarter_duration 0.25. However from the Python output we will see that the length is definitely 0.24791667. What occurred is that MuseScore, which was used to generate this MIDI file, made every be aware somewhat shorter. Why? To make the audio rendition of this MIDI file somewhat higher. And whereas that definitely works, it comes at the price of inflicting numerous issues for people who find themselves utilizing these recordsdata for pc music analysis. Related issues exist in different extensively used datasets such because the Lakh MIDI Dataset.
Given the variations between MIDI Rating and MIDI Efficiency we have seen, listed here are some normal pointers that will help you arrange your deep studying system appropriately:
We choose MIDI scores for music technology methods as a result of they permit us to specific quantized be aware positions with a really small vocabulary and permit different simplifications, similar to solely contemplating monophonic melodies.
MIDI Efficiency is used for methods that focus on the best way people play and understand music, similar to beat monitoring methods, tempo estimation methods, and emotion recognition methods (with an emphasis on expressive efficiency).
We use each forms of information for duties similar to rating monitoring (enter: efficiency, output: rating) and expressive efficiency technology (enter: rating, output: efficiency).
Extra points
I’ve launched the primary variations between MIDI rating and MIDI efficiency. Nonetheless, as is commonly the case, Issues could also be extra difficult.
For instance, some datasets, such because the AMAPS dataset, are initially MIDI scores, however the authors launched time variations per be aware. Simulate the time deviations of an actual human participant (Word that this may solely happen between notes at completely different time positions; all notes in a chord shall be precisely simultaneous).
Moreover, some MIDI exports, similar to these from MuseScore, attempt to make the MIDI rating extra just like the MIDI efficiency by altering the tempo indication if the tempo of the piece modifications, by inserting very brief silences between successive notes (as we noticed within the earlier instance), or by taking part in grace notes as very brief notes simply earlier than the beginning of the reference be aware.
Certainly, grace notes pose a really tough downside in MIDI scores: there isn’t any specified length for a grace be aware in musical terminology, we simply know that it ought to usually be “brief”. And whereas the beginning of a grace be aware is similar because the reference be aware within the rating, it sounds very unusual whenever you take heed to an audio rendition of the MIDI file: ought to the earlier be aware be shortened to make room for the grace be aware, or the next be aware?
Different ornaments are additionally problematic as a result of they do not have their very own guidelines about methods to play them – for instance, what number of notes a trill ought to comprise, or whether or not a mordent ought to begin on the precise be aware or the be aware above.
MIDI recordsdata are advantageous as a result of they explicitly present details about the pitch, onset, and length of every be aware, which suggests fashions concentrating on MIDI information are smaller and may be skilled on smaller datasets in comparison with, for instance, audio recordsdata.
This comes at a value: MIDI recordsdata, and symbolically encoded music usually, are complicated codecs to work with, as a result of they encode so many alternative varieties of knowledge in so many alternative methods.
To correctly use MIDI information as coaching information, it is very important concentrate on the kind of information that’s encoded. We hope this text has offered a great place to begin for studying extra about this matter.
[All figures are from the author.]

