IBT — MATLAB Implementation on SP Cup 2017 Database
A real-time tempo induction and beat tracking system implemented in MATLAB.
Based on: “IBT: A Real-Time Tempo and Beat Tracking System” — ISMIR 2010.
Overview
IBT is an agent-based tempo induction and beat tracking system that processes audio causally in real time. It extends the competing-agents strategy of BeatRoot to support continuous, frame-based spectral flux input — making it suitable for live audio, MIR pipelines, and robotic platforms.
The algorithm is evaluated on the ISMIR 2004 Tempo Induction Contest dataset (3199 instances) and a 1360-piece beat tracking benchmark, achieving results competitive with state-of-the-art systems.
Repository Structure
ibt/
├── beatVer1.m # Main script: full IBT pipeline (pre-tracking + beat tracking)
├── specflux.m # Spectral flux onset detection function (STFT-based)
├── peakpick.m # Adaptive peak-picking on onset detection function
├── comp_dgt_fb.m # Filter bank DGT (Gabor transform) — borrowed from LTFAT
├── winfuns.m # Window function generator (Hamming, Hann, Blackman, etc.)
└── open_025.wav # Example audio file for running beatVer1.m
Requirements
- MATLAB R2016b or later (uses
designfilt,audioread,findpeaks) - Signal Processing Toolbox
- No additional toolboxes required (LTFAT helper
comp_dgt_fbis bundled)
Quick Start
- Place all
.mfiles and your.wavfile in the same directory. - Update the filename in
beatVer1.mif needed:[y, fs] = audioread('open_025.wav'); - Run the script:
>> beatVer1 - The script outputs:
- A plot of the autocorrelation periodicity function with detected tempo peaks
beat_final— a vector of estimated beat positions in milliseconds
Algorithm Pipeline
Audio WAV file
│
▼
┌─────────────────────────────────────┐
│ 1. AUDIO FEATURE EXTRACTION │
│ specflux.m │
│ • Hamming window, 1024 samples │
│ • Hop size: 512 samples │
│ • Spectral flux (half-wave rect)│
│ • Zero-phase Butterworth LP │
│ filter (2nd order, IIR) │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 2. PRE-TRACKING (5s window) │
│ │
│ a) Period Hypotheses Induction │
│ • Autocorrelation of SF │
│ • Peak-pick in [50, 250] BPM │
│ (240–1200 ms at 44100 Hz) │
│ • Threshold: δ × rms(A(τ)) │
│ │
│ b) Phase Hypotheses Selection │
│ • 30 uniformly spaced phases │
│ • Beat train template matching │
│ • Best phase per period: φᵢ │
│ │
│ c) Agent Scoring & Ranking │
│ • Raw score: beat–onset error │
│ • Relational score: rewards │
│ integer metrical ratios │
│ • Final score Sᵢ = Sᵢrel / max │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 3. BEAT TRACKING (causal) │
│ beatVer1.m │
│ │
│ Multi-agent system (max 30 agents) │
│ For each onset event: │
│ • Best agent emits beat to output │
│ • Inner window (±46.4 ms): │
│ period & phase correction │
│ P ← P + 0.25·err │
│ φ ← φ + P + 0.25·err │
│ • Outer window (−0.2P, +0.4P): │
│ spawn 3 child agents (C1, C2, C3)│
│ – C1: phase correction only │
│ – C2: period + phase correction │
│ – C3: half-correction (both) │
│ • Agent culling: redundancy, │
│ score gap > 80%, out-of-range │
└──────────────┬──────────────────────┘
│
▼
beat_final (ms)
Key Parameters
All parameters are set at the top of beatVer1.m and are easy to modify:
| Parameter | Value | Description |
|---|---|---|
hop_size |
512 |
STFT hop size in samples |
Plow |
240 ms |
Minimum beat period (= 250 BPM) |
Phigh |
1200 ms |
Maximum beat period (= 50 BPM) |
ind_win |
5 × Fs / hop_size frames |
Induction window length |
Tin |
46.4 ms |
Inner tolerance window (±) |
Tl_out |
0.2 × P |
Outer tolerance — left margin |
Tr_out |
0.4 × P |
Outer tolerance — right margin |
phase_hyp |
30 |
Number of phase hypotheses |
child_score factor |
0.9 (non-best), 0.8 (best) |
Score inherited by child agents |
| Max agents | 30 |
Agent pool size cap |
File Reference
beatVer1.m
Main script. Runs the full IBT pipeline end-to-end:
- Loads audio, computes spectral flux, applies LP filter
- Detects onsets via
peakpick - Initialises beat agents from period/phase hypotheses
- Runs the causal multi-agent beat tracking loop
- Outputs
beat_final(beat positions in ms)
specflux.m
[SF, V0] = specflux(f, win_length, tgap)
Computes the spectral flux onset detection function of signal f using a Hamming-windowed Gabor transform. Half-wave rectifies the magnitude difference between consecutive frames.
| Arg | Description |
|---|---|
f |
Input audio signal (column vector) |
win_length |
STFT window length in samples (e.g. 1024) |
tgap |
Hop size in samples (e.g. 512) |
SF |
Spectral flux vector (1 × N frames) |
V0 |
Raw STFT coefficients |
peakpick.m
peaks = peakpick(SF, thre, range, multi)
Picks significant local maxima from the onset detection function. A peak is accepted only if it exceeds the local mean over an asymmetric neighbourhood [−multi×range, +range] by more than thre.
| Arg | Description |
|---|---|
SF |
Onset detection function |
thre |
Threshold above local mean (e.g. 0.35) |
range |
Neighbourhood radius (e.g. 3) |
multi |
Asymmetric extension factor (e.g. 3) |
comp_dgt_fb.m
coef = comp_dgt_fb(f, g, a, M)
Computes the Discrete Gabor Transform using a filter bank approach. Borrowed from the LTFAT toolbox by Peter L. Søndergaard (GPL). Handles boundary conditions periodically for both leading and trailing edges.
winfuns.m
g = winfuns(name, N)
g = winfuns(name, N, L)
g = winfuns(name, x)
Generates standard window functions by name. Used internally by specflux to produce the Hamming window for the STFT.
Supported windows: hann, hamming, blackman, blackharr, nuttall, gauss, cos, rec, tri, and many Nuttall variants (nuttall10 through nuttall03).
Agent Score Functions
Raw score (within induction window)
For each beat prediction bp vs. local spectral flux maximum m:
If m ∈ Tᵢₙ: Δs = (1 − |err|/Tr_out) · (Pᵢ/Pmax) · SF(m)
If m ∈ Tₒᵤₜ: Δs = −(|err|/Tr_out) · (Pᵢ/Pmax) · SF(m)
Relational score (agent ranking)
Sᵢrel = 10 · Sᵢraw + Σⱼ≠ᵢ r(nᵢⱼ) · Sⱼraw
⎧ 6 − n, 1 ≤ n ≤ 4
r(n) = ⎨ 1, 5 ≤ n ≤ 8
⎩ 0, otherwise
where nᵢⱼ is the nearest integer ratio between periods Pᵢ and Pⱼ.
Child agent generation (outer window)
| Child | Period | Phase |
|---|---|---|
| C1 | P | φ + P + err (phase-only fix) |
| C2 | P + err | φ + (P+err) + err (period + phase fix) |
| C3 | P + 0.5·err | φ + (P+0.5·err) + 0.5·err (half-fix) |
Source
| Original Paper | Matlab Code |