Noise floor & ACX compliance, before you submit.
Drop your audio file here WAV · MP3 · M4A · FLAC · AIFF — up to 600MB
Use the last N seconds as a room tone reference
Best if your session ends with silence or room tone after mastering. Adjust the sample window slider to match how much tail you typically leave.
→ Best for: narrators who leave room tone at the chapter tail
3.0s
How many seconds to borrow from the start or end of your file.
Drop your room tone file here, or click to browse
3.0×
500ms
Uploading…
Scan Results
Duration
Baseline noise floor
Issues found
Threshold used
ACX compliance
RMS loudness
Target: -23 to -18 dB
Peak level
Target: -3 dBFS max
Noise floor
Target: -60 dB or lower
Noise floor timeline — flagged sections in red · baseline in amber
Flagged sections
How Audiobook Spec Check works
Audiobook Spec Check listens for sections in your mastered audio where the background noise is louder than it should be. This usually happens when normalization amplifies a blank gap that has no real room tone behind it — a cut that didn't get filled, a silence that wasn't attached to anything. The listener hears it as a burst of static or hiss between sentences. Spec Check finds it before your client or platform does.
1
Drop your mastered file

Upload the final mastered version of your chapter — the file you'd submit to ACX, Findaway, or your client. WAV, MP3, M4A, FLAC, and AIFF are all supported.

Non-WAV files are converted internally before scanning. Your original file is never modified.

2
Choose your baseline mode

NarScan needs a sample of what your room tone should sound like — a few seconds of quiet background from the same recording session, after mastering.

Pick the mode that matches your workflow. See the baseline guide below for help choosing.

The baseline must be from the same mastering chain as your content. A room tone sample processed differently than your chapter will give inaccurate results.
3
Set your threshold and duration

Threshold (3.0×) — how much louder than baseline a section must be to get flagged. At 3.0×, a section needs to be 3 times louder than your normal room tone. Lower = more sensitive.

Min duration (500ms) — shortest anomaly worth reporting. Prevents brief breath sounds or transients from cluttering the results.

4
Read the results

The timeline shows your file's noise floor at a glance. Grey bars are quiet non-speech sections. Red sections are anomalies. The indigo dashed line is your baseline.

The table lists each flagged section with its timestamp, duration, dB level, and severity. Take those timestamps into your DAW, listen, and fix what's real.

Severity 10× or higher is audible static — fix it. Under 4× at a 3.0 threshold is borderline — use your ears before acting.
Which baseline mode should I use?
End of file
Use the tail

Borrows the last N seconds as its reference. Adjust the sample window slider to match how much room tone you typically leave at the end.

→ Best for: narrators who leave room tone or silence at the chapter tail

Start of file
Use the head

Borrows the first N seconds instead. Use this if you record a few seconds of silence before you start speaking, or if your tail has music or outro content.

→ Best for: narrators who record room tone at the top of the session

Separate file
Upload room tone

A dedicated file from the same session, after mastering. No guesswork — most accurate baseline possible, independent of file length or structure.

→ Best for: critical submissions, production studios, or when head/tail aren't reliable

Glossary
Noise floor
The level of background sound when no one is speaking — your room's ambient hiss, hum, and air handling. Every recording has one. The problem is when it varies unexpectedly within the same file.
Room tone
A recording of your room's silence. Captured at the start or end of a session, it's used to fill edited gaps so the background sounds consistent throughout the chapter.
Normalization
A mastering step that raises the volume of your whole file to hit a target level (e.g. -23 LUFS for ACX). It raises everything uniformly — including any gaps that have no room tone behind them, which become audible as raised hiss or static.
Baseline
The RMS level NarScan uses as "normal" for your file. All non-speech sections are compared against it. A section at 5× severity means its RMS is 5 times louder than your baseline.
RMS
Root Mean Square — a measure of average audio level over a short window. More reliable than peak levels for detecting noise, because noise is continuous rather than momentary.
Severity ratio
How many times louder a flagged section is compared to your baseline. 3× is the minimum threshold. 10× or higher is audible static that will almost certainly fail a platform QC check.
Threshold
The multiplier at which NarScan flags a section. At 3.0×, a quiet section must be 3 times louder than your baseline to appear in the report. Lower the threshold if you're missing real problems; raise it if you're getting too many false flags.
Min duration
The shortest anomaly worth reporting, in milliseconds. At 500ms, a noise spike shorter than half a second won't appear in the results. Raise this if breath sounds or transients are cluttering your report.