DS
DistroShield / Blog
Start free trial
Home ยท Case study ยท

Two real fraud catches in production

On the same day Module 4 (recording fingerprint via ACRCloud) was deployed, DistroShield caught two operationally significant cases against an indie distributor's incoming catalog. Both are anonymized below โ€” technical details (scores, match counts, latencies, costs) are real and verifiable on the production API.

Anonymization note. Client identity, uploader identity, and submitted track metadata are abstracted. Technical signals (scores, ISRCs structure, latencies, match windows, costs) are real and reproducible on the production API. Reach out for the non-anonymized version under NDA.

Case 1 โ€” Cross-distributor identity fraud

Module 4 / Recording Fingerprint via ACRCloud

Situation

A client of an indie distributor submitted a 20.5 MB WAV through the standard upload pipeline with metadata claiming a self-authored track. To DistroShield's per-track AI detector (Module 1), the audio looked human-made (ai_score = 0.0069, confidence = 100%) โ€” and indeed the audio is human-made, not AI-generated. To the duplicate-check module (Module 2 โ€” string match against Spotify / Deezer / YouTube), the title and artist had no exact-match red flag. Modules 1, 2, and 3 alone would not have escalated.

What it actually was: the same audio recording was already registered on at least three other distributors, under three completely different artist identities. The submitting client was attempting to be the fourth identity that would receive royalties for the same recording.

What DistroShield caught

The newly-deployed Module 4 hashed the first ~5 seconds of audio and queried ACRCloud's catalog (~70M commercial recordings). Three matches at perfect score:

ScoreMatchISRC issuer codeReleased
100Recording A โ€” Artist Xissuer code Y2023
100Recording B โ€” Artist Zdifferent issuer2022
100Recording C โ€” Artist Wdifferent issuer2022

Three different ISRC issuer codes confirm three different distributors. Three different artist names on three different Spotify / Deezer accounts.

DistroShield's response:

recommendation:                      review
review_reason:                       cross_distributor_recording_fraud
recording_fingerprint.matched:       true
recording_fingerprint.highest_score: 100
distinct_artists_at_perfect_score:   3

The track was held in the distributor's review queue instead of being delivered to DSPs. The reviewer confirmed via the verdict button that the audio itself is human-made (Module 1 was correct: not AI-generated), and the cross-distributor fraud signal led to the submitting client being identified as the impostor โ€” the uploader, not a victim. The track was blocked from distribution.

Important nuance: the verdict button (human / ai / uncertain / hybrid_confirmed) refers to whether the audio is AI-generated, not to whether the track should ship. A track can be verdict: human (audio is real) AND recommendation: review (held due to recording fraud) simultaneously. They are separate decisions about separate questions.

What would have happened without DistroShield

The 4th-identity attempt would have shipped to Spotify / Apple / Deezer / Amazon / YouTube Music. Every play royalty for that recording would split across 4 rights claimants โ€” at least one of whom is the legitimate owner. Eventually one of the legit owners files a DMCA strike against the indie distributor. Possible downstream consequences:

The conservative downside estimate for a single distributor caught delivering one cross-distributor identity fraud incident is $10Kโ€“$50K in payout holds, plus uncapped reputational tail.

Cost to catch it

A single ACRCloud query at the post-trial Pay-as-you-go rate (with bucket-fee and 3rd-party-IDs surcharges this project carries): ~$0.0065.

One pre-upload API call. Cost: less than a cent. Caught a recording attempting to be registered under a 4th distributor identity. Avoided downside: tens of thousands in payout holds plus reputational damage that doesn't wash out.

Why no competitor catches this


Case 2 โ€” Album-coherence false-negative recovery

Module 1 + post-processing

Situation

Same day, same distributor. A client submitted a 5-track EP of bachata-style songs, all generated by the same AI music tool, with the same vocal style, same rhythm, same arrangement language across all 5 tracks. The detector model (v7c) โ€” which runs at ~93% accuracy in production but has a known ~7% FN rate โ€” caught 4 of the 5 tracks as AI:

TrackAI ScoreClassificationRecommendation
10.0224humanPASS โ† false negative
20.7427aiflagged
30.7426aiflagged
40.7418aiflagged
50.7415aiflagged

Track 1 is statistically a false negative. Same generator, same artist identity, same EP delivery โ€” the probability that one track of an AI EP is genuinely human is essentially zero.

What the system did

Per-track inference alone would have delivered Track 1 to DSPs while flagging the other 4 โ€” fragmenting the release in a way that would itself be a flag to the DSP audit team ("why did 4 of 5 tracks of this EP get pulled?").

The album-coherence post-processing layer cross-checks per-track classifications within a release. The rule: if โ‰ฅ60% of tracks in a single-artist release are classified ai, force the remaining tracks to pending_review regardless of their per-track score.

In this EP, ratio = 4/5 = 80% > 60%. Track 1 was forced into the review queue with an album_coherence_anomaly flag. The reviewer listened, confirmed it was AI (the vocal section is what gives it away), and emitted a final verdict via the API. All 5 tracks of the EP held.

What would have happened without the coherence layer

Track 1 distributes to DSPs as a single. The other 4 stay flagged. From the DSP's perspective, the distributor delivered a fragmented release where 80% was pulled โ€” a strong audit signal. Possible outcomes:

The product narrative

The detector is good, not perfect โ€” every ML detector has a per-track FN rate, and at scale the FN is mathematical certainty. The system-level architecture compensates: per-track ML + cross-track coherence = layered defense. When the model's per-track decision is wrong, the redundancy catches it.

This case also produces gold-quality training data for the next model iteration: an explicit FN with context ("model missed this track, but the other 4 of 5 were AI-confirmed"). Training data sourced from real production verdicts is the substrate the detector improves on.

Even with a 93%-accurate detector, the per-track FN rate is mathematically inevitable at scale. DistroShield's release-level coherence check converts isolated false-negatives back into review queue items. The detector keeps getting smarter; the system stays robust while it learns.

Why no one else catches this

LayerPosition in pipelineCatches this?
Spotify / Apple / Deezer auditPost-uploadYes โ€” but after punishing the distributor
Audible Magic / PexRights-holder sideEnterprise-priced, not distributor ingest
Manual reviewDistributor sideDoesn't scale, can't detect cross-distributor identity reuse
DistroShieldPre-upload, distributor-side~$0.40/track, four modules

Postscript โ€” what we learned in the next 24 hours

On the day after Module 4 deployed, a different track from the same distributor was flagged with a single match at score 100 โ€” appearing to trigger another fraud signal. On inspection, the match window was only 2.88 seconds out of an audio sample of ~5 seconds. The fingerprint service was matching a generic drum-machine intro shared by two unrelated indie tracks. A re-query 90 minutes later returned a different match at score 100 (different track, same short window) โ€” confirming that the algorithm was reporting hash collisions on short generic intros, not real fraud.

Same-day fix: increase audio sample sent to the fingerprint service from 1MB (~5-10s) to 5MB (~30-60s), and add a match_window_ms โ‰ฅ 6000 gate for single-match escalation. Multi-artist coincidence (the cross-distributor case above) is independently strong signal and is NOT gated by window length, so the real fraud catch remains intact while generic-intro false positives get suppressed cleanly.

A first version that catches fraud will also produce false positives. The product matures by exposing both the catches and the misses honestly, then engineering the system to discriminate. We shipped the fix the same day, against a real production false positive, and validated it against the original real catch โ€” both still behave correctly.

A product that has only success stories at 24h is suspicious. A product that catches a real cross-distributor fraud, surfaces a false positive, and ships the discrimination fix in the same day is mature.


What we DON'T claim from these

Honesty in the pitch is itself a wedge against sales-led pricing and unverifiable claims.

Run this on your ingest pipeline.

One API call before delivery. Four detection modules. Trial 7 days, 50 tracks free.