There is an idea worth dismantling before any serious conversation about predictive maintenance in MRI: a quench is rarely an event. It is almost always the visible end of a progression that had been developing for hours or days without anyone reacting. The difference between teams that avoid quenches and teams that pay for them is not access to different signals — everyone has the same ones — but reading the progression in time.
This article is not a catalogue of what to monitor. The MRI monitoring cluster page covers that. Here I describe what the 72 hours before an avoidable quench actually look like, chronologically, and why most biomedical engineering departments only find out when there is nothing left to do.
H-72: the drift starts — but "it's still in range"
The classic pattern starts three days before the final failure. Something changes slightly — compressor cycle gets a few seconds longer, cryostat temperature climbs half a degree above its historical baseline, the coldhead starts behaving subtly differently. Nothing is out of threshold. That's why no alarm fires. When someone reviews it days later, after the incident, the drift is obvious in retrospect. Live, it was acceptable noise.
There is an important detail here: in many services, the clinical technician could have seen the drift if they had had a comparative view against the nominal operation of that specific equipment. Not "a generic MRI". That specific unit with its six years of history. But that view doesn't exist by default in the vendor console.
H-36: the first alerts appear — and get silenced
A bit more than a day before the event, vendor thresholds start to get touched. There are alerts. But they tend to be alerts the team has already seen before in situations that didn't end in anything — a scheduled service day, a room with slightly higher temperature because the AC was acting up. The operational response is the logical one: "we're watching it, if it happens again tomorrow we'll escalate".
This is what is hardest to fix from the outside: the problem isn't missing information, it's that the information arriving doesn't differentiate enough from the usual noise. An alert without historical context looks like the thousand previous alerts that ended in nothing.
H-12: the system starts to actually strain
In the last hours before the failure, behaviour changes. Compressor cycles become irregular, thermal drift accelerates, and small decouplings start to appear between magnet pressure and cryostat temperature that aren't normal. At this point the quench is almost certain — but it's also the point at which most departments discover they didn't have visibility on these variables, or that they had them on screens nobody was watching at 3 AM.
The most frustrating thing about post-mortems is the same almost every time: when the technical team reviews logs the Monday after the quench, all the data was there. It was always there. What was missing was the layer that makes it operational in real time.
H-0: the quench, and the first invoice
The quench itself lasts seconds. What lasts weeks is everything else:
- The helium refill (~€30k–€100k depending on equipment and country)
- The days or weeks with the scanner out of service while the magnet cools back down
- The manufacturer's review to validate the system is back to nominal — and visit lead times that are rarely urgent
- Cancelled appointments — between 50 and 300 patients displaced per week of downtime, depending on the center
- For single-MRI groups, referrals to external centers with their associated cost
- Potential damage to nearby equipment in the room if the venting didn't work as it should
The total cost of a well-measured quench — not just the refill — is usually between five and ten times the refill cost. That part rarely appears in the math when someone asks whether observability is worth the investment.
The asymmetry that matters
What separates a service that avoids quenches from one that pays them is asymmetric:
- Capturing the right signals: feasible for almost any service. Vendor consoles expose them.
- Seeing them in real time with historical context for the specific equipment: that's where the flow breaks.
- Having a technical team that reviews them before they cross threshold: organisational, not technical.
The hard part isn't the first. It's the second and the third. That's why most services have "monitoring" — every manufacturer sells it — and still have quenches that could have been avoided.
If your department operates MRI scanners and you want to see how this translates into actionable alerts with historical context per unit, let's talk. And if you want the panoramic view of what to monitor in MRI, the MRI monitoring cluster covers it in full.