Data Breach Explained: Causes, Types and Real Examples

Last year, a finance worker at some company in the US got a video call from his CEO. The CEO told him to transfer $25 million urgently. The worker did it. The thing is — it wasn’t the CEO. It was a deepfake. The whole call was AI-generated. By the time anyone figured out what happened, the money was gone.

That’s not a plot from a sci-fi movie. That actually happened in 2025. And it’s not even close to the worst data breach story from the past year or so.

So let me break down what a data breach actually is, how it happens, what the main ways attackers get in are, and honestly — how AI has changed all of this in ways most people don’t fully understand yet.

What Is a Data Breach, Actually?

Simple answer: someone accessed information they were not supposed to access. That’s it. Could be your email and password. Could be your medical records, your Social Security number, your bank account. Sometimes it’s 1,000 people affected. Sometimes it’s 16 billion records — like what happened in June 2025 when a massive credential leak exposed data from Google, Apple, and Facebook accounts combined.

Sixteen billion. Let that number sit for a second.

The data that gets stolen usually falls into a few categories. Login credentials are the most common — your username and password. Then there’s PII, which is “personally identifiable information,” meaning your name, date of birth, address, phone number. Then financial data — account numbers, routing numbers, credit card details. Medical records are also a big one, especially because healthcare data sells for a lot more on the dark web than credit card numbers do. A stolen credit card is worth maybe $5 to $20. A full medical record? Closer to $250 to $1,000, because it can be used for insurance fraud, fake prescriptions, a dozen other things.

How Attackers Actually Get In

This is the part most articles mess up. They describe hacking like it’s someone in a hoodie frantically typing code to “break through firewalls.” Real breaches don’t usually work like that. Most of the time, attackers don’t break in. They log in. A PKWARE analysis in May 2026 described it plainly: almost every major incident that month started with a person, not a technical flaw.

Phishing is probably still the number one method. You get an email that looks like it’s from your bank, or from HR, or from Microsoft. You click a link, type in your credentials, done. The attacker now has your login. Phishing has been around forever but it’s gotten dramatically harder to spot in the past two years because attackers are now using AI to write the emails. By the end of 2025, cybercriminals were using large language models to craft more convincing fake personas and more effective phishing messages. The grammar used to be a giveaway — bad spelling, weird phrasing. That doesn’t work as a filter anymore. 82.6% of analyzed phishing emails in 2026 show some form of AI assistance.

Social engineering is a broader version of the same idea. Instead of a fake email, an attacker might call your IT helpdesk pretending to be an employee. They sound convincing, they know some basic details about the company, and they talk the helpdesk person into resetting a password or giving them access. The M&S ransomware attack in April 2025 — which disrupted both online and in-store operations at the UK retailer — started with social engineering targeting a third-party vendor and M&S help desk personnel. One phone call. That’s all it took to get inside a company that size.

Third-party vendors are honestly one of the biggest problems right now and not enough people talk about it. Your bank or hospital might have excellent security. But they use dozens of outside companies — for billing, for customer service software, for cloud storage. Those vendors have access to your data too. And they might not have the same security standards. In 2025, unsecured third-party Salesforce databases played a role in the majority of major data breaches that year. The Allianz Life breach in July 2025 happened through a third-party cloud CRM — attackers used social engineering to get in and then just used the legitimate export functions inside Salesforce to pull out data at scale. They didn’t need to write a single line of exploit code.

Ransomware is different from pure data theft. Here, attackers get into a system, encrypt everything, and demand payment to unlock it. Sometimes they also steal the data and threaten to publish it if you don’t pay. Kettering Health in Ohio found out in May 2025 that their network had actually been compromised a full month earlier — the Interlock ransomware group claimed to have stolen 941 GB of data, and when Kettering refused to pay, Interlock leaked it. By April 2026 the final count showed over 1.69 million patients had their data exposed — names, Social Security numbers, medical records, financial account numbers.

Then there are insider threats. Someone who already works at the company, or used to work there, and uses their access to steal data. This one’s hard to detect because the person is supposed to be in the system. The Coupang breach exposed 33.7 million customers through an insider threat. Credentials were there. Access was there. Nothing looked suspicious until the data was already gone. And that’s kind of the defining feature of insider threats — by the time the damage is visible, months might have passed. Legend Senior Living discovered a breach in March 2026 that had been sitting undetected since July 2025. Nearly a year of exposure. The incident at Legend only affected around 5,000 people, which is small by modern standards, but it shows how long these things can quietly run before someone notices.

And finally, exposed APIs and misconfigured databases. This is maybe the most embarrassing type of breach because it’s often not really “hacking” — it’s more like leaving your front door wide open. Developers set up a database in the cloud and forget to add proper access controls. Anyone who knows the URL can just… pull the data. No credentials needed. An exposed API was the weak link in the Navia breach that impacted 2.7 million people — data exposed between December 2025 and January 2026 before anyone noticed.

The AI Problem

Here’s where things get more complicated and honestly, a bit worrying.

AI is changing what attackers can do, and the change is not small. AI tools now allow attackers to automate reconnaissance — scanning vast attack surfaces to identify vulnerabilities faster than any manual method could — generate exploits including polymorphic malware that adapts to evade detection, and coordinate full attack chains from initial access to data theft with minimal human input.

Let me break that down a bit. Reconnaissance used to mean a hacker manually researching a company — looking up employees on LinkedIn, finding email formats, checking for old data dumps. Now you can point an AI tool at a target and it does all of that in minutes. OSINT tools enhanced with AI can build a detailed profile of any person within minutes — pulling public records, social media, data breach databases, and approximate income levels.

Spear phishing — targeted phishing that’s customized for a specific person — used to take real effort. An attacker would have to manually research you, figure out your writing style, know who your boss is. AI can do that now for thousands of targets simultaneously. Attacks are built on behavioral data, trained to mimic writing styles, and increasingly supported by deepfake voice and video. That finance worker who sent $25 million? The call looked real. The CEO’s face looked real. The voice matched. There was no way to tell without some very specific technical verification that almost nobody does on a regular video call.

And then there’s autonomous AI attacking. We’re not fully there yet, but we’re getting close. Armis head of threat intelligence Michael Freeman predicted that by mid-2026, at least one major global enterprise will fall to a breach caused or significantly advanced by a fully autonomous agentic AI system. These systems can plan, adapt, and execute an entire attack lifecycle — from finding the vulnerability to stealing the data — without a human giving instructions at each step. The UK’s NCSC is a bit more cautious and says fully automated end-to-end advanced attacks are probably not common before 2027, but the direction is clear.

In early 2026, a campaign targeting about ten Mexican government entities used AI tools throughout the operation. Researchers found the attacker exploited at least 20 vulnerabilities and used AI to accelerate reconnaissance, exploit development, and data theft. Ten government organizations compromised in one coordinated operation. A few years ago that would have required a large, well-funded team. Now it might not.

I think the honest summary here is that AI hasn’t created new types of attacks — phishing, ransomware, social engineering, these are all old concepts. What AI has done is make them faster, cheaper, and more targeted. A low-skilled attacker can now run a sophisticated spear-phishing campaign because an AI handles all the hard parts. That’s the shift.

Who Gets Hit the Most

Healthcare is far and away the most targeted sector. According to the HIPAA Journal, an average of 47 data breaches were reported each month in the healthcare sector between September 2025 and January 2026. Hospitals can’t afford to have their systems down — patients need medication records, surgery schedules, everything. That makes them more likely to pay a ransom fast. Also, medical data is worth more on the dark web than credit card data, so the financial incentive for attackers is higher.

The Conduent breach, initially reported as affecting 42,616 individuals, eventually turned out to have exposed the sensitive data of over 15 million people in Texas alone, plus millions more across other states. That number kept growing for months after the initial disclosure. This happens more than people realize — a company reports a breach, says a few thousand people were affected, and then the real scope comes out months later.

Small businesses and regular people also get hit more than you’d think. The FBI reports that over 70% of AI cyberattack victims in 2025–2026 were individuals and small businesses with fewer than 50 employees. Big companies have security teams. Small businesses often have one IT person who’s also doing five other things.

Government systems are a target too. In January 2026, Target employees reported that internal code and developer documentation had been stolen — about 860 GB from several repositories, which were then released publicly. Same month, the US ICE agency had databases exposed online. These aren’t isolated incidents.

What Happens After a Breach

Your data gets stolen. Then what?

A lot of it ends up on dark web markets and forums. BreachForums — a well-known hacker forum where stolen data gets sold and traded — itself got breached in January 2026, exposing over 324,000 accounts. Bit of karma there, honestly. But most stolen data just quietly gets listed for sale, sometimes within hours of the breach.

From there it goes into credential stuffing attacks. Attackers take your username and password from one breach and try it across hundreds of other websites automatically. AI-powered credential stuffing tools test stolen combinations across hundreds of platforms simultaneously, and the situation is made worse by the fact that 65% of people reuse passwords across multiple accounts.

If it’s financial data, it might be used directly for fraud, or sold to someone who specializes in that. Medical data gets used for insurance fraud. SSNs go toward identity theft — opening credit cards, taking out loans, filing fake tax returns in your name. This stuff can follow you for years.

The average cost of a data breach was $4.4 million in 2025, and that’s just the direct cost to the company — investigation, notification, legal fees, credit monitoring for affected customers. That doesn’t count the lost business, the reputation damage, or the class action lawsuits. Yale New Haven Health settled for $18 million after their 2025 breach.

What You Can Actually Do

Some things you have real control over and some things you don’t. Let me be direct about that.

You can’t stop a hospital from getting ransomwared. You can’t control whether some third-party vendor your bank uses has weak security. That part isn’t in your hands.

But credential stuffing works because people reuse passwords. So use a password manager — Bitwarden is free and works fine — and make each account have a unique password. This alone cuts your risk significantly. Then turn on two-factor authentication everywhere you can, especially email and bank accounts. SMS-based 2FA is better than nothing but physical security keys like YubiKey are much stronger, especially since AI-powered SIM-swap attacks have made SMS-based verification easier to bypass.

Check haveibeenpwned.com. Seriously, just go do it right now. Type in your email and it’ll tell you every known breach where your data showed up. I checked mine last month and found two old breaches I had completely forgotten about.

Freeze your credit if you’re in the US. It’s free, takes maybe 20 minutes across the three bureaus (Equifax, Experian, TransUnion), and it stops anyone from opening new credit in your name. You can temporarily lift the freeze when you actually need to apply for something.

Be suspicious of anything that asks for urgency. Phishing and social engineering both rely on making you act fast before you think. “Your account will be suspended in 24 hours.” “This payment needs to go out today.” Slow down. Verify through a separate channel before doing anything.

If your company uses shared logins or doesn’t have a clear offboarding process for employees who leave, that’s a real risk. IT teams sometimes forget to revoke access for people who quit or get let go. Former employees — especially ones who left on bad terms — can sometimes still log in months later. This is basic stuff but organizations mess it up more than they should.

One more thing: if a company you use sends you a data breach notification email, take it seriously even if it says “no financial data was affected.” Your name, email, date of birth and phone number together are enough for someone to try to social engineer you. That combination is basically a starter kit for targeted phishing.

A Honest Look at Where This Is Going

Data breaches are not going away. That’s just the reality. The economics are too good for attackers — low cost, low risk, high payout. AI has reduced the skill barrier even further, which means more people can do this now.

But the defenders are also using AI. Automated detection systems can now spot unusual access patterns faster than any human analyst could. The use of advanced automated detection and response tools can reduce breach identification and containment time by roughly 80 days, saving nearly $1.9 million compared to environments without these tools.

The thing that hasn’t changed is that most breaches still involve a person making a mistake. Someone clicking a phishing link, a helpdesk agent giving access they shouldn’t have given, a developer forgetting to lock down a database. Technology helps, but it doesn’t fix the human part of this. That’s still where most breaches start.

So yeah, your data has probably already been in a breach. Maybe more than one. The question is what you do about it on your end, and whether the companies handling your data are treating security as a real priority or just something they’ll deal with after it goes wrong.