The 2009 Data Breach Investigations Report

Wade Baker
April 15th, 2009

Get it free of charge with no sign-up requirements here.

Creating the single-year sequel to a four-year report on over 500 breach investigations is a daunting prospect. While it would be impossible to trump the sheer scope of the original 2008 DBIR, we’ve sought to preserve its strengths and introduce some key enhancements for 2009. Here is some of what you can expect in this release:

First, you’ll notice the report is quite a bit larger than last year. Hopefully it’s worth the extra disk space (which isn’t saying much given current prices) and/or toner (which *is* saying a lot given current prices). Rather than platitudes and pitches, we’ve worked to fill those extra pages with real substance. Everyone loves data.

More data was possible, in part, due to an important methodological change in 2008. Whereas the original DBIR reached back across four years in one massive data collection effort, this data set was assembled periodically throughout the year. This shift from historic to ongoing collection allows for more detail on existing data points and opens the door to new areas of study.

We’ve also listened to your feedback and requests since the last report was released. We couldn’t possibly address everything but we’ve tried to be responsive and accommodating. Your feedback was very much appreciated last year and we covet your input again on this report. Over the next few days, we will roll out posts for each major section in the report. If a section intrigues you, you’ll find pointers in the document to the accompanying post. It’s your opportunity to tell us what you think and what you’d like to see next year.

Finally, 2008 was a crazy year in the world of data breaches. One might argue it was a crazy year in general, but that’s a different discussion. In terms of cybercrime, the bad guys were really busy and, unfortunately, really successful. We saw much of the same in 2008 but new twists and trends undoubtedly emerged. The percentage of breaches in our caseload involving financial service organizations, targeted attacks, and customized malware all doubled in 2008. It’s sure to win me the “Captain Obvious Award” from the Securitymetrics list, but organized crime activity increased and was responsible for over 90% of the 285 million records compromised. The scales continue to tilt more and more toward servers and applications as the point of compromise. I don’t want to spoil the fun so I’ll close this out and let you get to the report.

As with last year, our goal is that the data and analysis presented in the report prove helpful to the planning and security efforts of organizations around the world. Beyond that, we also hope you simply find it an enjoyable read. Cheers.

Tags: , , , , , ,

Comments

  1. I would like to say THANK YOU for providing such a high quality report. EVERY CISO should take a read and re-think/align strategy.

    Adam Mikrut
    CEO
    DigitalStakeout

    Targeted malware evading your controls and stealing your customer data? Find out how DigitalStakeout can help. http://www.digitalstakeout.com

    Posted by: Adam Mikrut on April 15th, 2009 at 10:29 pm
  2. Great report, but perhaps goes a little light on the client-side
    aspects [i.e. how most end-user machines become compromised]. What
    I wanted to ask here, since one way of helping protect against
    such things is to disable hazardous functionality, is if there’s any
    way to convince the website designers behind some of Verizon’s own
    facilities to not rely so heavily on Javascript and other scripting
    functionality being enabled in visitors’ browsing environments?
    Such things may occupy a relatively slim slice of your worldview,
    but I believe it’s becoming a more significant vector and a lot of
    the success is because end users are repeatedly encouraged, nay,
    demanded to leave themselves vulnerable just to see content.

    Verizon has a real chance to be a leader here, with a large base of
    customers that could be advised to tighten up on their sides but then
    the facilities you need them to visit must be reworked to match
    that stance.

    Posted by: Hobbit on April 16th, 2009 at 1:20 pm
  3. @Hobbit

    As to how most end-user machines were compromised:

    Not many were. It’s all about servers. Page 20 in the report gives a breakdown of the 7 instances of malware infection that came via web browsing. Incidentally, malware introduced to client systems contributed to less than 0.02% of all records compromised in 2008. I say contributed because that breed of malware wasn’t the sole cause. The typical scenario was user infected with keylogger/spyware -> authentication credentials stolen -> creds later used to compromise some other system or application.

    As to the use of active content on Verizon web sites: noted.

    Posted by: Wade Baker on April 16th, 2009 at 5:43 pm
  4. I love the rot13 slipped in at the end. Too bad it’s very decipherable!

    Posted by: Mike on April 16th, 2009 at 5:49 pm
  5. Wade -

    Great report, but I have to figure out what you are thinking with pseudo-risk… I am worried because everything I do revolves around r=t*v*c (or my version which is VaR = {threats, vulns, consequences} è VaR = Risk*Consequences) but I can’t get my head around your thought process with pseudo-risk.

    Two big points of interest:

    1) Your population is different than I would use… maybe that is the nature of the application here, but using a % of total incidents doesn’t strike me as legitimate.

    2) There is no uncertainty associated with the number of records – those are actual losses – so no need to discount them further using a likelihood measure.

    If I had my way, I would be using a population of all records available at those entities, and all activity associated with them. That is not particularly practical, but it calls into question the use case for pseudo-risk.

    Pete Lindstrom

    Posted by: Pete on April 16th, 2009 at 5:53 pm
  6. @Mike –

    It is, isn’t it…but what does it mean and what does it point to? Hmmm…

    Posted by: Wade Baker on April 16th, 2009 at 6:00 pm
  7. @Pete

    I’m sure I won’t answer this fully the first time, so please respond and I’ll try to hit what I miss this time.

    As for t*v*c:

    We use that too. The pseudo risk calculation is based on that same equation – at least, as close to it as we can get. We don’t know the threat rate (number of attacks/attempts) and we don’t independently measure vulnerability (1- total effectiveness of all relevant controls) but we do have insight into the product of the two (t*v). That’s likelihood.

    Since it’s beyond the scope of a forensics engagement to fully measure c (see http://securityblog.verizonbusiness.com/2009/04/16/to-dbir-show-me-the-money/), we use the most measurable indicator of consequences available to us – the number of records breached per incident.

    As to #1:

    I know I’m biased, but it seems reasonable to use % of total incidents. If your company suffers a data breach, what’s the likelihood that an insider is involved? According to our sample of 90 breached organizations this year, it is 20%. Do you have another suggestion? Is your suggestion to use the % of all records owned/processed by the company in question? To be completely honest, I’m not sure how to measure that or why we’d need to. We give the median number of records compromised per incident (Fig 7) as well as the total records compromised by source (Fig 8). Those together with the likelihood data give a pretty decent account of which source is the most damaging (I would say risky but I don’t want to conflate things since we don’t have a real measure of C)

    As to #2 above, I’m not sure what you mean.

    Posted by: Wade Baker on April 16th, 2009 at 6:28 pm
  8. @Wade

    Well I’d say it points to the importance of securing information itself instead of focusing solely on access to it. But it has to be a good form of encryption – easily crackable/outdated encryption algorithms do more harm than good.

    Posted by: Mike on April 16th, 2009 at 6:30 pm
  9. Great report, with some predictable stuff, but also some alerting issues.

    One remark though. Starting at around page 30, references to the figures in the report become mixed up. Too many to mention here but I’m convinced you’ll easily find them.

    Once again, great report. Thanks a lot.

    Dirk

    Posted by: Dirk Pauwels on April 21st, 2009 at 12:12 pm
  10. Data Breach Report findings were insightful. The report identified SQL Injection attack second in prevalence and consequently (indirectly) identifies a vulnerability to guard against.

    Is there a breakdown or ‘guesstimate’ for most frequent SQLi attack scenarios that would point to specific controls? Are there more 2nd order attacks where data is being modified or ‘embedded’ versus an immediate result (leak)? Were there more automated SQLi attacks?

    I am interested in details if they are available or can be divulged. Thanks very much for publishing these metrics. –Catherine

    Posted by: Catherine Franke on April 23rd, 2009 at 11:53 pm
  11. Catherine,

    Our IR Team says that the nature of the SQL injection attacks mostly involves basic input validation issues. They are generally found on either custom-coded web-facing applications or on COTS packages (like shopping carts).

    In their experience, these attacks are almost always automated. First, a tool like Data Thief (or one of the many, many other legitimate SQL injection testing tools) is utilized to test for the “possible” existence of the vulnerability. In fact, we’re increasingly seeing more and more examples where the test sequences utilized by common tools are reverse-engineered and scripted to look for more specific variants of exposure in masse. Once that SQLi point of exposure is identified, the test output is written to a staging point (a hijacked webserver or other system between the perpetrator, usually the author of the script) and the victim company. The attack information is then later harvested and manually reviewed in the hope that data of interest will be found.

    An example:

    Step 1 – Staging point system is hijacked.
    Step 2 – Scripted SQLi test is created and uploaded to staging point.
    Step 3 – A predefined list of target IP addresses (ie. known business DSL ranges) is configured.
    Step 4 – Scripted attack is launched which runs thru each IP in the predefined range. The attack code looks for specific exposure (not specific data types). As possible vulnerable hosts are found by the script they are written to a file or otherwise stored on the staging point.
    Step 5 – Perpetrator reviews script findings manually to look for targets of interest. Online stores, brokerage, on-line banking portals, and the like tend to carry data of value (i.e. consumer records or other PII – setting the stage for more insidious forms of identity fraud) and are prioritized.
    Step 6 – Perpetrator manually exploits targets identified in order of perceived priority (read: value).

    The actual attack in Step 6 may come from different IP addresses but inevitably looks the same as the earlier test. For this sort of attack we tend to see months, as opposed to days or weeks, in attack staging and setup (see figure 31 on page 35). So it might be 60 days or more between initial identification of the possible SQLi vulnerability and the subsequent manual attack. Having said that, the lead indicators for this method of testing are quite easily detected and should immediately trigger internal investigation before the security breach blows up and becomes something more damaging, like data compromise. We don’t really recall ever seeing SQL injection exploited when the underlying vulnerability was not detectable through simple scanning utilities (Nessus, Qualys, nCircle, etc.), and usually it’s quite easy to detect both the vulnerability and the exploit while the exploit code is being executed. Unfortunately, our victims _just weren’t looking_.

    Hope this helps…

    Posted by: Alex on April 27th, 2009 at 6:39 pm
  12. This report is highly misleading.

    Every other security report shows you that the majority of breaches are the result of insiders who accidentally lose their laptops, memory sticks.

    I work for my state government and even though it may not be a significant sample size to speak for the entire country but I’ve seen enough newspaper articles to know that that majority of breaches aren’t from external.

    I can’t explain why there is a discrepancy with this report but I highly recommend reconciling this report with another report such as http://www.darkreading.com/security/government/showArticle.jhtml?articleID=211201426

    Posted by: James on May 26th, 2009 at 5:27 pm
  13. @James

    By “reconciling”, I assume you want us to ignore our 5 years of data from 600 cases and just go with what others are reporting? Unlikely. We’ve reported what we found – what else can we do? YMMV, but as for me and any organization I advise or steer… I’ll follow the data.

    I think a lot of the discrepancy has to do with definitions rather than actual data. I refer you to a post from last year on this subject: http://securityblog.verizonbusiness.com/2008/07/07/bogus-biased-or-believable/

    TaoSecurity also has an interesting angle on this: http://taosecurity.blogspot.com/2009/05/insider-threat-myth-documentation.html

    Posted by: Wade Baker on May 27th, 2009 at 9:53 pm
  14. @James

    This report is based on Verizon’s experience via their 90 related engagements last year.

    Why would you hire Verizon to investigate a data breach resulting from a lost USB key or stolen laptop (or for that matter from one of the 1/3 of sysadmins who admit to snooping (aka breach) into confidential or sensitive files that they were not mgmt authorized to view)?

    This report is interesting (more as it relates to Verizon then it does the security industry), but it is not representative of any larger trend nor is it statistically valid outside of Verizon’s 90 engagements.

    Posted by: Dan on June 4th, 2009 at 4:17 pm
  15. @James

    I think you have to take this report at face value. The report never claims to be anything more than statistics that VB have gathered from investigating a few cases of data breaches.

    Can you extrapolate this to all companies and all breaches? No. But it does give some good insight as to what is happening out there behind closed doors. It would be amazing for other companies to deliver such informative statistics to the world at large so comparisons can be made.

    You have to take some of the figures with a pinch of salt. Companies don’t spend money when they don’t have to so most investigations that VB do will be where required by law. That means that they will do more PCI than PPI and more PPI than IP (PCI=credit cards, PPI = info about people, IP = business secrets) This is probably more of an indicator as to what breaches get investigated then what information is stolen.

    Some stats are interesting and need to be investigated – like how much information is stolen where the company did not even know they *had* the information. This makes sense and companies should investigate what information they actually have and don’t know about.

    The fact that web breaches lose more information than lost PCs means that full disk encryption is not worth as much as securing websites. More information is lost due to laptops being stolen but the information on the device is not usually used for cybercrime. Don’t stop deploying full disk encryption but don’t believe it is a silver bullet either.

    Basically, you need to use whatever information you can get including the VB Breach report, gut feel, logs and a good knowledge of your own environment. That’s not easy but then again, you ain’t paid for “easy”. ;)

    Posted by: Allen Baranov on June 8th, 2009 at 1:09 pm

Leave a Comment