Plane crashes and security breaches

admin
March 11th, 2010

by Christian Moldes

In Outliers, Malcom Gladwell analyses how plane crashes are the result of a combination of errors. I found this analysis very interesting because of the similarity with most security breaches. A brief excerpt of his book:

“Plane crashes rarely happen in real life the same way they happen in the movies. Some engine part does not explode in a fiery bang. The rudder doesn’t suddenly snap under the force of takeoff. The captain doesn’t gasp, “Dear God,” as he’s thrown back against his seat. The typical commercial jetliner – at this point in its stage of development – is about as dependable as a toaster. Plane crashes are much more likely to be the result of an accumulation of minor difficulties and seemingly trivial malfunctions.

The typical accident involves seven consecutive human errors. One of the pilots does something wrong that by itself is not a problem. Then one of them makes another error on top of that, which combined with the first error still does not amount to catastrophe. But then they make a third error on top of that, and then another and another and another and another, and it is the combination of all those errors that leads to disaster.”

Security breaches happen exactly like that. They are the result of a combination of minor or seemingly insignificant errors. Let me illustrate this. A few years ago, a merchant suffered a breach, and its case is one of the best examples for this topic. Their e-commerce website was developed in-house but some of the components had been developed by a third party. The application had been thoroughly reviewed for security vulnerabilities and none had been identified as risky. However, one of the components was not reviewed, it was added a few days after the application review had been completed, and since it was not related in any way with payment transactions, it was deemed as non-critical.

The merchant had a network IDS which was maintained and monitored by a MSS (managed security services) vendor. The device had signatures that were able to recognize SQL injection attempts and they were supposedly enabled. One of the vendor’s security analysts disabled rules monitoring attacks on port 80 and 443 for the e-commerce servers. This was probably because they generated many false-positive alerts, and was most likely intended as a temporary action. As a result, none of the attacks and unusual traffic on those ports was detected by the IDS.

The e-commerce site was using a trusted relationship to connect to the database. Credit card numbers had been encrypted in the database a few months ago. During the process as a contingency plan, the DBA exported the tables containing sensitive data before encrypting some of the columns. The backup files had been left on the database server since then.

Hackers found the security vulnerability in the e-commerce website; the third-party component was vulnerable to SQL injection. By exploiting the vulnerability, they were able to create local administrator accounts on the database server and run OS commands with local administrator privileges. Unfortunately, since the IDS was not monitoring traffic on ports 90 and 443, none of the SQL probes was detected by the IDS, nor was any other unusual traffic on those ports. Remote management tools were installed and password hashes were cracked off-line. The hackers reviewed every folder on the web server looking for scripts, source code, and data files. They found the backup files left behind by the DBA.

The merchant was only aware of the intrusion several months after the fact, when they were notified by law enforcement agents that their data was on sale on one of the carders websites.

This case clearly illustrates that even when proper security controls are in place, a breach could happen at any moment. Relying on single controls or single layers of security is never sufficient.

The case also illustrates the need to assess security controls independently of any other surrounding security or other layers of security. QSAs and internal staff in charge of PCI DSS compliance should not consider risk-based discussions until all the security controls have been independently assessed.

Comments

  1. This is an interesting blog and analogy. I’m curious about the example you cite: unlike a plane crash where you have a team of two pilots working in concert that then make a series of mistakes together, your example is one in that the people making the mistake had no hand in the original error. Is it your contention that distributed security management — either internally or by service providers — can increase the potential for errors and, consequently, breaches?

    Posted by: Larry Walsh on March 12th, 2010 at 1:30 pm
  2. At the risk of showing internal disagreement, I’d like to say that, depending on what you mean, I’m not completely on board with the last sentence. We assess risk all the time with a less than perfect understanding of an environment (in life and in security). The unknown is part of the risk assessment. In fact, if there is no uncertainty, then it isn’t risk by definition. Perhaps I’m misunderstanding that last point.

    Posted by: Wade Baker on March 13th, 2010 at 1:20 am
  3. Larry,

    You provided an interesting view of the analogy that I did not consider previously.

    Let me explain my post first, during PCI DSS assessments, many times clients argue that even though they don’t meet some of the requirements, they are not at risk because there are other controls or layers of security that compensate the lack of the initials controls.

    My point using this analogy is that even after having implemented all the required controls, you are still at risk. That is why QSAs or any other security auditors when conducting an assessment should assess every security control by itself. Once all the controls have been assessed, we can determine the reduced risk or exposure and start discussing how to treat the reduced risk.

    Now to your question, distributed security will always have a higher risk than centralized security. The more individuals responsible for security, the more opportunities for some of them to make an error that the rest of the team may not be aware of it. Unless, you can monitor or audit all the parties, which have been assigned security responsibilities, you are always at risk of falling into a false sense of security.

    Posted by: Christian J. Moldes on March 13th, 2010 at 3:16 am
  4. Wade,

    I hope the previous post will provide clarification.

    Posted by: Christian J. Moldes on March 13th, 2010 at 3:17 am
  5. So today is the day Cybertrust finally died. When a blog posted on this site says “should not consider risk-based discussions until all the security controls have been independently assessed.”, you know its time to get out.

    And since its clearly just an advertisement for Verizon’s PCI QSAs (and no other practitioner will do), your old Cybertrust team must have been replaced by the marketing department. This is so full of FUD, you will probably want to change the writer’s diaper before it explodes: “a breach could happen at any moment.” Oh my!

    Rest in peace, ye olde Cybertrust, we hardly knew you.

    Posted by: Jim Jones on March 13th, 2010 at 2:23 pm
  6. Hey Jim,

    Sorry you feel that way. I can assure you we’re still here and thriving. In fact, we’re just now getting used to our snazzy red and black uniforms. You’ll notice that I had the same question as you regarding the last sentence in this post. If we had a blog in which everyone had the same tightly controlled message, it wouldn’t be much of a blog. It’d be a series of mini-whitepapers. We’ve got a collection of different folks giving different perspectives on different topics. You’re bound to disagree with some of them. The day you disagree with everything everyone says on this blog you can put the final nail in the Cybertrust coffin and give a nice eulogy. Until then, we’d sure appreciate you continuing to check in with us and let us prove to you that we’re still kicking.

    Posted by: Wade Baker on March 13th, 2010 at 7:09 pm
  7. A few thoughts…

    Normal Accident Theory. Some of what you describe in the early part of the post falls in line with Normal Accident Theory concepts. A beautiful explanation of this in the context of plane crashes can be found in chapter 18 of the 587 page of “The Nimrod Review”. I think normal accident theory has its place in the discussion of some security incidents. From the report… “‘Normal Accident Theory’ holds that, when technologies become very complex and ‘tightly coupled’, accidents become inevitable and therefore, in a sense, ‘normal’.” In our case, incidents (at least a subset of them) could be considered “normal” (unintentional security or operational failures that lead to an incident).

    Threat Modeling. There is absolutely a need for risk-based assessments in the quagmire of PCI compliance. The accuracy of an assessment may become better when more information about surrounding controls is available – but that should not preclude one from taking a risk based approach. Back to the “Nimrod Review” – on page 465 of the report there is an illustration of the “swiss cheese” model depicting the number of controls that would have to fail to result in a loss. In the world of risk assessments – there is a relationship between any given control and its surrounding controls – in terms of threat event frequency and the capability a threat needs to have to overcome these controls.

    Finally, there has to be a balance between time and accuracy. Most information security folks like to peel back all layers of the onion before coming to a conclusion. In reality, this is not always achievable for each and every security / risk assessment we perform. For some identified issues a thorough risk assessment may be warranted – but not every one of them; else we wind up with analysis paralysis.

    Posted by: Chris Hayes on March 15th, 2010 at 12:18 pm

Leave a Comment