Reflecting on Okta’s Breach

TL;DR

No one should expect services to be perfect. Bugs happen, and they get exploited. Employees make mistakes, accept bribes, etc. We know these things and account for them when choosing to use a third-party solution. That said, finding out that one of your providers suffered a breach is every customer’s worst nightmare. 

This past week, the tech industry was startled by the announced breach of leading IDP Okta by a hacking group called LAPSUS$, known for recently leaking data and code from huge organizations like Microsoft, Samsung, LG, and NVidia.

In this article, we’ll briefly review what happened and then look at several lessons learned.

Contact with the Enemy

I knew it would be a hard day for everyone when my day started with a tweet that a group of hackers, LAPSUS$, had gotten administrative access inside Okta. Immediate inspection of the posts showed worrying evidence of compromise, including:

  • access to customer accounts, specifically Cloudflare’s users
  • access to Okta’s JIRA instance
  • access to thousands of Slack channels 

Furthermore, the screenshots pointed to access as far back as January, meaning customers could be looking at months of malicious access. 

We’ve all heard some variation of the quote “No plan survives first contact with the enemy,” the base assumption being you need a plan.

Per the plan, we assembled the relevant security and ops personnel and we began while assuming the worst:

As Okta released more information, the severity of the system breach remained unclear. At first, they played down the severity, claiming that it was only a contained “attempt to compromise” a support engineer’s account. Later communications were adamant that:

  • Only 366 customers, not including Forter, could have been affected
  • No customer action was required

LAPSUS$, in response to the posts above, added claims of obtaining AWS access keys from the breached Slack channels, implying additional and unaddressed attack vectors. 

Lessons Learned

Insider Attacks are Real

Whether maliciously motivated, financially motivated, or honestly accidental, insider threats are the hardest to detect and the most likely to succeed. Rather than relying on good intentions and wishful thinking, take prophylactic steps to protect your business:

  • Hire great people with good moral fiber and compensate them well. That’s the minimum but it’s not enough.
  • Implement least privilege and separate concerns like you mean it. 
  • Require active support tickets, initiated or approved by your customers to perform any support operations at the user level or which involve access to confidential information. 
  • Interfaces should not reveal PII when tokenization would suffice, e.g. the email addresses of Cloudflare users in the screenshots could have been replaced with internal identifiers.
  • Pay particular attention to “boring” areas – provisioning flows, password reset, device enrollment.

Implement independent controls

This is the real meaning of Zero Trust Architecture. When Stephan Marsh, in his 1994 thesis Formalising Trust as a Computational Concept, first coined the term, he cited the blind trust attributed to users after login as a crucial factor enabling disasters like the Internet Worm People in 1988. 

Recently this concept has been popularized as Perimeterless Security, encouraging people to throw away their VPNs, expose their internal services directly to the Internet, and rely on independent authentication to each service for protection. 

Imagine such services relying solely on compromised Okta, or other IDP services to perform that authentication. Adopting a naive interpretation of Zero Trust Architecture has only shifted the blind trust they worried about from their VPN to their IDaaS.

For this reason, I still recommend authenticating internal services behind another layer, such as an independently authenticated VPN. The VPN establishes the least privilege necessary to attempt connections with protected services making even the total compromise of the primary IDP insufficient to cause a breach alone. In Marsh’s terms, we don’t use the VPN to attribute trust, only to establish an outer perimeter of distrust. 

There are additional variations on the idea, e.g. IP address allow-lists, mTLS, device identities, each with its strengths and weaknesses. However we implement them, the key is using independent controls to give us, and our customers, peace of mind when stuff hits the fan.

Verifiable is Reliable

In Marsh’s work on trust, he posited that trust or distrust requires some prior knowledge. I’d add that knowledge requires facts, not belief or assumption. To facilitate trust in manufacturing, we have quality checkpoints like QA processes testing samples from batches, inspection on delivery, etc.

In the XaaS industry, there are no checkpoints where the customer can verify that everything is up to specification. Services are moving targets, sometimes deployed hundreds of times per day, relying on employees the customer never interviewed, and third-party services the customer never vetted. 

As such, to truly trust a provider, customers need continuous evidence on which to base that trust. Sometimes, the customer can obtain that evidence independently. Other times, the provider must provide evidence to prove themselves trustworthy:

Technical Verifications

Okta has above average transparency, in terms of logging actions in the application, so companies theoretically have the logs they need to identify any compromise on the applicative level.  If, however, Okta was infiltrated at a lower level, ie. via the AWS access keys supposedly seen by the attackers on Okta’s internal Slack channels, application logs are not enough. 

If attackers disabled the MFA requirement for a user’s account at the DB level, for example, it could escape notice in application logs but still have the desired malicious effect. Using IaC tools like Terraform may help independently identify changes like these by noticing the undesired state and, hopefully, correcting it. 

If attackers made malicious changes to the code, the application might stop enforcing the policies configured in the DB regardless. Next level organizations will write tests for their configurations to detect service level corruptions and validate that security policies behave as expected, i.e. preventing impossible travel logins, requiring MFA, etc.

For comparison, AWS charges exorbitant fees to log read or write access to S3 objects which is completely irresponsible. The XaaS industry, as a whole, must stop charging extra for basic security functionality.

Organizational Verifications

On the organizational level, this type of incident brings up questions about the trust that companies place in their providers. 

We place a certain level of trust in publicly traded companies because this implies some level of regulatory oversight, but there are limits. Tech governance is immature and, in fact, the SEC only recently proposed amendments to its rules which would require companies to disclose any material cybersecurity incidents. In the meantime, being traded might encourage accountability to the stock price over accountability to the customers.

Others trust in the contractual obligations between the customer and the provider, but words and the associated legal gymnastics are cheap. What use are contracts if:

  • We don’t know the contract was breached
  • We can’t prove the contract was breached
  • Knowing the contract was breached and moving to a new provider will cost months of unplanned work.

I am not saying we don’t need contracts, just that we need enforceable contracts.

When it comes to transparency and verifiable foundations for trust in the XaaS industry today, I feel like SOC II attestations, though not perfect, are our greatest hope. Unfortunately, organizations have no obligation to involve their SOC II auditors in these situations. It could be months before their next audit, and there is no guarantee that the auditor will investigate every incident. 

Companies committed to transparency and establishing a strong foundation of trust with their customers should go beyond the letter of the law, proactively involving their auditor in their real-time incident response (not at their next review). They should commission the auditor to release a detailed, unbiased review of the incident without weasel words or legal gymnastics.

Love thy customers as thyself

As I wrote above, no one expects services to be perfect. The question is never will there be a breach, but when there is a breach, how will it be handled? Ironically, Okta posted a tweet a while back asking, what percentage of customers will cut ties with a company if it suffers a data breach?

I think the answer lies in this adaptation of the biblical commandment to love thy neighbor – love thy customer as thyself. If we want our customers to stick with us through the bad times as well as the good:

  • First, put ourselves in your customers’ shoes
  • Accept responsibility for the breach to come, don’t look for excuses
  • Do our best to prevent the breach
  • Do our best to minimize the blast radius of the breach
  • Do our best to detect the breach so it’s not Twitter telling customers that something’s wrong
  • Communicate the risk of a breach to customers as soon as we become aware of it so they can take steps to protect their systems and their customers
  • Don’t play games