Failure is unavoidable; and although it might seem counter-intuitive, learning to fail is a good thing – learning to fail right, that is. Systems and software can fail in various ways. Failures can be mechanical (e.g., wear and tear), or they can be due to bugs in the system. Amidst such failures, attackers will try to make systems crash to reveal potential vulnerabilities in its start up routine.

The job of security professionals and security minded developers is to engineer a solution that fails securely by determining what should happen if a component or components in a system were to fail. This concept, called “Fail
Secure,” is defined as failing in such a way as to cause no harm or minimal harm to the system and the data contained therein. In practicality, there are four areas commonly considered when evaluating whether a system fail secure:
communication channels, access control systems, encryption packages and memory.

Communication Channels

For a variety of reasons, both users and computers themselves create secure communication channels using an assortment of technologies such as Secure Shell (SSH), Secure Sockets Layer (SSL), and Transport Layer Security (TLS). These technologies use cryptographic methods for encrypting data transmitted between devices, wherein keys are used to encrypt and decrypt data in the channel. One of the most well known methods is known as Quantum Key Distribution (QKD).

Cryptography is mandatory, since a primary method that attackers use to hijack a  secure communications plan is to terminate the link repeatedly in order that new keys will be generated – a strategy that allows the attacker to sometimes
predict the next set of keys to be used. Insecure communication channels may transmit cryptographic elements at the start of communication. An attacker that wants to intercept communications could force the channel to be recreated in order to view the initial cryptographic strings.

Secure channels will use a public and private key for setting up the initial connection and then share cryptographic information on this secure channel so that a new connection can be established. Public and private keys work in tandem
and the devices communicating already have the private key being used to decrypt information so there is no chance for information to be disclosed once the connection is established.

Access Control Systems

Utilizing two steps, access control systems are designed to allow or restrict access to a resource. The first step is authentication, where the identity of the user is verified – through credentials such as user ID and password, biometric scan, or key card – and is presented to the system. The second step invites authorization, where the confirmed identity is then compared to a list of authorized users. At this point, the system either allows or denies the access request.

Access control systems should be configured to fail securely and in such a manner that, if they are not working properly, access to the resources is denied by default. Using an example of an electronic door lock, here are two possible
scenarios: In scenario one, the lock fails and anyone who tries to open the door gains entrance. In scenario two, when the lock fails, anyone attempting to gain entry is denied access. Clearly, the second case is more secure. However,
resources will be unavailable while the access control system remains down; and while this is generally an acceptable condition, the sensitivity of the resource being protected needs to be weighed against the need for resource accessibility.
Moreover, because a door is involved in this particular case, in the actual event of a system failure, scenario one is likely the preferred outcome, since failure could occur due to a fire or other disaster where people must be able to
exit immediately from the locked area. While the protection of assets is paramount, nothing exceeds the worth of human life.

Default configurations

Encryption protocols are designed to make data unreadable to any parties who are not part of the communication or who have not been granted access to the resource. In the event of a failure with the encryption protocol, systems should
be designed to cease operation until the failure is repaired. A faulty encryption package should not allow for any data to be read in either plain text or unencrypted format.

As computers become more powerful, past encryption methods are becoming less competent because they are easily broken. Although new encryption protocols may periodically be put into place, the original encryption protocols may exist in a default configuration. Additionally, default configurations could contain manufacturer-documented usernames and passwords that an attacker could easily obtain to compromise the system. For these reasons, systems should be configured to prevent attackers from reverting them to such default states.

Memory

Systems that detect a security violation should automatically restart or remain in a stopped state until an administrator can address the issues that have caused the crash. This is one reason why users sometimes see the dreaded Blue
Screen of Death (BSOD). While it is hard to believe, the BSOD is actually a feature, not a fault. The BSOD is intended to protect your computer from harm by terminating the operating system when it detects a potential security threat. Rather than allowing the operation to continue and possibly compromise the machine, it shuts down. In other words, it allows the system to fail right.

Testing

Testing should be implemented to ensure that system failures fail one way only: secure. Testing also needs to be conducted regularly. The testing process begins with an outline of the methods of failure to be employed, the tests that will be conducted on the failed system, and the desired results. Documentation of this testing outline should be contained in a formal testing document. Should test results identify a state where the system does not fail secure, steps should be taken to correct the situation. It is better to identify such system flaws during the testing phase and before an actual breach occurs. System failures can provide opportunities for attackers to gain access to sensitive data or key systems within organizations. Therefore, organizations need to be proactive in designing and implementing mechanisms that allow systems to fail secure, thus preventing harm to both data and systems. This article focused on four areas where fail secure is important: communication channels, access control systems, encryption packages, and memory; as well as provided  examples of how systems use such fail secure strategies. Once such systems are in place, testing procedures should be outlined and performed regularly to maintain security.
———————
Guest Post: Eric Vanderburg