Fix It Fast – Best Practices for Application Errors and Crashes

Application errors and crashes are a part of life. Instead of avoiding them, learn how to handle them in a meaningful way. 

Everyone loves an application that runs perfectly smoothly and never crashes. The users are happy, the clients are happy, you can high-five your colleagues and head home. Unfortunately, this isn’t always the case. Crashes and application errors are simply a part of life, and we have to know how to deal with them. 

Instead of striving for a perfect world, our goal as developers should be to make the best effort to effectively diagnose and quickly fix issues when they arise. 

At Infinum, we’ve worked on hundreds of mobile apps, and fixing crashes is our daily activity. In this article, we will present our learnings and best practices for crash reporting and error handling.

Handling application errors

Errors are a fact of life in software.

Swift has a powerful error-handling mechanism, which can be very helpful for building robust and resilient apps. But, as with all language features, we need to use it correctly.

Generally speaking, there are two types of application errors: expected and unexpected.

Expected application errors

Expected errors are usually handled as part of the regular application flow. Let’s illustrate this with two examples.

Example 1

The user uploads a document to the backend, but the backend needs some time to process it.

In this case, the application might implement API polling. The API can return a 404 error to indicate that the resource is not yet available. The application can retry an API call until it gets a 200 response code.

Example 2

The user attempts to download a file in bad network conditions. The application might implement an automatic retry mechanism to make it more resilient to failures.

Typed errors

Errors are generally propagated and rendered, but rarely handled exhaustively, and are prone to changing over time in a way that types are not.

Whether your APIs are using Combine, Result type, or recently accepted typed throws, you might be inclined to define something like:

	enum MyAppError {
    case network
    case invalidInput
    // ...
}

This would expose explicit error types in your APIs. However, this tactic is generally not advisable as it will limit your ability to compose. You’ll likely end up constantly adding error cases to the MyAppError enumeration.

However, when we approach this wisely, in some cases, we can get elegant APIs. For example:

	enum ResourceError: Error {
    case notAvailable
    case error(Error)
}

func checkIfResourceAvailable() -> AnyPublisher<Void, ResourceError> {
    fatalError()
}

func waitUntilAvailable() -> some Publisher<Void, Error> {
    checkIfResourceAvailable()
        .tryCatch { error in
            switch error {
            case .notAvailable: waitUntilAvailable()
            case .error(let error): throw error
            }
        }
        .eraseToAnyPublisher()
}

It is important to use this approach on very localized use cases that don’t propagate typed error information upstream. Upstream code tends to handle errors generically.

Unexpected application errors

An error has occurred.

Unexpected errors are the ones that applications cannot handle in a meaningful way. Usually, the application has no better way of dealing with these errors than by displaying them to the user.

Here are some recommendations for handling unexpected errors:

Make illegal states unrepresentable

One of our primary objectives as programmers is to translate business rules into code representation. While direct translation is often impractical, striving for close alignment between the code and business rules is important. This ensures that the code reflects the intended logic and avoids states that are not allowed by business rules.

This principle equally applies to error handling and enables a consistent user experience.

Errors usually occur in response to some user action. More generally, this interaction can be visualized in the following diagram:

The diagram shows three mutually exclusive states of our program:

1

Data loading is in progress.

2

Data is successfully obtained.

3

There was an error during data loading.

To provide a consistent experience, it is useful to represent this idea in code explicitly. In SwiftUI, we can define something like:

	enum Loadable<Value> {
    case loading
    case failure(error: Error)
    case success(Value)
}

struct LoadableView<T, U: View>: View {
    let element: Loadable<T>
    @ViewBuilder let onSuccess: (T) -> U
    let onRetry: (() -> Void)?

    var body: some View {
        switch element {
        case .loading: LoadingView()
        case .failure(let error): ErrorView(error: error, onRetry: onRetry)
        case .success(let value): onSuccess(value)
        }
    }
}

This allows us to handle loading, error, and success states generically and consistently throughout the app.

ErrorView can look something like this:

	struct ErrorView: View {
    let error: Error
    let onRetry: (() -> Void)
    
    var body: some View {
        VStack {
            Image(.error)
            Text("An error has occured")
            Text(error.localizedDescription)
            Button("Retry") { onRetry() }
        }
    }
}

This way, users will always see localized descriptions of the error, and they will be able to retry the operation. We also have the flexibility to expose other relevant information such as error code or error user info.

This pattern can either be applied to the whole screen or partially, to some of its elements.

Error mappings

To ensure this pattern works well, all your errors should conform to LocalizedError and provide a meaningful message to the user.

Unfortunately, in many cases, we will get an error from some API in either a third-party SDK or an iOS SDK. If it’s important to provide a human-readable message in those cases, you will need to handle those errors explicitly and provide your error message.

One of those cases can be a state of no internet connection. The code that handles it might look like this:

	struct NoInternetError: Error, LocalizedError {
    var errorDescription: String? { "Please check your internet connection" }
}

extension Error {
    var asNoInternet: NoInternetError? {
        switch (self as NSError).code {
        case NSURLErrorNotConnectedToInternet, NSURLErrorDataNotAllowed:
            NoInternetError()
        default: 
            nil
        }
    }
}

API errors

If the app uses backend APIs, this will most likely be the main source of errors. Try to define an error body contract with the backend team. This will transfer the ownership of error messages to the originator. The backend has more context required to return a good error message to the user.

	struct ErrorResponse: Decodable, Equatable, Sendable {
    let code: String
    let title: String
    let detail: String
}

struct APIError: LocalizedError {
    let code: Int
    let response: ErrorResponse?

    var errorDescription: String? { response?.title }
}

This can be optionally parsed from the response of URLSession and thrown as a more detailed error:

	func fetch(request: URLRequest) async throws {
    // ...
    guard response.statusCode >= 200 && response.statusCode < 300 else {
        let response = try? JSONDecoder().decode(ErrorResponse.self, from: data)
        throw APIError(code: response.statusCode, response: response)
    }
}
Be careful in API design

We still focus so much on our experience of the use of the construct.

We all prefer convenient and easy-to-use APIs. But we need to be aware of what goes on in the background.

For example, it might be tempting to expose Keychain API as a property wrapper. That would allow us to do something like this:

	class User {
    @KeychainValue("token") private var token: String?
}

This is very expressive code. In just one line, it is clear that we are dealing with a keychain item that automatically gets serialized and deserialized to the keychain.

Unfortunately, with such an API design, we fundamentally limit our program to handling keychain errors. The reason is that the underlying keychain API returns the status of the operation being performed. By exposing such an API as String?, we effectively map all errors to optional values.

The same advice can be applied to choosing libraries. Understand what you get by integrating a library.

Debugging errors in production

We can summarize this with a few simple rules:

1

Always preserve the original error. If you need to transform it, wrap it inside of your custom error.

2

Avoid using try? unless the operation is truly optional. This is rarely the case.

3

Avoid returning optional when an error should be thrown.

4

Conform your errors to LocalizedError.

By following these rules, we also get some positive side effects:

1

Users will always see localized descriptions.

2

We will preserve the original errors with all the details (such as HTTP status code or framework error).

Error details

Users are generally interested in human-readable descriptions of errors. But for programmers, it is critical to have as many details as possible about the error.

We can expose these details in our ErrorView by providing an additional Details button. If you can’t do that in production, it might be a good idea to enable detailed error handling at least in staging builds. This can make finding issues easier and shorten the feedback loop between the person reproducing the error and you.

Non-fatals

Firebase non-fatals are a great way to get more detailed information about errors that occur in production. This can usually provide critical information for debugging user-reported issues. To utilize and analyze them effectively, we need fine-grained error grouping.

As the Crashlytics guide states:

Unlike fatal crashes, which are grouped via stack trace analysis, logged errors are grouped by domain and code.

This means that, to group Swift errors properly, we need to conform them to CustomNSError. For example, our APIError can conform to CustomNSError in the following way:

	extension APIError: CustomNSError {
    var errorCode: Int { code }
    static var errorDomain: String { "APIError" }
}

This will group all API errors with the same status code.

The additional benefit of conforming to CustomNSError is that we can provide additional details that will show up in Firebase:

	var errorUserInfo: [String: Any] {
    [
        "title": response?.title as Any,
        "detail": response?.detail as Any,
    ]
}

What to do with application crashes

Fail fast

An application crash is usually the worst outcome our app users can experience, and we should do everything we can to prevent it. However, avoiding crashes is not something to be done at any cost. In some cases, we want to “fail fast” and use those crashes to find issues more effectively.

Generally speaking, when we can’t handle some condition in a meaningful way, we should expose it as an error to the user.

However, we are also limited by the programming language and the ecosystem we operate in. Sometimes we need more expressivity in the language to represent some concept.

Let’s look at the following example:

	class NamePicker {
    private let names = ["John", "Josh", "Lucy", "Angela"]
    private var selectedName: String

    init() {
        selectedName = names.first!
    }
}

In general, force unwrapping is a pattern that we would like to avoid. But in this case, making init throwing wouldn’t convey the right message and would pollute the code. It would require all upstream functions to be annotated with throws, which would obscure the true value of genuine errors.

Here is another example:

	let status = CVDisplayLinkCreateWithActiveCGDisplays(&link)
guard let link, status == kCVReturnSuccess else {
    fatalError("Could not create display link. Return status: \(status)")
}

In this case, we believe that all preconditions for creating CVDisplayLink are fulfilled when this code is invoked. Therefore, instead of propagating optional CVDisplayLink through the codebase, we declare that this operation will crash the program, which allows us to catch holes in our assumptions quickly.

The decision whether to crash, throw an error or return optional value needs to be made on a case-by-case basis. Here are some questions you should ask yourself to make this decision more easily:

  • How likely is failure?
  • Does ignoring the failure lead to undefined behavior and even worse consequences, such as data loss?
  • Have you ensured preconditions to be true in other parts of the code?
  • Is there a reasonable default value you can provide in case of failure?
  • Would showing an error message to a user be better than crashing?
  • Is application security at risk in any given area of the code?

If you want to explore this topic further, here are some good resources:

Additional crash logs

Looking at the stack trace can very often lead us to a clear root cause of the crash and the conditions under which it happens. Unfortunately, this is not the case for all crashes, and any associated information might prove useful for investigating, understanding, reproducing, and fixing the crash.

We can abstract this idea into the following protocol:

	public protocol CrashInfoLogger: Sendable {
    func log(_ message: String)
}

And we can implement Firebase crash info logging:

	final class FirebaseCrashLogger: CrashLogger {
    func log(_ message: String) {
        Logger.app.log("\(message)")
        Crashlytics.crashlytics().log(message)
    }
}

And finally, we can provide a convenient extension for it:

	public extension Logger {
    static let crashInfo: CrashLogger = FirebaseCrashLogger()
}

This way, we can log some key events during the program’s execution. If the app crashes, these key events will be associated with the crash report in Firebase.

Follow the trends

Whether it is crashes or errors, some problems might never be reproducible in your local setup. For these kinds of issues, we often need to experiment with the fix.

In that case, we need some way of confirming if the fix helped or not. Firebase can provide valuable information about that. The following graph is an example of an issue that was fixed and confirmed by production data:

Firebase also provides a lot of valuable associated data. For example, this crash is only happening on iOS 15:

This is key information that can help us reproduce and fix the issue.

App Store Connect can also provide valuable information about your crash rate and how you compare to other apps in your category. It is important to monitor this trend and ensure it does not grow.

Legal considerations

Beyond technical concerns, there are legal and regulatory aspects to consider with crash and error handling. For example, this includes data protection laws such as the General Data Protection Regulation (GDPR). 

Furthermore, it’s worth noting that some companies have strict policies prohibiting the transmission of any data to third parties, regardless of the circumstances. These policies may stem from privacy concerns, contractual obligations, or industry regulations.

Logging errors or even crashes to third-party systems can have a legal impact on the product you are building, so always check with your legal department to see what you are allowed to do.

Be proactive about application errors and crashes

Even if we follow all the best practices and invest significant resources into this area, errors and crashes are almost impossible to avoid. Therefore, it is important to be upfront about the situation with your clients and have open discussions about it.

It is more professional to inform your client that a new application version has introduced a crash than to have them come to you with a complaining user report.

Clients are generally not interested in individual crashes. Instead, you can set up a cadence where you check the status and inform your client in a public channel. This builds trust and understanding and allows you to handle issues professionally.

Reap the benefits of quality error handling

Robust error handling requires intention and care, but it is fundamentally not difficult.

Following the best practices creates substantial benefits for your organization, including more effective debugging and issue resolution, a deeper understanding of user behavior, stronger client relationships, and more reliable applications. 

We hope this guide has equipped you with the knowledge and tools to enhance your error and crash-handling strategies. Take action today to implement these practices and reap the benefits for your organization’s success.