At Digits, we strive to push the bounds of technology in order to deliver radically more useful, delightful software experiences for our customers. We’re excited to begin sharing a closer look at the technical foundations that underpin our products in a new series of blog posts called Building Digits. Without further ado…
Let’s talk about viewing complex data. One of our primary goals at Digits is to provide business owners with insightful and holistic views of their company’s finances, in substantially real-time.
Achieving this involves three major, independent steps:
- We collect all of their relevant data from various sources, such as their QuickBooks, the financial institutions they bank with, their corporate credit card providers, etc.
- We apply our algorithms and proprietary datasets to extend, interpret, and tease out meaning from all of their data.
- We consolidate and aggregate the results into a holistic view that we then visualize for them on their dashboard.
We refer to the pieces of data that we receive from third party systems as Facts. This is not a judgement of the credibility or immutability of these systems, but rather a delineation of what is (and what is not) under our control.
For example, if we receive a transaction from an external source that looks like
05/12/20 - Taxi $15.05, we might classify it as
Transportation. Later, we may receive another piece of information that leads us to believe that this transaction was actually a client expense, and is better classified as
Meals & Entertainment. In this example, the transaction itself, the fact, did not change—but our interpretation of it did.
We refer to insights and analysis that are performed by Digits, based on all of the Facts that we have received, as Computed Data. In the example above, this involved a category classification of a transaction. In other cases, this might involve determining that two external pieces of information actually represent the same physical transaction, or detecting that a particular transaction tends to recur on a regular interval and that it should be treated as a subscription.
Benefits of Recomputation
One of the challenges with this model is that we receive new data from external systems constantly, and each new Fact we receive may shed more light on, or change our understanding of, earlier Facts we already recorded. The arrival of new Facts can impact virtually every aspect of our Computed Data. As a result, determining the subset of Computed Data that needs to be updated as a result of any new Fact arriving is non-trivially complex.
We’ve chosen to avoid this complexity by “recomputing the world.” We reconsider our entire set of Computed Data every time the set of Facts that it is based on changes in any way. This guarantees that Digits uses every piece of knowledge it has access to in order to model your business’s financial health at every moment, so it’s as accurate as it can possibly be.
Consistent Views of the World
The last core tenet of our architecture is our notion of a View. To us, a view of a customer’s data is the combination of all of the Facts for that customer at time T, as well as all of the Computed Data derived from that fact set. If the set of customer facts changes at time T+1, we’ll create a new view that represents that updated set of Facts and Computed Data.
When a customer loads our dashboard, we retrieve the latest available version of their view, and their experience is based on that view version for the remainder of the customer session.
What is the motivation for keeping their experience tied to a single, static view version? There are several:
- Consistency. Assume that the arrival of a new fact causes us to re-label a subset of transactions as
Meals & Entertainmentinstead of
Transportation. It would be confusing to our customers if one or two of the transactions in this subset changed labels while others did not as they browsed the site, and even more confusing if additional transactions became recategorized incrementally as they loaded new pages.
- Atomicity. Assume that we receive an external update that replaces one Fact with another. For example, a correction of a pending transaction with the actual, confirmed transaction and the date it posted. If a customer was looking at a page of transactions that included the pending transaction, and then clicked Next and loaded a new page that now, all of a sudden, included the confirmed version of the transaction, they would be confused why the transaction seemed to appear twice.
View Serving Architecture
For these reasons, we decided that our architecture would entail a system that is capable of loading a holistic view V1, serving a customer dashboard based on it, and then, at a later point in time, loading a new holistic view V2 and atomically switching to serving the next customer experience based on it. At yet another later point, after a configurable number of hours has elapsed, we want to unload view V1 as it is no longer needed to serve any customer experiences.
Frequency of Recomputation
For a given customer, we recompute their view whenever we detect that any of their Facts or Computed Data have changed. In practice, this varies from customer to customer but can roughly be assumed to be between 1 and 24 times per day.
Why Google Cloud Spanner?
After a detailed analysis of the major cloud platforms in 2018, we made the decision to begin building Digits on Google Cloud, and we have been quite pleased with that decision.
With the rest of our infrastructure within the Google ecosystem, it was natural to evaluate Spanner as a potential database technology, and there are quite a few aspects of Spanner that are beneficial to our use case:
- Spanner is fully-managed and requires effectively zero overhead for database operations.
- Its ability to automatically shard data via table interleaving was an appealing feature for us, as it allows us to prepare for high scale and still get the benefits of relational database features: efficient joins, foreign keys, etc.
- Its ability to perform ACID transactions, as needed, was appealing to us from a financial data perspective.
There is a huge trade-off between pre-computing everything to reduce read latency (i.e. dashboard load time) while limiting development speed, versus biasing towards read-time computation, which permits rapid feature iteration on the frontend.
At Digits’ current stage, we want to be in the middle of this spectrum — have low read latencies for a great user experience, but still be able to quickly iterate and perfect new features based on customer feedback. In order to support this model, whichever solution we chose for serving our views would have to support read-time SQL.
Sharding and Interleaving
Once it became clear that read-time SQL support was a requirement, we also wanted to be sure that the database solution we selected would easily scale with us as we grew. Traditional RDBMS systems have trouble scaling join performance once the data set can no longer fit on a single node, and many NoSQL key-value stores address the scalability concern by sacrificing join support entirely.
Spanner’s interleaving/sharding design is a nice balance between these two ends of the spectrum. While all data for a given set of parent-child tables does not need to fit on a single node, rows that share the same root key are guaranteed to be co-located on a node. This allows for fast joins within a parent-child hierarchy and aside from defining the interleaving model in the schema, it happens without the developer’s involvement.
Combined, these constraints of easy scalability, low latency reads, and support for relational SQL eliminate quite a few otherwise appealing solutions. For example:
- Cassandra and Redis are both great for serving precomputed views, but do not support read time aggregations via SQL. (Cassandra does support a SQL API but not read-time aggregation via SQL)
- MySQL and Postgres are great for relational querying and read-time aggregation, but are challenging to scale, as sharding data across clusters is left to the engineer/operator.
- Google BigQuery is great for all kinds of SQL analytics querying but is not designed to serve low latency, customer-facing dashboard reads.
Based on all of these factors, we elected to implement our view architecture on Spanner.
Implementing Views in Spanner
As we developed our view implementation, one of the challenges that we had to overcome was Spanner’s 20,000 cell mutation limit. The limit caps the number of cells (rows * columns-per-row) that can be inserted/updated in a single transaction.
This limit presented challenges on both the view loading side and the view unloading (deleting) side.
On the view loading side it meant that as we computed views for a given customer, we could not guarantee that we could load the entire view into the database atomically. Additionally, it is non-trivial for the implementer to know whether a particular transaction would hit this limit or not as they would need to keep track of all of the cells impacted by the generated mutations or the DML statement.
To address this, we created a view version table that is independent of the tables that actually store the view data. This table is a simple mapping from a customer id to a
version identifier. This
version column is also set on all rows of the actual view data tables.
For example, a small subset of our view schema may look like this:
The queries that power our dashboard can either serve a view for a particular, known, version or consult the active versions table to see which version is the latest (our
version_ids increase monotonically).
Our view loading process, for a given customer, then looks like this:
- Load all parts of the view in independent transactions of roughly 100 rows each (conservative to stay well-clear of the mutation limit).
- Once all parts of the view have been successfully loaded, update the view version table to denote the newest active version.
This process ensures that a new view will be served atomically in its entirety, because no query will be aware of its existence until the view version table has been updated. At the same time, existing customer sessions can continue to experience our dashboard against the view version which was active at the start of their session.
The same 20,000 cell mutation limit applies to data deletion. Spanner does support a Partitioned DML alternative that was appealing for this use case, however we found two limitations with it:
- Every time we load a new view, it effectively invalidates a similarly sized older view, and this amounts to tens of thousands of rows in need of deletion. This has a significant impact on CPU load. Spanner tombstones rows that are marked as deleted, which makes them invisible to all queries, and then reclaims the disk space in an asynchronous process. However, in our experience both the tombstoning and the reclamation process place a non-trivial load on the CPU and can thus impact read latency of customer facing queries.
- The partitioned DML alternative that is documented to not be constrained by the 20,000 cell mutation limit still fails intermittently with the 20,000 cell mutation limit error.
Efficient Incremental Views
To address the deletion constraint, we analyzed the insertions and deletions that we were performing as part of view loading/unloading and confirmed what we expected: the vast majority of rows stay the same from one view version to the next. All we had to do was invest in being able to easily identify the rows that actually changed, and only inserting and removing the deltas.
(It is important to note that while determining which pieces of our Computed Data for a given customer need to be updated because we received new Facts is non-trivially complex, comparing two fully computed views to each other and determining which rows have been updated, removed, or created is quite straightforward.)
The output of this comparison can then be used as follows:
Each row in the current active version is determined to:
- Still be relevant
- Be removed for all view versions going forward
Each row in the newly computed version is determined to:
- Already be present in the active view version
- Be added for all view versions going forward
For most normal operations, 99% of rows in both the existing version and the new version are determined to be identical, and thus no-ops.
To support this, we modified our view tables to have two version-related columns,
version_invalid_since, and updated all queries with two
WHERE conditions. For example, continuing with the example schema above, our modified schema would look like this:
Finally, we implemented version diffing in a generic way, such that it can be applied to all of our view tables without additional work.
With this in place, we now have to insert and delete 99% less data from Spanner than we did when we were fully loading/unloading every view.
Schema Design for Scale
One unintuitive aspect of Spanner’s secondary indexes (indexes you explicitly add to tables as opposed to the index you implicitly get for the primary key), is that if a query which hits the index selects a column that is neither a part of the index nor explicitly stored on the index, Spanner must perform an implicit join from the secondary index back to the base table. This makes sense once you accept the fact that the index is stored as an independent structure from the table that it is indexing.
Unfortunately, this join may be non-trivial in cost, particularly when a lot of data is being selected.
To avoid secondary-index-to-base-table join costs, we have explored two options:
- Carefully designing our tables’ primary keys in such a way that most common queries only require the implicit primary key index.
- Creating secondary indexes that are less generic and more tailored to specific query patterns by including all of the columns selected by that query in their
The second option has a higher maintenance cost as it potentially requires updating indexes when new columns are added to tables (if these new columns are selected by queries which indexes are tailored for) or when query patterns are added for product reasons. As a result, we prefer the first approach whenever possible.
For example, if the majority of queries against a table will involve restricting the result set by time, then we consider adding the column that represents time to be part of the primary key, even if it logically is not required to be in the primary key.
Spanner’s interleaving support allows for schema design that makes your database straightforward to scale while still supporting efficient joins on data that is commonly joined together. These two properties are often very difficult to achieve in tandem with relational databases.
Interleaving lets you to signal to Spanner that all hierarchical data spanning multiple tables, rooted at a particular root row, should be colocated together. Building on our schema examples above, it might be appealing to interleave all of our view data under the
customers table. This would mean that all view data for a given customer would be colocated—a property that makes sense since we often want to show a customer various parts of their data, while we never want to join data from multiple customers together.
The schema for the
customers table as well as the view tables above may then look like this:
CREATE TABLE customers ( customer_id STRING(MAX) NOT NULL, name STRING(MAX) ) PRIMARY KEY (customer_id); CREATE TABLE active_versions ( customer_id STRING(MAX) NOT NULL, version INT64 NOT NULL ) PRIMARY KEY (customer_id), INTERLEAVE IN PARENT customers ON DELETE CASCADE; CREATE TABLE payments ( customer_id STRING(MAX) NOT NULL, version_valid_since INT64 NOT NULL, version_invalid_since INT64, payment_id INT64 NOT NULL, amount INT64 NOT NULL, ) PRIMARY KEY (customer_id, version_valid_since, payment_id), INTERLEAVE IN PARENT customers ON DELETE CASCADE; CREATE TABLE sales ( customer_id STRING(MAX) NOT NULL, version_valid_since INT64 NOT NULL, version_invalid_since INT64, sale_id INT64 NOT NULL, amount INT64 NOT NULL, ) PRIMARY KEY (customer_id, version_valid_since, sale_id), INTERLEAVE IN PARENT customers ON DELETE CASCADE;
This schema might work well, but there is another factor to keep in mind: the size of all data that is interleaved under a single root has a hard limit in Spanner of 4 GB. To avoid approaching this limit, we might further restrict the data that is co-located.
For example, if we know that there are pieces of data that will never be joined with each other, then there is no reason for them to be co-located on the same node. Building on our scenario above, imagine that payments and sales are shown in totally separate parts of our dashboard and would never need to be joined together. If that were the case, then interleaving and thus colocating the view data for both of these tables, for a given customer, under the same customer_id row would make that row unnecessarily large.
To address this, we could add a table in between
sales that would facilitate better sharding. The new table might look like:
CREATE TABLE customer_view_types ( customer_id STRING(MAX) NOT NULL, view_type STRING(MAX) ) PRIMARY KEY (customer_id, view_type) INTERLEAVE IN PARENT customers ON DELETE CASCADE;
sales tables would be updated to look like:
CREATE TABLE payments ( customer_id STRING(MAX) NOT NULL, view_type STRING(MAX), version_valid_since INT64 NOT NULL, version_invalid_since INT64, payment_id INT64 NOT NULL, amount INT64 NOT NULL, ) PRIMARY KEY (customer_id, view_type, version_valid_since, payment_id), INTERLEAVE IN PARENT customer_view_types ON DELETE CASCADE; CREATE TABLE sales ( customer_id STRING(MAX) NOT NULL, view_type STRING(MAX), version_valid_since INT64 NOT NULL, version_invalid_since INT64, sale_id INT64 NOT NULL, amount INT64 NOT NULL, ) PRIMARY KEY (customer_id, view_type, version_valid_since, sale_id), INTERLEAVE IN PARENT customer_view_types ON DELETE CASCADE;
While there were a few limitations to work around, and special care must be taken in both schema design and query design to maintain high performance, Spanner has performed well in production as our customer base has scaled to billions of dollars in transaction value.
Incremental static views provide an optimal balance between dashboard consistency, read-time latency, continual re-computation based on new data, and developer productivity, and Spanner’s ease of scalability via auto-sharding and interleaving has made this architecture very low-overhead to operate.
Join the Team
Static view serving at scale is just one of countless technical challenges we’ve faced while building Digits, and we’re pushing the boundaries at every layer of the stack.
If you’re interested in crafting the next generation of financial software, we’re hiring, and we’d love to meet you! See our open positions here.
As businesses around the country and around the globe continue to struggle during these unprecedented times, we’ve been working as quickly as possible to scale our technical infrastructure and our Early Access program to keep pace with demand and help those in need.
At Digits, our mission is to provide the real-time visibility, and powerful, actionable intelligence, that business owners require to chart their course in this dynamic environment, and we feel the obligation to do as much as we can to help.
This month, we’ve been focused on two major, parallel tracks: an intense effort to drive the scalability and accuracy of our core financial engine in order to accelerate how quickly we make it through our customer waitlist, and a return to green-field product development as we lay the groundwork for our next major launch, later this year.
This month’s Upgrade is a BIG one. However, much is still under wraps, so stay tuned for more!
- Ability to enable or disable integration sub-accounts.
- Improved automatic validation of transaction reconciliation.
- Enhanced category classification with additional ledger data.
- Automated regression detection for classification algorithms.
- Automated health indicators for ledger data synchronization.
- Major scalability improvements to our data processing pipelines.
- Major query performance improvements to support high transaction volume customers.
- Role-based permissioning and ACL implementation.
▋▋▋▋ ▋▋ ▋
- ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋never been done before ▋▋▋ ▋▋▋
- ▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋
- ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋game changer▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋powerful▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋
- ▋▋Patented ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋gorgeous ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋ ▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋ ▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋ ▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋ ▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋
- ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋▋▋▋▋
- ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋▋▋▋ ▋▋▋▋▋▋
After two years of heavy development, we’ve announced Digits for Expenses! We couldn’t be more humbled by the support and excitement we’ve seen from countless business owners and accountants across the country and around the globe, and we’re working tirelessly to get them into our Early Access program.
Since our launch, we’ve also been solidifying Digits’ underlying financial engine. Businesses come in all shapes and sizes, so we need to be absolutely certain that Digits supports them all.
This month, our biggest upgrades focused on new technology to help ensure that our understanding and your understanding of your business’ financials are the same.
Automated P&L and Balance Sheet Validation
Data quality is at the heart of our mission to provide real-time visibility and actionable insights into how your business is spending money, and we’re constantly iterating on approaches to guarantee both precision and accuracy, while keeping your data private and secure.
Over the past month, we’ve built fully-automated validation using two of the three most common financial reports. Here’s how it works:
We first generate a full Profit & Loss Statement and a Balance Sheet report using Digits’ internal, proprietary representation of your company’s financials. We then export the same P&L and Balance Sheet from your ledger’s API, for each month of your company’s history. Our validation pipeline then automatically compares them—line by line—guaranteeing that Digits’ representation of your data is accurate and that you can trust the numbers we display.
Automated Regression Detection
While validation is critical for all historical data, it’s meaningless for real-time transactions that have not yet hit your ledger. That’s where Digits’ online classification engine comes in.
We are constantly iterating and improving our algorithms to give you even better real-time insights into spend (with correct categorization and vendor identification), and it’s critical that those improvements don’t unintentionally regress other data points.
Over the past month, we’ve architected and implemented an automated classification regression monitor, which continuously watches a large and growing set of known-correct transactions from our sample data, and then automatically alerts on any unexpected changes.
This is just the start, and we continue to invest heavily in data validation, and data quality assurance.
There’s lots more we’ve been working on, though…
Digits for Expenses
- Improved time-series navigation with “Jump to today” feature.
- Improved support for yearly period views.
- Stabilized transaction deep-link URLs.
- Polished commenting UX.
- Enhanced activity feed with additional event types.
- New UI for identifying unknown/novel transactions.
- Automated P&L and Balance Sheet validation.
- Automated regression detection.
- Improved anomalous transaction detection.
- Improved vendor identification for journal entry line items.
- Productionized automatic vendor deduplication.
- Implemented depreciation and amortization exclusions for expense analysis.
- Improved auto-reconciliation of source transactions across larger time windows.
- Productionized new recurrence detection algorithm.
- Improved status monitoring of 3rd-party data providers.
- Tuned importer pipeline for performance and scale.
- Deployed API upgrades to improve imported data consistency.
- Improved handling of popular expense management software transactions.
We are obsessed with the vision that business finance should be immediately accessible and intuitive. It should learn, in real-time, as the business evolves, and it should empower business owners and operators everywhere, without requiring any prior financial training.
Why? Because today’s business climate requires you to take action in the moment and react rapidly to changing market conditions. That means not waiting weeks to receive a standard packet of black & white financials—which honestly can be quite difficult to interpret—and instead demanding the visibility and insight you need to make decisions, right now.
The Power of Digits
The world’s largest companies have sophisticated finance teams and internal forecasting models that give them these capabilities. What about everyone else? What if synthesized, actionable financial insights were available to every business on earth? What if they were always in real-time, and always up-to-date?
For two years, we’ve been building the technical infrastructure to make this a reality: N-dimensional transaction attribution. Auto-reconciliation. Predictive classification. Vendor identification and profile synthesis. Sub-second, full-ledger search. Statistical analysis and anomaly detection. Per-secret envelope encryption. The list goes on.
Introducing Digits, for Expenses
We’ve been humbled by the glowing feedback from our early customers and we’ve been inundated with requests for access. And, in these unprecedented times, we feel the obligation to help as many business owners as we can, as quickly as possible.
So today, we’re launching a broad early-access program for Digits for Expenses. We’ve taken all of the power of our platform and focused it specifically on helping business owners navigate today’s challenging market dynamics, so they can see and manage how their company is spending money:
Digits is a phenomenal and truly game-changing product. To be able to ask such a wide spectrum of financial questions and get to those answers immediately has been so empowering, and the team’s passion for the space is clear in all the small details.
Thinking back to a finance world before Digits… feels like remembering Netflix as a DVD-by-mail service.
– Kenny Mendes, Head of Finance, People, and Operations
Digits for Expenses is the first real-time, intuitively visual, machine-learned expense monitoring dashboard for small businesses, and in light of current conditions, we are making it Free.
(Seriously. We have other paid products on our roadmap, but Digits for Expenses is and will continue to be free for all small businesses. In this quarantine, it’s the least we can do.)
GV leads $22M Series B
As we’ve built Digits, we’ve been overwhelmed by the energy and excitement for our mission from founders, accountants, and investors alike, and we’ve been honored to have the financial backing of Benchmark and over 70 passionate angel investors.
We see this as a long-term, sustained effort, and Digits for Expenses is just the first chapter: we believe that innovations in technology, algorithms, and design have unlocked a new realm of possibility for financial software, and we are committed to making this vision a reality for businesses around the world.
In support of this, we’re excited to share that we have closed $22M in Series B funding, led by Jessica Verrilli at GV, and we’re thrilled to welcome her to our Board. Jessica’s deep experience in corporate development at Twitter and in early-stage investing through GV and #Angels have given her unique insight into the challenges small businesses face during their most-defining moments, and we’re looking forward to imbuing the product with her knowledge.
Jeff and Wayne are masterful at creating intuitive, high-utility products from complicated data. I saw this up close with Crashlytics and Twitter, and I’m thrilled to partner with them on Digits as they reimagine financial software for startups.
– Jessica Verrilli, General Partner
This round gives us the stability we need to become the partner that businesses can trust: at current burn, Digits’ runway now extends over 10 years.
$8+ Billion and counting…
When we announced our Series A in November, Digits’ production systems saw over $1.5 Billion in transaction value across our early customer base. Today, that number is already over $8 Billion, and growing daily.
Digits for Expenses is available today, for free, for US-based startups and small businesses. We plan to add support for international markets later this year.
Digits takes just a few clicks to set up, and sits on top of your existing ledger and your existing accountant’s work—you change nothing. Sign up here to get started.
(If you’re an investor, startup incubator, or accounting firm and want priority access for your portfolio companies, contact us at firstname.lastname@example.org.)
As founders, we would like to thank the entire Digits team for their tireless work over the past two years to bring us to this point. We still see this as just the beginning of our Digits journey, and we look forward to building powerful tools to help businesses of every shape and size chart their course.
We can’t wait for you to experience Digits, and we can’t wait to hear what you think.
To be sure, working remotely is not for every business, and not for every individual.
If your team needs to huddle together to prototype a physical product, that will be tricky. Or if you work for a biotech startup with special lab equipment, it’s likely infeasible to provision it for every home-office. Or if you’re a big-time extrovert who lives for lunchroom gossip and 5pm team socials, this work-style may not be your cup of tea.
But remarkably, we’re now at the point where most software/digital/service-based businesses, and most knowledge-based employees, are well-suited to going fully-remote, and many will be better off if they do! In fact, remote work has already jumped 159% in the past 12 years, but that’s just the tip of the iceberg.
Wayne and I founded Digits 2 years ago to create the next generation of delightful, powerful business finance software, and we had a choice: how should we structure the business? When we landed on building a fully-remote startup, even some of our closest supporters were skeptical.
Can a fully-distributed team be productive?
Won’t everyone just slack off all day?
How do you brainstorm and make product decisions?
But, but… whiteboarding??
We immediately faced these and countless other questions, but we were convinced that we could not build the scale of business we envisioned in San Francisco, and we had a strong distaste for the known challenges of distributed offices.
With the full benefit of hindsight, we could not be happier with our decision. Working as a fully-remote team has been a joy, and we have relentlessly iterated on the tools and techniques we’ve used to make it so.
Here’s just a few of the lessons we’ve learned over the past 2 years building a fully-remote company:
The Key Difference: No HQ
“Remote” isn’t new, and isn’t great. “Fully-remote” is a totally different concept.
Most peoples’ understanding of remote work has been a scattering of remote employees and a big HQ somewhere. As a result, the remote teams miss out on a lot of the ad hoc conversation/culture that develops in the colocated offices, are frequently left out of some meetings (sorry, who forgot to dial-in?), and inevitably begin to feel like 2nd-class citizens.
Trust me, I’ve been there.
A fully-remote team, by contrast, means no conversations are happening in an office somewhere to miss out on. At Digits, the pulse and culture of the company is pushed online for everyone to partake in: what would normally be hallway conversations now happen in some digital form. Chats are quickly upgraded to voice/video calls when written communication is not enough—and there is no stigma towards, or friction preventing, those who were passively following along in the chat room now asking to join live. Everyone is on the same level playing field.
This makes our fully-remote team feel like we aren’t remote at all. We’re all working right next to each other, at adjacent desks, despite being thousands of miles apart. Our remote interactions are a real and meaningful replacement for stopping by a coworker’s desk to chat something out or chasing them down in the hallway.
The moment any part of your team is physically together in an office, all of these critical distinctions begin to break down.
The Buddy System
We aim to run all projects at Digits in micro-teams of 2-3 people. Sometimes 4, but in practice we’ve found they tend to immediately split themselves into pairwise sub-teams.
The major benefit is you always have a buddy. From the moment we kick off the week, you know what your goals are and who you’ll be working with to achieve them, so you can dive right in together.
On the engineering side, this means you always have a designated code-review partner to keep things moving. It varies, but over the last 2 years many of our teams organically started to pair-program (yes, fully remotely) to further align and accelerate their work.
On the business and product sides, it’s the same story. We pair on strategy docs, product wireframes, marketing copy, blog post drafts—you get the idea. And pairing makes the work way more fun! You’re not sitting in your house toiling alone; you’re talking with at least one other colleague constantly, and you’re both working together to achieve a common goal.
It’s easy to think we’re wasting time having at least 2 people tackle every task, but the opposite is true: quality is much higher, each project moves at a faster pace, and in a distributed world, redundancy is critical to increasing the odds that someone knowledgeable about a thing is online and available.
The benefits have been dramatic: nobody is lost in their own world either slacking off or making unilateral (mediocre) decisions. Ideas are iterated and improved rapidly as they are discussed in small groups and then raised for broader awareness. And there is no unwanted overhead or design-by-committee: each micro-team is empowered with their own goals for the week, so they know the direction to head in and are trusted to seek input when and where they need it. If they haven’t, it will become very obvious at our next group check-in 🙂
The Death of 30-Minute Time Slots
To stay in sync and highly aligned as a remote team, we do (short) all-hands meetings every 48 hours. Apart from that, we aim to have no other scheduled meetings.
Of course, interviews do need to be booked, and external customer/partner meetings get scheduled, but the principle holds: work within one’s micro-team is fluid and synchronous—you’re constantly on and off ad hoc video calls or pairing sessions with your buddy, but work between teams is async: via Google docs to review, large PRs that need broader buy-in, blog drafts that need editing, etc.
It becomes the ultimate Maker’s Schedule, for a single reason:
Digits is fully remote, has no office, and owns no conference rooms. This fact is critical.
Without conference rooms, there are no scarce resources to book, which means meetings don’t need to be planned in advance, which means they don’t need to be 30 minutes or an hour long. Chats can happen when the necessary people are available (which in practice is typically within minutes, because teams are small and independent, and no one else is over-scheduled either). You can add people to meetings when they are needed, and they drop off if the topic moves on. There is none of the awkwardness, friction, or wasted time of traveling between, waiting outside of, entering, or leaving, physical conference rooms. It’s truly remarkable.
This has made our typical internal “meeting” last on the order of 5-7 minutes: you hop on, get your questions answered or share your perspective, and return to execution. With the elimination of Parkinson’s Law, the Maker’s Schedule is complete: meetings usually aren’t long enough to knock you out of flow.
This flips the typical work day on its head: rather than running between recurring meetings and trying to “get stuff done” in between, we’re all free to focus on executing until pinged by someone who needs input. And since those interrupts are typically quite short, they aren’t disruptive—you’re right back at it without forgetting where you left off.
All of this creates an interesting reality: we’ve felt that we’ve reliably had more face-to-face interactions with colleagues than we ever did in physical settings. Everyone is more available. There is less friction to chatting—you don’t need to walk across the building to catch someone at their desk. And because the interactions are shorter, they tend to be much more frequent: I’d much rather chat with my teammates for a few minutes every day than a half hour once a week in a standing meeting!
The Remote Work Toolkit
Productivity tools often border on religious choices for many people and organizations, so I hesitate to make specific recommendations.
Instead, it’s more important to focus on the “jobs to be done”—what use case is each tool meant to solve—and then standardize on an answer for each, so everyone on the team knows where to go.
Over the past 2 years, we’ve converged on the following core roles for our tools:
- Asynchronous & semi-synchronous lightweight chat (no important decisions, no expectation of reading scrollback)
- Decision recording & lightweight knowledge sharing
- Long-form strategy and documentation
- Synchronous all-hands video meetings
- Synchronous 1-1 (or small group) video/audio chats
- Pair programming
- Work-tracking/project management
We have preferred to find specific tools that really excel at each of these distinct use-cases (even if we only use a tiny fraction of the tool’s functionality), rather than consolidating on fewer tools that might be less ideal. In practice, we’ve not had much issue forgetting where something is because the use cases are sufficiently distinct and obvious.
We also constantly evaluate and explore new tools, as the pace of innovation in remote collaboration is currently exploding.
For those who insist on asking, our current toolchain at Digits is:
…but it can and will change as new options come on the scene 🙂
We do not use email for anything other than customer support, and I can hand-count the total number of internal emails we’ve sent since starting the company.
A New Set of Employee Benefits
As founders, Wayne and I care deeply about showing company personality through employee benefits. Above and beyond health insurance and 401ks, we try to embrace the remote-work lifestyle to ensure every team member’s day-to-day is as delightful as possible.
Wherever You Work Best
Not everyone “works remotely” the same way. Some people have a home office they have used for years, but that’s really not typical. Usually it’s a difficult choice—do I take over a spare bedroom or underutilized corner of my house or apartment, or do I find a co-working space nearby? Countless factors go into this decision and it would be unfair for us to motivate one over the other.
So we happily support both!
Every new Digits employee gets the choice: if you’d rather work from home, we give you a $2,000 budget to outfit your home office. A great chair. A new desk. Houseplants! A side-table with a coffee bar. Really anything you feel would make your day-to-day more enjoyable. Conversely, if you’d rather go the co-working route, awesome. Pick your favorite spot and we’ll cover your monthly membership fee for a desk.
The Need for Speed
Regardless of which direction you go, we have another perk up our sleeves. We’ve found that everyone works from home sometimes, for many reasons, even if they prefer a coworking environment. And ISP quality varies widely across the country.
Starting last year, we rolled out a new benefit: Digits pays for every employee’s home Internet service, and we immediately upgrade it to Gigabit (or the equivalent fastest available plan)!
Never have you seen a more butter-smooth video-chat experience from so many different homes across the country…
The New Power of Team Travel
Gone are the days of traveling between distributed offices for alignment sessions or making the monthly trek to HQ.
There’s no office at all, so there is nowhere you have to go just to make an appearance or get face-time! Instead, all internal business travel can be deeply intentional, and optimized for creating amazing shared experiences.
At our current stage, we’ve found the ideal cadence is quarterly. 3 times per year, we host our Digits Onsites: we rent a series of AirBnbs (or hotels, as we’ve scaled) and we bring the entire team together somewhere in the country for a jam-packed week of strategic planning, knowledge sharing, in-person work and collaboration, and fun team-building activities and celebratory dinners. During the 4th quarter, we throw our annual holiday party.
And what we’ve learned is that working remotely makes these in-person moments together even more memorable. Time is punctuated by Onsites. Product milestones are marked by the strategy we discussed at each one. Team members’ start dates are recalled by which Onsite they first joined.
There’s always the next one on the calendar to look forward to, and we go to great lengths to make each one special, in some way or another. With no corporate budget dedicated to facilities or inter-office travel (indeed, one month last year our cash burn was 91% payroll-related!), we instead shift those resources to Onsite logistics, with great effect.
The resulting work-life balance has been magical: we are home with our families, with no commute to speak of, day-in and day-out, with flexible schedules and an unlimited vacation policy, and then we’re all together each quarter, exploring someplace new and aligning on our next chapter.
The Future of Work
Without question, we will be forever adapting and refining our approaches to remote work with each new level of scale and degree of business complexity, but we foresee no structural reasons that would cause us to change this basic approach. Indeed, great companies such as GitLab, Invision, Buffer, and many others have pioneered this path with well-recognized success.
At Digits, we’re extremely energized by the growing interest in remote work and the explosion of new tools that are being built to facilitate it, and we’re excited to join the community in sharing and iterating best practices.
These past 2 years have honestly surpassed our wildest expectations: one of our teammates recently expressed that they don’t see themselves returning to an office environment for the rest of their career.
Everyone else nodded in agreement.
If you’re excited to join a passionate, fun-loving, fully-remote team that’s obsessed with building delightful business finance software, we’d love to meet you! See our open positions here.
Six months ago we shared a preview of our next adventure, and of our obsession with building modern, intuitive, intelligent, delightful financial software. And the response has left us overjoyed and even more focused.
We’ve heard from countless business owners, CFOs, and accountants who’ve all lost patience with the status quo and who share our hunger—and our vision—for a better solution. We’ve also been humbled by so many offers of support.
$10.5M Series A
Today, we’re thrilled to announce that we closed $10.5M in Series A funding from Benchmark, alongside 72 incredible angel investors.
We’re also excited to share that Peter Fenton has brought his wealth of board experience from AirTable, Twitter, NewRelic, Yelp, and many others to Digits.
$1.5B+ And Counting
We’d like to thank our early customers and partners that have all generously shared their time, knowledge, ideas, and feedback with us over the past year as we’ve built out our core platform technologies. Digits’ production systems now see over $1.5 Billion in transactions across our customer base, and that figure grows daily.
Digits is invite-only. Apply for access.
As builders, there is nothing more gratifying than crafting a product that is used by millions. That saves them countless hours of effort. That turns something complex and frustrating into something accessible, intuitive, even delightful.
As founders, there is nothing more fulfilling than assembling a team of brilliant, passionate, customer-obsessed, kind people who you love working with every day. Who you trust deeply. Who you care for as friends and family.
When we set off to create Crashlytics in 2011, we were struck both by the potential of the nascent mobile ecosystem and by the frustration of actually building for it. How could it be so hard to make an app that didn’t crash? How could the bugs be so tricky to track down and fix, once the app did? We felt there was an opportunity to apply consumer-grade design and engineering to an obscure developer issue; an opportunity to bring enterprise-grade tooling to everyone on earth who aspired to write an app.
We got lucky.
We found a dream-team of like-minded builders who we owe everything to.
And we discovered that we had struck a chord with mobile developers around the world, from established tech companies in Silicon Valley to passionate indie devs in homes, in coffee houses, and in garages in almost every country on Earth.
And the scale left us humbled—today, Crashlytics processes trillions of events, from billions of mobile devices, across millions of apps. Every single month. Today, Crashlytics runs on substantially every active smartphone on Earth.
But that journey wasn’t all smooth.
Building a business is an endless rollercoaster of emotions, of challenges, of long explorations, of setbacks, of celebrations. And we unwittingly found our next project along the way.
The Next Puzzle
As builders, there is nothing more exciting than cracking the next engineering puzzle; than perfecting the next design; than delivering the next capability to customers.
And there is nothing more mind-numbing than the paperwork, and spreadsheets, and financial reports, and inscrutable transaction records that are all required to actually operate the business.
Globally, most entrepreneurs today have no formal training in business finance. We certainly didn’t. Today, you start a company to solve a real problem for real people, or to offer a service you’re skilled at, or to provide a living for you and your family. You don’t start a company because you want to operate a business—but you have to anyway.
Software has unlocked vast capabilities in some areas—you no longer must be a dedicated filmmaker to create a movie, or a professional travel agent to book a plane ticket—but it has stumbled in others. You still must be a trained accountant to understand your company’s financials, and even then they require tedious, manual work to keep updated.
We’ve become obsessed with solving this, but in the right way. Not with bots that replace human accountants, but with software experiences that pair design and machine learning to democratize financial savvy. That empower people of all backgrounds and skillsets to visualize, understand, and manage their businesses, and elevate their interactions with their accountants, investors, and advisors.
We feel there is an opportunity to apply consumer-grade design and engineering to the arcane world of business accounting; an opportunity to bring enterprise-grade tooling to everyone on Earth who aspires to own or operate a business.
From Digits, With Love
We’ve been lucky to reassemble a bit of the core team we loved so much to go build this together, along with some incredible new additions.
From the team that brought you Crashlytics, we hope you’ll wish us luck on this Digits adventure.
We’ll have more to share soon,