A C++ Library for Oblivious Data Structures

GitHub Repo: https://github.com/noelle-crawfish/oblivious-data-structures

Motivation

Increase in cloud computing, but we don't trust the cloud.

Obviously, there is no motivation for cloud providers to offer consumers protection from themselves – even if they are uninterested in client data, the additional protections would result in huge slowdowns and cost increases to their services, making them less attractive to consumers.

Luckily, academics have shown that they are interested in working on technologies to secure the public cloud; secure computing on untrusted hardware is a thriving branch of security research. Emerging technologies such as TEEs (trusted execution environments), FHE (fully homomorphic encryption), and MPC (multi-party computation) enable users to write code which can be executed on an untrusted server to produce a result that is guaranteed to be secret and/or correct. While these technologies come at a high cost (TEEs) or high computational overhead (FHE, MPC), they have shown promise for smaller scale computations, and in certain specialized computations (i.e., model training) after task-specific optimization.

ORAM (oblivious random access memory) is a system level technique which aims to hide memory access patterns and data from an observer able to snoop on the memory device itself. ORAM can be used as a standalone protection or in conjunction with other technologies that protect the computation occurring on the accessed data. Unfortunately, state of the art ORAM schemes introduce high overhead – typically requiring a large number of standard memory reads and writes per oblvious memory operation, adding an overhead that for most developers is unacceptable.

This overhead can be removed in specific scenarios, but real-world implementations of such efficient, special-purpose ORAMs are not publicly or easily available. For this project, we aim to solve this with an implementation of multiple "oblivious data structures".

Background

Secure Program Execution

When considering how to define secure program execution, there are many facets to consider: compute, memory, network devices, etc. all have the ability to leak different, potentially sensitive information. Since it is difficult to consider all facets of secure execution at once, researchers segment security definitions to deal with the specific ways in which data could be leaked as individual security guarantees.

In order for the execution of a program to be \textit{memory-trace oblivious}, an adversary observing the memory accesses must not be able to discern any information about sensitive program inputs or outputs (which may or may not include the program). This is important, as through memory access patterns, an adversary may be able to use techniques such as frequency analysis, co-occurrence analysis, or read-write distinguishability to reveal information about program inputs based on dependent control flow.

ORAM

ORAM is a system and/or compiler level technique to translate programs into a memory-trace oblivious form. Almost all ORAM designs share the same core ideas for hiding data. When blocks are accessed, they are shuffled an remapped, so repeated accesses to the same data point will appear to be targeting a sequence of different, random memory locations. Data is also "over-fetched", meaning not all retrieved blocks contain useful data. This prevents an adversary from identifying which of several blocks is of use to the client.

ORAM is an active field of research, with many variants published, each optimized for slightly different use cases and implemented in slightly different ways. Some popular variants include path ORAM, circuit ORAM, and ring ORAM.

Our project uses a non-recursive path ORAM\cite{stefanov2018path} as the basis of our implementation. Figure \ref{fig:path-oram} shows an example of a path ORAM data structure with 4 levels (\(L = 4\)). The structure has \(2^L\) unique "paths" from root to leaf, with each path being identifiable by the index of the leaf node at which it terminates. Blocks of memory are placed into buckets located at nodes along the path their address is currently mapped to, with mappings stored either in a single lookup table, or a smaller, recursive set of path ORAMs (the size of the mapping table can be significant as the number of blocks becomes large). Non-leaf nodes may contain blocks from any path which intersects them.

When a block is read or written, \textit{all} blocks on the associated path are read from the path ORAM tree into the stash (a client side cache for ORAM blocks). The block data is then unencrypted, and the stored address tag is compared with the target address. The found block is then read/written as normal, and the address is remapped to a new, random leaf node. Once the operation is completed, blocks from the stash are written back if there is available bucket space on the path, starting from the leaf node and ending with the root bucket. The specific mapping and ordering of blocks will occasionally result in the stash not being fully empty after this step, and any remaining blocks will persist until the next write-back able to clear them.

Despite being well studied, all existing ORAMs share a few key problems. The first is the large mapping store. One motivation for the use of ORAM is a large amount of data that cannot fit on a small, trusted client's storage, so if follows that we do not have a large amount of storage to waste on a mapping table. The current technique for resolving the client-side storage overhead is a recursive path ORAM, which stores a small mapping table on the client used to lookup increasingly more granular mapping tables from the ORAM server. This recursive process introduces extra overhead in terms of communication bandwidth and computation. Unfortunately, current research has been unable to alleviate this problem in the general case. However, most programs do not require massive amounts of random access memory. This is because large data is typically organized by data structures designed for efficient access. Prior works\cite{wang2014oblivious} have shown that in this special case, the mapping lookup overhead can be almost \textit{entirely} eliminated on the client side.

Oblivious Data Structures

A small number of prior works specifically target oblivious data structures, which are data structures built on top of secure technologies such as MPC or ORAM. Our work is based on the pointer-based ORAM oblivious data structures described by \cite{wang2014oblivious} due to the simplicity and extensibility of its design. The core idea of the work is that data structures restrict what memory locations can be accessed at any point to a small number of addresses, and therefore only a small number of mappings must be known at any point. The author's suggest adding metadata to each oram block indicating the revealed accessible addresses and their leaf mappings. The ORAM client itself also stores a small number of pointers (i.e., for an oblivious stack, the top pointer and mapping). The paper also proposes a locality based strategy for implementing efficient oblivious graph algorithms, but we do not implement this in our project.

How it Works

Client Server Model

Our design is based on a client server ORAM model, where communication occurs using serialized command structures over a TCP stream. The client initiates path reads or path write-backs, and then sends or receives a fixed number of encrypted ORAM blocks based on the path ORAM parameters. For example, a tree with \(L = 4\) levels and \(Z = 8\) blocks per node bucket would read/write \(4 \cdot 8 = 32\) blocks. The buckets are initialized to contain encrypted "empty" blocks, which are replaced during execution with real blocks as the data structure is populated.

Data Structures

Oblivious Stack

Figure \ref{fig:stack} shows an oblivious stack data structure. Like in traditional path ORAM, each block has an associated address mapped to a certain leaf index. Unlike traditional path ORAM, the client ORAM only has access to a single one of these mappings: it stores the top pointer and the associated leaf index.

When pushing to the stack, the new top block is assigned an address based on an internal counter variable (i.e., the \(i^{th}\) block has address \(i\)). This counter can also be used to return the current size of the stack. The new block's address is randomly mapped to a leaf index. Alongside the block data, and address, the block also stores a pointer to the previous stack top (the address and leaf index) as metadata. These allow backtracking through blocks as they are removed from the stack.

When popping from the stack, the client side metadata is used to lookup the top block from server as in standard non-recursive path ORAM. Then, the client metadata is replaced with the fetched blocks metadata, which is a pointer to the block just below it on the stack. This new top block can then be popped in an identical fashion.

Oblivious Queue

Figure \ref{fig:queue} shows an oblivious queue data structure. The queue client stores two ORAM pointers: one to the queue head and the other to the queue tail. These can be used to push and pop values to/from the queue similarly to the stack client. However, the queue requires a small modification to the push methodology.

When removing blocks from the queue, we would then like to point the head to the ORAM block that was pushed \textit{after} the block that was just removed. This means we need to include metadata in the block when we write it pointing to another block which \textit{does not yet exits}. We solve this problem by pre-generating an address and leaf index for the next block, which we store on the client side. Then, when the following push request is received from the user, the client is able to create a new block at the existing ORAM pointer, which will then be retrievable after popping the previous block. This "ghost block" technique is shown in figure \ref{fig:queue}.

Oblivious AVL Tree (Map/Set)

Figure \ref{fig:avl} shows an oblivious AVL tree, which is the basis of our oblivious map and oblivious set data structures. This follows from the C++ implementation of \code{std::map} and \code{std::set} which use the AVL tree for guaranteed fast look-ups.

We store the root node's address and leaf index in the AVL tree-based clients. Each ORAM block is a node in the tree, and contains as metadata pointers to its child blocks. When looking up an item in the map or set, we always start at the root and search downwards, at each step using the appropriate (left vs. right) child pointer found in the current block based on the comparison function provided.

When inserting an item, we traverse the tree similarly and then perform any re-balancing operations as we would in a traditional AVL tree. We introduce some optimizations during our implementation phase which help reduce the number of client-server interactions necessary for these operations, which require multiple block accesses with high temporal locality.

Implementation

We implement our library in \(\approx1500\) lines of C++.

The implementation contains two base classes: \code{ORAMClient} and \code{ORAMServer}. These classes communicate via a bidirectional TCP stream. Communication uses a serialized \code{Cmd} structure containing an opcode, an (optional) encrypted ORAM block, and an (optional) leaf index. The \code{Cmd} has one of three opcodes:

\begin{itemize} \item \code{GET\_BLOCKS}: Used by the client to initiate a read of blocks from a certain path of the ORAM. When sending this opcode the \code{Cmd} must include a valid \code{leaf\_idx}. \item \code{DUMP\_STASH}: Used by the client to initiate a writeback of blocks to a certain path of the ORAM. When sending this opcode the \code{Cmd} must include a valid \code{leaf\_idx}. \item \code{BLOCKS}: Used once another command has been initiated to send a single ORAM block between the client and server. \end{itemize}

Within our client code, a \code{GET\_BLOCKS} sequence is always followed by a \code{DUMP\_STASH} operation which allows us to keep the number of blocks on a path constant despite the fixed number of blocks transmitted during each operation. Encrypted empty blocks are used to pad the path buckets when no "real" blocks with a suitable leaf index exist in the stash.

Future work could reduce communication overhead by having different \code{Cmd} structures based on which fields are required by the opcode to reduce the amount of null data transferred.

Each of the oblivious data structures described in this paper are implemented as classes which extend \code{ORAMClient}. They expose public functions comparable to the C++ implementations of the same data structures. All TCP communication and intermediate read/write optimizations occur in the background, hidden from a user.

The encryption and decryption of block data uses an AES implementation from the OpenSSL\cite{openssl} cryptography library. We operate on the entire ORAM block structure (including metadata), and send the fully encrypted blob to the server.

Optimizations

We introduce two optimizations during our implementation phase. The first is delayed writes. For stack/queue push operations, and AVL tree insertion, we know that the block at the address we'd like to write to does not yet exist on the server. Therefore, there is no need for us to communicate with the server. Instead, we create a new block locally and place it into the stash. It will then eventually be propagated to the server when an operation necessitating a write-back does occur. To ensure this technique does not result in an overfilled stash, we introduce a threshold value. Once there are \code{THRESHOLD} number of blocks in the stash, the client triggers a \code{GET\_BLOCKS/DUMP\_STASH} command pairing to a randomly chosen leaf index, effectively draining the stash to prevent overflow.

The second optimization is the introduction of an \code{in\_use} flag for blocks in the stash. When an ORAM block is retrieved for the first time from the server or accessed from the stash the flag is set to indicate that it should not be evicted from the stash during a write-back. The function using the block is then responsible for un-setting the flag once the block is no longer needed. The usage of this flag is entirely transparent to the user, and does not introduce any programming overhead.

Security Analysis

Our implementation is based on prior works\cite{wang2014oblivious, stefanov2018path} which provide an in-depth security analysis of the pointer based method. We briefly discuss the security of our practical implementation decisions here.

Block data and metadata is encrypted using the AES symmetric cipher, which is considered to be secure. We include a nonce in the blocks that is incremented before encryption to prevent the server from knowing which blocks on a path were/were not read/written since their last tenancy on the server. We also initialize the \texttt{data} field empty blocks to random bytes to prevent them from being identifiable by the server when encrypted.

The server also has minimal knowledge about the usage of the data structure. To begin, we do not explicitly inform the server which of the implemented data structures our client is. Any access to the server is done through the base \texttt{ORAMClient} class and provides only an opcode, an encrypted block, and a leaf index. The server cannot tell if an access is a read or write since all blocks will appear changed when written back due to the nonce.

The delayed write mechanism in conjunction with random stash flushes helps hide operations that could potentially reveal the identity of the data structure or the usage (reads vs. writes). Consider the case in which a client continuously pushes to a stack, and then continuously pops from the stack. In this case, the leaf indices seen by the server would appear symmetric, allowing the server to guess the access pattern. By caching writes on the client and then randomly choosing a path to fill to, we make access patterns appear more random and prevent such information leakage.

There is some information leakage in terms of read/write timing and the overall number of accesses, but this has not been solved by previous ORAM works, and therefore we do not attempt to address it.

Limitations

Due to the short time-frame, our library has many limitations. First, the exposed APIs eliminate the possibility of random-access to data, essentially "locking" the information into the data structures. That is, data can only be accessed \textit{if} it is in a position referenceable under correct usage of the data structure. This prevents us from providing certain utility functions included in the C++ equivalent implementations, such as the ability to peek at data in the middle of a stack. To solve this, our ORAM client/server interface could be extended to provide recursive lookup functionality when necessary based on the Path ORAM\cite{stefanov2018path} design.

Our implementation is also designed for data structures which store large amounts of data per-node of the data structure in use. This is because we treat each ORAM block on the path as a single data structure node (i.e., stack item, AVL tree node), meaning that if the data stored per entry is small the per-byte read/write costs cannot be amortized well. One possible solution to this is structure packing: a single ORAM block could contain multiple entries of a structure. This segmented design is more difficult to implement, but would likely be much more practical for most use cases, and would allow much more efficient amortized accesses.

Finally, AES is quite slow, and likely not the best encryption algorithm to use for hiding block data. We use it in our implementation because it is widely available in standard cryptography libraries, but to make this implementation practical, it should likely be exchanged for a faster encryption mechanism. We could also likely eliminate the setup cost by using decryption failures to mark initially empty blocks instead of explicitly encrypted null-address blocks.

Other than solving the limitations listed above, future work on this project could extend our library to include more data structures.