July 2, 2019
While a lot of work in theoretical computer science has gone into optimizing the runtime and space usage of data structures, such work very often neglects a very important component of modern computers: the cache. In doing so, very often, data structures are developed that achieve theoretically-good runtimes but are slow in practice due to a large number of cache misses. In 1999, Frigo et al. introduced the notion of a cache-oblivious algorithm: an algorithm that uses the cac...
July 25, 2013
Let $\mathcal{D}$ be a collection of $D$ documents, which are strings over an alphabet of size $\sigma$, of total length $n$. We describe a data structure that uses linear space and and reports $k$ most relevant documents that contain a query pattern $P$, which is a string of length $p$, in time $O(p/\log_\sigma n+k)$, which is optimal in the RAM model in the general case where $\lg D = \Theta(\log n)$, and involves a novel RAM-optimal suffix tree search. Our construction sup...
April 20, 2020
A new array based data structure named black-white array (BWA) is introduced as an effective and efficient alternative to the list or tree based data structures for dynamic data set. It consists of two sub-arrays, one white and one black of half of the size of the white. Both of them are conceptually partitioned into segments of different ranks with the sizes grow in geometric sequence. The layout of BWA allows easy calculation of the meta-data about the segments, which are u...
April 19, 2017
We consider the problem of implementing a space-efficient dynamic trie, with an emphasis on good practical performance. For a trie with $n$ nodes with an alphabet of size $\sigma$, the information-theoretic lower bound is $n \log \sigma + O(n)$ bits. The Bonsai data structure is a compact trie proposed by Darragh et al. (Softw., Pract. Exper. 23(3), 1993, p. 277-291). Its disadvantages include the user having to specify an upper bound $M$ on the trie size in advance (which ca...
September 22, 2023
Augmented B-trees (aB-trees) are a broad class of data structures. The seminal work "succincter" by Patrascu showed that any aB-tree can be stored using only two bits of redundancy, while supporting queries to the tree in time proportional to its depth. It has been a versatile building block for constructing succinct data structures, including rank/select data structures, dictionaries, locally decodable arithmetic coding, storing balanced parenthesis, etc. In this paper, we...
June 4, 2023
A dictionary data structure maintains a set of at most $n$ keys from the universe $[U]$ under key insertions and deletions, such that given a query $x \in [U]$, it returns if $x$ is in the set. Some variants also store values associated to the keys such that given a query $x$, the value associated to $x$ is returned when $x$ is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occ...
July 10, 2015
Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger ...
January 5, 2022
In this paper, we revisit the question of how the dynamic optimality of search trees should be defined in external memory. A defining characteristic of external-memory data structures is that there is a stark asymmetry between queries and inserts/updates/deletes: by making the former slightly asymptotically slower, one can make the latter significantly asymptotically faster (even allowing for operations with sub-constant amortized I/Os). This asymmetry makes it so that rotati...
September 11, 2022
Subset sum is a very old and fundamental problem in theoretical computer science. In this problem, $n$ items with weights $w_1, w_2, w_3, \ldots, w_n$ are given as input and the goal is to find out if there is a subset of them whose weights sum up to a given value $t$. While the problem is NP-hard in general, when the values are non-negative integer, subset sum can be solved in pseudo-polynomial time $~\widetilde O(n+t)$. In this work, we consider the dynamic variant of sub...
March 20, 2015
In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 198...