Dan Harrison

Read this first

Compile Times and Code Graphs

Cross-posted on the Materialize Blog.

At Materialize, Rust compile times are a frequent complaint. On one hand, I’m forever anchored by the Scala compile times from my days at Foursquare; a clean build without cache hits took over an hour. On the other, Go at Cockroach Labs was great. Rust is in between, but much closer to Go than to Scala.

So far, I’ve mostly insulated myself from this here by carving out an isolated corner where unit tests catch almost all the bugs and so iteration is fast. But recently, I’ve been pitching in on some cross-cutting projects, felt the pain that everyone else is feeling, and so was motived to improve them a bit. Here’s how I did it.

First, a note that there are lots of other ways to improve compile times1, but today we’re going to talk about dependency graphs in code.

In general, the following will be talking about the smallest compilation unit that d...

Continue reading →


Freebase Meets Materialize 3 - First Impressions

Previous posts talked about what I’m hoping to do and some background on the Freebase data. Today, we (finally) take Materialize out for a spin.

  • Part 1: Introduction
  • Part 2: The Data
  • Part 3: First Impressions (you’re here)

First, a quick note: one of my motivations for doing this is to get a feel for Materialize as a user, so I’m going to take my developer hat off and put my user hat on. I’ve only been here a couple weeks and the first things I’ve been working on have to do more with internals than UX, so I’m hoping this will mostly work.

Spoilers from the future: it turns out to have worked pretty well! The following is all my real, unabridged first interactions with Materialize’s docs and Materialize itself. I end up finding some papercuts as well as some great touch-points where we could have helped conceptually with someone transitioning from traditional databases to streaming...

Continue reading →


Freebase Meets Materialize 2: The Data

Last post, I introduced the idea of using Materialize to implement fast reads of highly normalized Freebase data for an API endpoint. Today, we start by downloading the data and doing a bit of preprocessing.

  • Part 1: Introduction
  • Part 2: The Data (you’re here)
  • Part 3: First Impressions

1.9 Billion Triples

The final public copy of the Freebase data can be downloaded at https://developers.google.com/freebase/. It’s a 22 GB gzip (250 GB uncompressed) of N-Triples, which is a text-based data format with a spec and everything. Each line is a <subject, predicate, object> triple and according to this page, there are 1.9 billion of them.

In the interest of fast iteration, I’d like to start with something that comfortably fits in memory. Before we can trim down the data, we have to look at how it’s structured.

Structure of Freebase Data

This is all better explained by the since-removed...

Continue reading →


Freebase Meets Materialize 1: Introduction

I recently started working at Materialize. Friday here is called “Skunkworks Friday” and is reserved for personal/professional development, moonshot projects, and other things that don’t get priority as part of the normal product+engineering planning cycle. I’ve decided to use my first few to prototype using Materialize as a generalized replacement for some hand-rolled infrastructure microservices that we had at a previous company.

  • Part 1: Introduction (you’re here)
  • Part 2: The Data
  • Part 3: First Impressions

Background

For several years, I worked at Foursquare, back when they were mostly a consumer tech company. I was on the monetization team, but most people worked on the user facing app and website. Foursquare, like most apps at the time, kept data in a database but encapsulated this in a REST API. This API is what the mobile apps and the website talked to.

As was (and is) best...

Continue reading →


Simplenote

“What note-taking app do you use? Do you like it? I currently use Evernote and kind of hate it.” — co-worker on our internal slack

“Twitter productivity gurus and tinkerers. What is the best cross-platform, light-weight note-taking app these days?” — Noah Weiss

Simplenote

Allow me to introduce you to my favorite piece of software, Simplenote. As the name suggests, it’s for notes and it’s intentionally simple. The notes are plain text, shareable, and they sync instantly and seamlessly. This means that I never need to think about where I typed a note, it’s available and editable on my computer, my phone, my partner’s phone, the web. There are a very small number of features built on top of this, but they’re carefully chosen. Omitted features, like inline images, may sound limiting but there’s a reason people seem to always be looking for an Evernote replacement.

I’ve used Simplenote for nearly...

Continue reading →


Easy Bread

My take on making Jim Lahey’s No-Knead Bread even simpler

The final product

I love making bread. It does, however, lead to more bread than I can eat, which leads to me giving away loaves of bread at every opportunity. Occasionally this leads to someone asking me how I make bread.

These days I follow the recipes in Ken Forkish’s book Flour Water Salt Yeast as closely as possible. It’s an excellent book and I’ve had much better results with his recipes than any other source I’ve tried.

For about year when I was getting started, I used Jim Lahey’s “No-Knead Bread” recipe. It strikes the perfect balance of easy, beginner friendly, and tasty. There is a delightful tradeoff in bread between fast and easy; you can make good bread in 5 hours with a lot of work or you can make great bread in 18-24 hours with almost no work. Mr. Lahey’s recipe swings all the way toward easy, eliminating the kneading entirely...

Continue reading →


Implementing Backup

Originally published at www.cockroachlabs.com on August 9, 2017.

Almost all widely used database systems include the ability to backup and restore a snapshot of their data. The replicated nature of CockroachDB’s distributed architecture means that the cluster survives the loss of disks or nodes, and yet many users still want to make regular backups. This led us to develop distributed backup and restore, the first feature available in our CockroachDB Enterprise offering.

When we set out to work on this feature, the first thing we did was figure out why customers wanted it. The reasons we discovered included a general sense of security, “Oops I dropped a table”, finding a bug in new code only when it’s deployed, legally required data archiving, and the “extract” phase of an ETL pipeline. So as it turns out, even in a system that was built to never lose your data, backup is still a...

Continue reading →


Implementing Column Families in CockroachDB

Originally published at www.cockroachlabs.com on September 29, 2016.

CockroachDB is a scalable SQL database built on top of a transactional key value store. We don’t (yet) expose the kv layer but it’s general purpose enough that we’ve used it to implement SQL without any special trickery.
The particulars of how we represent data in a SQL table as well as the table metadata are internally called the “format version”. Our first format version was deliberately simple, causing some performance inefficiencies. We recently improved performance with a technique called column families, which pack multiple columns in one kv entry.

Once implemented, column families produced dramatic improvements in our benchmarks. A table with more columns benefits more from this optimization, so we added a benchmark of INSERTs, UPDATEs, and DELETEs against a table with 20 INT columns and it ran 5 times faster.

...

Continue reading →