Type-level programming in Scala

case study:

Tydal

case study - Tydal

A generic programming journey in Scala from type-classes to formal logic to singleton types to heterogeneous lists

Some time ago, inspired by a tiny book about Shapeless, I decided to start working on Tydal, a type-safe SQL DSL for Scala that looks like this:

object book extends TableSchema["book", (
  "isbn" :=: String,
  "title" :=: String,
  "authors" :=: List[String]
)]

val statement: ReadStatement[("isbn?" ~~> String, "author?" ~~> String), BookTitle, List] =
  Select
    .from(book as "b")
    .take1(_("b", "title") as "bt")
    .where(_("b", "isbn") === "isbn?")
    .orWhere(_("b", "authors") contains "author?")
    .sortBy(_("bt").ascending)
    .compile
    .to[BookTitle]
    .as[List]

val books: IO[List[BookTitle]] = statement
  .run((
    "isbn" ~~> "073521610X",
    "author" ~~> "Carlo"
  ))
  .transact(connectionPool)

A question one might have is: why would you even want something like a SQL DSL?
Why not good old plaintext SQL queries?

The answer is, of course, all about type-safety.
The purpose of Tydal, is to prevent (at compile time) mistakes such as referring to a non-existing column, or comparing two columns of unrelated types, or even decoding the results of a query into an incompatible structure, and more.

In the next chapters, I’ll walk you through some of the key concepts that made this possible and progressively build a minimal database table model also available on Scastie and Github. A Scala 3 version of this code is also available here.

Generic types

Tydal was not my first attempt at writing a SQL DSL.
I had modelled a database table schema before, and I had done so in different languages.
However, I had always approached the problem in a similar way:

class Column(name: String)
class Table(name: String, columns: List[Column])

object book extends Table(
  "book", 
  List(
    new Column("title"), 
    new Column("number_of_pages")
  )
)

If you’re familiar with Scala, the code above should be fairly straightforward.
This model might feel natural and intuitive, but there is a fundamental problem: there’s a lot of static information defined at run-time.
At compile-time, we know very little about a generic Table.
We know that it has a name, but we don’t know what that is.
We know that it has some columns, but we don’t know how many they are, or what their names are.

An alternative model could be:

class Column(name: String)
class Table[Columns](name: String, columns: Columns)

object book extends Table(
  "book",
  (
    new Column("title"),
    new Column("number_of_pages")
  )
)

The differences here, are that Table now takes a generic type parameter (Columns), and we’re representing the columns with a tuple rather than a list.
Suddenly, just by looking at the type, we know a little more about the specific book table, in particular we know exactly how many columns are there:

Table[Tuple2[Column, Column]]

This is an interesting start, but we’re still missing something.
Could we be even more precise with our types?
We can, of course, but we need to introduce a new concept: literal singleton types.

Singleton types

A singleton type is essentially a type inhabited by just one single value.
The literal "book", for instance, is obviously of type String, but it’s also - and more specifically - of type "book".

In Scala, singleton types are widely available since version 2.13, which makes the following code perfectly valid:

val theAnswer: "42" = "42"

With this in mind, we could come up with the following:

class Column[Name]()
class Table[Name, Columns]()

object book extends Table[
  "book",
  (
    Column["title"],
    Column["number_of_pages"]
  )
]

This time, we basically have no runtime specific information in our model.
Everything is perfectly static and well expressed in terms of types (note that the entire book table definition is within square brackets, including the table name, and the name of each field).

Type-classes

The question we haven’t answered yet is, why would such a “type-oriented model” be even desirable.
In fact, to some extent, it seems to have made things worse.
Nothing is stopping me from coding meaningless things such as:

object book extends Table[
  99,
  "this is not a list of columns, just a random string"
]

Also, we have lost all the runtime values.
We don’t have a runtime instance representing table name and columns, and even if we did, their types would be generic (Name and Columns), so we wouldn’t really know what to do with them.

Fortunately, the Scala compiler offers a mechanism to answer all this: type-classes.

trait DbIdentifier[T] {
  def value: String
}

object DbIdentifier {
  implicit object book extends DbIdentifier["book"] {
    override def value: String = "book"
  }

  implicit object title extends DbIdentifier["title"] {
    override def value: String = "title"
  }

   implicit object numberOfPages extends DbIdentifier["number_of_pages"] {
      override def value: String = "number_of_pages"
   }
}

What we have here, is an interface named DbIdentifier with a generic type parameter T.
We also have 3 instances of DbIdentifier defined for 3 different concrete types: "book", "title" and "number_of_pages".

What we can do, is to require an implicit parameter in our model to be automatically derived for the specific instance of the type parameter Name ("book" for instance):

class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name])

object book extends Table[
  "book",
  (
    Column["title"], 
    Column["number_of_pages"]
  )
]

println(book.name.value)  // prints book

This is an important achievement, as we managed to keep our type Name completely generic, and at the same time, we can indirectly use some of its properties through the automatically provided type-class instance.
Also, we’re effectively implementing a type constraint without implying inheritance, as Name is “unbounded”, but in practice it can only be either "book", "title" or "number_of_pages".

The Singleton value: ValueOf

Our first type-class, the DbIndentifier, simply enumerates a list of valid database names.
As it is, the model wouldn’t admit a column named "author", for instance:

val author = new Column["author"]

could not find implicit value for parameter name DbIdentifier["author"]

Could we have something more generic? After all, any string is a valid database name.

Thankfully, Scala provides a magic type-class that binds any singleton type T to its runtime value out of the box:

trait ValueOf[T] {
  def value: T
}

The fun part, is that we can use it to build new instances for our DbIdentifier type-class:

trait DbIdentifier[T] {
  def value: String
}

object DbIdentifier {
  implicit def anyString[A <: String](
    implicit
    singleton: ValueOf[A]
  ): DbIdentifier[A] = new DbIdentifier[A] {
    override def value: String = singleton.value
  }
}

Basically, given a type A (subtype of String), and the singleton value for A, then we can build a DbIdentifier for A.

Ambiguous implicit values

Thanks to singleton types, we can now build a DbIdentifier with its singleton runtime value.
However, not every String is a valid identifier.
Some reserved words might exist, you probably don’t want to call a table “table”, for instance.

How can we restrict (at compile time, of course) the range of possible values? How can we make the compiler failing when it is resolving an implicit value for DbIdentifier["table"]? There are essentially 2 ways to achieve this:

  1. not providing any type class instance for DbIdentifier["table"]
  2. providing multiple type class instances for DbIdentifier["table"]

The second solution above, will basically produce an ambiguous implicit values error, and will act as a sort of compile-time exception.

In our case, all we need to do is to add the following rule:

implicit val blacklistTable: DbIdentifier["table"] = new DbIdentifier["table"] { }

This might sound like a counter-intuitive hack, but basically when the compiler looks for a type class instance for DbIdentifier["table"] it will find more than 1: blacklistTable and anyString (as defined above), and therefore fail, which is what we wanted.

Compile-time error messages could be confusing, but they can be customised:

@annotation.implicitNotFound("${T} is not a valid identifier")
@annotation.implicitAmbiguous("${T} is a reserved word, and not a valid identifier")
trait DbIdentifier[T] {
  implicit def anyString[A <: String](
    implicit
    singleton: ValueOf[A]
  ): DbIdentifier[A] = new DbIdentifier[A] {
    override def value: String = singleton.value
  }

  implicit val blacklistTable: DbIdentifier["table"] = new DbIdentifier["table"] { }
}

implicitly[DbIdentifier["table"]]  // <- compile time exception: table is a reserved word, and not a valid identifier
implicitly[DbIdentifier[123]]      // <- compile time exception: 123 is not a valid identifier

Type-classes and logic programming

The ability of deriving new rules based on some facts (or other rules) is a very powerful concept.
Effectively, this is what logic programming is about, and it’s hardly news that the Scala compiler comes packed with a Prolog-ish interpreter.

As a canonical example, let’s consider the following family tree program:

trait IsFather[Dad, Child] { }

trait IsGrandfather[Grandfather, Grandchild] { }

object IsGrandfather {
  implicit def rule[X, Y, Z](
    implicit 
    grandFather: IsFather[X, Y],
    father: IsFather[Y, Z]
  ): IsGrandfather[X, Z] = new IsGrandfather[X, Z]
}

Here we are saying that, given some facts (X is Y’s father and Y is Z’s father) we can infer a new fact (X is Z’d grandfather).

As we eventually instantiate the type parameters, we invoke an implicit resolution that will either succeed or fail:

// facts
implicit object JimsFather extends IsFather["Joe", "Jim"]
implicit object BobsFather extends IsFather["Jim", "Bob"]

// the following expression compiles:
implicit val truthy: IsGrandfather["Joe", "Bob"] = implicitly

// the following expression does not compile:
implicit val falsy: IsGrandfather["Jim", "Bob"] = implicitly

Heterogeneous lists

So far, our model looks like this:

class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name])

We managed to bind the generic type Name to a DbIdentifier instance, accessible at runtime and automatically provided, but the Columns type is still completely generic, and therefore, not very useful.
One question we would like to answer, for example, is: what are the column names for this table?
Once again, type-classes to the rescue:

trait ColumnList[T] {
  def names: List[String]
}

object ColumnList {
  implicit def oneColumn[Name <: String](
    implicit
    name: ValueOf[Name]
  ): ColumnList[Column[Name]] = new ColumnList[Column[Name]] {
    override def names: List[String] = List(name.value)
  }

  implicit def twoColumns[Name1 <: String, Name2 <: String](
    implicit
    name1: ValueOf[Name],
    name2: ValueOf[Name]
  ): ColumnList[(Column[Name1], Column[Name2])] = new ColumnList[(Column[Name1], Column[Name2])] {
    override def names: List[String] = List(name1.value, name2.value)
  }
}

ColumnList is a type-class with 2 simple rules:

  1. Given a type Name and a singleton value for it, then we can build a ColumnList instance for Column[Name]
  2. Given two types Name1 and Name2 and their singleton values, then we can build a ColumnList instance for the tuple (Column[Name1], Column[Name2])

We can now plug this in our model:

class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns])

object book extends Table[
  "book",
  (
    Column["title"],
    Column["number_of_pages"]
  )
]

println(book.columns.names)  // prints List(title, number_of_pages)

This is a start, but what if we added a third column to the book table?

object book extends Table[
  "book",
  (
    Column["title"],
    Column["number_of_pages"],
    Column["author"]
  )
]

As you would expect, it does not compile, as we didn’t define a rule for a 3-columns tuple:

could not find implicit value for parameter columns ColumnList[(Column["title"], Column["number_of_pages"], Column["whatever"])]

The easiest solution to this problem could be to add a third rule to build a list made of 3 columns, but it’s obviously an unsustainable pattern.
Before we dig any further, there are a few observations to be made.

Firstly, why are we dealing with tuples?
Because tuples are very specific.
The size of a tuple is known at compile-time, and the specific type of each element is also known at compile time, whereas if we were using a simple List, the best we could do would be a List[Column[String with Singleton]] (if Column was covariant in its type parameter).

The issue with tuples though, is that a tuple with 2 elements (Tuple2) is a completely different type than a tuple with 3 elements (Tuple3).
As you probably know, in Scala 2 there are 22 different tuples, so a tuple with 23 elements cannot exist (22 is a completely arbitrary number, by the way).

In Scala 3, tuples have been redesigned to overcome this problem, and they essentially became a recursive data structure similar to a list, with a head and a tail.
In short, they became the shapeless HList:

sealed trait Tuple

// non empty tuple case:
case class `*:`[+H, +T <: Tuple](head: H, tail: T) extends Tuple {
  def `*:`[H2](h2: H2): `*:`[H2, `*:`[H, T]] = new `*:`(h2, this)
}

sealed trait EmptyTuple extends Tuple

case object EmptyTuple extends EmptyTuple {
  def `*:`[H](h2: H): `*:`[H, EmptyTuple] = new `*:`(h2, this)
}

val myHList: String *: String *: EmptyTuple = "a" *: "b" *: EmptyTuple

Recursive type-class

We discussed how new type-class instances can be derived given implicit parameters, and how this enables a logic-programming paradigm in Scala.
We also discussed how limited tuples are in Scala 2, and a better data structure to represent heterogeneous lists.

If we combine these two aspects, we can finally design a generic ColumnList type-class capable of dealing with tuples of an arbitrary size:

trait ColumnList[T] {
  def names: List[String]
}

object ColumnList {
  implicit object EmptyColumnList extends ColumnList[EmptyTuple] {
    def names: List[String] = Nil
  }
  
  implicit def nonEmptyColumnList[ColumnName <: String, Tail <: Tuple](
    implicit
    name: ValueOf[ColumnName],
    tail: ColumnList[Tail]
  ): ColumnList[Column[ColumnName] *: Tail] = new ColumnList[Column[ColumnName] *: Tail] {
    override def names: List[String] = name.value :: tail.names
  }
}

What we have here, is a recursive algorithm with the following two rules:

  1. Given an EmptyTuple then we have a ColumnList[EmptyTuple] (base case)
  2. Given the types ColumnName (subtype of String) and Tail (subtype of Tuple), and given a singleton value for ColumnName and a ColumnList instance for Tail (note the recursion here) then we have a ColumnList for Column[ColumnName] *: Tail

This is what our final model will look like:

class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns])

object book extends Table[
  "book",
  Column["title"] *:
    Column["number_of_pages"] *:
    Column["author"] *:
    EmptyTuple
]

println(book.columns.names)  // prints: List(title, number_of_pages, author)

A compile-time search algorithm

In the previous chapters we managed to design an alternative, type-oriented model for database tables.
The question we haven’t answered yet is: why, and what can we do with it that couldn’t be done otherwise?

As I mentioned on the introduction, one of the mistakes that Tydal tries to prevent is “referring to a non-existing column”.
One way of doing this, is to add a function to the generic Table model, that given a name, it returns a Column instance only if a column exists by that name.

Since we chose our new Tuple ADT to represent a collection of table columns, the first thing we need, is a type-class capable of extracting a single column by its name from a Tuple.
Being our specially designed Tuple a recursive data structure, we’ll need a recursive algorithm.
Moreover, we’ll have to express our algorithm following a logic programming style, by defining what it means to find an element in a tuple given some known facts.
We have 2 happy paths:

  1. We find the column in the head of the tuple (base case)
  2. Given that we already found the column in the tail of the tuple (the recursive derivation) then we can find the column in the whole tuple
trait ColumnFinder[Columns, ColumnName] {
  def get: Column[ColumnName]
}

object ColumnFinder {
  implicit def foundInHead[Name, Tail <: Tuple](
    implicit
    dbIdentifier: DbIdentifier[Name]
  ): ColumnFinder[Column[Name] *: Tail, Name] = new ColumnFinder[Column[Name] *: Tail, Name] {
    def get: Column[Name] = new Column[Name]
  }

  implicit def foundInTail[Name, Head, Tail <: Tuple](
    implicit
    finder: ColumnFinder[Tail, Name]
  ): ColumnFinder[Head *: Tail, Name] = new ColumnFinder[Head *: Tail, Name] {
    def get: Column[Name] = finder.get
  }
}

Here’s what it looks like when we plug it in our model:

class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns]) {
  def find[ColumnName](implicit finder: ColumnFinder[Columns, ColumnName]): Column[ColumnName] = {
    finder.get
  }
}

object book extends Table[
  "book",
  Column["title"] *:
    Column["number_of_pages"] *:
    Column["author"] *:
    EmptyTuple
]

book.find["number_of_pages"]  // compiles, returns a new Column["number_of_pages"] instance
book.find["whatever"]         // does not compile: implicit not found

May the compiler be with you

We’ve chosen to model columns with a single type parameter representing the column name.
Turns out that’s not enough. At the bare minimum, we also need to know what the column type is (String, Int, …).
Without a type, we wouldn’t know how to decode values returned by a SELECT query, for instance.
Also, in SQL, a DATE cannot be compared with a FLOAT, and we’d like to model this kind of constraints.
In short, we need a new type parameter for our Column model:

class Column[Name, T](implicit val name: DbIdentifier[Name])

Unfortunately, we’ll have to revisit some of our type-classes to take a ColumnType parameter into account.
Our ColumnFinder, for instance, will have to change slightly:

trait ColumnFinder[Columns, ColumnName, ColumnType] {
  def get: Column[ColumnName, ColumnType]
}

object ColumnFinder {
  implicit def foundInHead[Name, Type, Tail <: Tuple](
    implicit
    dbIdentifier: DbIdentifier[Name]
  ): ColumnFinder[Column[Name, Type] *: Tail, Name, Type] = new ColumnFinder[Column[Name, Type] *: Tail, Name, Type] {
    def get: Column[Name, Type] = new Column[Name, Type]
  }

  implicit def foundInTail[Name, Type, Head, Tail <: Tuple](
    implicit
    finder: ColumnFinder[Tail, Name, Type]
  ): ColumnFinder[Head *: Tail, Name, Type] = new ColumnFinder[Head *: Tail, Name, Type] {
    def get: Column[Name, Type] = finder.get
  }
}

As a consequence, our find method now takes 2 types parameters:

class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns]) {
  def find[ColumnName, ColumnType](implicit finder: ColumnFinder[Columns, ColumnName, ColumnType]): Column[ColumnName, ColumnType] = {
    finder.get
  }
}

object book extends Table[
  "book",
  Column["title", String] *:
    Column["number_of_pages", Int] *:
    Column["author", String] *:
    EmptyTuple
]

book.find["number_of_pages", Int]     // compiles, returns a new Column["number_of_pages", Int] instance
book.find["whatever", Int]            // does not compile: implicit not found
book.find["number_of_pages", Double]  // does not compile: implicit not found

This is not great, because the intention was to find a column just by its name, not a combination of name and type.
There are different approaches we could take (partially applied functions for instance), but the most elegant solution here is based on a different idea: taking an explicit value, and let the compiler work out what the types are:

class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns]) {
  def find[ColumnName <: Singleton, ColumnType](name: ColumnName)(
    implicit
    finder: ColumnFinder[Columns, ColumnName, ColumnType]
  ): Column[ColumnName, ColumnType] = {
    finder.get
  }
}

object book extends Table[
  "book",
  Column["title", String] *:
    Column["number_of_pages", Int] *:
    Column["author", String] *:
    EmptyTuple
]

book.find("number_of_pages")  // compiles, returns a new Column["number_of_pages", Int] instance
book.find("whatever")         // does not compile: implicit not found

Our find method is accepting a runtime value of type ColumnName, and it’s also requiring ColumnName to be a Singleton type, so the concrete ColumnName type can be inferred.
The ColumnType, will also be inferred by the compiler, since there’s only one possible concrete type for which an instance of ColumnFinder can be derived, which is, indeed, an impressive capability of the Scala compiler itself.

Conclusion

With this article, I intended to share part of my experience with the generic programming style, where, quoting wikipedia - “algorithms are written in terms of types to-be-specified-later” and “instantiated when needed for specific types provided as parameters”.

Of course, this is just the tip of an iceberg, and much more can be said and achieved.
Nevertheless, descending into the compiler universe has been a fascinating journey for a software engineer used to deal with a world made of APIs and databases, mutable and ruled by side effects.