Some time ago, inspired by a tiny book about Shapeless, I decided to start working on Tydal, a type-safe SQL DSL for Scala that looks like this:
object book extends TableSchema["book", (
"isbn" :=: String,
"title" :=: String,
"authors" :=: List[String]
)]
val statement: ReadStatement[("isbn?" ~~> String, "author?" ~~> String), BookTitle, List] =
Select
.from(book as "b")
.take1(_("b", "title") as "bt")
.where(_("b", "isbn") === "isbn?")
.orWhere(_("b", "authors") contains "author?")
.sortBy(_("bt").ascending)
.compile
.to[BookTitle]
.as[List]
val books: IO[List[BookTitle]] = statement
.run((
"isbn" ~~> "073521610X",
"author" ~~> "Carlo"
))
.transact(connectionPool)
A question one might have is: why would you even want something like a SQL DSL?
Why not good old plaintext SQL queries?
The answer is, of course, all about type-safety.
The purpose of Tydal, is to prevent (at compile time) mistakes such as referring to a non-existing column,
or comparing two columns of unrelated types,
or even decoding the results of a query into an incompatible structure, and more.
In the next chapters, I’ll walk you through some of the key concepts that made this possible and progressively build a minimal database table model also available on Scastie and Github. A Scala 3 version of this code is also available here.
Generic types
Tydal was not my first attempt at writing a SQL DSL.
I had modelled a database table schema before, and I had done so in different languages.
However, I had always approached the problem in a similar way:
class Column(name: String)
class Table(name: String, columns: List[Column])
object book extends Table(
"book",
List(
new Column("title"),
new Column("number_of_pages")
)
)
If you’re familiar with Scala, the code above should be fairly straightforward.
This model might feel natural and intuitive, but there is a fundamental problem:
there’s a lot of static information defined at run-time.
At compile-time, we know very little about a generic Table
.
We know that it has a name, but we don’t know what that is.
We know that it has some columns, but we don’t know how many they are, or what their names are.
An alternative model could be:
class Column(name: String)
class Table[Columns](name: String, columns: Columns)
object book extends Table(
"book",
(
new Column("title"),
new Column("number_of_pages")
)
)
The differences here, are that Table
now takes a generic type parameter (Columns
),
and we’re representing the columns with a tuple rather than a list.
Suddenly, just by looking at the type, we know a little more about the specific book
table,
in particular we know exactly how many columns are there:
Table[Tuple2[Column, Column]]
This is an interesting start, but we’re still missing something.
Could we be even more precise with our types?
We can, of course, but we need to introduce a new concept: literal singleton types.
Singleton types
A singleton type is essentially a type inhabited by just one single value.
The literal "book"
, for instance, is obviously of type String
, but it’s also -
and more specifically - of type "book"
.
In Scala, singleton types are widely available since version 2.13, which makes the following code perfectly valid:
val theAnswer: "42" = "42"
With this in mind, we could come up with the following:
class Column[Name]()
class Table[Name, Columns]()
object book extends Table[
"book",
(
Column["title"],
Column["number_of_pages"]
)
]
This time, we basically have no runtime specific information in our model.
Everything is perfectly static and well expressed in terms of types
(note that the entire book table definition is within square brackets,
including the table name, and the name of each field).
Type-classes
The question we haven’t answered yet is, why would such a “type-oriented model” be even desirable.
In fact, to some extent, it seems to have made things worse.
Nothing is stopping me from coding meaningless things such as:
object book extends Table[
99,
"this is not a list of columns, just a random string"
]
Also, we have lost all the runtime values.
We don’t have a runtime instance representing table name and columns,
and even if we did, their types would be generic (Name
and Columns
),
so we wouldn’t really know what to do with them.
Fortunately, the Scala compiler offers a mechanism to answer all this: type-classes.
trait DbIdentifier[T] {
def value: String
}
object DbIdentifier {
implicit object book extends DbIdentifier["book"] {
override def value: String = "book"
}
implicit object title extends DbIdentifier["title"] {
override def value: String = "title"
}
implicit object numberOfPages extends DbIdentifier["number_of_pages"] {
override def value: String = "number_of_pages"
}
}
What we have here, is an interface named DbIdentifier
with a generic type parameter T
.
We also have 3 instances of DbIdentifier
defined for 3 different concrete types:
"book"
, "title"
and "number_of_pages"
.
What we can do, is to require an implicit parameter in our model
to be automatically derived for the specific instance of the type parameter Name
("book"
for instance):
class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name])
object book extends Table[
"book",
(
Column["title"],
Column["number_of_pages"]
)
]
println(book.name.value) // prints book
This is an important achievement, as we managed to keep our type Name
completely generic,
and at the same time, we can indirectly use some of its properties
through the automatically provided type-class instance.
Also, we’re effectively implementing a type constraint without implying inheritance,
as Name
is “unbounded”, but in practice it can only be either "book"
, "title"
or "number_of_pages"
.
The Singleton value: ValueOf
Our first type-class, the DbIndentifier
, simply enumerates a list of valid database names.
As it is, the model wouldn’t admit a column named "author"
, for instance:
val author = new Column["author"]
could not find implicit value for parameter name
DbIdentifier["author"]
Could we have something more generic? After all, any string is a valid database name.
Thankfully, Scala provides a magic type-class that binds any singleton type T
to its runtime value out of the box:
trait ValueOf[T] {
def value: T
}
The fun part, is that we can use it to build new instances for our DbIdentifier
type-class:
trait DbIdentifier[T] {
def value: String
}
object DbIdentifier {
implicit def anyString[A <: String](
implicit
singleton: ValueOf[A]
): DbIdentifier[A] = new DbIdentifier[A] {
override def value: String = singleton.value
}
}
Basically, given a type A
(subtype of String
), and the singleton value for A
,
then we can build a DbIdentifier
for A
.
Ambiguous implicit values
Thanks to singleton types, we can now build a DbIdentifier
with its singleton runtime value.
However, not every String
is a valid identifier.
Some reserved words might exist, you probably don’t want to call a table “table”, for instance.
How can we restrict (at compile time, of course) the range of possible values?
How can we make the compiler failing when it is resolving an implicit value for DbIdentifier["table"]
?
There are essentially 2 ways to achieve this:
- not providing any type class instance for
DbIdentifier["table"]
- providing multiple type class instances for
DbIdentifier["table"]
The second solution above, will basically produce an ambiguous implicit values error, and will act as a sort of compile-time exception.
In our case, all we need to do is to add the following rule:
implicit val blacklistTable: DbIdentifier["table"] = new DbIdentifier["table"] { }
This might sound like a counter-intuitive hack,
but basically when the compiler looks for a type class instance for DbIdentifier["table"]
it will find more than 1: blacklistTable
and anyString
(as defined above),
and therefore fail, which is what we wanted.
Compile-time error messages could be confusing, but they can be customised:
@annotation.implicitNotFound("${T} is not a valid identifier")
@annotation.implicitAmbiguous("${T} is a reserved word, and not a valid identifier")
trait DbIdentifier[T] {
implicit def anyString[A <: String](
implicit
singleton: ValueOf[A]
): DbIdentifier[A] = new DbIdentifier[A] {
override def value: String = singleton.value
}
implicit val blacklistTable: DbIdentifier["table"] = new DbIdentifier["table"] { }
}
implicitly[DbIdentifier["table"]] // <- compile time exception: table is a reserved word, and not a valid identifier
implicitly[DbIdentifier[123]] // <- compile time exception: 123 is not a valid identifier
Type-classes and logic programming
The ability of deriving new rules based on some facts (or other rules) is a very powerful concept.
Effectively, this is what logic programming is about,
and it’s hardly news that the Scala compiler comes packed with a Prolog-ish interpreter.
As a canonical example, let’s consider the following family tree program:
trait IsFather[Dad, Child] { }
trait IsGrandfather[Grandfather, Grandchild] { }
object IsGrandfather {
implicit def rule[X, Y, Z](
implicit
grandFather: IsFather[X, Y],
father: IsFather[Y, Z]
): IsGrandfather[X, Z] = new IsGrandfather[X, Z]
}
Here we are saying that, given some facts (X is Y’s father and Y is Z’s father) we can infer a new fact (X is Z’d grandfather).
As we eventually instantiate the type parameters, we invoke an implicit resolution that will either succeed or fail:
// facts
implicit object JimsFather extends IsFather["Joe", "Jim"]
implicit object BobsFather extends IsFather["Jim", "Bob"]
// the following expression compiles:
implicit val truthy: IsGrandfather["Joe", "Bob"] = implicitly
// the following expression does not compile:
implicit val falsy: IsGrandfather["Jim", "Bob"] = implicitly
Heterogeneous lists
So far, our model looks like this:
class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name])
We managed to bind the generic type Name
to a DbIdentifier
instance,
accessible at runtime and automatically provided,
but the Columns
type is still completely generic, and therefore, not very useful.
One question we would like to answer, for example, is: what are the column names for this table?
Once again, type-classes to the rescue:
trait ColumnList[T] {
def names: List[String]
}
object ColumnList {
implicit def oneColumn[Name <: String](
implicit
name: ValueOf[Name]
): ColumnList[Column[Name]] = new ColumnList[Column[Name]] {
override def names: List[String] = List(name.value)
}
implicit def twoColumns[Name1 <: String, Name2 <: String](
implicit
name1: ValueOf[Name],
name2: ValueOf[Name]
): ColumnList[(Column[Name1], Column[Name2])] = new ColumnList[(Column[Name1], Column[Name2])] {
override def names: List[String] = List(name1.value, name2.value)
}
}
ColumnList
is a type-class with 2 simple rules:
- Given a type
Name
and a singleton value for it, then we can build aColumnList
instance forColumn[Name]
- Given two types
Name1
andName2
and their singleton values, then we can build aColumnList
instance for the tuple(Column[Name1], Column[Name2])
We can now plug this in our model:
class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns])
object book extends Table[
"book",
(
Column["title"],
Column["number_of_pages"]
)
]
println(book.columns.names) // prints List(title, number_of_pages)
This is a start, but what if we added a third column to the book table?
object book extends Table[
"book",
(
Column["title"],
Column["number_of_pages"],
Column["author"]
)
]
As you would expect, it does not compile, as we didn’t define a rule for a 3-columns tuple:
could not find implicit value for parameter columns
ColumnList[(Column["title"], Column["number_of_pages"], Column["whatever"])]
The easiest solution to this problem could be to add a third rule to build a list made of 3 columns,
but it’s obviously an unsustainable pattern.
Before we dig any further, there are a few observations to be made.
Firstly, why are we dealing with tuples?
Because tuples are very specific.
The size of a tuple is known at compile-time, and the specific type of each element is also known at compile time,
whereas if we were using a simple List
, the best we could do would be a List[Column[String with Singleton]]
(if Column
was covariant in its type parameter).
The issue with tuples though, is that a tuple with 2 elements (Tuple2
)
is a completely different type than a tuple with 3 elements (Tuple3
).
As you probably know, in Scala 2 there are 22 different tuples, so a tuple with 23 elements cannot exist
(22 is a completely arbitrary number, by the way).
In Scala 3, tuples have been redesigned to overcome this problem,
and they essentially became a recursive data structure similar to a list, with a head
and a tail
.
In short, they became the shapeless HList
:
sealed trait Tuple
// non empty tuple case:
case class `*:`[+H, +T <: Tuple](head: H, tail: T) extends Tuple {
def `*:`[H2](h2: H2): `*:`[H2, `*:`[H, T]] = new `*:`(h2, this)
}
sealed trait EmptyTuple extends Tuple
case object EmptyTuple extends EmptyTuple {
def `*:`[H](h2: H): `*:`[H, EmptyTuple] = new `*:`(h2, this)
}
val myHList: String *: String *: EmptyTuple = "a" *: "b" *: EmptyTuple
Recursive type-class
We discussed how new type-class instances can be derived given implicit parameters,
and how this enables a logic-programming paradigm in Scala.
We also discussed how limited tuples are in Scala 2,
and a better data structure to represent heterogeneous lists.
If we combine these two aspects, we can finally design a generic ColumnList
type-class
capable of dealing with tuples of an arbitrary size:
trait ColumnList[T] {
def names: List[String]
}
object ColumnList {
implicit object EmptyColumnList extends ColumnList[EmptyTuple] {
def names: List[String] = Nil
}
implicit def nonEmptyColumnList[ColumnName <: String, Tail <: Tuple](
implicit
name: ValueOf[ColumnName],
tail: ColumnList[Tail]
): ColumnList[Column[ColumnName] *: Tail] = new ColumnList[Column[ColumnName] *: Tail] {
override def names: List[String] = name.value :: tail.names
}
}
What we have here, is a recursive algorithm with the following two rules:
- Given an
EmptyTuple
then we have aColumnList[EmptyTuple]
(base case) - Given the types
ColumnName
(subtype ofString
) andTail
(subtype ofTuple
), and given a singleton value forColumnName
and aColumnList
instance forTail
(note the recursion here) then we have aColumnList
forColumn[ColumnName] *: Tail
This is what our final model will look like:
class Column[Name](implicit val name: DbIdentifier[Name])
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns])
object book extends Table[
"book",
Column["title"] *:
Column["number_of_pages"] *:
Column["author"] *:
EmptyTuple
]
println(book.columns.names) // prints: List(title, number_of_pages, author)
A compile-time search algorithm
In the previous chapters we managed to design an alternative, type-oriented model for database tables.
The question we haven’t answered yet is: why, and what can we do with it that couldn’t be done otherwise?
As I mentioned on the introduction, one of the mistakes that Tydal tries to prevent is
“referring to a non-existing column”.
One way of doing this, is to add a function to the generic Table
model,
that given a name, it returns a Column
instance only if a column exists by that name.
Since we chose our new Tuple
ADT to represent a collection of table columns,
the first thing we need, is a type-class capable of extracting a single column by its name from a Tuple
.
Being our specially designed Tuple
a recursive data structure, we’ll need a recursive algorithm.
Moreover, we’ll have to express our algorithm following a logic programming style,
by defining what it means to find an element in a tuple given some known facts.
We have 2 happy paths:
- We find the column in the head of the tuple (base case)
- Given that we already found the column in the tail of the tuple (the recursive derivation) then we can find the column in the whole tuple
trait ColumnFinder[Columns, ColumnName] {
def get: Column[ColumnName]
}
object ColumnFinder {
implicit def foundInHead[Name, Tail <: Tuple](
implicit
dbIdentifier: DbIdentifier[Name]
): ColumnFinder[Column[Name] *: Tail, Name] = new ColumnFinder[Column[Name] *: Tail, Name] {
def get: Column[Name] = new Column[Name]
}
implicit def foundInTail[Name, Head, Tail <: Tuple](
implicit
finder: ColumnFinder[Tail, Name]
): ColumnFinder[Head *: Tail, Name] = new ColumnFinder[Head *: Tail, Name] {
def get: Column[Name] = finder.get
}
}
Here’s what it looks like when we plug it in our model:
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns]) {
def find[ColumnName](implicit finder: ColumnFinder[Columns, ColumnName]): Column[ColumnName] = {
finder.get
}
}
object book extends Table[
"book",
Column["title"] *:
Column["number_of_pages"] *:
Column["author"] *:
EmptyTuple
]
book.find["number_of_pages"] // compiles, returns a new Column["number_of_pages"] instance
book.find["whatever"] // does not compile: implicit not found
May the compiler be with you
We’ve chosen to model columns with a single type parameter representing the column name.
Turns out that’s not enough. At the bare minimum, we also need to know what the column type is (String
, Int
, …).
Without a type, we wouldn’t know how to decode values returned by a SELECT query, for instance.
Also, in SQL, a DATE cannot be compared with a FLOAT, and we’d like to model this kind of constraints.
In short, we need a new type parameter for our Column
model:
class Column[Name, T](implicit val name: DbIdentifier[Name])
Unfortunately, we’ll have to revisit some of our type-classes to take a ColumnType
parameter into account.
Our ColumnFinder
, for instance, will have to change slightly:
trait ColumnFinder[Columns, ColumnName, ColumnType] {
def get: Column[ColumnName, ColumnType]
}
object ColumnFinder {
implicit def foundInHead[Name, Type, Tail <: Tuple](
implicit
dbIdentifier: DbIdentifier[Name]
): ColumnFinder[Column[Name, Type] *: Tail, Name, Type] = new ColumnFinder[Column[Name, Type] *: Tail, Name, Type] {
def get: Column[Name, Type] = new Column[Name, Type]
}
implicit def foundInTail[Name, Type, Head, Tail <: Tuple](
implicit
finder: ColumnFinder[Tail, Name, Type]
): ColumnFinder[Head *: Tail, Name, Type] = new ColumnFinder[Head *: Tail, Name, Type] {
def get: Column[Name, Type] = finder.get
}
}
As a consequence, our find
method now takes 2 types parameters:
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns]) {
def find[ColumnName, ColumnType](implicit finder: ColumnFinder[Columns, ColumnName, ColumnType]): Column[ColumnName, ColumnType] = {
finder.get
}
}
object book extends Table[
"book",
Column["title", String] *:
Column["number_of_pages", Int] *:
Column["author", String] *:
EmptyTuple
]
book.find["number_of_pages", Int] // compiles, returns a new Column["number_of_pages", Int] instance
book.find["whatever", Int] // does not compile: implicit not found
book.find["number_of_pages", Double] // does not compile: implicit not found
This is not great, because the intention was to find a column just by its name, not a combination of name and type.
There are different approaches we could take (partially applied functions for instance),
but the most elegant solution here is based on a different idea: taking an explicit value,
and let the compiler work out what the types are:
class Table[Name, Columns](implicit val name: DbIdentifier[Name], val columns: ColumnList[Columns]) {
def find[ColumnName <: Singleton, ColumnType](name: ColumnName)(
implicit
finder: ColumnFinder[Columns, ColumnName, ColumnType]
): Column[ColumnName, ColumnType] = {
finder.get
}
}
object book extends Table[
"book",
Column["title", String] *:
Column["number_of_pages", Int] *:
Column["author", String] *:
EmptyTuple
]
book.find("number_of_pages") // compiles, returns a new Column["number_of_pages", Int] instance
book.find("whatever") // does not compile: implicit not found
Our find
method is accepting a runtime value of type ColumnName
,
and it’s also requiring ColumnName
to be a Singleton
type, so the concrete ColumnName
type can be inferred.
The ColumnType
, will also be inferred by the compiler,
since there’s only one possible concrete type for which an instance of ColumnFinder
can be derived,
which is, indeed, an impressive capability of the Scala compiler itself.
Conclusion
With this article, I intended to share part of my experience with the generic programming style, where, quoting wikipedia - “algorithms are written in terms of types to-be-specified-later” and “instantiated when needed for specific types provided as parameters”.
Of course, this is just the tip of an iceberg, and much more can be said and achieved.
Nevertheless, descending into the compiler universe has been a fascinating journey
for a software engineer used to deal with a world made of APIs and databases, mutable and ruled by side effects.