Transitive Closure in PostgreSQL
At Remind we operate one of the largest communication tools for education in the United States and Canada. We have...
For the first several years at Remind, our authorization model was straightforward. We had three distinct roles that dictated what you could do on the site: teacher, parent and student. For example, teachers could create classes while students couldn’t. While this role-based authorization system served us fine for a while, it eventually broke down as the complexity of our product increased.
Without good guards in place, these rules devolve into spaghetti code, with conditionals and special cases all over the place. It’s hard to read, it’s hard to modify without side effects and it’s hard to reason about correctness. Especially when you have code that was written ages ago by engineers who’ve moved on from the company. We needed to simplify the way we were expressing permissions.
We wanted a way of describing arbitrarily complex authorization rules that any employee could grok from looking at the code. We wanted to be able to reuse common building blocks and separate rules descriptions from the underlying implementation and data fetching. We wanted to have authorization code that was easy to read and more importantly, easy to modify. So we built the permissions service—a declarative style of writing user permissions.
The easiest way to understand the permissions service is to see a couple examples of permissions:
func CanCreateGroup(ctx context.Context) Predicate {
return AllOf(
Not(IsUnderThirteen(ctx)),
Not(IsMuted(ctx)),
)
}
// Admin console is a product for administrators at a school to oversee their school
func CanSeeAdminConsole(ctx context.Context, organizationUuid *string) Predicate {
return AllOf(
AnyOf(
IsOrganizationPaid(ctx, organizationUuid),
IsOrganizationInDemo(ctx, organizationUuid),
)
AnyOf(
IsOrganizationAdmin(ctx, organizationUuid),
IsRemindAccountManager(ctx, organizationUuid),
)
)
}
// Admin stats is one feature within the admin console
func CanSeeAdminStats(ctx context.Context, organizationUuid *string) Predicate {
return AllOf(
CanSeeAdminConsole(ctx, organizationUuid), // Defined above
AnyOf(
DoesPaidPlanIncludeStats(ctx, organizationUuid),
IsRemindAccountManager(ctx, organizationUuid), // Add an override so account managers can demo the product
)
)
}
The goal is that you should be able to understand who can create groups and who can see our admin console from the code above. You should feel comfortable modifying the above code even if you have never written a line of Go, which is what the service is written in. You might even be able to write new rules, repurposing the existing building blocks that you can see in the example.
There’s a lot going on in the code example above so let’s break it down.
You’ll notice every permission returns a Predicate
type. A predicate is what you’d expect:
type Predicate func() (bool, error)
What each permission is actually returning is a function that will evaluate to true or false, instead of having the permission directly return a boolean. Also you’ll notice that the Predicate
above takes no arguments. By returning a Predicate
instead of the boolean, we can separate the steps of binding arguments from actually evaluating the function. This makes it possible for our rules to have arbitrary function signatures, as well as make it possible to evaluate these functions however we want. For instance, we can evaluate these functions in separate goroutines and parallelize work. More on that later.
In the above example, IsUnderThirteen
, IsMuted
, IsRemindAccountManager
and DoesPaidPlanIncludeStats
are all Leaf Rules. That means they are rules that aren’t compositions of other rules (like CanSeeAdminConsole
). In almost all cases, Leaf Rules are doing a fetch to another service or data store. The goal of the design is to be able to isolate these external fetches. They could all just be queries to a single PostgreSQL database or they could be doing HTTP or gRPC calls to a whole bunch of different microservices like a users service, a mute service and a paid accounts service. Even better, we can easily switch from an HTTP call to a gRPC call without touching any rules because the data fetching code is totally isolated from the rules definitions.
An example of one of our Leaf Rules:
func IsOrganizationPaid(ctx context.Context, organizationUuid *string) Predicate {
return func() (bool, error) {
organization, err := store.GetOrganizationByUuid(ctx, organizationUuid)
if err != nil {
return false, err
}
return organization.Paid, nil
}
}
Where the actual data fetch is still abstracted away into a store package. I’ll dig into what that data fetch looks like later.
Leaf Rules are good as reusable building blocks, but the real value comes when there are clean ways of combining Leaf Rules. We currently have three combiners: AllOf
, AnyOf
and Not
. A quick look at the code for Not
:
func Not(predicate Predicate) Predicate {
return func() (bool, error) {
result, err := predicate()
return !result, err
}
}
That one’s really simple. A really basic implementation of AllOf
might look like
func AllOf(predicates ...Predicate) Predicate {
return func() (bool, error) {
for _, predicate := range predicates {
result, err := predicate()
if err {
return false, err
}
if !result {
return false, nil
}
}
return true, nil
}
}
This isn’t how we implement AllOf
though in the permissions service. This evaluates each predicate sequentially meaning the time it takes an AllOf
to evaluate is the sum of the time if takes to evaluate all the predicates. But these Leaf Rules have no dependencies between each other so we can run them concurrently. This is the main reason we wrote the permissions service in Go.
As mentioned before, one of the benefits of using the Predicate
type is to be able to evaluate functions in goroutines. Utilizing that, here’s what our AllOf
really looks like:
func evalPredicate(pred Predicate, resultChan chan bool, errChan chan error) {
result, err := pred()
if err != nil {
errChan <- err
} else {
resultChan <- result
}
}
func AllOf(predicates ...Predicate) Predicate {
return func() (bool, error) {
childChan := make(chan bool)
errChan := make(chan error)
for _, predicate := range predicates {
go evalPredicate(predicate, childChan, errChan)
}
for i := 0; i < len(predicates); i++ {
select {
case res := <-childChan:
// Short-circuit
if !res {
return false, nil
}
case err := <-errChan:
return false, err
}
}
return true, nil
}
}
We heavily use goroutines to be able to parallelize evaluation of predicates. With this change, we now cut the amount of time it takes to evaluate an AllOf to be the maximum of the time it takes to evaluate the predicates. This is a huge improvement on the previous implementation. What’s really cool about this is that if we revisit our rules from above:
func CanSeeAdminConsole(ctx context.Context, organizationUuid *string) Predicate {
return AllOf(
AnyOf(
IsOrganizationPaid(ctx, organizationUuid),
IsOrganizationInDemo(ctx, organizationUuid),
)
AnyOf(
IsOrganizationAdmin(ctx, organizationUuid),
IsRemindAccountManager(ctx, organizationUuid),
)
)
}
func CanSeeAdminStats(ctx context.Context, organizationUuid *string) Predicate {
return AllOf(
CanSeeAdminConsole(ctx, organizationUuid),
AnyOf(
DoesPaidPlanIncludeStats(ctx, organizationUuid),
IsRemindAccountManager(ctx, organizationUuid), // Add an override so account managers can demo the product
)
)
}
Because every expensive fetch is kicked off in its own goroutine, evaluating CanSeeAdminStats
will immediately start fetching IsOrganizationPaid
, IsOrganizationInDemo
, IsOrganizationAdmin
and IsRemindAccountManager
all in parallel, while also utilizing any early exit shortcuts we can. For example, if IsOrganizationAdmin
is taking 100ms but isRemindAccountManager
is a much faster 5ms and evaluates to true, the whole query won’t be blocked by the expensive organization admin call.
While this is all great for evaluating permissions quickly, this results in cases where we’ll fetch the same data multiple times while evaluating a single permission. In the above example, IsRemindAccountManager
will be evaluated multiple times. There are cases that are even more difficult to detect. If you look at how we check paid and demo, you can imagine something like:
func IsOrganizationPaid(ctx context.Context, organizationUuid *string) Predicate {
return func() (bool, error) {
organization, err := store.GetOrganizationByUuid(ctx, organizationUuid)
if err != nil {
return false, err
}
return organization.Paid, nil
}
}
func IsOrganizationInDemo(ctx context.Context, organizationUuid *string) Predicate {
return func() (bool, error) {
organization, err := store.GetOrganizationByUuid(ctx, organizationUuid)
if err != nil {
return false, err
}
return organization.Demo, nil
}
}
In which case we’re going to do a cold fetch for the organization twice in parallel, once in IsOrganizationPaid
and then again in IsOrganizationInDemo
. To avoid doing that, we need to keep some sort of “global” state that is shared between all rules. I put “global” in quotes because we don’t need this state to persist as long as the service is running, but instead just for the duration that a single top-level permission is being evaluated.
We use the context package for this purpose
Package context defines the Context type, which carries deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes
Which is exactly what we want. In the context object, we keep a Store which facilitates deduplicating data fetches across goroutines.
type Store struct {
ResponseCache map[string][]byte
RequestsInFlight map[string][]APIResponseChannels
Mutex *sync.Mutex
Client *http.Client
}
I alluded to the store
package earlier that The store has:
Finally, the last piece is to wrap the logic for all this deduplication into a clean and straightforward interface. For that, we just have a single function:
func Fetch(ctx context.Context, req *http.Request) ([]byte, error)
Under the hood:
All of these pieces sum up to give us a cleanly composable set of permissions that are easy to read and modify, evaluate faster than traditional methods (through parallelization) and avoids some natural pitfalls, like duplicating work.
There are still ways the evaluation steps could be improved. For instance, short circuiting does result in lower latency, but it doesn’t prevent every leaf rule from actually doing the work it needs to produce a result, even if that result isn’t used in the end. That’s because we kick off all leaf rules in parallel so by the time we know that evaluating a leaf rule is unnecessary, it’s already probably waiting on I/O, and the costs are already incurred. This could be prevented by having a more intelligent system, where some Predicate combiners run in parallel, while some run in series, based on weights and empirical pass rates. But in trying to get an MVP out and prove out the concept, these are optimizations that add a great deal of complexity to a service that already has a lot going on.
One of our values is creating simplicity for others. While there is a lot of tech happening under the hood, the primary goal of this project has always been to create an interface for describing permissions that all developers can build on, even if they have no understanding of how the guts of the permissions service work. With this declarative style of describing permissions, engineers can confidently read, understand and modify what people can do on Remind.