Wednesday, February 13, 2008

Maintainting Heirarchies in SQL

I've refined my thinking a little about the optimal way to keep track of tasks in a relational database. Here is the newer version:

CREATE TABLE task (
 id BIGSERIAL PRIMARY KEY,
 parentid BIGINT REFERENCES task (id),
 taskname varchar(50) NOT NULL
 CONSTRAINT name_not_empty
 CHECK (taskname <> '')
);
CREATE TABLE leaftask (
 id BIGINT PRIMARY KEY REFERENCES task (id),
 weight INT4 NOT NULL DEFAULT 1000,
 starttime TIMESTAMPTZ,
 endtime TIMESTAMPTZ,
 CONSTRAINT startbeforefinish
 CHECK ((starttime < endtime) or (endtime IS NULL))
);

I want my tasks to follow certain rules:

  • All sub-tasks of a given parent task should add up to 100% of the parent task.
  • A task can never have exactly one sub-task because - according to the rule above - the subtask would by synonymous with its parent.
  • All tasks that have no descendants - and none that don't - should have a corresponding "leaftask" record.
  • All tasks must have a final root that has no parent - i.e. no looping back on your descendants.

I want to build structures into the database that will prevent invalid states (as defined by the above rules) from occurring. Today I am going to tackle the problem of a task descending from itself. To prevent that case, I need what in PostgreSQL is called a trigger. I'm getting ahead of myself though. First I need a way to list the descendants of a given task. Since I want it to happen all in the database, I need a stored function.

Stored functions in PostgreSQL can be written in many languages. I'm going to write mine in PL/pgSQL, a sort of SQL-related procedural language, because it is included by default with PostgreSQL. (PL/pgSQL is intentionally similar to Oracle's PL/SQL, if you're an Oracle person.)

CREATE OR REPLACE FUNCTION taskdescendants(parenttaskid bigint, OUT decid bigint)
 RETURNS SETOF bigint AS $fun$
BEGIN
 FOR decid IN SELECT id FROM task WHERE parentid = parenttaskid LOOP
  RETURN NEXT;
  FOR decid IN SELECT * FROM taskdescendants(decid) LOOP
   RETURN NEXT;
  END LOOP;
 END LOOP;
 RETURN;
END;
$fun$ LANGUAGE plpgsql;

This function will return the ids (each as a separate row) of any descendants a given task might have. I can use it in my SQL statements as if it were a table. For example, this will select all tasks descending from task 57:

SELECT decid, taskname FROM taskdescendants(57) INNER JOIN task ON decid = id;

That's all well and good for a generic function, but trigger functions have slightly different requirements. This will check that the new parentid doesn't show up in the list of descendants:

CREATE OR REPLACE FUNCTION taskrecursioncheck() RETURNS trigger AS $fun$
BEGIN
 IF NEW.parentid IN (SELECT decid FROM taskdescendants(NEW.id)) THEN
  RAISE EXCEPTION 'task cannot descend from one of its own descendants';
 END IF;
 IF NEW.parentid = NEW.id THEN
  RAISE EXCEPTION 'task cannot descend from itself';
 END IF;
 RETURN NEW;
END;
$fun$ LANGUAGE plpgsql;

CREATE TRIGGER parent_trap BEFORE UPDATE ON task
 FOR EACH ROW EXECUTE PROCEDURE taskrecursioncheck();

There - now it raises an error on invalid relationships and I got a handy function that my front end can call. Later I will have to figure out some way to ensure my other rules are followed.