I've refined my thinking a little about the optimal way to keep track of tasks in a relational database. Here is the newer version:
CREATE TABLE task ( id BIGSERIAL PRIMARY KEY, parentid BIGINT REFERENCES task (id), taskname varchar(50) NOT NULL CONSTRAINT name_not_empty CHECK (taskname <> '') ); CREATE TABLE leaftask ( id BIGINT PRIMARY KEY REFERENCES task (id), weight INT4 NOT NULL DEFAULT 1000, starttime TIMESTAMPTZ, endtime TIMESTAMPTZ, CONSTRAINT startbeforefinish CHECK ((starttime < endtime) or (endtime IS NULL)) );
I want my tasks to follow certain rules:
- All sub-tasks of a given parent task should add up to 100% of the parent task.
- A task can never have exactly one sub-task because - according to the rule above - the subtask would by synonymous with its parent.
- All tasks that have no descendants - and none that don't - should have a corresponding "leaftask" record.
- All tasks must have a final root that has no parent - i.e. no looping back on your descendants.
I want to build structures into the database that will prevent invalid states (as defined by the above rules) from occurring. Today I am going to tackle the problem of a task descending from itself. To prevent that case, I need what in PostgreSQL is called a trigger. I'm getting ahead of myself though. First I need a way to list the descendants of a given task. Since I want it to happen all in the database, I need a stored function.
Stored functions in PostgreSQL can be written in many languages. I'm going to write mine in PL/pgSQL, a sort of SQL-related procedural language, because it is included by default with PostgreSQL. (PL/pgSQL is intentionally similar to Oracle's PL/SQL, if you're an Oracle person.)
CREATE OR REPLACE FUNCTION taskdescendants(parenttaskid bigint, OUT decid bigint) RETURNS SETOF bigint AS $fun$ BEGIN FOR decid IN SELECT id FROM task WHERE parentid = parenttaskid LOOP RETURN NEXT; FOR decid IN SELECT * FROM taskdescendants(decid) LOOP RETURN NEXT; END LOOP; END LOOP; RETURN; END; $fun$ LANGUAGE plpgsql;
This function will return the ids (each as a separate row) of any descendants a given task might have. I can use it in my SQL statements as if it were a table. For example, this will select all tasks descending from task 57:
SELECT decid, taskname FROM taskdescendants(57) INNER JOIN task ON decid = id;
That's all well and good for a generic function, but trigger functions have slightly different requirements. This will check that the new parentid doesn't show up in the list of descendants:
CREATE OR REPLACE FUNCTION taskrecursioncheck() RETURNS trigger AS $fun$ BEGIN IF NEW.parentid IN (SELECT decid FROM taskdescendants(NEW.id)) THEN RAISE EXCEPTION 'task cannot descend from one of its own descendants'; END IF; IF NEW.parentid = NEW.id THEN RAISE EXCEPTION 'task cannot descend from itself'; END IF; RETURN NEW; END; $fun$ LANGUAGE plpgsql; CREATE TRIGGER parent_trap BEFORE UPDATE ON task FOR EACH ROW EXECUTE PROCEDURE taskrecursioncheck();
There - now it raises an error on invalid relationships and I got a handy function that my front end can call. Later I will have to figure out some way to ensure my other rules are followed.