Sunday, November 11, 2007

Creating a Tree-based Task List in a Relational Database

Things we want:

  • tasks
  • subtasks
  • task owners (responsible for that task)
  • task start time
  • task end time
  • prerequisite tasks

How do we represent these things in a relational database? Tasks are represented as a tree. Only leaf tasks will have owners and start/end times. This is because any non-leaf task will be made up entirely of its subtasks. In other words, when all the subtasks are done, the parent task is done. The start time of a non-leaf task is the earliest start time from among its descendants. Likewise, its end time equals the last end time from among its descendants (null if any are not yet finished).

Note: "bigserial" and "serial" are PostgreSQL abstractions for 8-byte and 4-byte integers that have an auto-incrementing default. They're mostly used for primary keys.

CREATE TABLE task (
    taskid BIGSERIAL NOT NULL PRIMARY KEY,
    parentid BIGINT REFERENCES task(taskid),
    jobid INTEGER NOT NULL REFERENCES job(jobid),
    taskname VARCHAR(200) NOT NULL
);

CREATE TABLE taskleaf (
    taskid BIGINT NOT NULL PRIMARY KEY
      REFERENCES task(taskid),
    employeeid INTEGER NOT NULL
      REFERENCES employee(employeeid),
    starttime TIMESTAMPTZ,
    endtime TIMESTAMPTZ
);

Problems:

Every task has a "jobid" - only the root task really needs this. Make the jobs themselves the root tasks? That wouldn't work in my pre-existing job scheme. Also, if performance was database-bound in my application, I could shift the tree-decoding work to the web server by selecting all the tasks by jobid.

Prevent circular references? I could make a trigger or rule in the database that would check the path back to a root node before allowing a parentid to be assigned. While I am at it, I can check to make sure the parent's "jobid" matches.

I looked at the "adjacency list" method, but I don't think it would work well for my needs.

No comments: