Project Portfolio Optimization with Genetic Algorithms

Right before I tackled my keyboard project I turned in my second mandatory project report as part of my graduate program at the university of Duisburg-Essen. As my grade was really good and I liked it as I reread it recently, maybe it has value for someone. The report covers the idea of optimizing the sequence of doing project activities across different projects through the utilization of a genetic algorithm. The report is in German though (here it is nevertheless: ProjektArbeit).

I will give an overview of its content in this post. Additional to the written report exists the implementation freely accessible at squeak source. The sources are in the package: Phase2-POGA. It is written in Pharo Smalltalk, and you should use Monticello to load the sources to your image. Everything else needed should be included in the standard Pharo image. I tested it lately with Pharo 1.4, and it worked fine. If you need any advice in using Pharo and Monticello, there are really good freely accessible books available referenced from the home page of Pharo.

At first it is important to make a case for the topic. So why would one spend time to implement a genetic algorithm (What the hack is a genetic algorithm, by the way?) to order the tasks of projects? Well, at my work place (as maybe the case in many others) there are plenty of things to do and many of them are undoable in the recent future given the existing people and their knowledge. So we have to make the most important things first. But there are around one hundred people and around 50 most important projects with scores of interrelated tasks. No one I know is able to create a plan that ensures, that the tasks are tackled and finished in a given time frame, that in the end provide the most business value.

Genetic Algorithms take techniques and ideas from natural evolution to tackle optimization problems. There are basically the following steps involved:

  1. Create a random initial population of individual solutions.
  2. Rank the solutions regarding some criteria.
  3. Take the best and breed them somehow.
  4. Apply mutation to some degree.
  5. Take the Y best resulting solutions and eventually mix them with the X best solutions from the previous generation so that in the end the size of the initial population is obtained.
  6. Continue with step 2 until some abort criteria is reached.

The challenge is to find a good representation of the optimization problem so that each step can be applied in a meaningful way. The picture of the problem I drew above has to be transformed into a more specific form. I chose this one (there exist and might exist other valid representations):

  • There are a number of resources (individuals or teams) with individual capacities for time frames of individual length.
  • Each of those resources estimates a number of existing work units in an arbitrary unit (e.g. Story Points – maybe this post will help understanding it.) related to there capacity and their time frame length.
  • There can and should be more resources estimating the same unit of work (which will differ, see the post linked in the prior step.)
  • Units of work have an arbitrary number representing their overall business value.
  • Units of work may be related in two ways: one unit of work must be completed to be able to begin with another unit of work (I only implemented the simplest and most often applicable relation, others should easily be addable). The other relationship is the part-of relationship. One unit of work may be composed from an arbitrary number of other units of work (like a project is) and can not be finished until all those “child” elements are finished. Those composite units of work do still have an own business value and estimates from one or more resources. Because the whole can be more than the parts.
  • You specify a time span in which the optimal amount of work units (with the highest business value in sum) should be completed. The amount of open work units should exceed the possible accomplishable work for that time span, otherwise you could simply finish all work.

After deciding on this basic rules and cornerstones it is time to tackle the question of how to represent such a project portfolio in terms of a genetic algorithm. I termed the individual in the genetic algorithm population a scenario. A scenario contains the order of tasks for each individual resource. How that tasks can be outlaid (respecting the tasks relationships) on the timeline defines how many tasks can be accomplished in the timespan and thous how much business value will be generated.

At the beginning, remember step one from above, a (configurable) number of scenarios will be randomly generated. The order of tasks as well as which resource will work on which task (remember: for one task may exist more than one estimating resource) are the variables. The scenarios are ordered by comparing their achievable business value for that timespan respecting the relationships of the tasks and the capacity of the resources. The top scenarios are selected (step 2).

In the breeding process (step 3) the selected scenarios are randomly crossed with each other until the size of the original population is reached. When crossing two scenarios the order of tasks in the child is composed from alternately taking tasks from one of the parents (respecting of course already contained tasks). In the next step (step four – mutation) are certain tasks taken from their current position and moved to a random other position or even given to another resource if applicable. Mutation is generally applied at a relatively low rate (again configurable).

In step five the resulting population is ranked, and the best children are combined with the best parents. The rate at which the parents are kept is also configurable. The process starts over at step two. The genetic algorithm stops when their wasn’t a better solution for a configurable number of generations.

I used morphic to visualize the scenarios in a more or less rudimentary way. That’s basically it. I still think this is a nice approach but their are some shortcomings. Generally you won’t get concurrent estimations, and a project is usually given to a single team. This might change: for example the book “Agile Software Requirements” by Dean Leffingwell (a great book by the way) states an approach that might better fit the optimizing technique described in this post. Furthermore (at least) I notice some trend that fosters internal competition and parallel development. That might further fortify the described approach.

Another shortcoming is the one-number-business-value of the units of work. It would be tough work to get the decision makers to reduce their multiple-factors-world-view to one simple comparable number. Whereas the comparability might be the hardest part.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s