Once upon a time…
…I noticed that, what should have been a relatively simple (and, above all, quick) process was taking longer and longer to complete. This wasn’t a system I had written myself, but I had inherited it (as you do) when joining a firm which relied on a lot of home-brewed software.
The basic function of this particular was relatively straightforward:
- clear out the list of unactioned items
- get a list of all those cases for which we are waiting for documentation
- for each case (and each outstanding item), work out what “chaser” needs to be done next – letter, phone call – and when it should be performed
- create an appropriate “diary” entry
These diary entries were either cleared by the arrival of the documentation we were expecting, or were triggered an appropriate number of days after we had generated the action request; they were also in part dependent upon the particular work referrer’s own rules, and also upon the type of work being performed.
These last two, as you can imagine, didn’t change very often – perhaps only once a month or so. So why the program that did this needed to look up the control data for every case it was processing (which would have been in the tens of thousands) was beyond me. So I changed the code.
I changed the code and replaced the timewasting code which ran queries along the lines of:
SELECT ClientName FROM tblClients WHERE ClientID = 5
SELECT ProcessName FROM tblProcesses WHERE ProcessID = 3
(Note that each of these tables was at most a couple of dozen records.)
I replaced that code with a single cached client-side recordset, populated once at the start of the program’s main run; I did this for only half a dozen small(ish) reference objects, and replaced the queries against those recordsets with filters on those recordsets. An easy couple of days work to make these changes and test it, and suddenly, that process’s run duration was reduced from 2 hours to as many minutes, which made m’learned colleagues in the support department happy, because they didn’t have to keep popping back to check that everything was still running smoothly with that system.
So, why was the code crappy?
The code as originally written, worked, but not particularly efficiently. However, this hadn’t been highlighted as a potential performance issue because (as is not unusual) it had never been tested against a large dataset. The production dataset had grown slowly over several years and the support team had just got used to it being slower and slower.
The code itself may not have been written from the state of knowing how the data would behave, and therefore that you could do this sort of thing perfectly legitimately. Either that, or it was just laziness on the part of the original programmer, it being easier to go and get the data as you want / need it, rather than work out what could be cached.
As a programmer, you need to understand your data in order to write optimal code.