Archive for Software Development

Best Practice: SQL Queries for Searching and Reporting

Imagine you have several pieces of criteria that the user may use to query the database. They probably have a nice dashboard where they can choose from drop-down lists and maybe even search using text!

I’ve seen many creative ways to manage SQL queries for reports and/or searches. Some methods of creating the SQL are better than others. This article describes one anti-pattern and one best practice that can be used in place of the anti-pattern.

The Anti-Pattern:

Imagine that the input parameters for this query are:

  • p_c1
  • p_c2

The anti-pattern code looks something like this:

var x = 'select ColOne, ColTwo from MyTable where';

if(first_condition)
  x+='theColumnIWantToCheck like p_c1'
else if(second_condition)
  x+='and thirdColumn = p_c2 and fourthColumn = p_c2'
exec x --Execute the string using your favorite database...

I’ve seen this pattern more times than I care to mention. It’s also a maintenance nightmare. Over time new criteria are added and the many branches in the if/then logic become convoluted at best.

I once saw a nasty version of this anti-pattern that spanned more than 50 pages when printed! Ouch! …and you guessed it, my job was to add “just one more” criteria to it! The core problem here is that we’re not allowing the database to do what it is good at. Databases LOVE to search and sort data!

A Best Practice:

A better way to handle the same problem utilizes short circuit evaluation. The basic pattern is this:

(p_my_parm==null or (condition_to_test_here))

This allows you to pass NULL in when you don’t want to search on a specific criteria. If you *do* want to search on it, then pass in a value. The second half of the condition will only run when p_my_parm is NOT null. I typically encourage developers to leave the second set of parenthesis in even when they’re not necessary. This tends to make the SQL more readable and understandable for less experienced software engineers.

Example
Select ColOne, ColTwo, ColThree from myTable where
    (p_c1==null or (theColumnIWantToCheck like p_c1))
and (p_c2==null or (thirdColumn = p_c2 and fourthColumn = p_c2))

The end result is a single SQL that is readable. It’s not surrounded by if statements, strings, and other syntax that make it hard to comprehend. Instead it exists as a single, elegant SQL statement.

In almost every case this approach comes out miles ahead of our anti-pattern in terms of both performance and maintainability. Your software engineers will be able to more quickly read and understand the SQL and your database engine can create an awesome query plan to execute the search.

Bottlenecks, Deer Hunting, and Software

I see this a lot at work – someone puts code in a function. No, that part isn’t unusual :) The part that “bugs” *pun intended!) me is that the code really belongs in the function that they have overridden (in the case of OO languages), or in the functions that they call (in the case of procedural and/or OO languages).

It’s really a variation on the classic theme – don’t repeat yourself; write as little code as possible; keep it simple; don’t make me think.

One of the best ways of accomplishing this is to approach creating software like deer hunting. The best hunters know that there are certain natural funnelsĀ  – places where deer tend to run. These are the best places to put up a tree stand. You wait for the deer to come to you. You don’t go running around trying to corral the deer. If there’s a river, you find a place that is narrow and shallow to hunt because it is a natural crossing. Over time you’ll see every deer in the surrounding area.

This technique also applies to software development. Find one place in the code to solve your problem. You have an additional advantage too – you can create the bottlenecks – the places where everyone crosses the river so to speak.

At work we have an OO based framework that all of our web apps use. The people who designed it didn’t take full advantage of the natural bottlenecks. Rather than adding code to our Page_Load base method to check if the application should redirect the user to a “sorry, but we’re down while the batch runs” page, they put code in EVERY page of EVERY application. Worse, they put a class in every application that is around 500 lines long. I did a diff between two of these classes. Less than 10 lines of difference. What does this mean? It means that 490 lines of code could be moved into a base class and the 10 lines of code could simply be overridden to provide the appropriate configuration values.

In short, the total amount of code in each application to support this feature could have been less than 10 lines of code instead of 500! In our current situation, if a bug is found in the duplicated code it would take us more than 100 hours of effort to update every application and deploy them. If this code had been put in a base class, we’d just have to update it in one place.

Mailing Lists and Procmail

I like having procmail sort my mail for me. In the case of mailing lists, the header of choice is the List-ID field. But there’s a problem… notice how each example below is slightly different. I want to pull the bold portion of the mailing list and use that as the folder name:

List-ID: <linux-kernel.vger.kernel.org>
List-Id: Learn about the Linux kernel <kernelnewbies.kernelnewbies.org>
List-Id: cocci.diku.dk

To get started, let’s state in English what we want to find: Dear procmail; please find the word that immediately precedes the first period in a line that begins with “List-Id:”

Finding these headers is easy with a regular expression… IF… you’re allowed to use look ahead: ^List-Id:.*?( (?!.*<)|<)([^.]*)

BUT, procmail doesn’t do look ahead :(

So let’s try with procmail’s regular expressions. Aside from look ahead/behind, there are two other major differences between procmail’s regular expressions and the rest of the world. First, procmail uses \/ to mark the portion of the expression that will be copied into $MATCH. Secondly, the part of the regular expression to the left of the \/ uses non-greedy matching. So when you write .* procmail treats it like .*? this is the feature that makes matching the three list headers I want to grab quite difficult.

With this in mind:
Matches the linux-kernel list:
^List-Id: *<\/[^.]*

Matches linux-kernel and kernelnewbies:
^List-Id: .*<\/[^<]?[^.]*

Notice the extra [^<]? which tells procmail that we want $MATCH to start after the < character. This is what allows the rule to find kernelnewbies without pulling < into $MATCH. This is necessary because procmail isn’t being greedy when it matches to the left side of \/.

Now, our remaining problem is the cocci mailing list. This one really makes life difficult. I decided that using a single regular expression just isn’t possible, so that means we’ll need two. One to grab the cocci mailing list and one to grab everything else. Here’s the completed procmail rule (note: I use Maildir and not mbox on my mailserver).

:0
* ^List-Id: \/[^.]+
{
        #list with <>
        #e.g. List-Id: Learn about the Linux kernel <kernelnewbies.kernelnewbies.org>
        #e.g. List-Id: <linux-kernel.vger.kernel.org>
        :0
        * $MATCH ?? ^.*<\/[^<]+
        .MailingLists.$MATCH/

        #list without <>
        #e.g. List-Id: cocci.diku.dk
        :0
        .MailingLists.$MATCH/
}

The Sum of All Sums

Several years ago, we had a foreign contractor write some code for one of our applications. One piece involved create some javascript to sum numbers on a dynamically generated page.

About a year ago, I rewrote it – you see he had coded 4 functions; Sum1, Sum2, Sum3, and Sum4. Sum1, Sum2, and Sum3 were identical except that different bugs had been fixed in each piece of code over time by other developers, so they didn’t “look” identical at first blush. That was confusing. Not to mention it was a little intimidating and scary to fix up each function so that all of the bugs were corrected in a way that didn’t modify the behavior of the framework that depended on these functions. Who knows, maybe someone coded in a way that made Sum3 work correctly for them…

The real surprise though, was Sum4. It presented a completely different problem. One might have assumed that it was simply a fourth sum. Imagine my surprise when I realized that it was not a fourth sum at all! It was in fact, the sum of all sums (Sum1 – Sum3)! A grand total. Even worse, it didn’t use Sum1, Sum2, or Sum3 to do this. Instead, it was simply a (very poor) copy of the code original code for Sum1, Sum2, and Sum3! Of course, completely different bugs had been fixed inside this function when compared to the current versions of Sum1, Sum2, and Sum3. Woah.

Long story short I ended up rewriting the code. We now have a single Sum(x) function that can do the work of sum1, sum2, sum3, etc… and we have a TotalOfAllSums() function that is aware of the number of summations we have on a webpage, and it simply calls sum(x) to determine the total. Simple, and, now we can implement a fourth Sum in the future without having to resort to naming it “Sum5″ and copying and pasting say… Sum3 to get us started.

The life of a Software Engineer….

gotta love it :)

The User Doesn’t Want To Wait

Too often user interfaces are muddied by this fact that seems lost on software developers. When I install software, I want to answer all of the questions UP FRONT and then let the computer do the dirty work. I should be able to leave without wondering if the computer will need babysitting later on! Not only should user interfaces do this, but they should INDICATE when they are doing this so that I know it’s safe to leave :)

Some prime examples:

  • Operating System Install
  • New Software Installation
  • Upgrades
    • Redesign debian apt-get to ask any and all configuration questions up-front! I realize there are technical challenges to this, but we’re software developers, and I’m sure there are harder things to figure out…
    • FreeBSD could do the same with portupgrade, etc…
    • Ironically, Microsoft has this just about right with Windows Update – although Debian is quite similar if you schedule a cron job to download and install the updates.
  • Microsoft Windows – Deleting files should be a background process (at least optionally). The file system should be able to mark a tree for deletion so that explorer doesn’t see it, etc… and then the delete should just happen. I don’t really care when it finishes! If the user wants to create a new file that conflicts with a file scheduled for deletion, then the OS should be able to deal with that too …

The goal should be to let the user do as much as possible in as short amount of time as possible. The software should save the time intensive tasks for later and manage them in a way that does not impact the user.

Good OO is NOT Flat

Good OO (object oriented) design is not flat. It is 3d and sometimes even 4d in nature. The problem with most object designs is that they are based on the concept of an object. Objects are too constrained. Too often, an object is demoted to the thing that holds the data. People need to get over that idea. OO is simply a way to organize your code into tools that can be reused. The power of OO is found when you use it to create building blocks and tools. Objects are nice, but you really should be intentionally creating re-usable, modular, code! An object is a byproduct of that organizational style. You are programming related functionality in a multi-dimensional manner.

Consider interfaces that allow “horizontal” relationships. An interface is considered part of good OO design. With it, you can relate things that are not related “vertically” via traditional inheritance. If your “object” has legs, then implement the legs interface. Now everyone else can use them… e.g. Object.Legs.Walk, Object.Legs.Run. Perhaps your animal doesn’t know how to run? Then let it walk instead :) Now your arthropoda spider and your chordata ape can both walk even though they are from different phylums. It’s cross domain functionality.

Excellent programmers understand this distinction. I’ve seen too many good OO designs muddied by people who still live in a functional/procedural world. This is possible because OO simply extends the procedural model. To be an OO programmer does not imply understanding of OO. In fact, it is possible to program as in the past and completely miss the benefits that OO can give!

Warning signs of bad OO programming … hmmm, sounds just like bad procedural programming…

  • Repeated blocks code
  • A single line of code repeated 10s of times.
  • GoTo … LOL :)

Error Messages

Ideally, error messages should be treated as how-to-fix-it messages – ID, pp. 149

Good Software Design

Some features of good software design:

  • Is deliberate
  • Is well organized
  • Object oriented
  • Makes excellent use of Layering (and even layering inside layers)
    • Careful thought is put into the placement of code in layers
    • Scopes variables and functions appropriately (e.g. public, private, protected, etc…)
  • Minimizes copy and paste
    • if you copy and paste, then it should immediately become a function and shared if at all possible.
    • Your aim should be minimal duplication and maximal reuse. But please do use common sense.
  • Is invisible
    • Guides novice programmers – they should follow the design almost by accident.
  • Each function should work as a generic tool.
    • The user of a function should not need to know how it works.
    • Thought should be given to this when creating any function.
    • Should be as simple as possible in implementation and functionality, but no simpler.
  • Refactor, refactor, refactor
  • Adheres to the principles of
    • High Cohesion
    • Low Coupling

    Makes use of deliberate CODE bottlenecks.

Another funny error message!

When everyone doesn’t speak English as their primary language, funny things start to happen… to make it more ironic, there is a really bad thunderstorm today… so we’re not sure if the computer is trying to report a weather related phenomenon!

The reporting database checking program detects a mistmatch between the application database and the reporting database