Your requirement: to find the things where either A or B is true and C is true. Your first draft might be this:
SELECT ... FROM ... WHERE a OR b AND c
(field, table, function, logic all anonymised to protect the perpetrator)
What’s the problem? Operator Precedence, that’s the problem.
The order in which calculations are done – not just reading from left to right, but remembering that things like multiplication and division happen before addition and subtraction. My son tells me that kids nowadays are being taught something called “BIDMAS” – which stands for “Brackets, Indices, Division, Multiplication, Addition, Subtraction”. Or it can be BODMAS – Brackets, Operations, Division… (Operation is a fancy new way of describing indices – ie xy)
Unsurprisingly, there are similar rules for Boolean operators. (Boolean Operators LogicaL Order of CalculationS? BOLLOCS? So close…) And these all mix in together, with AND being at the same level as multiplication, and OR being at the same as addition. Microsoft has kindly provided a list of the operator precedence rules it uses in SQL Server.
So What? I hear you cry
Well, we had a malfunctioning server. Running
sp_WhoIsActive against the server showed 400-odd queries blocked by five instances of a malformed bit of code like the statement at the top of this post.
We, like many places, have a variety of third-party applications that allow users to write their own queries at a high level, and then the applications fill in the blanks, and add all the close brackets and things, to make the query actually parse… What these apps don’t do, of course, is fix the broken logic for you. Anyway, our Joe Punter had been asked to run a query to find customer results (or something) where condition A or condition B were true, and condition C was also true. And he had come up with the A OR B AND C logic above, and was puzzled when he fired off the report that it didn’t come back instantly, so he fired it off again. And again. And a couple more times for good luck.
Reproducing the Problem For Fun & Blogging Purposes
We are using the AdventureWorks Database, as that’s easy to get hold of. (NB: MS has moved all this stuff to github – you may need to update your bookmarks accordingly.)
This query below is a cut-down hack that illustrates the problem. In the data sense, it’s garbage, but it illustrates the problem albeit with several orders of magnitude less data.
SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail AS sod, Sales.SalesOrderHeader AS soh WHERE soh.OnlineOrderFlag = 1 OR soh.BillToAddressID = soh.ShipToAddressID AND soh.SalesOrderID = sod.SalesOrderID;
The query plan generated is…unpleasant
As you can see, the query plan is dominated by that Nested Loop operator. Let’s have a look what’s going on there:
Let’s rewrite the query properly (ie add a bracket or two):
SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail AS sod, Sales.SalesOrderHeader AS soh WHERE (soh.OnlineOrderFlag = 1 OR soh.BillToAddressID = soh.ShipToAddressID) AND soh.SalesOrderID = sod.SalesOrderID;
And the query plan:
Spot the difference.
Yes, the query returns different answers – that’s kinda the point inasmuch as the original query would not have returned what the user wanted… But we can now see a more appropriate join operator, and one that’s not taking all the query time.
The moral of the story? Get your logic right, and your server won’t fall around your ears.