Access SQL TRANSFORM Statement
Posted by putxi on March 16, 2007
The Access SQL TRANSFORM statement is a non-standard extension to SQL-92 implemented by Microsoft’s JET database engine to create crosstab queries. Crosstab queries calculate a sum, average, count, or some other aggregate on rows, and then group the results in two dimensions: one down the left side (row headings) and the other across the top (column headings). In Excel these are called PivotTables. In SQL Server you’re on your own.Crosstab queries provide a powerful data analysis tool, but can be tricky to use in Access. It’s not that they’re hard to create. Access provides a wizard for creating new ones, and the syntax is pretty straightforward, if under-documented (that TRANSFORM link above is to the Access 97 documentation). It’s using them, like as the basis for a report, that can be tricky.
The problem is that a crosstab query’s columns aren’t generally known until the query is run. After all, a crosstab query’s purpose is to turn row aggregations into columns, and so the number of columns (and their names) will generally vary depending on the data selected. Access reports, on the other hand, really want to know at design time what columns their base queries will return. Without this information, they can’t know what data columns are available for layout and presentation.
Luckily, there are lots of ways to get around this in Access. For instance, you can assemble, parse, and distribute the query’s results as delimited values, or use a multi-column subreport, or use a parallel non-TRANSFORM GROUP BY query and fill the report’s controls programmatically. The folks at Corporate Technologies have prepared a great demonstration of the many ways to do this, so I won’t repeat them here. What I am going to do is ramble some about creating a parameterized crosstab query with a fixed set of columns in order to highlight a couple of gotchas.
I prepared a sample database, in case you want to follow along.
Let’s start with a single table named mytable having the following columns and rows:
column type key long integer project long integer year integer amount currency key project year amount 1 100 2003 $0.50 2 100 2004 $1.00 3 100 2004 $1.50 4 100 2005 $2.00 5 200 2004 $3.00 6 200 2005 $4.00 7 200 2005 $4.50 8 200 2006 $5.00
Using standard GROUP BY SQL, it’s easy to get a total of the amounts by project (see q0a):
SELECT mytable.project, Sum(mytable.amount) AS total FROM mytable GROUP BY mytable.project; project total 100 $5.00 200 $16.50
Or a total of the amounts by year (see q0b):
SELECT mytable.year, Sum(mytable.amount) AS total FROM mytable GROUP BY mytable.year; year total 2003 $0.50 2004 $5.50 2005 $10.50 2006 $5.00
However, to get a total of the amounts by project for each year, we’ll need a crosstab query.
To turn the former GROUP BY query into a crosstab, wrap it in a TRANSFORM statement, using the Sum() of mytable.amount for the TRANSFORM and mytable.year as the PIVOT (see q1a):
TRANSFORM Sum(mytable.amount) AS total SELECT mytable.project FROM mytable GROUP BY mytable.project PIVOT mytable.year; project 2003 2004 2005 2006 100 $0.50 $2.50 $2.00 200 $3.00 $8.50 $5.00
Alternatively, we can GROUP BY year and PIVOT on the project (see q1b):
TRANSFORM Sum(mytable.amount) AS total SELECT mytable.year FROM mytable GROUP BY mytable.year PIVOT mytable.project; year 100 200 2003 $0.50 2004 $2.50 $3.00 2005 $2.00 $8.50 2006 $5.00
Now suppose we need to create an Access report based on the first crosstab query above (i.e., total amounts by project for each year), but we want to restrict the data to a user-specified year and the year after. To do this, let’s add a WHERE clause that uses a numeric parameter named “foryear” (see q2):
PARAMETERS foryear Short; TRANSFORM Sum(mytable.amount) AS total SELECT mytable.project FROM mytable WHERE mytable.year In ([foryear],[foryear]+1) GROUP BY mytable.project PIVOT mytable.year;
When we run this query, we’ll be prompted to enter a value for the “foryear” parameter. If we specify 2004, we get this:
project 2004 2005 100 $2.50 $2.00 200 $3.00 $8.50
Perfect. Unfortunately, we can’t base a report on it. Well, we could, but we would have to bind the columns to a particular set of years, or programmatically establish the column bindings at runtime. Not what we want.
Luckily, in situations like this, where the number and/or type of columns desired are known in advance, there’s a simple solution. We can coerce the PIVOT data to a set of fixed values using an expression. In other words, instead of relying on the varying values of a column to determine our column headings, we’ll calculate fixed column headings from the column data using an expression.
In our query, we know we’ll only ever have two columns: one for the specified “foryear” parameter, and one for the next year. So, we can PIVOT on an expression that converts mytable.year values to the appropriate fixed values of “thisyear” or “nextyear” (see q3):
PARAMETERS foryear Short; TRANSFORM Sum(mytable.amount) AS total SELECT mytable.project FROM mytable WHERE mytable.year In ([foryear],[foryear]+1) GROUP BY mytable.project PIVOT IIf(mytable.year=[foryear],"thisyear","nextyear");
Now when we run this query and specify 2004 for the “foryear” parameter, we get this:
project nextyear thisyear 100 $2.00 $2.50 200 $8.50 $3.00
And if we specify 2005 for the “foryear” parameter, we get this:
project nextyear thisyear 100 $2.00 200 $5.00 $8.50
Now we have a crosstab query with fixed columns to which we can bind a report.
I should point out that if you are going to create a report based on this query, you might want to avoid specifying the report’s RecordSource at design-time. In other words, create an unbound report, and bind the report to the query at run-time by setting the report’s RecordSource property in its Open event (see rep-q3a). If you don’t do this you’ll be prompted for the “foryear” parameter a million times while designing the report. Alternatively, you can bind the query’s parameter to a control on an open form, or wait until you’re done designing the unbound report and assign its RecordSource just before you save it (see rep-q3b). Note that you’ll have to assign the report a RecordSource at design-time if you intend to use it as a subreport.
So, are we done? Not quite, we’ve still got a problem. The problem is that our crosstab query has columns consisting of fixed values, but doesn’t have a fixed number of columns. In other words, we know what the possible column values are (and so can bind form/report controls to them), but we don’t know until the query is run how many of those columns it will actually have.
To illustrate this problem, run our last query again and specify 2002 for the “foryear” parameter:
project nextyear 100 $0.50
Yikes, we lost a column.
This presents an obvious problem for any forms or reports we’ve bound to the query. If those forms or reports always assume the existence of both a “thisyear” and “nextyear” column then we’ll be seeing a 3070 error: “The Microsoft Jet database engine does not recognize <name> as a valid field name or expression.”
The other problem is that we can’t use this query as the basis for a subreport. If we do (see rep-q3bsub), we’ll get an error 2172: “You can’t use a pass-through query or a non-fixed-column crosstab query as a record source for a subform or subreport.” And, contrary to KB 209218, clearing the child report’s LinkChildFields and LinkMasterFields properties will not allow the report to run “showing every row in the subreport.” I also noticed that Access swallows this error if the parent report is opened programmatically using DoCmd.OpenReport, even though the report still doesn’t work (see frm-rep-q3bsub).
The solution is simple: explicitly declare the columns our crosstab query will always have. We do this by adding an IN clause to our PIVOT (see q4):
PARAMETERS foryear Short; TRANSFORM Sum(mytable.amount) AS total SELECT mytable.project FROM mytable WHERE mytable.year In ([foryear],[foryear]+1) GROUP BY mytable.project PIVOT IIf(mytable.year=[foryear],"thisyear","nextyear") IN ("thisyear", "nextyear");
Now if we specify 2002 for the “foryear” parameter we get:
project thisyear nextyear 100 $0.50
Great. We’ve now got a fixed column crosstab query that we can bind to the RecordSource of a report at design-time without causing endless parameter value prompts (see rep-q4), and that we can successfully use as the RecordSource of a linked subreport (see rep-q4sub).
By the way, if you’re wondering why earlier we didn’t do something like this:
PARAMETERS foryear Short; TRANSFORM Sum(mytable.amount) AS total SELECT mytable.project FROM mytable WHERE mytable.year In ([foryear],[foryear]+1) GROUP BY mytable.project PIVOT mytable.year IN ([foryear], [foryear]+1);
It’s because Access doesn’t support the parameterization of IN clauses in a TRANSFORM statement’s PIVOT clause. This will get you an error 3071: “This expression is typed incorrectly, or it is too complex to be evaluated. For example, a numeric expression may contain too many complicated elements. Try simplifying the expression by assigning parts of the expression to variables.”
Now, let’s talk about another TRANSFORM gotcha. It seems that Access can’t see through TRANSFORM SQL the same way it can see through all other SQL statements. Specifically, Access can’t resolve implicit query parameters through a chain of separate nested queries that contain, somewhere in the chain, a TRANSFORM query, unless the TRANSFORM query also uses or declares the same downstream parameters. More simply, TRANSFORM queries terminate implicit parameter bubbling in Access.
An example will probably explain this better. If we have a query named q5 like this:
SELECT mytable.project, mytable.year, mytable.amount FROM mytable WHERE mytable.year In ([foryear],[foryear]+1);
And another query named q6 that relies on q5 like this:
SELECT mytable.project, Sum(mytable.amount) FROM q5 GROUP BY mytable.project;
Running query q6 will cause Access to prompt us for a value for the “foryear” parameter, which is used, but not explicitly declared, in q5 . Likewise, if we programmatically evaluate the Parameters collection of q6’s QueryDef object, we’ll find that it contains a single “foryear” Parameter:
However, if we add another query named q7, which also relies on query q5, but uses a TRANSFORM statement, like this:
TRANSFORM Sum(mytable.amount) AS total SELECT mytable.project FROM q5 GROUP BY mytable.project PIVOT mytable.year;
Running the query will not get us a prompt for the “fordate” parameter. Instead, it will will get us an error 3070: “The Microsoft Jet database engine does not recognize ‘foryear’ as a valid field name or expression.” We’ll also see this error if we try to programmatically evaluate the Parameters collection of q7’s QueryDef object.
The solution is simple: explicitly declare query parameters, either in the queries that use them, or in the TRANSFORM queries that use the queries that use them.
Finally, just to be complete, I should point out that there’s a way to get the same crosstab-style fixed column results illustrated above without using a TRANSFORM statement (see q8):
PARAMETERS foryear Short; SELECT myvalues.project, Sum(myvalues.thisyear) AS thisyear, Sum(myvalues.nextyear) AS nextyear FROM [SELECT mytable.project, IIf(mytable.year=foryear, mytable.amount, Null) as thisyear, IIf(mytable.year=foryear + 1, mytable.amount, Null) as nextyear FROM mytable WHERE mytable.year In (foryear,foryear+1)]. AS myvalues GROUP BY myvalues.project;
Of course, this type of sub-SELECT SQL is always an option when you’re dealing with fixed column presentations. However, I find the TRANSFORM-based approach cleaner and easier to read.