## Beyond Excel: VBA and Database Manipulation

Aug 31 2010   10:16AM GMT

Profile: Craig Hatmaker

By guest contributor: Yoav Ezer

NOTE: This post provides an example spreadsheet: accelerating-excel.xlsm.  Due to concern for your system’s security, macro enabled spreadsheets cannot be stored in this blog.  So to accomodate security and free exchange of ideas, we loaded the spreadsheet as a text file with a “txt” extension.  To use this example, right click the link, select “Save Target As”, change the extension on the file name from .zip (it’s a text file that ITKnowledgeExchange has compressed for you) to .xlsm, click “Save”, scan your local copy for viruses, then open it.

Do your Excel spreadsheets sometimes take too long to calculate? It may be due to formulas that crunch large sets of data (for example, data that comes from large databases – the focus of this blog). This is because Excel recalculates all formulas that depend on a specific cell every time you change that cell. And if those formulas have dependents, Excel will recalculate them, and their dependents, and so on, and so on.

Consider the formula =SUM(A:A). This adds all cells in column A. It recalculates each time you update any cell in column A. Fortunately, the SUM function is very fast and may not cause significant delay even if used a 1,000 times in your workbook. But more advanced functions, like SUMPRODUCT() and array formulas, are not so efficient.

For instance, the following array formula is pretty simple: =SUM(IF(MOD(A:A,2)=1,A:A,0)). It sums all odd numbers in column A. It is much slower than the SUM function. I’ve used this formula only 12 times on sheet1 (See accelerating-excel.xlsm above) in this workbook and on my machine it takes 5 seconds to add a value to column A, which makes this workbook too slow to use.  Fortunately for us, there are ways to make Excel work faster even with advanced formulas.

Strategy #1: Use Limited Ranges
The reason the array formula evaluates so slowly is that it calculates for every cell in column A (that’s more than 1 million cells). One way to make this formula work faster is to limit the range. So instead of using this formula:

`{=SUM(IF(MOD(A:A,2)=1,A:A,0))}`

We can use this formula specifying only the rows needed:

`{=SUM(IF(MOD(A1:A10000,2)=1,A1:A10000,0))}`

NOTE: Excel adds curly brackets when you enter a formula using CTRL+SHIFT+ENTER. CTRL+SHIFT+ENTER tells Excel your formula is an array formula. For more information on array formulas and their power, see: Introducing Array Formulas in Excel by Colin Wilcox and John Walkenbach

Because the revised formula is limited to 10,000 rows it works 100x times faster!

Strategy #2: Use Dynamic Ranges
Strategy #1 works as long as you know how many cells contain data. When you don’t know, you can still limit your ranges using a Dynamic Range. Dynamic Ranges expand automatically to include only cells that contain data.  You can define a dynamic range called ‘ColumnA’ like this:

`=OFFSET(Sheet2!\$A\$1,0,0,COUNT(Sheet2!\$A:\$A),1)`

And then use it in the original formula in the following manner:

`{=SUM(IF(MOD(ColumnA,2)=1,ColumnA,0))}`

NOTE: See How to Set up a Named Range in Microsoft Excel if you need help with this.

This formula calculates only rows in column A that contain data. That’s good for two reasons: It reduces the number of cells calculated if your range contains fewer cells than anticipated; AND, it calculates cells that might otherwise be overlooked if your range contains more cells than anticipated. For more information on Dynamic Ranges see: Introduction to Dynamic Ranges.

To experience the performance difference between these two methods, open the sample file and update data on the first and second sheet. You’ll see a very palpable difference.

Strategy #3: Stopping/Starting Calculation
At times, even limiting the range used in the formula isn’t enough. One of our clients had a workbook with over 12,000 array formulas and although we used dynamic ranges to limit the range size in each of those formulas, the workbook took over a minute to update with 1,000 data rows. For that client we used the following technique.

The workbook was divided into a data entry sheet and ‘data analysis’ sheets which contained the array formulas. We employed a simple macro to stop the formulas on the workbook from automatically updating every time the user entered the ‘data entry’ sheet and a second macro to calculate all the formulas on the workbook when the user left the ‘data entry’ sheet. This way the user was able to update data very quickly and wait only once (when leaving the sheet).  Here is the macro we used whenever the user entered the ‘data entry’ sheet:

`Private Sub Worksheet_Activate()`
`Application.Calculation = xlCalculationManual`
`End Sub`

And this is the macro we used when the user left the sheet:

`Private Sub Worksheet_Deactivate()`
```Application.Calculate
Application.Calculation = xlCalculationAutomatic```
`End Sub`

You can see how stopping and starting the automatic calculations effects performance in the sample file.

Summary
We use Microsoft Excel to improve productivity. We can improve productivity even more by removing unnecessary waits through writing efficient formulas and controlling when Excel does its magic! Look for opportunities to use these techniques to speed results and improve the user experience.

Do you have Excel optimization tips? Please share with us in the comments.