(By Guest blogger Yoav Ezer)
Many times when a workbook is crammed full of numbers, your data can be difficult to read. This is bad enough when you are sure the data is correct, but If you are supplied a spreadsheet that contains potential errors you really want to be able to detect them so they can be fixed.
A common error in inputting is where your work contains duplicate records. There are a couple of ways we can delete duplicates, but what if you only want to see them rather than delete them? Here is when conditional formatting can help. With this little technique you can make the duplicates jump out at you!
Check out this screen grab below.
See how the duplicate rows are highlighted? In this sheet, the highlighting helps us indentify duplicate invoices.
First you need to go to the conditional formatting dialog as you normally would.
Then in the “Edit the Rule Description” box:
Enter this formula:
The formula might look complicated, and it kind of is. It relies on a function you might not have seen much called SUMPRODUCT. If you are curious about the function, this article is a great introduction to the topic.
In this formula, SUMPRODUCT will return the sum of rows from rows 2 through 16 where columns A,B and C are equal to the current row. If the result is greater than 1 then the format is implemented on that row.
As you can see, when you are given a spreadsheet containing problems, you don’t always want to nuke error rows, some times you need to know about them so you can deal with the issues at source. Conditional formatting can raise your awareness without changing the content of your spreadsheet. Give it a try!
About the author
Yoav Ezer co-authors the technology and productivity blog Codswallop. He is also the CEO of a company that produces PDF to Excel conversion software.
A couple of months ago, Guest blogger Yoav Ezer posted a piece including the concept of Dynamic Named Ranges (see Strategies for Speeding Spreadsheets). Dynamic Named Ranges are ranges within Excel that have been named using “Name Manager” or “Define Name” and that can be expanded or contracted without having to change what the name refers to. Once named, you can use the range’s name in formulas and data validation instead of the common “$A$1:$D$50″ cell references. This makes your formulas simpler to read. And with the dynamic nature of Dynamic Named Ranges, you don’t have to change the formula when rows of data are added.
Here is a typical formula that could be added in Name Manger for a named range that starts in cell $A$1, has two columns, and can grow to as many rows as entries:
Translating this formula for us humans, it says:
- From $A$1
- Go down 0 rows
- Go right 0 columns
- Expand to down by the number of cells in column A that contain characters
- Expand to the right 2 columns
I “Googled” the subject and found many articles on Dynamic Named Ranges. Obviously there is a lot of interest and many examples of their use; however, in every article I read, including those in Microsoft’s MSDN, I kept coming across the same shortcomings:
- If your range contains empty cells, especially in the first column, you’re likely to get bad results
- If your range contains spaces after the last row, you’re likely to get bad results.
- If your range contains columns of different lengths, the suggested approaches are very cumbersome.
- If your range starts somewhere other than row 1, the formula gets more complex.
- If another range exists below or to the right of the first named range, you’re likely to get bad results.
Most of these limitations are because almost everyone seems to want to use COUNT or COUNTA to determine how many rows should be contained in the range. I found one blogger who used MATCH instead of COUNT. This had the advantage of skipping over empty cells, but still worked only for numbers, or for characters, but not for both (unless you double the formula and the MAX function).
=OFFSET($A$1,0,0,MATCH("",$A:$A,-1),2) 'Finds last character cell
=OFFSET($A$1,0,0,MATCH(1E+306,$A:$A,1),2) 'Finds last numeric cell
Seeking A Better Approach
The above approaches all work. But no one of them works for all circumstances by itself. And none of the approaches dealt with the stray space after the table’s last row. That’s not good enough. I want one compact formula that requires as little thought as possble that works for as many situations as possible. In researching and experimenting, I accidently stumbled on a quirk regarding formulas stored in names that makes overcoming these problems much simpler. Chip explains this quirk very well.
Defined Name Formulas And Array Formulas by Charles H. Pearson
If you use a formula in a Defined Name, that formula is evaluated as if it were an array formula. There is no way to force a formula in a Defined Name to be evaluated as a non-array formula.
Brilliant! With this bit of knowledge, we can use logical functions that are insensitive to the type of data used (Numbers vs Characters). Here is an array formula that finds the last row in column A containing anything at all (NOTE! The curly brackets are the result of Shift-Ctrl-Enter. For more information on how to enter array formulas see Array Formulas by Charles H. Pearson).
Building on this, we can find the last row within the first four columns that contains anything at all, regardless of which column is longest.
Because logical operators in Excel return 0 for FALSE and 1 for TRUE, we can shorten the formula up a bit, if that’s your preference.
But one problem remains. In my opinion, entries containing only spaces are the same as totally empty cells and as I said before, the above formula finds cells that contain anything at all, including those with just spaces. No worries, this is simple enough to overcome by trimming cells.
This formula finds the last cell that contains anything other than just spaces. It doesn’t matter if any of the cells in between are empty. It doesn’t matter if any of the cells are numbers or characters. It doesn’t matter which column is longest. To put it all together, you need to adjust for one more thing, the starting row. I often want my ranges to start in row 4 with totals in row 2. So if you want your range to start somewhere other than row 1, you need to subtract the starting row number and add 1 back for good measure. Here is the final formula that you would enter into Name Manager for a range that starts in $A:$4 and has 5 columns (NOTE! Name Manager does not need the curly brackets since it treats ALL formulas as array formulas no matter what).
This works. It’s also slow. It’s cumbersome.
Solving with VBA
Formula approaches just don’t work well. There are just too many ways they can fail and too many limitations. So I looked to VBA. VBA has always had a very simple and elegant way of dealing with dynamic ranges:
Set DynamicRange = Range("A4").CurrentRegion
CurrentRegion finds all adjacent non-empty cells. So if “A4″ is anywhere inside a table, CurrentRegion will identify the entire table. One minor problem is it will also pick up any adjacent cells with stray spaces. But a bigger problem is that when used in a UDF (User Defined Function intended to be used within an Excel formula), CurrentRegion returns only one cell.
A different approach (copied from Andy Pope) can be encapsulated into a VBA routine and used in a UDF context:
Set DynamicRange = Range("A4").Resize(Range("A4").End(xlDown).Row - 1, _ Range("A4").End(xlToRight).column - 1)
End() works as long as there is more than one row and column in the range. If, for example, only one row is in the table, the End() method will find the last cell in the worksheet or the next list.
The “One Best Way”
Going through this exercise was interesting, but utlimately, neither VBA, nor complex formulas are required to create dynamic ranges without ANY of the shortcommings of either. We’ll discuss that after another great Excel tip from Yoav Ezer.
PivotChart Drill DownThis seems so basic to me that I was not surprised Googling “Drilldown Excel Chart” shows lots of interest out there. But the implementation is so simple that I was very surprised Googling “Drill Down Excel Chart” yielded almost no good suggestions. That ends today.
[kml_flashembed movie="http://www.youtube.com/v/-Uu2WqDxLdk" width="480" height="292" wmode="transparent" /]
For beginners: What is “Drilldown?”
Drilldown is displaying underlying details for a total. This is important because we hope charts and summaries show something we didn’t know and/or expect. When that happens, we want to know why. Displaying what makes up a total helps answer that question.
Doesn’t Excel Support Drilldown Automatically?
In PivotTables and Outline Reports – yes. You can double click any calculated number in these Excel objects and Excel displays the associated rows from their source data range. But if you double click on a Chart/Graph element, the “Format Data Point” dialog box appears. That’s not what my users want. The good news, though, is the very same mechanism that reveals detail beneath PivotTables makes coding drilldown for PivotCharts a snap.
What are PivotCharts?
A PivotChart is a chart over a PivotTable. In the templates provided in this blog, we use PivotTables to summarize data in our extracted rows. PivotTables are extremely flexible and allow the user to slice and dice data in many, many useful ways. The only draw back to PivotTables is they show numbers, not graphs. This is easily overcome by simply creating a chart over the PivotTable. Charts made from PivotTable data, as opposed to simple rows of data, also allow users to slice and dice the underlying data just like the user can with a PivotTable. The only drawback to the PivotChart is that it lacks Drill Down.
Adding Drilldown to PivotTable Charts
This “trick” only works with PivotCharts because it relies on the PivotTable’s ShowDetail property. As mentioned before, you can double click on any calculated result in a PivotTable and it will automatically show the associated detail rows. If you start the Macro Recorder, double click on a PivotTable cell, stop the recorder, and then view the recorded code, you’ll see something like this:
Range("B9").Select Selection.ShowDetail = True
…where “B9″ is the cell you double clicked. The Selection.ShowDetail = True is what causes the detail to display. Now when you create a PivotChart from a PivotTable, each PivotTable cell becomes a chart element. So what we have to do is figure out which chart element the user double clicked and which PivotTable cell that represents. Then all that’s left to do is use that cell’s ShowDetail property to display the data. As it turns out, this is almost easier done than said.
Determining Which Chart Element was Clicked
Excel provides a simple routine that makes this easy - ActiveChart.GetChartElement. ActiveChart.GetChartElement is a method attached to every chart in Excel. You pass to it the mouse pointer’s X and Y coordinates and it returns the Chart Element Type and two of that Chart Element Type’s properties. Chart Element Types can be the chart’s Title, Legend, Axis, … or a Graph Element. We are only interested in Graph Elements such as a slice in a Pie Chart, a line in a Line Chart, a bar in a Bar Chart, etc. So if ActiveChart.GetChartElement returns anything other than a Chart Element Type of 3 (Graph Element), we know to ignore things and move on. On the other hand, if the user clicked a Graph Element, we want to show the detail. When Chart Element Type is 3, Arg1 is the associated PivotTable row and Arg2 is the column. So to show the detail beneath we use:
ActiveChart.PivotLayout.PivotTable.DataBodyRange. _ Cells(Arg2, Arg1).ShowDetail = True
Capturing Chart Double Click and Mouse Pointer’s X and Y
Every Chart also has a Chart_MouseUp event. Chart_MouseUp fires whenever the user clicks (and releases) the mouse on a chart. Excel also passes a few properties to this event. Two are important to us: X and Y.
We now have all of the pieces to the puzzle. All that’s left to do is put it together. Place this code in the Chart Sheet object:
Private Sub Chart_MouseUp(ByVal Button As Long, ByVal Shift As Long, _ ByVal x As Long, ByVal y As Long)
' Description:Drill Down into Pivot Chart's data
' Parameters: Button Mouse botton that was released ' Shift State of SHIFT, CTRL, and ALT keys ' x Mouse pointer X coordinate within Chart ' y Mouse pointer Y coordinate within Chart
' Example: *none - This is an event handler
' Date Init Modification ' 10/04/10 CWH Initial Programming
On Error GoTo ErrHandler Dim ElementID As Long Dim Arg1 As Long Dim Arg2 As Long ' Pass: x, y. Receive: ElementID, Arg1, Arg2 ActiveChart.GetChartElement x, y, ElementID, Arg1, Arg2 ' If data element clicked, show detail If ElementID = 3 Then ActiveChart.PivotLayout.PivotTable.DataBodyRange. _ Cells(Arg2, Arg1).ShowDetail = True ActiveSheet.Cells(2, 2).Select ActiveWindow.FreezePanes = True End If
ErrHandler: If Err.Number <> 0 Then MsgBox _ "Chart_MouseUp - Error#" & Err.Number & vbCrLf & _ Err.Description, vbCritical, "Error", Err.HelpFile, Err.HelpContext On Error Resume Next On Error GoTo 0 End Sub
Today’s little routine was repeated all throughout Position_Cursor_In_Data (See previous post). It’s a simple little routine with not much to talk about except one little trick:
v = Intersect(ActiveWindow.VisibleRange, Selection)
If you look closely at this routine, the variable “v” is never used. So why is it there? Answer: To cause an error. Purposefully causing an error may sound crazy. Well, there may be a better way, but it’s not crazy. If the newly selected cell happens to be outside the visible window, attempting to intersect the selection with the visible window will fail causing error #91. If that happens, we want to shift the window to display the selection with a call to Position_Window_to_Cursor.
Here is the code.
Function Find_UnLocked_Cell(lRowFrom As Long, lRowTo As Long, _ lColFrom As Long, lColTo As Long, _ lStep As Long) As Boolean
' Description:Find the next unlocked cell
' Parameters: lRowFrom Starting Row ' lRowTo Ending Row ' lColFrom Starting Column ' lColTo Ending Column ' lStep Direction (-1=backward)
' Example: bFound = Find_UnLocked_Cell(Selection.Row, Selection.Row, _ ' Selection.Column + 1, _ Range("Data").Columns.Count, 1)
' Date Init Modification ' 01/12/06 CWH Initial Programming
On Error GoTo ErrHandler ' Find_UnLocked_Cell = False 'Assume the Worst Dim lRow As Long Dim lCol As Long Dim v As Variant For lRow = lRowFrom To lRowTo Step lStep For lCol = lColFrom To lColTo Step lStep If Cells(lRow, lCol).Interior.Color <> CellLocked Then Cells(lRow, lCol).Select Find_UnLocked_Cell = True v = Intersect(ActiveWindow.VisibleRange, Selection) Exit Function End If Next lCol Next lRow
ErrHandler: If Err.Number = 91 Then Position_Window_to_Cursor Selection ElseIf Err.Number <> 0 Then _ MsgBox _ "Find_UnLocked_Cell - Error#" & Err.Number & vbCrLf & Err.Description, _ vbCritical, "Error", Err.HelpFile, Err.HelpContext End If On Error GoTo 0
Function Position_Window_to_Cursor(rngCursor As Range) As Boolean
' Description:Positions the window/pane so the cursor is visible
' Parameters: rngCursor The cursor's cell/range
' Example: bResult = Position_Window_to_Cursor(Selection)
' Date Init Modification ' 12/14/09 CWH Initial Programming
On Error GoTo ErrHandler Position_Window_to_Cursor = Failure 'Assume the Worst Dim iPaneRow As Integer Dim iPaneCol As Integer Dim lRow As Long Dim lCol As Long lRow = rngCursor.Row lCol = rngCursor.Column With ActiveWindow If lRow > .SplitRow + 1 Then iPaneRow = .Panes.Count Else iPaneRow = 1 End If If lCol > .SplitColumn + 1 Then iPaneCol = .Panes.Count Else iPaneCol = 1 End If lRow = rngCursor.Row - _ .Panes(iPaneRow).VisibleRange.Rows.Count + 2 If lRow <= .SplitRow Then lRow = .SplitRow + 1 .Panes(iPaneRow).ScrollRow = lRow lCol = rngCursor.Column - _ .Panes(iPaneCol).VisibleRange.Columns.Count + 2 If lCol <= .SplitColumn Then _ lCol = .SplitColumn + 1 .Panes(iPaneCol).ScrollColumn = lCol End With Position_Window_to_Cursor = Success
ErrHandler: If Err.Number <> 0 Then MsgBox _ "Position_Window_to_Cursor - Error#" & Err.Number & vbCrLf & _ Err.Description, vbCritical, "Error", Err.HelpFile, Err.HelpContext On Error GoTo 0
The last few posts covered Worksheet_Change and Worksheet_SelectionChange events. Both rely on a function called Position_Cursor_In_Data. Position_Cursor_In_Data‘s Job is to jump over ‘locked’ cells and place the cursor in the next ‘unlocked’ cell. This prevents the user from inadvertently changing things that won’t get updated, or in other words, wasting their time.
Since the purpose of the routine is to jump over ‘locked’ cells, the routine first checks to see if the cursor has moved into a ‘locked’ cell. If the cursor is in an ‘unlocked’ cell there is nothing for the routine to do and so, it ends (exits).
Another excuse for the routine to end is if the user has selected a group of cells. This may be a prelude to a copy or paste command. I want to facilitate copy/paste commands so if more than one cell is selected, this routine doesn’t interfere.
There is one last excuse for this routine to end itself and that is if the user used the mouse to navigate to a ‘locked’ cell. If the user really wants to position the cursor in a single ‘locked’ cell, it may also be for the purpose of copy/paste. It could also be because the user really doesn’t know what they’re doing. That’s okay. Remember that If the user tries to change ‘locked’ cells Worksheet_Change will ‘undo’ their change and restore ‘locked’ cell values. Neat huh?
So the user has pressed a key and ended up in a ‘locked’ cell. The key to knowing where to jump to is in knowing which key the user pressed. If they pressed RIGHT, TAB, ENTER, DOWN, or PAGEDOWN the system assumes they want the next ‘unlocked’ cell. If there is no ‘unlocked’ cell to the right, the routine searches below starting in the left most position and looking right for the next ‘unlocked’ cell. If the pressed LEFT, SHIFT TAB, UP, or PAGEUP the system assumes they want the previous ‘unlocked’ cell. In that case, the search moves left, and if need be, up starting in the last cell of the previous line and looking left.
Below is the code for Position_Cursor_In_Data. It relies on Find_UnLocked_Cell. That will be the topic of our next post.
Function Position_Cursor_In_Data(Cell As Range, _ Entries As Range, _ KeyPressed As String) As Boolean
' Description:Call this from Worksheet_SelectionChange to force _ cursor positions inside the entry area
' Parameters: Cell Current cell or range selected by user ' Entries Range to restrict the cursor to ' KeyPressed Last key the user pressed
' Example: bResult = Position_Cursor_In_Data( _ Target, Range("Data"), KeyPressed)
' Abstract: If the cursor is moved to a locked cell via ' keyboard, move the cursor to the next unlocked cell.
' Date Init Modification ' 01/12/06 CWH Initial Programming
On Error GoTo ErrHandler ' Position_Cursor_In_Data = Success 'Assume the Best
' If more than 1 cell is selected, don't do anything If Cell.Rows.Count > 1 Or Cell.Columns.Count > 1 Then _ Exit Function ' If the Cell is unlocked, we're done If Cell.Interior.Color <> CellLocked Then Exit Function 'From last key pressed, determine direction to _ search for an unlocked cell Dim sLocateMethod As String Select Case KeyPressed Case Is = "Up", "PageUp", "Left", "ShiftTab" sLocateMethod = "Previous" Case Is = "Down", "PageDown", "Right", "Tab", "Return" sLocateMethod = "Next" Case Else Exit Function End Select ' End looking for an excuse to leave early Settings "Save" 'Save current application settings Settings "Disable" 'Disable events, screen updates & calc.s Dim lRow As Long Dim lCol As Long Dim bfound As Boolean Dim lRight As Long 'Last allowable column Dim lBottom As Long 'Last allowable row lRight = Entries.Column + Entries.Columns.Count - 1 lBottom = Entries.Row + Entries.Rows.Count If sLocateMethod = "Next" Then 'Search to the right on same row bfound = Find_UnLocked_Cell(Cell.Row, Cell.Row, _ Cell.Column + 1, lRight, 1) 'Search rows below If Not bfound Then _ bfound = Find_UnLocked_Cell(Cell.Row + 1, lBottom, _ 1, lRight, 1) End If 'We're here, either because there's nothing below, _ or we want to check previous 'Search to the left on same row If Not bfound Then If Cell.Column > 1 Then _ bfound = Find_UnLocked_Cell(Cell.Row, Cell.Row, _ Cell.Column - 1, 1, -1) End If 'Search rows above If Not bfound Then If Cell.Row > 1 Then _ bfound = Find_UnLocked_Cell(Cell.Row - 1, 1, _ lRight, 1, -1) End If 'We're here because we looked previous & found nothing or _ there's just nothing here 'Search to the right on same row If Not bfound Then _ bfound = Find_UnLocked_Cell(Cell.Row, Cell.Row, _ Cell.Column + 1, lRight, 1) 'Search rows below If Not bfound Then _ bfound = Find_UnLocked_Cell(Cell.Row + 1, lBottom, _ 1, lRight, 1)
If Not bfound Then Position_Cursor_In_Data = Failure ErrHandler: If Err.Number <> 0 Then MsgBox _ "Position_Cursor_In_Data - Error#" & Err.Number & vbCrLf & _ Err.Description, vbCritical, "Error", Err.HelpFile, _ Err.HelpContext Settings "Restore" 'Restore application settings On Error GoTo 0
This post returns to controlling the cursor in update spreadsheets.
As the user positions the cursor on the spreadsheet, we want to have it jump over ‘locked’ cells to the next ‘unlocked’ cell. The words ‘locked’ and ‘unlocked’ are quoted because we aren’t exactly using Excel’s notion of ‘locked’ and ‘unlocked’ cells. Excel provides the ability to prevent the cursor from entering locked cells when you protect the worksheet. Unfortunately, protecting the worksheet also prevents other things such as copy/paste if the paste range touches locked cells. Excel’s documentation says you can selectively allow some things within a protected worksheet, but my experiments with this have frustrated me and I’ve never gotten it to work satisfactorily (Maybe smarter minds than mine will contribute to the discussion and show us the way).
Without worksheet protection, Excel has no problem letting users do whatever they want to ‘locked’ cells. To work around this, we use the Worksheet_SelectionChange event to monitor cursor movements and call our Position_Curosr_In_Data function to help the user stay in ‘unlocked’/’open for entry’ cells. Position_Curosr_In_Data is also called from the Worksheet_Change event and we will cover it shortly. But for now, let’s look at the Worksheet_SelectionChange event.
Below is the code for the Worksheet_SelectionChange event. Almost all of it is consumed with figuring out which key the user pressed. This is important because we need to know which way to ‘jump’. If the user pressed the right arrow, we want to jump the the first unlocked cell to the right. This same code is in the Worksheet_Change event so I suppose it’s time I explained it.
At the heart of the code is an API called GetAsyncKeyState. GetAsyncKeyState is included in user32.dll. This Windows API tells us what the last key pressed was. Actually, it doesn’t do that. I wish it were that simple. But since groups of keys can be pressed simultaneously, such as the familiar Ctrl-Alt-Delete, the good folks at Microsoft created this API to tell you if a certain key is pressed or not. So if you want to determine if Ctrl, Alt, and Delete were pressed, you have to ask: “Is Ctrl pressed? And if so, is Alt pressed? And if so is Delete also pressed?” If you want more detail on this API, here are some good resources:
- Microsoft Developer’s Network Documentation: GetAsyncKeyState Function
- Chip Person’s: Testing Key States
- Answers.Com: GetAsyncKeyState
To use the API, we have to first declare it. I put this code at the top of modGeneral so it is available to all functions in my project.
'API Classes ' Get Key state Public Declare Function GetAsyncKeyState Lib "user32" _ (ByVal vKey As Long) As Integer
Once declared we can use it as shown in the code below. As you can see, we have to ask GetAsyncKeyState if a certain key was pressed. We pass it the key we want to know about, and it returns a 16 bit number. If the most significant bit is turned on, the key was pressed. &H8000 is the bit mask we use to determine if the most significant bit is on. &H8000 in binary form is 1000000000000000. If you “AND” it with GetAsyncKeyState‘s 16 bit number and the result is TRUE, the most significant bit is on and the key is pressed. Based on which key is pressed, we can determine which way to jump.
Here is the code for the Worksheet_SelectionChange event. It must be placed in the worksheet class. Next post will be on the Position_Curosr_In_Data function.
Private Sub Worksheet_SelectionChange(ByVal Target As Range)
' Purpose: Restrict the user to areas open for update
' Determine the last key pressed Dim sKey As String If GetAsyncKeyState(vbKeyTab) And &H8000 Then If GetAsyncKeyState(vbKeyShift) And &H8000 Then sKey = "ShiftTab" Else sKey = "Tab" End If ElseIf GetAsyncKeyState(vbKeyRight) And &H8000 Then sKey = "Right" ElseIf GetAsyncKeyState(vbKeyLeft) And &H8000 Then sKey = "Left" ElseIf GetAsyncKeyState(vbKeyPageUp) And &H8000 Then sKey = "PageUp" ElseIf GetAsyncKeyState(vbKeyUp) And &H8000 Then sKey = "Up" ElseIf GetAsyncKeyState(vbKeyDown) And &H8000 Then sKey = "Down" ElseIf GetAsyncKeyState(vbKeyPageDown) And &H8000 Then sKey = "PageDown" ElseIf GetAsyncKeyState(vbKeyReturn) And &H8000 Then sKey = "Return" Else sKey = "Mouse" End If Position_Cursor_In_Data Target, Range(sData), sKey
By guest contributor: Yoav Ezer
NOTE: This post provides an example spreadsheet: accelerating-excel.xlsm. Due to concern for your system’s security, macro enabled spreadsheets cannot be stored in this blog. So to accomodate security and free exchange of ideas, we loaded the spreadsheet as a text file with a “txt” extension. To use this example, right click the link, select “Save Target As”, change the extension on the file name from .zip (it’s a text file that ITKnowledgeExchange has compressed for you) to .xlsm, click “Save”, scan your local copy for viruses, then open it.
Do your Excel spreadsheets sometimes take too long to calculate? It may be due to formulas that crunch large sets of data (for example, data that comes from large databases – the focus of this blog). This is because Excel recalculates all formulas that depend on a specific cell every time you change that cell. And if those formulas have dependents, Excel will recalculate them, and their dependents, and so on, and so on.
Consider the formula =SUM(A:A). This adds all cells in column A. It recalculates each time you update any cell in column A. Fortunately, the SUM function is very fast and may not cause significant delay even if used a 1,000 times in your workbook. But more advanced functions, like SUMPRODUCT() and array formulas, are not so efficient.
For instance, the following array formula is pretty simple: =SUM(IF(MOD(A:A,2)=1,A:A,0)). It sums all odd numbers in column A. It is much slower than the SUM function. I’ve used this formula only 12 times on sheet1 (See accelerating-excel.xlsm above) in this workbook and on my machine it takes 5 seconds to add a value to column A, which makes this workbook too slow to use. Fortunately for us, there are ways to make Excel work faster even with advanced formulas.
Strategy #1: Use Limited Ranges
The reason the array formula evaluates so slowly is that it calculates for every cell in column A (that’s more than 1 million cells). One way to make this formula work faster is to limit the range. So instead of using this formula:
We can use this formula specifying only the rows needed:
NOTE: Excel adds curly brackets when you enter a formula using CTRL+SHIFT+ENTER. CTRL+SHIFT+ENTER tells Excel your formula is an array formula. For more information on array formulas and their power, see: Introducing Array Formulas in Excel by Colin Wilcox and John Walkenbach
Because the revised formula is limited to 10,000 rows it works 100x times faster!
Strategy #2: Use Dynamic Ranges
Strategy #1 works as long as you know how many cells contain data. When you don’t know, you can still limit your ranges using a Dynamic Range. Dynamic Ranges expand automatically to include only cells that contain data. You can define a dynamic range called ‘ColumnA’ like this:
And then use it in the original formula in the following manner:
NOTE: See How to Set up a Named Range in Microsoft Excel if you need help with this.
This formula calculates only rows in column A that contain data. That’s good for two reasons: It reduces the number of cells calculated if your range contains fewer cells than anticipated; AND, it calculates cells that might otherwise be overlooked if your range contains more cells than anticipated. For more information on Dynamic Ranges see: Introduction to Dynamic Ranges.
To experience the performance difference between these two methods, open the sample file and update data on the first and second sheet. You’ll see a very palpable difference.
Strategy #3: Stopping/Starting Calculation
At times, even limiting the range used in the formula isn’t enough. One of our clients had a workbook with over 12,000 array formulas and although we used dynamic ranges to limit the range size in each of those formulas, the workbook took over a minute to update with 1,000 data rows. For that client we used the following technique.
The workbook was divided into a data entry sheet and ‘data analysis’ sheets which contained the array formulas. We employed a simple macro to stop the formulas on the workbook from automatically updating every time the user entered the ‘data entry’ sheet and a second macro to calculate all the formulas on the workbook when the user left the ‘data entry’ sheet. This way the user was able to update data very quickly and wait only once (when leaving the sheet). Here is the macro we used whenever the user entered the ‘data entry’ sheet:
Private Sub Worksheet_Activate()
Application.Calculation = xlCalculationManual
And this is the macro we used when the user left the sheet:
Private Sub Worksheet_Deactivate()
Application.Calculate Application.Calculation = xlCalculationAutomatic
You can see how stopping and starting the automatic calculations effects performance in the sample file.
We use Microsoft Excel to improve productivity. We can improve productivity even more by removing unnecessary waits through writing efficient formulas and controlling when Excel does its magic! Look for opportunities to use these techniques to speed results and improve the user experience.
Do you have Excel optimization tips? Please share with us in the comments.
About the author
Yoav Ezer co-authors the technology and productivity blog Codswallop. He is CEO of Cogniview, producer of PDF2XL, a Native PDF to Excel Converter, PDF2XL OCR, a scanned PDF to Excel converter, and PDF2XL Enterprise, a universal format converter to Excel. For more Excel tips from Yoav, join him on: Facebook; or Twitter
Thanks Yoav for those great tips!
Updating databases demands discipline. Excel is about freedom. It’s what your users love about it. Even so, updating databases demands discipline and striking the right balance between freedom and discipline is key to making Excel a great tool for users and DBAs.
We’ve just invested most of this blog discussing ways to insure users don’t get too free with their data. We not only made sure users didn’t enter bad data, but we went the extra mile and provided tools for helping them find what they need, like pop-up windows to search databases for the right User ID, Customer Code, General Ledger Account Number, Inventory Item, etc. We enforced restrictions and balanced that with tools to make getting it right easy.
We also exploited one of Excel’s great features not found in most database entry programs – cut and paste of multiple rows. This helps us pull data from existing sources (such as the web) and easily encorporate it into our systems in the format we need.
Cut and paste is greatly hampered by one of Excel’s methods designed to restrict users from entering data in places they shouldn’t. That method is known as Worksheet Protection. The idea behind Worksheet Protection is solid – prevent users from straying outside the entry area – allow them to change ONLY unprotected cells and nothing else. Unfortunately, when you try to paste a region that overlaps protected cells, Worksheet Protection rejects the entire paste – not just the cells intruding on protected regions. Without using WorkSheet Protection, we overcame that ‘flaw’ in Microsoft’s implementation in the WorkSheet_Change event, but at the price of letting users move anywhere on the Worksheet even to places well outside where entries are supposed to be entered – outside where entries can be checked - outside where values can be updated to the database.
We need to add routines to control how the cursor moves from cell to cell in order to help the user stay in the entry region – to protect them from inadvertantly typing data where their entries will be wasted.
The next posts will deal with this important aspect of database updates.
Last post we discussed the theory behind the WorkSheet_Change event code below. This code must be in the WorkSheet class as it only responds to events for the worksheet that contains it. Here is the code:
Private Sub Worksheet_Change(ByVal Target As Range)
‘ Purpose: Invoke routines to set/check the contents of entry cells
‘ Parameters: Target Range that was changed
‘ Example: None – this is an event handler
On Error GoTo ErrHandler
Dim bResult As Boolean
Settings “Save” ‘Save current application settings
Settings “Disable” ‘Disable events, screen updates, & calcs
‘ Determine the last key pressed
Dim sKey As String
Dim lRow As Long
Dim lCol As Long
If GetAsyncKeyState(vbKeyTab) And &H8000 Then
If GetAsyncKeyState(vbKeyShift) And &H8000 Then
sKey = “ShiftTab”
lCol = lCol – 1
sKey = “Tab”
lCol = lCol + 1
ElseIf GetAsyncKeyState(vbKeyRight) And &H8000 Then
sKey = “Right”
lCol = lCol + 1
ElseIf GetAsyncKeyState(vbKeyLeft) And &H8000 Then
sKey = “Left”
lCol = lCol – 1
ElseIf GetAsyncKeyState(vbKeyPageUp) And &H8000 Then
sKey = “PageUp”
lRow = lRow – 1
ElseIf GetAsyncKeyState(vbKeyUp) And &H8000 Then
sKey = “Up”
lRow = lRow – 1
ElseIf GetAsyncKeyState(vbKeyDown) And &H8000 Then
sKey = “Down”
lRow = lRow + 1
ElseIf GetAsyncKeyState(vbKeyPageDown) And &H8000 Then
sKey = “PageDown”
lRow = lRow + 1
ElseIf GetAsyncKeyState(vbKeyReturn) And &H8000 Then
sKey = “Return”
lRow = lRow + 1
sKey = “Mouse”
‘ Disallow all total column oriented actions
If Target.Rows.Count = Me.Rows.Count Then
MsgBox “You may not paste, delete, or insert columns”, _
vbInformation, “Column changes not allowed”
‘ Allow without checking all total row oriented actions _
that are below the header row. Execute the code below for all else
ElseIf Target.Columns.Count < Me.Columns.Count _
Or Target.Row <= Range(sData).Row Then
‘ Handle pasting ‘locked’ cells
‘ Remember Current Cursor Position
Dim rngSelection As Range
Set rngSelection = Selection
‘ Restrict Target to appropriate range
Set Target = Intersect(Target, _
Rows(Range(sData).Row + 1 & “:” & Me.Rows.Count))
‘ Remember Target
Dim rngCell As Range
Dim colAddress As New Collection
Dim colValue As New Collection
If Not Target Is Nothing Then
For Each rngCell In Target
Application.Undo ‘Undo Changes
‘ Repaste values to unlocked cells as unlocked cells
Dim i As Integer
For i = 1 To colAddress.Count
If Not Range(colAddress.Item(i)).Locked Then _
Range(colAddress.Item(i)) = colValue.Item(i)
rngSelection.Select ‘Restore Selection
If NameExists(sData) Then
bResult = Set_Entry_Defaults(Target, sData, sFields)
If bResult = Success Then Check_Entry Target, sData, sFields
Format_New_Line sData, sFields
If bResult <> Success Then Set Target = Selection
Target.Cells(1 + lRow, 1 + lCol), Range(sData), sKey
If Err.Number <> 0 Then MsgBox _
Me.Name & “.WorkSheet_Change – Error#” & Err.Number & vbCrLf & _
Err.Description, vbCritical, “Error”, Err.HelpFile, Err.HelpContext
On Error Resume Next
Settings “Restore” ‘Restore application settings
On Error GoTo 0
When the user changes something and
When they want to post entries to the database (aka add, change or delete data).
Last post covered the second situation. This covers the first.
Excel provides a rich user interface. That’s why people like it. They can do just about anything. They can move anywhere on the spreadsheet. Select things. Move things. Copy things. Paste things. Delete things. Insert things.
As a developer, we like things controlled. We want to limit what the user can and cannot do. We want to guide them carefully through data entry. We want to check every entry they make. Control is in direct opposition to the freedom Excel offers, the freedom users love. Control is required to prevent data corruption. Striking a balance between control and freedom is tough. It’s taken me a while to find a balance between the two and the heart of that balance is focused in the WorkSheet_Change event.
Here is the basic flow of this routine:
If an entire column is changed (move, paste, insert or delete), the change is thrown out
If an entire row is changed and it is in the target entry area, the change is accepted without checking
If a cell is locked, any change is removed and the original cell values restored
- First: The system attempts to conform the entry to rules implemented in Set_Entry_Defaults. An example is converting text to upper case as dictated by values in the Fields Definition Table under the validation columns
Second: If this is the first entry in a new row, the system attempts to apply default values for every column. An example is automatically putting today’s date in an “ENTERED” column and the user’s ID in an “ENTERED BY” column.
Third: If a single cell is changed and Field Definition Table indicates it is checked by one of the Table Validation rules (XLC, XLT, CUST), the system will check the value using the rule, and if it fails validation, display a Pop-Up validation/selection window to assist the user in selecting something appropriate. An example is a Pop-Up validation/selection window showing valid Country Codes for a Country Code entry.
Fourth: If a code or ID entry has associated values that need to be displayed in the entry row, the system attempts to retrieve those values and display them. An example is the Country Name displayed next to the Country Code.
Fifth: The system performs a final check
Sixth: The system attempts to move the cursor to the next appropriate field or record based on the key used to exit the cell.
And here is a partial list of how the user might change things and how this routine handles each circumstance:
Typing – The most common change is from the user simply typing into a cell, in which case, everything works as it should and nothing special happens.
Copy/Paste - This is another common situation and one where Excel offers a great benefit. Sometimes you have data in another spreadsheet, or in a word document, or on a web page - and if only you could highlight it, copy it, and paste it to a table that would upload to the database, complete with full validation. Wouldn’t that be great! Well – this routine handles that – but in an odd sort of way. The problem is that cells copied from other places are pasted, by default, as locked cells – and locked cells can’t be changed by typing, nor are they validated.
Another problem with copy/paste is that there is nothing preventing users from pasting into areas outside the entry area (I do NOT use worksheet protection to permit proper copy/paste into rows with some locked cells between fields). So when cells are pasted, the system 1) Remembers the pasted cells’ values, 2) Undoes the post, 3) Elminates cells outside the entry area, and 4) carefully pastes values (no formats or lock states) from the pasted cells into only unlocked cells.
This method works great – but adds processing that slows entry. If your PC is slow, or your users simply never use copy/paste, you can speed data entry by eliminating this capability from your spreadsheet.
Inserting rows - This is allowed but not checked. It will, of course, be checked when the entry is posted.
Moving Rows – This is allowed but has no impact on the database. If you want to allow users to ‘sequence lines’, you need to provide a ‘sequence’ field and handle sequencing in code.
Deleting Rows – This is allowed but has no affect on the database. Users should be warned to use the “D” in the ACD column to delete rows in the database.
- Inserting columns - If a user wants more fields, they MUST negotiate with the developer (you) to accomodate everything else that goes with that. If they attempt it, an error is thrown and the insert is removed as though nothing ever happened. That’s the way it should be.
- Deleting columns – This is not allowed. If they attempt it, an error is thrown and the deleted column restored as though nothing ever happened. That’s the way it should be.
- Moving Columns – This is not allowed. If a user wants the column order changed, for whatever reason, they should see you. You can easily change the Field Definitions Table to accomodate such requests. It is also possible to code ‘formats’ to accomodate different users or different ‘line types’ within a record. But that requires YOU to code. If a user attempts to move a column, the column is returned to its proper position. This is as it should be.
Inserting cells – This really makes no sense in a traditional row oriented data entry scenario. If it is attempted, the area that was inserted is blanked out and the displaced cells are returned. If the blanked cells are marked by the Field Definition Table as required, the record(s) will fail validation and the user must set things right before the system will update the database.
This is an opportunity for improvement but my cynical nature is holding me back. If a user is really trying hard to muck things up (or if they are really that poor of judgement), I don’t mind it if the system makes it hard on them to make it right.
Deleting Cells – Once again, the user is attempting something that doesn’t make sense in a traditional row oriented database. And once again, this is an opportunity for improvement.
That’s the theory behind the WorkSheet_Change event code. In our next post, we’ll discuss the code.