Make your own web scraper apps!
A web scraper copies data displayed by web pages into, in this case, Excel. WWWHelper.xlsm is an XL workbook that creates other web scraper apps without coding. WWWHelper.xlsm can be used readily by normal Excel users but it is intended to teach VBA Developers about how to integrate the web with Excel.
Many people don’t realize getting web tables into XL has been simple enough with XL’s Get External Data option and now with PowerQuery. As long as the web data is in an HTML table accessible from a static URL, and does not need to merge with other data, XL’s native tools work really well. When web data does not meet these criteria, this tool works better and easier.
My customer requests are increasing for web data in XL. The data often resides in multiple web pages and may require populating and submitting web forms to get the needed data. I use this tool to automate web scraping and overcome XL’s native limitations. Example uses include:
- Stock values
- Exchange Rates
- CPU Benchmarks
- Applicant data from career sites
- Government carrier safety statistics
- Phone number telecoms for texting from Outlook
- CRM data from social media sites
- Fantasy football player statistics
- And more…
To meet customer demand I created a tool to simplify scraping web elements into XL’s tables, or ranges. Simplifying this task is critical because there are no guarantees that methods used to pull data from a particular website today will work tomorrow. We are completely at the website designers’ whim. The importance of being able to maintain web scraping apps quickly and easily cannot be overstated.
While this tool uses VBA, we don’t need to know VBA to use it. This tool uses normal XL tables to hold paths that map where our web data is and how to get it into XL. Thus, the only skill we need is the ability to enter data into normal XL tables.
For those with VBA skills, this tool provides a wealth of code and examples that make creating web scrapers quick and easy. Included are functions to populate and submit web forms opening up websites that require sign-ons or input parameters for selecting data – such as a Google search.
ITKnowledgeExchange readers can get a copy of this workbook by emailing me at: Craig_Hatmaker At Yahoo Dot Com
My users wanted an easy way to get a current phone directory. The previous method had the receptionist type up new lists, print them and distribute them. Not all of the old printed lists were trashed resulting in multiple versions of the directory scattered throughout the business. No one knew which was current, or even if a current listing was available.
Since our new IP Phone system is integrated with Active Directory, I created a simple Excel workbook that queries LDAP and lists all active users with phone extensions. Now, whenever anyone wants a current directory, they just open the Excel app and click a button. If my users want to save trees, they can chose to not print it. They can sort it, filter it, and re-arrange it. Below is the code.
This LDAP query uses ADO and LDAP’s very restricted form of SQL. If you know SQL, you can read and understand the SQL String below.
NOTE! This routine uses BXL Error Handling described here: https://dl.dropbox.com/u/13737137/Starters/Error_Handling.pdf
Public Sub GetUsers()
' Description:Get User names for a Domain ' Inputs: *None ' Outputs: *None (tblUsers) ' Requisites: *None ' Example: GetUsers ' Notes: https://msdn.microsoft.com/en-us/library/ms684291(v=vs.85).aspx
' Date Ini Modification ' 12/30/13 CWH Initial Programming
' Declarations Const cRoutine As String = "GetUsers" Dim oRootDSE As Object 'RootDSE Dim sDN As String 'Domain Name Dim oCN As Object 'ADO Connection Dim oRS As Object 'ADO RecordSet Dim sSQL As String 'SQL Request String Dim n As Long 'Generic Counter Const sTable As String = "tblUsers"
' Error Handling Initialization On Error GoTo ErrHandler ' Initialize Variables ' Determine DNS domain name. Set oRootDSE = GetObject("LDAP://RootDSE") sDN = oRootDSE.Get("defaultNamingContext") ' Procedure ' Delete table if it exists If Not IsError(Evaluate(sTable)) Then With Evaluate(sTable) If .ListObject Is Nothing Then _ .Delete Else _ .ListObject.Delete End With End If ' Get Data sSQL = "SELECT ipPhone, sn, givenName " & _ "FROM 'LDAP://" & sDN & "' " & _ "WHERE objectClass='Person' " & _ " And proxyAddresses='*' " & _ " And userAccountControl <> 514 " & _ " And userAccountControl <> 546 " & _ " And ipPhone > 0 " Set oCN = CreateObject("ADODB.Connection") Set oRS = CreateObject("ADODB.Recordset") oCN.Provider = "ADsDSOObject" oCN.Open "Active Directory Provider" oRS.Open sSQL, oCN With wksUsers .Activate n = .UsedRange.SpecialCells(xlCellTypeLastCell).row + 2 With .ListObjects.Add(SourceType:=xlSrcQuery, _ Source:=oRS, _ Destination:=.Cells(n, 1)) .QueryTable.Refresh .Name = sTable ' Format Table .ListColumns(1).Name = "First Name" .ListColumns(2).Name = "Last Name" .ListColumns(3).Name = "Extension" .HeaderRowRange.Style = "Accent3" .Range.EntireColumn.AutoFit ' Sort Table .Sort.SortFields.Add .ListColumns(1).Range(1), _ Excel.XlSortOn.xlSortOnValues, _ Excel.XlSortOrder.xlAscending .Sort.Apply ' Freeze Panes ActiveWindow.FreezePanes = False .HeaderRowRange.Cells(2, 1).Select ActiveWindow.FreezePanes = True End With End With
ErrHandler: Select Case Err.Number Case Is = NoError: 'Do nothing Case Else: Select Case DspErrMsg(cModule & "." & cRoutine) Case Is = vbAbort: Stop: Resume 'Debug mode - Trace Case Is = vbRetry: Resume 'Try again Case Is = vbIgnore: 'End routine End Select End Select
We don’t need MS Access to create MS Access databases. Excel’s VBA can create them. Here is sample code:
Dim oCatalog As Object
Set oCatalog = CreateObject("ADOX.Catalog")
oCatalog.Create "provider='Microsoft.ACE.OLEDB.12.0';" & _
Run this code and a new MS Access database called NewDB.accdb will be created, ready for new tables and new data.
Line one declares a generic object variable oCatalog .
Line two uses the CreateObject method to turn our oCatalog object into the specified class: “ADOX.Catalog”
NOTE! This is an example of “Late Binding”. Late binding allows us to create objects without adding reference libraries. We lose Intellesense but or code is more portable and robust. See reference link for more on late binding.
NOTE! I’m using ADOX, not ADODB. ADOX is an extension to ADO that supports creating databases.
Line Three executes the catalog’s create method using an appropriate connection string
Create Object: https://msdn.microsoft.com/en-us/library/office/aa220083(v=office.11).aspx
Late Binding: https://sites.google.com/site/beyondexcel/project-updates/untitledpost
Microsoft.ACE.OLEDB Download: http://www.microsoft.com/en-us/download/details.aspx?id=13255
Create Catalog Method: https://msdn.microsoft.com/en-us/library/windows/desktop/ms676498(v=vs.85).aspx
Connection Strings: https://www.connectionstrings.com/ace-oledb-12-0/
|Drilldown PivotCharts||Add double click drilldown to PivotCharts|
|Drilldown Results Management||Use just one worksheet for all drilldown requests to end drilldown clutter|
|Embedded Chart Expand/Restore||Expand small embedded PivotCharts to fullscreen and restore them to their original position by double clicking|
Executive Dashboards typically have several small graphs on a single page. These graphs are intended to provide a quick overview of a company’s key performance indicators (KPIs). Of course, when executives see something that doesn’t look right, they immediately want a closer look to confirm the graph is really displaying something odd.
The Big Picture
With this add-in, double click a chart to display it fullscreen and get a closer look. Once the odd condition is verified, executives want to know why their KPI is off. To answer that question we need a dive deeper into supporting data.
With this add-in, double click a graph’s element/series to display supporting data. Oddly enough, XL’s pivottables support drilldown but XL’s charts don’t. They should. This add-in adds in what Microsoft left out.
Keep it Clean
While XL’s PivotTables support drilldown each drilldown request creates a new results worksheet. The more drilldown requests we make, the more clutter we create. With this add-in, results are directed to a single drilldown worksheet. Make as many requests as you like. This add-in keeps things tidy.
Like all BXL add-ins, Drilldown is free and the code unprotected. My purpose is to raise awareness within the developer community as to the power of Excel and VBA, and to help raise VBA developer skills. So use this to up your skills, or just use it. Get it here (see link below).
|Download the PDF at: https://dl.dropboxusercontent.com/u/13737137/Projects/Drilldown/Drilldown.pdf|
Excel’s grid is perfect for some types of data entry, but in its startup mode, Excel is too wide open. For data entry we need to restrict entries to specific cells and specific values. Excel’s answer is Worksheet Protection and Data Validation. For me, Excel’s Worksheet Protection falls short.
Problem #1: Restricting Entries to Specific Cells.
Besides being cumbersome to setup, Excel’s Worksheet Protection prevents inserting or adding table rows. Excel’s grid is perfect for repeating rows of data which are required by Journal Entries, Time Sheets, Rate Tables and more. Preventing table row adds cripples Excel’s ability to work with these data sets.
Problem #2: Disabling Features
Excel’s Worksheet Protection also disables some of Excel’s wonderful features such as table sorting and filtering.
Problem #3: Data Validation and Conditional Formatting
Excel’s Worksheet Protection does nothing to prevent copy/paste operations from wiping out Data Validation and Conditional Formatting. Both of these features are absolutely critical to data entry. Good data entry apps MUST restrict entries to valid data and MUST convey errors which is normally done by text and highlighting cells in red or yellow.
This free add-in was designed to address these problems.
Easy Setup – Just format data entry cells with the INPUT style. INPUT, BAD, GOOD and NEUTRAL are all considered input styles. All other styles, including NORMAL are protected.
Table Rows – Cursor Control add’s rows to the bottom of tables when the user selects the last row. It also allows inserting table rows.
Sorting and Filtering – Cursor Control preserves almost all of Excel’s features such as sorting and filtering
Data Validation – Cursor Control prevents copy/paste operations (or autofill) from wiping out data validation.
Cursor Control is available as a free Excel add-in: Cusor.xlam. As an add-in it can be applied to any worksheet without VBA. If you prefer tighter control, Cusor.xlam’s source code is unprotected permitting its code to be copied directly into other projects. This PDF explains how to: download Cursor Control;; add it as an add-in; use it; and directly incorporate its code it into any project.
Download the PDF at: https://dl.dropboxusercontent.com/u/13737137/Projects/Input/Cursor.pdf
Add a ribbon like interface to your Excel project quickly and easily with this free add-in. And for developers, source code is included and open to inspect or modify.
What is the Dynamic Toolbar?
The Dynamic Toolbar is a floating UserForm configured by entries in an XL table. It looks and acts very much like the MS Office Ribbon interface but requires a fraction of the time, effort, and skill to produce. If you need a Ribbon Interface your choices are Dynamic Toolbar or:
Paste Buttons on Worksheets
From Developer tab find Controls group and use Insert to paste a controls to the worksheet, add text, set some properties and assign a macro. It’s easy but takes up worksheet space and limits our user interface to one worksheet or we must replicate all controls to other worksheets. It looks less professional.
The Office Custom UI Editor
This free standalone application modifies any MS Office Ribbon. It uses existing MS Office icons. It looks very professional but requires ribbon configuration using XML in a separate application. We cannot use it to modify the Ribbon from within XL.
AJP’s RibbonX Visual Designer
Andy Pope’s add-in is free and provides a graphical editor. It is a welcome step up from editing XML directly.
Spreadsheet1’s add-in is free for trial use.It does everything The Office Custom UI Editor does with VBA instead of XML and from within XL so our interface can change while our XL apps are running. This is fantastic if you need the power and if you have the skills.
Dynamic Toolbar adds a customizable floating Ribbon-like interface for XL applications like PapaGantt’s toolbar shown above. Dynamic Toolbar is by far the easiest to learn and use because we configure it using normal Excel tables instead of XML or VBA; and because it requires only a handful of properties. It also shares Ribbon Commander’s ability to change the toolbar while our app is running (although, Dynamic Toolbar only works with Excel where as Ribbon Commander works with all MS Office apps).
Dynamic Toolbar cannot do all the MS Office Ribbon does and cannot tap directly into MS Office icons, but it does enough of what the MS Office Ribbon does; is easy to add custom icons to, and is instantly recognizable so users know how to use it. Best of all, it can be assembled with a fraction of the time, effort and skill of alternatives. It also happens to be a great way to learn about adding, placing and manipulating all kinds of controls on a UserForm through VBA.
Start by downloading the user’s guide (pdf).
Learn to convert cross tab data into tabular data to which we can apply PivotTables,
Filters, and Sorts.
A Common Problem
The bottom table at right is an example Cross tab table. It looks great and is easy to add data to when small. But even when small, try summingTerritory sales from it.
If we are good with formulas, we rate this a trivial task until we consider, “what happens when we add a month?” Suddenly traditional formulas start to get a little messy. And what happens when we finish the year? Will we extend the table and re-label all columns to include year and month? Or will we create a new worksheet for the next year and start this table all over again? Hmmm… what would a total formula across multiple worksheets look like then?
Again, those of us with formula skills know this is possible. Good for us. But why go through all that effort when PivotTables sum data without formulas? No matter how quick we are with formulas, we can create PivotTables over tabular data in a fraction of the time with superior results. PivotTables provide slicers, drilldown, PivotCharts and a whole lot more.
The benefits of well-structured data far outweigh the aesthetics of cross tab formats. That does not mean we have to abandon aesthetics when aesthetics count because we can present tabular data in PivotTables easily.
The UnPivot add-in makes converting cross tab data to PivotTable friendly, Auto-Filter friendly, Dynamic Form friendly, formula friendly, database friendly tabular formats quick and effortless. Plus, UnPivot provides a bonus feature that allows us to use cross tabs as input forms and append data entered into these forms to our master table so we can perform XL miracles over tabular data AND have the ease of entry associated with cross tab data..
Here’s how to get UnPivot. And yes, it’s free and the code totally unprotected so those new to VBA can learn from it or modify it as needed.
|Download the PDF at: https://dl.dropboxusercontent.com/u/13737137/Projects/UnPivot/UnPivot.pdf|
Want a data entry form that creates itself?
Forms make presenting and changing information in remote tables possible. Dynamic Forms make creating those forms effortless. Read on to learn how to create forms that design themselves to accommodate any XL table complete with data validation. The user just selects a row, presses SHIFT-CTRL-D, and the form creates itself instantly.
Why I Did This
I created an app to design my kitchen. The app uses XL’s shapes to draw floor plans and cabinet positions. Each shape had to be precisely drawn and positioned which required entering dimensions into a table. But having the table on the drawing’s worksheet was clumsy and ugly. I needed to work with shapes on one worksheet and enter dimension in a table on a different worksheet at the same time. That requires a form.
Options 1 – Create UserForm Manually
I considered creating a form the old fashion way:
- manually adding labels and text boxes
- setting each control’s properties
- coding routines to move data between the table and the controls
- coding validation routines
For the particular table I was working with, that meant 21 labels, 20 text boxes, 1 combo box, and 2 command buttons. That seemed like a lot of duplicate work considering the table had the data validation already.
Option 2 – Use XL’s List Form
Since 2003, XL has had an option to create dynamic forms over lists using a single line of code (see: Add Forms to Edit Tables). Unfortunately XL’s dynamic form has two shortcomings: it only works with the active worksheet and it can’t start on a specified table row.
Option 3 – Create a Dynamic Form
I wanted a method that:
- created the form for me using the table’s data validation, column headings, and cell protection
- worked over tables not necessarily on the same worksheet
- worked only on rows I specified.
Solution – frmData
frmData is a blank user form with:
- OK and Exit command buttons
- a textbox for user messages
- code to add all other controls based on the table sent to it.
frmData is really all I needed but I wanted more.
Enhancement #1 – clsInpMsg
I also wanted the data validation’s input message displayed when editing a table’s cell. For that I created class clsInpMsg. clsInpMsg responds to labels, textboxes, and combo boxes being selected or the mouse hovering over them and puts the associated cell’s data validation input message in frmData’s user messages textbox.
Enhancement #2 – clsForm
I also wanted the form to be a little more dynamic and fun than the standard drab gray user form. So I included class clsForm. clsForm adds a worksheet’s theme colors and “glowing” effects for command buttons when the mouse hovers over them. If you want normal, just remove two lines from frmData and (optionally) remove clsForm.
DynamicForm.xlam is an Add-In with all forms and classes assembled together for easy workbook integration.
AddInDemo.xls demonstrates the DynamicForm Add-in and shows how to dynamically install/uninstall add-ins when projects open/close.
Want it? Get it.
Here is a PDF explaining everything: DynamicForm.PDF
Introducing a new approach to Gantt Charts in XL – BabyGantt.xls
This type creates a stacked bar chart with the first series being the duration from project start to task start. This series’ fill color is set to “No Fill” so it appears that the second series – the actual task – is floating by itself. It is a clever and effective means to small Gantts – about 30 or less tasks. It can leverage XL2007 and later’s improved aesthetics.
This type uses spreadsheet cells and conditional formatting to make colored cells into Gantt Chart bars. This method can accommodate many more tasks but also creates a lot of formulas and calculations.
BXL’s Baby Gantt uses shapes and VBA. It has the advantage of Conditional Formatting’s “many tasks” without the formula overhead. It also shares the Bar Chart’s improved graphics capabilities. But best of all, there’s no setup – at least – not for the chart.
All Gantt Charts require tasks and dates. But that’s all Baby Gantt requires us to enter. We don’t need to add and configure charts or enter and autofill formulas across massive ranges. The VBA in Baby Gantt reads our task entries and generates the graphics on the fly. And because it is Excel, we can use formulas to calculate task start and end dates.
BabyGantt.xls can be downloaded from my DropBox. Here is a link to the instructions:
The instructions include:
User Instructions – Step by Step guides on how to enter your project with examples
Download Instructions – Links to a fully functioning, totally free, unprotected BabyGantt.xls
Technical Documentation – All of the code with explanations and all data validation entries with screenshots.
Getting data from IBM’s iSeries into XL is normally no different than other platforms. There are exceptions. The iSeries’ multi-member files, such as source files, are problematic for SQL. So are data areas. This post deals with simple Data Areas.
To read simple Data Areas from SQL we need a little help from IBM’s iSeries. We need a “stored procedure” which we must create, which means we must program on the iSeries. This post assumes you have access to your iSeries via a “green screen”. If you don’t know what I’m talking about, bale now because this won’t make any sense to you.
I’m going to use an example to help explain things. In this example imagine we have a data area containing a numeric ID. The steps to create this stored procedure are:
- Code CL source to pass parameters and retrive data area’s contents
- Create a CL Module from the code source
- Create a Service Program from the CL Module.
- Code an SQL Script to create a stored procdure using the Service Program
- Run the SQL script
The first step is to write the CL source. Let’s put it in member GetMyData in QGPL/QCLSRC. Here is the example:
PGM PARM(&ID) DCL VAR(&ID) TYPE(*DEC) LEN(10 0) RTVDTAARA DTAARA(MYDATA) RTNVAR(&ID) ENDPGM
The next step is to create the CL Module. At a command prompt type:
CRTCLMOD MODULE(QGPL/GetMyData) SRCFILE(QGPL/QCLSRC)
After the above executes succesfully, type this command:
CRTSRVPGM SRVPGM(QGPL/GetMyData) MODULE(QGPL/GetMyData) EXPORT(*ALL)
Next we write our SQL Script. Create source member GetMyData in QGPL/QDDSSRC:
Create Function GetMyData() Returns DECIMAL(10, 0) Language CL Specific GetMyData Deterministic No SQL Returns Null on Null Input No External Action Not Fenced External Name 'QGPL/GetMyData(GetMyData)' Parameter Style SQL;
Lastly, we run the SQL script from a command line:
RUNSQLSTM SRCFILE(QGPL/QDDSSRC) SRCMBR(GetMyData)
Now all that is left to do is test it. Using command STRSQL to start an SQL session, run this SQL statement:
Select GetMyData() From SYSIBM/SYSDUMMY1
SYSIBM/SYSDUMMY1 is a special one record file we can use for this sort of thing.
As you can see, there’s not a lot of work involved once you know how. And if you look carefully, you can see many possibilities for doing things far beyond reading simple data areas!
- IBM iSeries Redbooks: http://www.redbooks.ibm.com/portals/systemi
- A Fast Path to AS/400 Client/Server Using AS/400 OLE DB Support: http://publib-b.boulder.ibm.com/abstracts/sg245183.html?Open
- Stored Procedures, Triggers, and User-Defined Functions on DB2: http://www.redbooks.ibm.com/redbooks/pdfs/sg246503.pdf