Great Power, Great Responsibility
Microsoft Excel is undoubtedly an extremely powerful, compelling application and has often been described as a killer app for the Windows platform. It is no wonder as the spreadsheet paradigm is certainly the original killer app, and built on top of this foundation is a powerful array of functionality to extend and customize the Excel calculation engine. However, with great power comes great responsibility. An unfortunate, but common, characteristic of Excel’s vast customer base is complex proprietary systems, built up and modified over many years, that are a constant source of anxiety for those unfortunate souls who are tasked with their maintenance.
So what is it about Excel that nurtures these sometimes horrific, ethereal and interdependent creations? What is it that compromises the robustness and recoverability of these solutions?
Excel is often used to consume information, to model complex systems with that information, and then to publish that information to some kind of external repository, where it can be viewed or consumed by other systems.
To import data into Excel we could use:
DDE links
Long since deprecated, nevertheless still supported and often used to stream realtime updates into Excel. The DDE technology is based on data attached to Windows messages and passed between processes, with a clearly defined protocol for requests and responses. Excel acts as a DDEclient when the appropriate links are inserted into Excel, the links themselves rely on the DDE server for flow control and do not support parameter references inside the sheet. Microsoft describes DDE as “not designed for getting real-time data into Excel in a robust and high-performance way” yet it is widely used and DDE servers for proprietry data sources are very common.
RTD links
RTD was developed by Microsoft as a replacement for DDE, a solution that would “overcome the performance and robustness issues with DDE”. This technology is based on an RTD Server object (a proprietry COM DLL) instantiated in process by Excel. The RTD Server object can notify Excel when it has updates available and Excel applies flow control and aggregation on those updates according to global settings. RTD links in Excel support parameter references and can be wrapped with VBA functions to create a consistent namespace for an Excel addin.
Custom Solutions
Then we have proprietary data source solutions based on a spectacular variety of ingenious custom solutions that take advantage of VBA, automation etc…
Cold Excel Spaghetti
Already, we have a fair degree of complexity getting data into the Excel environment. This pales in comparison with the casually created, and impossibly complex, networks of dependant Excel cells, chains of references, inline formulas that concatenate, manipulate, obfuscate data spread across sheets and workbooks far and wide. Anyone who has tried to analyse a spreadsheet created many years ago, by that bearded guy who had the Star Wars theme chess set on his desk and left the company to write spaceflight simulation software for NASA, will know what I mean.
This complexity is again compounded by the limitless variations for exporting or publishing the results of an Excel based system. This ranges from text files, financial services publishing systems, databases or indeed automating Excel to generate additional spreadsheets.
These kind of spreadsheet systems can have any number of performance, reliability and recoverability problems, and often require constant monitoring and maintenance to ensure continuing service for organization critical systems.
Performance
The ongoing increase in market data update rates is a well known phenomenon, and it’s implications for Excel based applications can be dire. Excel applications built without due consideration for update rate flow control, and the implications of the Excel dependency engine, will not scale, and therefore fail at critical moments. Realtime data sources for Excel applications (DDE, RTD and custom) must be aware of throttling for update rates for these applications to continue to function as traffic increases. Systems that snap data from external sources based on Excel dependencies must be aware of the implicit volatility of that dependancy engine. These type of systems need to be carefully driven by other throttled polling mechanisms (timed RTD updates), rather than the dependancy engine.
Reliability and Recoverability
The variety of external data sources that is available to applications built in the Excel environment necessarily results in a great variation in the quality of service these data sources provide. That quality of service is transfered directly into the Excel application. Data sources that are unaware of performance implications can drive Excel to its process bound knees. Workbook dependencies on network drives will predictably fail at inopportune moments and bring critical systems grinding to a halt. Excel flexibility means that developers in the Excel environment need to exhibit constant vigilance when tempted by accessing data resources directly, without the protection of fault tolerance and recoverability systems.
Absolute power corrupts absolutely?
I hope not. It is undeniable that Excel is a powerful and vital tool for any business. I certainly don’t suggest that the flexibility that Excel delivers should be curtailed in any way. Instead, I think that the challenge, as a programmer, is to build frameworks for these Excel-based applications that facilitate creative development, whilst protecting the applications from Excel’s inherent pitfalls.
