Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 19. February 2012 15:37

This post is part of a series providing commentary on the VSTO Stocks project. I initially developed it for theExcel Developers Conference in London, to illustrate some of the benefits or interesting features of VSTO add-ins compared to traditional VBA automation. The add-in is a work in progress, and is by no means production ready, but it is functional; I will update the code and add comments over time. Feel free to ask questions in the comments!

Level: intermediate.

Code version: 39134382823e 

The straightforward way to organize an Add-In solution is to just create a single project of type Excel 2007 Add-In (or whichever Office application / version you may target). So why would I go through the pain of structuring my Solution into four distinct Projects in the VSTO Stocks add-in?

Having a single project is a perfectly valid way to proceed, with things kept simple and tight. However, a drawback of going that route is that it makes it easy to write untestable code, with poor separation of concerns.

Note: in the past, I also ran into testing issues with one single project, because I couldn’t reference the Add-In project itself in my automated tests suite, which was problematic. This problem seems to be gone now.

Specifically, one source of such problems is to directly reference either ThisAddIn or Globals classes to access the Excel object model. They behave more or less as static classes would, which is a testing nightmare. For instance, in order to access the Active worksheet in a method, I could write code like this:

public void DoSomeStuff()
{
   var addIn = Globals.ThisAddIn;
   var excel = addIn.Application;         
   var workbook = excel.ActiveWorkbook;
   var activeSheet = workbook.ActiveSheet;
   // do stuff with the sheet
}

As is, this method is virtually untestable: we do not control the instantiation of Globals, and cannot replace it with fakes to emulate various situations that could arise in the application. What if, for instance, we wanted to test the behavior of that function when no worksheet is active?

That problem would be avoided altogether if instead of accessing Excel through the Add-In, we injected a reference to Excel in the method:

public void DoSomeStuff(Application excel)
{       
   var workbook = excel.ActiveWorkbook;
   var activeSheet = workbook.ActiveSheet;
   // do stuff with the sheet
}

Now we are free to substitute Application with our own version, set it up to emulate whatever scenario we see fit, and test what the behavior of the method should be in that case. I’ll revisit that question later when discussing automated tests, and leave it at that for now. For that matter, note also that we could perfectly well re-use that method anywhere, as there is no dependency on the add-in.

By separating into multiple projects, we can enforce that rule. In our Solution, the add-in acts purely as a bootstrapper: its only role is to kick things off when Excel starts, and initialize the UserInterface project, where the core of the application logic resides.

Note: I may rename the UserInterface project to ExcelApplication later on, the name would be more fitting.

The UserInterface project has a reference to Microsoft.Office.Interop.Excel, but not the Add-In, and therefore cannot use Globals or ThisAddin. In the ThisAddIn_Startup method, the running instance of Excel is retrieved and passed to the TaskPaneBuilder in the UserInterface project. From that point on, the Add-In itself doesn’t do anything: the ExcelController in the UserInterface project holds a reference to Excel, and handles all the interactions with Excel going forward.

So what is the Domain project about, then?

This one may or may not be a luxury item, depending on your project. The intent of the Domain project is to represent the business model you are dealing with; it should have no knowledge of Excel (hence the lack of reference to the Excel interop). For instance, StockHistory or MovingAverage are concepts which exist and can be modeled independently of Excel. Our add-in is currently using Excel as a client, but nothing would preclude developing a web-based version later, for instance. In that case, having a separate library which contains the Domain would prove very valuable, because it could be used as-is: the “only” effort required would be to write a different client / user interface to interact with that Domain model.

The name of the fourth project, UnitTests, should give away its purpose: it hosts the automated tests that verify certain behavior of the application. We’ll revisit it later on.

Of course, you don’t have to follow that organization - I also write add-ins which live in one single project! The right organization for your solution depends on each individual project, their complexity, and how they may grow over time. The structure/ideas I describe here are on the heavy side, but have served me well over the years - I hope they will help you think about your projects as well!

by Mathias 6. February 2012 04:24

This post is part of a series providing commentary on the VSTO Stocks project. I initially developed it for the Excel Developers Conference in London, to illustrate some of the benefits or interesting features of VSTO add-ins compared to traditional VBA automation. The add-in is a work in progress, and is by no means production ready, but it is functional; I will update the code and add comments over time. Feel free to ask questions in the comments!

Level: beginner.

Code version: 346c1bd9394e

One of the key benefits I find in using VSTO for Office automation instead of VBA is that it enables using Source Control tools.

During a development effort, regardless of the technology used, lots of things can go wrong. A code change which initially looked like a great idea progressively degenerates into chaos, something goes awry with a file which becomes irrecoverably corrupt, a hard drive suddenly decides it is time to call it quits – all these happen. When they do, it’s nice to have a safety net, and know that somewhere, safe and warm, a snapshot of the code taken in happier times is waiting and can be restored, giving you a safe point to restart from, with only a few hours of work lost.

What I have often seen done with Excel development goes along these lines: on a regular basis, the developer saves the workbook somewhere “safe” with a time-stamp convention, like “MyWorkbook-2010-12-24.xlsx”.

On the plus side, this is a very lightweight process, which addresses some of the issues. At the same time, it is cumbersome: the developer needs to be diligent, the process is manual and error-prone (messing up the timestamp, or accidentally over-writing archives is very possible), and recovering the right version from a folder that contains multiple versions only identified by a timestamp is impractical.

Developers working in other ecosystems have been facing the same issue, and address it with specialized tools: source control systems. In a nutshell, the idea of source control is to operate like a library: the source code is stored in a “vault” (known as the Repository), developers check out a local copy of the current version on their machine, edit it, and check in/commit the modified code back into the vault if they are happy with the result.

Put differently, whatever code is currently being modified on the developer’s machine is “scratch paper”; it become “real” only once it is committed.

There are a few obvious benefits. The entire history of the project is saved for posterity, and its state at any point in time can be instantly restored. The system generates timestamps automatically, and each commit has comments attached to it, which helps navigation between versions. This encourages experimenting with code ideas: check out the code, spike something – if it works, great, if not, discard it and revert to the previous repository version.

[pic of repo on CodePlex]

image

Overview of the project history with Mercurial + Tortoise on Windows

More interestingly, version control systems typically store the difference between the current version and the previous one, and not the file itself. Besides keeping the size of the repository minimal, it also allows to produce “diffs”, i.e. code differences: the source control system can easily produce a view that highlights all the differences between two versions of the code, which is invaluable.

image

The “Diff” highlights what has been added or removed between versions.

image

Diff view of the changes to a specific file on CodePlex

Why hasn’t the traditional Excel developer community embraced source control?

The main reason, in my opinion, is that Source Control systems are at their best when dealing with text files. While this is the case for most development platforms, Excel is peculiar in that aspect: the code is embedded in the Workbook, in multiple forms (Excel formulas, code-behind worksheets, macro modules…), and the overall project isn’t a collection of text files containing code. Up to version 2003, workbooks were saved as a proprietary binary file, which couldn’t be used to produce meaningful "differences” – and the Open XML format adopted since version 2007 still isn’t very practical for differentiation purposes.

By contrast, a VSTO add-in like VSTO Stocks consists entirely of .NET code, which is ultimately a collection of text files – there is nothing attached to a specific Workbook. As a result, it is a perfect fit for Source Control, with automatic archival of successive versions, and highly detailed “audit” of changes between versions.

Note that nothing prevents using Source Control tools for “classic” Excel development – I do it all the time, even when working with Excel 2003 workbooks. You won’t get the full benefits of source control (no diffs), but you will still get a history of all the code changes in the project. Also, if you tend to re-use VBA utilities like UDFs, .bas files are perfect candidates for source control: store the utilities .bas files in a repository, and import them in workbooks when you need them.

How to get started with Souce Control?

A nice thing about Source Control tools is that some of the best and most widely used systems are open source and totally free. The two systems I use are Subversion (for the past 7 years or so) and Mercurial since a few months – they are both great. The other name that comes up a lot is Git, which as far as I know is very similar to Mercurial.

The main difference is that Subversion has a centralized model (there is a central “source of truth” where the official code resides, and where all changes get committed), whereas Mercurial is a distributed system (developers can work with the full  benefits of version control even disconnected from the “central”, and can merge their work with any other clone of the repository). Both are worth looking into, and offer different advantages. There are plenty of discussions comparing the two approaches, so I won’t go further into it.

On the other hand, regardless of what system you pick, I recommend installing Tortoise (TortoiseHg for Mercurial, TortoiseSVN for Subversion); it’s an extension which integrates source control with Windows, so that you can manage your repositories directly via the graphical user interface. It’s a great way to start without having to struggle with arcane command-line tools.

TortoiseHg in action

Tortoise integrates your Version Control system right into Windows.

by Mathias 14. January 2012 14:16

I am putting together a demo VSTO add-in for my talk at the Excel Developer Conference. I wanted to play with charts a bit, and given that I am working off a .NET model, I figured it would be interesting to produce charts directly from the data, bypassing the step of putting data in a worksheet altogether.

In order to do that, we simply need to create a Chart in a workbook, add a Series to the SeriesCollection of the chart, and directly set the Series Values and XValues as an array, along these lines:

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Tab Name");

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = new double[] {1d, 3d, 2d, 5d};
series.XValues = new string[] {"A", "B", "C", "D"};
series.Name = "Series Name";

This will create a simple Line chart in its own sheet – without any reference to a worksheet data range.

Now why would I be interested in this approach, when it’s so convenient to create a chart from data that is already in Excel?

Suppose for a moment that you are dealing with the market activity on a stock, which you can retrieve from an external data source as a collection of StockActivity .NET objects:

public class StockActivity
{
   public DateTime Day { get; set; }
   public decimal Open { get; set; }
   public decimal Close { get; set; }
}

In this case, extracting the array for the X and Y values would be a trivial matter, making it very easy to produce a chart of, say, the Close values over time:

// Create a few fake datapoints
var day1 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 1), 
                 Open = 100m, 
                 Close = 110m
              };
var day2 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 2), 
                 Open = 110m, 
                 Close = 130m
              };
var day3 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 3), 
                 Open = 130m, 
                 Close = 105m
              };
var history = new List<StockActivity>() { day1, day2, day3 };

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Stock Chart);

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = history.Select(it => (double)it.Close).ToArray();
series.XValues = history.Select(it => it.Day).ToArray();
series.Name = "Stock";

Using LINQ, we Select from the list the values we are interested in, and pass them into an array, ready for consumption into a chart, and boom! We are done.

If what you need to do is explore data and produce charts to figure out potentially interesting relationships, this type of approach isn’t very useful. On the other hand, if your problem is to produce on a regular basis the same set of charts, using data coming from an external data source, this is a very interesting option!

by Mathias 27. November 2011 07:59

Earlier this month, on Simon Murphy’s blog, the project to organize a UK Excel Developer conference 2012 in London quickly took shape. As of now, the planned date is in the January 24 to 26 range, with a pretty cool agenda, covering most of the recent developments Excel developers should know about. I am looking forward to it!

I’ll be talking for about an hour on VSTO (Visual Studio Tools for Office). VSTO can be both awesome and painful: I plan on demoing building an Excel add-in from the ground up, illustrating some of the benefits and drawbacks/pitfalls of that approach, so that you know when it’s the right time to go that route.

If there are any topics or questions you are specifically interested in, please let me know in the comments – I’ll be happy to take requests.

As an aside, I usually talk at .NET developers events, where I need to convince the audience that developing for Excel (or Office) isn’t a terrible idea. This will be my first time with an Excel developers audience, and I expect the opposite challenge, namely why bother with C#/VB.NET and Visual Studio when VBA is free and works just fine?

Check out the Excel Conference page and Simon’s blog for updates.

London-calling

Picture from the amazing series “The Kitten Covers

by Mathias 2. August 2011 16:06

When working with Excel, it is common to use small optimization tricks, like setting Excel.ScreenUpdating to false to avoid un-necessary screen refreshes, during lengthy operations – something along these lines:

public void DoJob(Worksheet  worksheet)
{
   var excel = worksheet.Application;

   var initialScreenUpdating = excel.ScreenUpdating;
   var initialCursor = excel.Cursor;

   excel.ScreenUpdating = false;
   excel.Cursor = XlMousePointer.xlWait;

   // do stuff

   excel.ScreenUpdating = initialScreenUpdating;
   excel.Cursor = initialCursor;
}

This is a good outline of what we would like to happen, but as is, this code has limitations. We would like to be certain that whenever we capture the state, the final two lines, which reset the state to what it was originally, are executed.

More...

Comments

Comment RSS