Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 19. February 2012 15:37

This post is part of a series providing commentary on the VSTO Stocks project. I initially developed it for theExcel Developers Conference in London, to illustrate some of the benefits or interesting features of VSTO add-ins compared to traditional VBA automation. The add-in is a work in progress, and is by no means production ready, but it is functional; I will update the code and add comments over time. Feel free to ask questions in the comments!

Level: intermediate.

Code version: 39134382823e 

The straightforward way to organize an Add-In solution is to just create a single project of type Excel 2007 Add-In (or whichever Office application / version you may target). So why would I go through the pain of structuring my Solution into four distinct Projects in the VSTO Stocks add-in?

Having a single project is a perfectly valid way to proceed, with things kept simple and tight. However, a drawback of going that route is that it makes it easy to write untestable code, with poor separation of concerns.

Note: in the past, I also ran into testing issues with one single project, because I couldn’t reference the Add-In project itself in my automated tests suite, which was problematic. This problem seems to be gone now.

Specifically, one source of such problems is to directly reference either ThisAddIn or Globals classes to access the Excel object model. They behave more or less as static classes would, which is a testing nightmare. For instance, in order to access the Active worksheet in a method, I could write code like this:

public void DoSomeStuff()
{
   var addIn = Globals.ThisAddIn;
   var excel = addIn.Application;         
   var workbook = excel.ActiveWorkbook;
   var activeSheet = workbook.ActiveSheet;
   // do stuff with the sheet
}

As is, this method is virtually untestable: we do not control the instantiation of Globals, and cannot replace it with fakes to emulate various situations that could arise in the application. What if, for instance, we wanted to test the behavior of that function when no worksheet is active?

That problem would be avoided altogether if instead of accessing Excel through the Add-In, we injected a reference to Excel in the method:

public void DoSomeStuff(Application excel)
{       
   var workbook = excel.ActiveWorkbook;
   var activeSheet = workbook.ActiveSheet;
   // do stuff with the sheet
}

Now we are free to substitute Application with our own version, set it up to emulate whatever scenario we see fit, and test what the behavior of the method should be in that case. I’ll revisit that question later when discussing automated tests, and leave it at that for now. For that matter, note also that we could perfectly well re-use that method anywhere, as there is no dependency on the add-in.

By separating into multiple projects, we can enforce that rule. In our Solution, the add-in acts purely as a bootstrapper: its only role is to kick things off when Excel starts, and initialize the UserInterface project, where the core of the application logic resides.

Note: I may rename the UserInterface project to ExcelApplication later on, the name would be more fitting.

The UserInterface project has a reference to Microsoft.Office.Interop.Excel, but not the Add-In, and therefore cannot use Globals or ThisAddin. In the ThisAddIn_Startup method, the running instance of Excel is retrieved and passed to the TaskPaneBuilder in the UserInterface project. From that point on, the Add-In itself doesn’t do anything: the ExcelController in the UserInterface project holds a reference to Excel, and handles all the interactions with Excel going forward.

So what is the Domain project about, then?

This one may or may not be a luxury item, depending on your project. The intent of the Domain project is to represent the business model you are dealing with; it should have no knowledge of Excel (hence the lack of reference to the Excel interop). For instance, StockHistory or MovingAverage are concepts which exist and can be modeled independently of Excel. Our add-in is currently using Excel as a client, but nothing would preclude developing a web-based version later, for instance. In that case, having a separate library which contains the Domain would prove very valuable, because it could be used as-is: the “only” effort required would be to write a different client / user interface to interact with that Domain model.

The name of the fourth project, UnitTests, should give away its purpose: it hosts the automated tests that verify certain behavior of the application. We’ll revisit it later on.

Of course, you don’t have to follow that organization - I also write add-ins which live in one single project! The right organization for your solution depends on each individual project, their complexity, and how they may grow over time. The structure/ideas I describe here are on the heavy side, but have served me well over the years - I hope they will help you think about your projects as well!

by Mathias 30. January 2012 01:32

As happy as I am with how Bumblebee came out so far, there was one sore spot bugging me. I tried my best to write it in a functional style, but failed in one place. The inner part of the Search algorithm, which processes bees coming back in the queue with a new solution, was written as a while loop, using two mutable references to maintain the best solution found so far, and the list of solutions stored by the inactive bees.

And then it hit me yesterday, as I was reading some material on F# Azure worker roles.

Novice:

   Master, I fail to find the path to the immutable way.

Master:

   Immutable way?

   Nothing changes, just call the

   Immutable way.

This is an execrable Haiku, and a reminder to myself. When I get stuck with my old mutable ways in F#, usually the answer is recursion.

In this case, all it took was rewriting the inner loop as a recursive function. The original loop looked along these lines:

while cancel.IsCancellationRequested = false do
   match returnQueue.IsEmpty with
   // the returning bees queue is not empty
   | false -> 
      let success, bee = returnQueue.TryDequeue()
      match success with
      // a returning bee has been found in the queue
      | true -> 
         inactives := waggle !inactives bee
         let candidate = Hive.solutionOf bee
         if candidate.Quality > best.Value.Quality then
            best.Value <- candidate
            foundSolution.Trigger(new SolutionMessage<'a>(candidate))
         else ignore ()
      // more code, irrelevant to the point

The recursive version simply includes in its arguments list all the previously mutable variables, and calls itself, passing along the result of the updates, along these lines:

let rec loop (queue: ConcurrentQueue<Bee<'a>>) best inactives (cancel: CancellationTokenSource) =
   if cancel.IsCancellationRequested then ignore ()
   else
      let success, bee = queue.TryDequeue()
      if success then
      // a returning bee has been found in the queue
         let updatedInactives = waggle inactives bee
         let candidate = Hive.solutionOf bee
         let updatedBest = bestOf candidate best
         if candidate.Quality > best.Quality 
         then foundSolution.Trigger(new SolutionMessage<'a>(candidate))
         
         dispatchBee updatedInactives pickRandom waggle bee

         loop queue updatedBest updatedInactives cancel

      else loop queue best inactives cancel

The structure is essentially the same, but all the mutable variables are now gone; instead, the recursive function passes forward new “updated” values.

I think what tripped me up is the fact that I never used recursion to run an “infinite loop” before – I always saw them done using while true statements. Conversely, I always used recursion to produce an actual result, computing intermediary results until a termination condition was met. This one is a bit different, because the recursion simply passes along some state information, but doesn’t return anything (its return is unit) or terminate until a cancellation happens.

by Mathias 11. December 2011 11:06

Yesterday, I made the first version of Bumblee public on Codeplex. Version 0.1 is an Alpha release, meaning that it’s usable, but still rough around the edges.

What is Bumblebee? It is an Artificial Bee Colony (ABC) algorithm, a randomized search method which mimics the behavior of bee hives. Given a search problem, the algorithm will dispatch Scout bees to look for new solutions, and Active bees to explore around known solutions. Once their search is complete, bees return to the Hive and share information with Inactive bees, and new searches are allocated based on the information available so far.

I have multiple goals with Bumblebee. I came across the algorithm for the first time in this article, which illustrates it on the Traveling Salesman Problem with a C# implementation. I enjoyed the article, but wondered if I could

  • parallelize the algorithm to use multiple cores,
  • provide a general API to run arbitrary problems.

… and I figured it would be a good sample project to sharpen my F# skills.

For the parallelization part, I decided to use the Task Parallel Library: the Hive creates Tasks for each bee beginning a search, which returns to a Queue to be processed and share the search result with the inactive bees in the Hive (see outline here).

Deciding on a design for the API took some back and forth, and will likely evolve in the next release. The API should:

  • accommodate any problem that can be solved by that approach,
  • be reasonably simple to use by default,
  • limit parallelization problems, in particular around random numbers generation,
  • be palatable for F# and C# users.

Regarding the first point, 3 things are needed to solve a problem using ABC: Scouts need to know how to find new random solutions, Active bees need to be able to find a random neighbor of a solution, and solutions need to be comparable. I figured this could be expressed via 3 functions, held together in a Problem class:

  • a generator, which given a RNG returns a new random solution,
  • a mutator, which given a RNG + solution tuple, returns a random neighbor of the solution,
  • an evaluator, which given a solution returns a float, measuring the quality of the solution.

You can see the algorithm in action in the TspDemo console application included in the source code.

I opted to have the Random Number Generator as an argument because the algorithm is responsible for spawning Tasks, and is therefore in a good position to provide a RNG that is safe to use, relieving the client of the responsibility of creating RNGs. I’ll probably rework this a bit, though, because I don’t like the direct dependency on Random; it is not very safe, and I would like to provide the ability to use whatever RNG users may happen to prefer.

The reason I chose to have the evaluator return a float (instead of using IComparable) is because I figured it might be interesting to have a measure which allowed the computation of rates of improvements in solution quality.

As for simplicity, I ended up with a main class Solver with 2 main methods. Search(Problem) initiates a search as a Task, and goes on forever until Stop( ) is called. The Solver exposes an event, FoundSolution, which fires every time an improvement is found, and returns a SolutionMessage, containing the solution, its quality, and the time of discovery. It is the responsibility of the user to decide when to stop the search, based on the information returned via the events.

By default, the Solver is configured with “reasonable” parameters, but if needed, they are exposed via properties, which can be modified before the search is initiated.

No effort has gone for the first release to make this C# friendly – this, and abstracting the RNG, are the goals of the next release.

I would love to get feedback, both on the overall design and the code itself! You can download the project from here.

by Mathias 4. December 2011 15:47

I am in the middle of “Working Effectively with Legacy Code”, and found it every bit as great as it was said to be. In the book, Feathers introduces the concept of Seams and Enabling Points:

a Seam is a place where you can alter behavior in your program without editing it in that place

every seam has an enabling point, a place where you can make the decision to use one behavior or another.

The idea - as I understand it - is that an enabling point is a hook for testability, a place where you can replace the behavior of a piece of code with your own controlled behavior, and validate that the results are as expected.

The reason I am bringing this up is that I have been writing lots of F# lately, and it made me realize that a functional style provides lots of enabling points, and can be much easier to test than object-oriented code.

Here is a simplified, but representative, example of the problem I was looking at: I needed to pick a random item in a list. In C#, a method along these lines would do the job:

public T PickFrom(IList<T> list)
{
   var random = new Random();
   return list[random.Next(list.Count())];
}

However, this code is utterly untestable; it’s also probably a terrible idea to instantiate a new Random every time this is called, so we modify it this way:

public T PickFrom(IList<T> list, Random random)
{
   return list[random.Next(list.Count())];
}

This is much better: now we have a decent Enabling Point, because the list of arguments of the method contains everything that is used inside the method. However, this is still untestable, but for a different reason: by definition, Random.Next() will return different values every time PickFrom is called, and expecting a repeatable result from PickFrom is a bit of a desperate enterprise.

More...

by Mathias 13. November 2011 11:03

We were doing some pair-programming with Petar recently, Test-Driven Development style, and started talking about how figuring out where to begin with the tests is often the hardest part. Petar noticed that when writing a test, I was typically starting at the end, first writing an Assert, and then coding my way backwards in the test – and that it helped getting things started.

I hadn’t realized I was doing it, and suspected it was coming from Kent Beck’s “Test-Driven Development, by Example”. Sure enough, the Patterns section of the book lists the following:

Assert First. When should you write the asserts? Try writing them first.

So why would this be a good idea?

I think the reason it works well, is that it helps focus the effort on one single goal at a time, and requires clarifying what that goal is. Starting with the Assert forces you to imagine one single fact that should be true once you have implemented the feature, and to think about how you are going to verify that the feature is indeed working.

Once the Assert is in place, you can now write the story backwards: what is the method that was called to get the result being checked, and  where does it belong? What classes and setup is required to make that method call? And, now that the story is written, what is it really saying, and what should the test method name be?

In other words, begin with the Assert, figure out the Act part, Arrange the actors, and (re)name the test method.

I think what trips some people is that while a good test will look like a little story, progressing from a beginning to a logical end, the process leading to it follows a completely opposite direction. Kent Beck points the Self-Similarity in the entire process: write stories which describe what the application will do once done, write tests which describe what the feature does once the code is implemented, and write asserts which will pass once the test is complete. Always start with the end in mind, and do exactly what it takes to achieve your goal.

dyn002_original_472_480_pjpeg_2665126_e9a998bfcc0456c55ca52c2c29484609[1]

Comments

Comment RSS