Friday, July 22, 2011

RemoteApp and Passwords

We are in the process of preparing a legacy Windows application for deployment as a Windows Terminal Services RemoteApp over the internet. One of the issues we ran into was the lack of password expiration detection and password change support in the RemoteApp infrastructure. Currently, RemoteApp is positioned primarily for usage within a single Windows domain/organisation boundary. As a workaround for the lack of password management support in RemoteApp, we decided to fill this gap as much as possible ourselves.

We used the following ADSI (System.DirectoryServices in .NET) call to reliably detect password expiration date, regardless whether the user account was a local or domain account and regardless of which security policy applies to the given user (see IAdsUser interface reference):

static DateTime PasswordExpiryTime(string domainOrMachine, string userName)
  using (var directoryEntry = new System.DirectoryServices.DirectoryEntry("WinNT://" + domainOrMachineName + '/' + userName + ",user"))
      return (DateTime)directoryEntry.InvokeGet("PasswordExpirationDate");
    catch (TargetInvocationException e)
      throw e.InnerException;

Initially we also used ADSI to perform the password change:

static void ChangePassword(string domainOrMachine, string userName, string oldPassword, string newPassword)
  using (var directoryEntry = new System.DirectoryServices.DirectoryEntry("WinNT://" + domainOrMachineName + '/' + userName + ",user"))
      directoryEntry.Invoke("ChangePassword", oldPassword, newPassword);
    catch (TargetInvocationException e)
      throw e.InnerException;

However, after a password change via ADSI we often encountered problems when connecting to for example SQL Server databases using integrated security. It seems that certain cached credentials didn't get refreshed. Ultimately, the resolution for this problem was to directly call the Windows NetUserChangePassword API:

static void ChangePassword(string domainOrMachine, string userName, string oldPassword, string newPassword)
  uint returnValue = NetUserChangePassword(domainOrMachine, userName, oldPassword, newPassword);
  if (returnValue != 0) throw new Win32Exception();

Thursday, April 05, 2007

NHive project launched

Following on from my previous post, there seem to be more people who'd like a .NET collection library that has the rich functionality and liberal license of the C5 Collections and the .NET Framework integration of the PowerCollections. Enough to start NHive, a new open-source project to achieve that vision. It is very early days yet, but you need to start somewhere!

Thursday, January 18, 2007

About .NET Collection Libraries (PowerCollections and C5)

Summary: Some positive and negative points of the PowerCollections and C5 .NET collection libraries from early experience.

It is no secret that the .NET Base Class Library (BCL) is not overly generous when it comes to collection classes. Though generics have improved life a lot, there still are considerable holes. Fairly well known examples are the lack of set implementations and read-only wrappers around collections/dictionaries.

There are various more or less open-source .NET collection library projects that try to fill the gap of which the Wintellect PowerCollections and the C5 Collections seem to be mentioned most. Of the possible other alternatives, the NCollection project at CodePlex has barely started and the NGenLib project on SourceForge has not seen any activity for two years.

The philosophy behind PowerCollections and C5 differs a lot. The PowerCollections approach is pragmatic and aims to stay as close as possible to the existing BCL collections programming model and interfaces, filling the gaps where necessary. A very appealing part of PowerCollections is the static Algorithms class, which defines tens of generic methods for algorithms such as searching and sorting any object that supports IEnumerable. This class is very useful, because the implementation of similar methods on the BCL collections is either missing or inconsistent.

Contrary to this the C5 Collection classes have been designed from scratch and show how a more academic approach has resulted in a set of very powerful and consistently implemented collections. Examples of out-of-the-box functionality are fine-grained events and read-only wrappers for all collections. To get the most from the C5 collections, it pays to have a look at the technical paper that outlines their design and also provides a full API reference.

Based on approach, I would consider PowerCollections first if I needed a patch for missing BCL functionality. However, the one thing I don't like about the PowerCollections is its non-standard license, which seems to prohibit use in applications that are distributed under GPL/LGPL-like licenses. As Wintellect (recently bought by Microsoft) invited community participation, I feel this would have deserved a more liberal license. For example, a BSD style license, such as used for the C5 collections. For me, this has tipped the balance in favour of C5.

Having said that, the C5 library needs a bit of nurturing to gain most of its power in day-to-day applications. Offering an implementation of the BCL IDictionary and IList interfaces is an important facet of this. Otherwise it can be hard to interface with other modules (NHibernate, Castle or Spring.NET for example) that work with the BCL collection interfaces.

So far, my experience in modifying the C5 code has been promising. After some refactoring of the C5 code, I have been able to implement the
BCL IDictionary interface on all C5 dictionaries. It is too early to start posting code yet, but feel free to leave a comment on this post if you're interested!

Thursday, July 27, 2006

Hypocrisy of Anti-virus Software Vendors

After using McAfee's VirusScan product for the consumer market for a while, I still get very annoyed that it can only update itself if you're logged on as administrator. Given that running as a normal user is one of the better security measures to take, this seems plain hypocrisy to me: "Please use the most unsecure user account possible to keep your PC protected..." Especially because the business version of VirusScan does update itself without any need for admin logons. And it's not just McAfee. The story is the same for CA's e-Trust and Norton's consumer offering.

Friday, June 30, 2006

From Microsoft Virtual PC/Server to VMWare?

Finally bought my own laptop two days ago and decided to give VMWare Server RC2 a try.

I had used VMWare Workstation in the past, but since VirtualPC and Server were available in our MSDN Subscription it didn't make sense to pay more than necessary to maintain virtual development and testing environments. Now that VMWare has taken the initiative to release their more mature product for free, it was time to see whether it was worth switching back.

So far, I am pretty impressed. The VMWare server management console (a fat client) is much friendlier to use than the Microsoft Virtual Server web application. And to my amazement, VMWare supports running Virtual PC/Server environments within seconds as well as converting them into a proper VMWare environment. The only snag I encountered is that the VMWare Virtual Machine Importer wizard must be run under an admin account...

Update: And be aware that any Windows installations may need to be reactivated because of the change in virtual hardware environment!

Wednesday, January 11, 2006

Skype Video

We have continued to use Skype in a virtual team setting and it works admirably well. This morning I installed the latest version which supports video as well as voice, and the video link worked right away! That is very different from my experience with MSN Messenger video.

Friday, December 09, 2005

Fowler on Validation

I just read Martin Fowler's ContextualValidation entry in which the following statements triggered me:

  • "one thing that constantly trips people up is when they think object validity on a context independent way such as an isValid method implies."

  • "Alan Cooper advocated that we shouldn't let our ideas of valid states prevent a user from entering (and saving) incomplete information."

Martin indicates he prefers the use of contextual validation methods such as isValidForCheckIn rather than a single isValid method. I would agree it is wise to separate out various types of validation, but I am far from sure whether that should be in the form of multiple isValidForXXX methods on say a Customer class.

I am a strong believer in separating information management, business process management and UI dialog management responsibilities. In my book information management is about enforcing context-free constraints on business data, to ensure that this data has and maintains certain semantics (e.g. weight is a positive number, end data cannot be less than start date, total volume of packages in a warehouse cannot be more than volume of warehouse).

Business process management is about coordinating the automated and manual activities required to gather a piece of meaningful information. Though a business process may take a long time to get all the data it needs, it either finishes with a commit of the gathered information into an information management system (or service/component/object, whichever term you prefer) or it gets cancelled. A business process management system may well store information itself, but it is of a different nature and/or with different semantics. While a business process is running, the collected information only has meaning to the participants in the business process.

For example, while a customer is on the phone to an energy supplier to set up a direct debit, only the customer and the customer service person on the other end of the phone line can (to a certain degree) assess the validity of the information collected so far. It is only when both the customer and the customer service representative both agree that all the required information is there and of sufficient quality that this information is committed to the customer database of the energy supplier and becomes relevant and meaningful to the rest of the organisation. So business information systems manage data that is regarded as 'fact' by the whole organisation. Business process management systems manage (the collection and storing of) work-in-progress data with limited meaning to anyone else than the people directly involved.

An object-oriented implementation of these ideas would involve the creation of business workflow classes/component/services that coordinate business process execution and business entity classes/components/services that are responsible for information management (terminology borrowed from Application Architecture for .NET: Designing Applications and Services). A CustomerCheckIn class would provide a natural home for the IsValidForCheckIn method, whereas the CustomerEntity class would solely have context-free information management reponsibilities, and one isValid method could well do the job. That is context-free as in not business process state dependent. A Customer object may still have a limited number of states (such as active/non-active/archived) that reflect the long-term life cycle of customer data.

Saturday, November 05, 2005

Upgrading Applications with Microsoft Installer (MSI)

Once we had mastered building a Microsoft Windows Installer (MSI) file (using WiX) and got our application successfully installed, it was time to consider upgrades. We quickly found out that our MSI file would not install if the application already had been installed. It did not matter whether the new install had a higher product version or product id (in which case we want an automatic upgrade) or a lower version (automatic upgrade not desirable).

It turns out that a MSI file in itself cannot bootstrap an automatic upgrade process. When you execute a MSI file (myapp-v0.msi for example) directly the following command get executed:
msiexec.exe /i "myapp-v0.msi"
This will install nicely on a clean machine. To allow for upgrades the myapp-v0.msi must specify an upgrade code.

Now let's see what happens when we release a next version of the application and try to install it on the same machine using a myapp-v1.msi installation file. Doubleclicking on the msi file in the Windows shell will result in a
msiexec.exe /i "myapp-v1.msi"
command. Because there exists an application on the machine that was installed with an installer that had the same upgrade code as we are using now, the default behaviour of Microsoft Installer is to not use the myapp-v1.msi file we started in the first place! Instead it uses a cached version of the MSI file with the same upgrade code. The cached version is a copy of myapp-v0.msi that tends to be saved with a pretty random name in the %windir%\installer directory. So the installation file for a newer product version can never bootstrap the upgrade of a previous version.

Automatically bootstrapping a complete product upgrade requires execution of something like the following command line:
msiexec /i myapp-v1.msi REINSTALLMODE=vomus REINSTALL=ALL
This command will use the myapp-v1.msi rather than the cached version of myapp-v0.msi to perform a (in MSI-speak small or minor) update on the installed product and it will replace the cached copy of myapp-v0.msi with a copy of myapp-v1.msi.

It does not make sense to send end users through command line hell in order to reach installation nirvana, so what else can be done to automatically upgrade older product versions to new versions? The common answer is to use a wrapper executable that invokes Windows Installer (msiexec.exe) with the correct parameters. Commercial installation products such as Installshield tend to provide this functionality. However, open source installer solutions such as WiX (or the NAnt MSI task) don't.

This is where the Microsoft Windows Installer SDK comes to the rescue. The installer SDK can be downloaded as part of the Windows platform SDK and contains the C++ sample code for two utilities that allow you to build your own MSI wrapper executables:

  • setup.exe is the a wrapper executable whose installation behaviour is encoded in a number of string resources.

  • msistuff.exe is a command-line utility to modify the string resources in setup.exe.

Using a command such as
msistuff setup.exe /d myapp-v1.msi /n "My Great App 1.0" /o INSTALLUPD /v 200 /w InstMsiW.exe'
we get a setup.exe wrapper executable that will both install our application on a clean machine or upgrade a previous version when available. It is pretty easy to add this command to a build script, in our case using a NAnt exec task.

Sunday, October 16, 2005

Installers (a plug for WiX)

I know, it has been very quiet here in the past months. Consider it a symptom of a lot of work in progress. The last week of that work I spent a considerable amount of time debugging and (re-)building our installation package.

We started of a while ago with an evaluation version of Installshield. That seemed to do the trick. However, success has an expiry date. An early one of course when you depend on an evaluation version. Reluctant to spend a lot of money on a tool with little reputation of user-friendliness and that does not seem to fit well into our development and build processes, I started looking at the alternatives.

First try was a look at the NAnt MSI task, as we are using NAnt already to build C++ and VB6 code. Google didn't highlight much usage of this task and its documentation is pretty terse as well, so that inspired insufficient confidence to pursue this option any further.

While searching for info on the MSI task a post from Loren Halvorson mentioned WiX (Windows Installer XML), a toolkit that was open-sourced by Microsoft just over a year ago. Together with the reminiscence of a positive post from Ramon Smits a few months ago, this was enough incentive to give it a further look.

The WiX documentation is not completely there yet, but was certainly enough to get started. Using Gabor Deak Jahn's WiX tutorial, I built and debugged a MSI file that deploys 50+ COM components, a number of system dll's and an ODBC data source within a couple of days. Along the way I learnt more about the Microsoft Installer and its rules and weaknesses, than I would have done using higher level tools such as InstallShield and Wise. I consider that a good thing. Smooth deployment (and upgrade/patching) is too important too leave to chance and there are plenty of pitfalls on the way.

I also like the philosophy behind WiX to make deployment authoring a development task. The deployment definition is captured in a (XML) text format that can be updated as soon as components are added or removed by the responsible developer (me). Changes in the deployment script are easily traced using standard version control tools. Two command line tools transform the installation script into the desired MSI file, which was very easy to integrate into our build scripts using NAnt Exec tasks. We are now at a point that we can build from source code to MSI in one go. I already look forward to the day that we can automatically run the installer in one or more clean environments for automated testing!

Finally, WiX seems a pretty safe bet for the future: if WiX is good enough to build MS Office (and a number of other MS products), it certainly should be good enough for us. Active use within Microsoft is a reasonable warranty that the toolkit will remain up-to-date. And the growing open-source community around it is likely to shave off any rough edges that still exist.

Wednesday, June 29, 2005


I have been using Firefox for a good while now and like it. So much, that I even set it as my default browser.

However, that seemed a step to far for MS applications such as the Virtual Server Administration Site, which depend on Internet Explorer to provide a decent user interface. The Firefox IEView extension can come to the rescue here, but in the end I decided to return to Internet Explorer as default browser to accomodate the culprits and reduce the number of clicks on my way to productivity.

That worked well, until I got fed up with the gmail notifier launching the gmail site in IE as well. That is where the FirefoxView extension for IE now seems to come in handy to launch Firefox pages from IE (update: or just switch the browser setting in the gmail notifier...).

Tuesday, May 17, 2005

First weeks in business

I have been rather silent since leaving LogicaCMG, but starting up on my own has taken more time than expected. First of all the amount of DIY involved to turn our study into a decent office took a few days a considerable number of trips to and from IKEA. And setting up computer equipment always seems to take longer than one expects as well. Last but unfortunately not least there is a considerable amount of red tape to get familiar with (VAT anyone?) and the administrative consequences.

Afin, all that is pretty much settled now. So I hope to be able to blog again about some technical topics in the future. Early candidates are experiences with using and running Subversion in a Microsoft operating and development environment, using NAnt and NUnit to build and test VB6 en C++ projects, and migrating VB6/C++ apps to .NET.

Having said that, there is a still a business web site to be developed, business cards to be printed and regular work to be done... The future will tell!

Sunday, May 08, 2005


I last tried Skype about a year ago, but kicked it off my home system after I started getting random calls from strangers. Bogus and/or commercial calls on my landline are bad enough. More of the same via the internet was one step to far. And my Skype contact list at that time was pretty small anyway.

Lately I have been using MSN Messenger for voice and a bit of video conferencing. However, to get this to work I have to reconfigure the NAT services on my ADSL router to a much less secure configuration (unfortunate lack uPnP implementation on the ADSL router). Messenger traffic is not secured either, so it was time to reconsider alternatives.

So I am back with Skype. Only just. As soon as I installed it on my new AMD Athlon 64 box, it reliably crashed immediately on startup with some sort of memory access violation reported in the Windows event log. As pointed out by this article, the reason turned out to be an aggressive data execution prevention (DEP) configuration. Turning DEP off for Skype solved the problem. Tomorrow I'll find out whether it really works across my symmetric NAT configuration.

Update [17/05/2005]: Unlike MSN Messenger Skype seems pretty happy with my NAT configuration. After some initial problems with my contact list it now works reliably and is likely to stay as my voice client.

Monday, April 04, 2005


An interesting description of a company culture to be avoided:

  • You can't have a company that entirely consists of high ability people, you need a mix of less able people that the high ability people leverage.
  • Intellectuals aren't interested in making money, so a company built around them won't stay viable.
  • It's a harsh world where nice guys finish last - so you can't afford to be nice to employees and customers without an ulterior business reason.
  • High ability people can't collaborate effectively, they intellectualize and self-destruct.
  • Large companies need a strong management structure to avoid falling apart.
  • Intellectuals must be run by B students since intellectuals are idealist and only greedy B students are pragmatic enough to make real decisions
  • Doing things for the long term doesn't work.
  • Being transparent about economics and operations is bad internally, worse externally and certainly won't scale.
  • Don't reveal your weaknesses, especially to outsiders.
  • The purpose of being international is to take advantage of people in weaker countries.
  • Don't give production people powers that can be abused and hurt the company.
  • Culture is secondary - it cannot be a sustainable advantage - you need a superior business model

Thursday, March 24, 2005

Going Independent

I have notified LogicaCMG that I intend to start my own business by the end of April 2005. I am sad to leave a large and international network of professional relationships. I am happy to take the opportunity to do the things that I enjoy and be able to write more about it in the process.

More Rich Internet Application Frameworks (3)

An AJAX thread on the TheServerSide.NET has resulted in another smart browser framework supplier: Isomorphic. The same thread also contains some posts on developing online/offline functionality with AJAX.

Wednesday, March 23, 2005

IoC in .NET

The Inversion of Control virus seems to have mutated sufficiently to make the jump from Java to .NET developers. Olaf Conijn writes about his beginning excursion into this territory. However, before writing a new framework it might be worthwhile to have a look at existing .NET efforts. This thread on tackles the subject as well.

Monday, March 21, 2005

AJAX (2)

Sam Ruby has started to write about AJAX best practices. A comment in one of his posts referred to a AJAX list which provided a pointer to the Ajaxian blog, which has a good amount of useful info about JavaScript libraries and more.

Sunday, March 20, 2005

More Rich Internet Application Frameworks (2)

I like irony and coincidence. I like it even more when the two come together.

At the beginning of this year I wrote that a MVP award and a GMail account had provided a good start, but that their benefits still had to be determined. Last Friday I received a MVP related message in my gmail account which came with the following sponsored link:
Longhorn Can Wait - Learn to build the future now with the open source Laszlo platform.

I had not heard about Laszlo before, but it turns out to be another rich internet application platform. It uses its own XML document structure in combination with JavaScript and XPath, which at first sight looks pretty intuitive and powerful. A Java backend compile the Laszlo XML documents into Flash 5+ compatible SWF files that can be displayed in any browser that supports Flash plug-ins. It seems that other display formats could be targeted by the backend, but so far Flash is the only option. I guess XAML/Avalon would be a conceivable option once Longhorn is out.

Laszlo is open source, released under the Common Public License, which allows use in commercial products. Tool support is available as an Eclipse plug-in. For more info and demos see

Synergy or what?

Thursday, March 17, 2005

More Rich Internet Application Frameworks

More people seem to be interested in smart browser applications (my preferred term for rich internet applications...). A colleague pointed to Bindows as a rich browser-side UI framework. It does some impressive things, but I am still far from convinced about its support for Mozilla browsers. Many of the samples don't seem to work as intended in Firefox 1.0.1, but I trust that will change over time.

Sunday, March 13, 2005


Though I have had a weak spot for the Dutch football club for a long time, I expect to spend more time on the "technology".

I saw a first reference to the term AJAX (Asynchronous JavaScript with XML) in Dennis van der Stelt's blog. Shortly afterwards followed by a remark in an InfoWorld article on the Java vs .NET debate (referenced by where AJAX is mentioned as a likely new feature in JSF 1.2/J2EE 5.

Finally it seems we are getting some traction for the development of rich browser applications. Though browser nowadays provide a strong platform (DHTML, CSS, XML, JavaScript) to build such applications, there is still a lack of supporting browser-side frameworks and associated development tools that will allow for productive development of such interfaces. However, it seems this is starting to change.

I can see this work in the Java world through the community process and open-source organisations like Apache supporting it. Unfortunately, I still have to see whether similar traction can be formed in the Microsoft world, because Microsoft currently has no direct benefit from promoting richer browser applications. It would go at the cost of the momentum they are trying to build for development of smart clients based on XAML/Avalon and Office. Developing a browser-side framework would be a risk for Microsoft partners (or any other commercial party), because it is practically impossible to protect browser-side script. So I guess it will have to come from either the open-source community or a MS partner who is willing to build both framework and VS.NET integrated toolset.

A business model based on providing the toolset seems to be feasible: in the Netherlands BackBase is a company with a very interesting offering in this area (based on a open-source, Java-based backend).

Tuesday, January 11, 2005

Microsoft SharePoint roadmap absence

Nobody but the Microsoft product team seems to know where the next release of SharePoint Services/Portal Server will be heading. Customers don't, Partners don't, MVPs don't, and even Microsoft's own portal guru's/evangelists such as Mark Harrison don't.

Having had a closer look at the SharePoint externals and internals to find out how to get my way with it after all I finally dived into the web service interfaces and the related CAML (Collaborative Application Markup Language) documentation in the SharePoint SDK. CAML first of all provides a flexible schema definition facility that underpins SharePoint's ability to create/modify tables (lists in SharePoint speak) at run-time and thus works around the limitations of SQL Server (and any other RDBMS that I know of). The second part of CAML seems to focuses on the rendering of list content.

So SharePoint defines a data schemas and flexible data storage mechanism, as well as rendering and searching of the this data. Is it just me, or does there seem a lot of overlap between the core SharePoint infrastructure and the functionality that was originally intended to be delivered by WinFS in Longhorn. Before Microsoft's announcement of WinFS delay, it would make sense for a next version of SharePoint to use WinFS instead of its current data typing and storage mechanisms. If the SharePoint product team was indeed heading that way, they are now faced with some serious questions about how to go forward. That would certainly explain the complete lack of clarity about SharePoint's future direction.

Sunday, January 09, 2005

Proxy (Auto-)Configuration Blues

I got seriously fed up with switching proxy settings when switching location of my laptop between company, home and client/partner networks. For some reason or another automatic configuration would not work everywhere, so I decided to write my own PAC (proxy autoconfiguration) script.

The authoritative reference documentation for PAC scripts by Netscape was easily found by googling for "proxy autoconfiguration script" (Interestingly, the same search on MSN did not even return this URL on the first page of search results). A PAC script is nothing more a bit of JavaScript code that must define a FindProxyForURL function.

My initial script looked like this:
function FindProxyForURL(url, host)
  if(isInNet(myIpAddress(), "", ""))
    // Connected to company intranet
    return "PROXY; PROXY
  return "DIRECT";

If current IP address of machine is in the company intranet subnet, use the proxies that provide internet access from the intranet (company network scenario), otherwise assume direct access is possible (home network scenario).

Unfortunately this script did not work straight away. Writing it was pretty easy, debugging not. An old MIND article provided some help in this area. Attaching a debugger to Internet Explorer didn't seem to work for me - the auto configuration script block didn't show up in the Running Documents. JavaScript alerts in the PAC file helped highlight the problem however (IE6 shows the alerts in popup windows, Firefox writes the alerts to its JavaScript console). It turned out that the myIpAddress function returned the IP Address of a loopback adapter (installed for more complex VirtualPC networking scenarios) instead of the IP Address of the Ethernet adapter connected to the LAN.

Credits to Oliver Presland (Microsoft UK) for pointing out that this problem could be tackled by changing the priority of the network connections. Something that is much more obscure now (open Network Connections applet from Control Panel, Advanced/Advanced Settings... menu, Adapters & Bindings tab, Connections list) than it was in NT4 days. Once the LAN adapter had top priority, the PAC script worked as intended both in the office and at home. That is, until I set up a VPN connection to the company intranet...

The VPN software creates a temporary network adapter that gets an IP address in the company intranet subnet and always seems to claim top connection priority. The result was that once a VPN connection had been setup, all internet traffic would be routed via the company intranet and its proxies. To prevent unnecessary use of company resources I adjusted the PAC script:
function FindProxyForURL(url, host)
  if (isResolvable("myhomepc"))
    // Can resolve DNS queries without proxy
    return "DIRECT";
  else if(isInNet(myIpAddress(), "", ""))
    // Connected to company intranet
    return "PROXY; PROXY";
  return "DIRECT";

This seems to have done the trick. Update: Only problem now is that resolving host names that cannot be found takes an awful long time. Using subnet tests only is a lot faster, so I will do a bit more work to find out which company subnets are used for VPN clients and which are used for company LAN clients.

Good start of the year

It has been an interesting first week. I got a gmail account and became a Microsoft MVP (Visual Developer - Solutions Architect)! I need to further explore the benefits of both, but it's already becoming clear that the latter provides another, pretty powerful route into the Microsoft organisation.

Saturday, October 30, 2004

Some .NET Regular Expressions for HTML parsing

Despite the observations in my previous post, my intranet publishing solution is based on a combination of the ASP.NET approach to obtain web content and regular expressions coded in C#. Here are some regular expressions that worked well for me to get specific meta and input tags regardless of letter case and attribute quoting style (single, double or no quotes):
  • Content attribute of meta tag with name content-type:
      ( '(?<Result>[^']*)'
      | "(?<Result>[^"]*)"
      | (?<Result>[^\s>]*)

  • Value of input element with name input-name:
      ( '(?<Result>[^']*)'
      | "(?<Result>[^"]*)"
      | (?<Result>[^\s>]*)

Friday, October 29, 2004

HTML Parsing/Screen Scraping in .NET

In an e-mail conversation with Pascal Naber the topic of HTML "screen scraping" came up. I dabbled a bit with this a few months ago to when writing a command line utility to alleviate the pain of manually publishing content to our intranet. So far I have considered the following solutions to do HTML screen scraping in .NET, the first two of which I have looked at in detail and the third one I will test-drive on the next occasion:
  • ASP.NET web page parsing: The ASP.NET web service infrastructure supports WSDL extensions that parse the result of HTTP-GET requests using regular expressions. The WSDL has to written and/or edited by hand and can be compiled using the wsdl.exe tool into a web service proxy. I honestly cannot recommend this approach. Regular expressions on their own are already a nightmare to decipher. Having to XML-escape them to get a valid XML attribute is a solid recipe for no sleep at all. Documentation besides the occasional MSDN page is absent. The few articles that exist on this feature seem to rehash the oversimplified example on MSDN.

  • SgmlReader: Provides XmlReader interface over arbitrary SGML documents and has native support for HTML DTDs. Most elegant solution I have seen so far:
    1. It allows for processing of HTML streams - it is not necessary to load complete documents into memory.
    2. It is easy to layer an XPathNavigator on top to extract content.
    Disadvantages are the lack of preservation of the original HTML and it seems there are not many people who (are able to) support the code. When I encountered problems with complex embedded JavaScript blocks I decided to look for alternatives.

  • .NET HTML agility Pack: Reads HTML into custom object model en is able to convert HTML to XML. Seems to be the most pragmatic approach for its ease of use and ability to deal with invalid HTML (tag soup).

Thursday, October 21, 2004

Visual Studio and C# Multi-line Build Event Editing

I sometimes use build events to invoke tools such as xsd.exe and wsdl.exe for code generation purposes. Typically this requires a build event script with multiple lines. An annoying 'feature' of Visual Studio 2005 is that if you press the Enter key to create a new line in a build script, the OK button is activated and the dialog is closed. Ctrl+Enter seems to do the trick however.

Wednesday, October 20, 2004

Debugging With... VS2005 and TestDriven .NET

I'm rewriting some messaging code from scratch using a strict test-first approach to test-drive some XP practices and a number of (fairly) new technologies:
My first impressions are pretty positive. Everything works together nicely so far. One of my favourite features is the "Test with... Debugging" option to debug code from an arbitrary test within VS.NET. That is, once I got it to work.

I initially created a project in Visual C# express and subsequently upgraded to VS.NET 2005 Beta 1 with Tech Refresh. When I then tried to debug the project I got the following error:
One or more projects in the solution do not contain user code and cannot be debugged with "Just my code setting" enabled". Make sure that all projects in your solution are configured to be built in Debug mode.

To suppress this message from appearing in the future, disable 'Warn if no user code on launch' in the debugger options page. To prevent the debugger running in 'Just My Code' mode, turn off 'Enable Just My Code' setting in the debugger options page.
Turning off both 'Just My Code' flags in the debugger options page indeed got rid of the warning, but my debugging breakpoints never were hit. The important clue in the warning turned out to be the "Make sure that all projects in your solution are configured to be built in Debug mode" sentence. The advanced build settings (project properties|Build Tab|Output group|Advanced button) specifies the debug info to be generated during builds. Somehow this setting was set to None, and changing it to Full solved all my debugging problems.

Wednesday, September 01, 2004

Enterprise Services Bus

Radovan Janecek points to a set of posts (Jump of the bus, take a cab I, II and III) by Jean-Jacques Dubray in which Jean-Jacques indicates that the Enterprise Services Bus (ESB) is far from essential for SOA.

In principle I agree: the implementation of services does not require the use of an ESB. In practice however I like to use an ESB to solve a lot of infrastructures issues related to the implementation of services.

My "ideal" service implementation consists of a business logic kernel wrapped by a service infrastructure shell. The business logic, implemented in a business component , contains as little infrastructure logic (logging, authentication, authorisation, encryption, caching, transaction management, message encoding/encryption/compression, etc.) as possible. The service infrastructure may host the business component and must ensure the business component can be found and reached by the outside world. It must also enable the definition and enforcement of infrastructure policy for messages sent to and from the service.

The ESB can ease the implementation of service clients as well, by shielding service clients from the implementation details of locating service endpoints, selecting the preferred endpoint, selecting the optimal transport protocol and negotiating the policy for service invocations.

As long as an ESB supports WS standards (SOAP, WSDL, WS-Policy, WS-MetadataExchange) it's use is not necessary to create service clients and providers, but it certainly makes the life of the developers creating services a lot easier. It also decouples service consumers and providers from technical implementation details, which provides considerable flexibility for optimisation and management of services.

Phil Wainewright gives an insightful overview of the ESB playing field in the posts Top ESB Choices and More on ESB.

Saturday, July 31, 2004

How Service Clients Affect Service Design

Mike Taulty writes about how assumptions about service clients, for example the protocol they will use (HTTP especially), can influence the web service interface:

"I keep getting bogged down in what I’d call a “technology” gap between web services implemented on top of HTTP and web services implemented (potentially) on top of other protocols. I’ve been having a lot of discussions around this lately and I thought I’d blog what I’ve been thinking about as a means of getting my thoughts in order and, thereby, preserving my sanity.

The problem arises for me because HTTP is implicitly a request-response protocol and the technologies that we have for building web services today take “advantage” of that fact to implement their services in a particular way."
The interesting scenario that Mike discusses is where a service is not able to handle a request fast enough to be able to return a response within the timespan of one HTTP connection. How do you return the final response to the original requestor in that case? Mike proposes a service interface design that allows service clients to poll for the final response. I am not sure that is the way to go. To quote from my own comment on his blog:

"If I need to cater for polling clients I would use a similar solution as for polling people. Implement a core service that expects to live in an ideal service world with a WSDL as described in my first comment. Then create a second service that represents the polling client, acts as an adapter for the core service and has a WSDL as you propose. Polling clients would talk to the adapter service, non-polling clients (peer services) would talk directly to the core service without being forced to poll for their information."
See comments on his post for the rest of the discussion.

Friday, July 02, 2004

Dependencies between services

During a architect panel at TechEd Europe one of the questions centered on dependencies between services: Isn't there a contradiction between promoting loose-coupling and minimal dependencies between services on one hand and collaboration between services on the other hand?

In this respect it is important to separate the business and technical aspects of service orientation. Services should make minimal assumptions in the technical realm, for example about location (may vary from same machine to outsourced on other side of the world over time), communication protocol (HTTP, DCOM, MSMQ/MQSeries, RMI, IIOP), message format (XML, binary XML), and many other aspects that are policy to be determined and optimised by administrators/operators rather than developers.

If IT services (provided by machines) are to mirror business services (provided by people), the dependencies between services will mirror dependencies between people. People must collaborate because they are not able (lack of capability) or allowed (lack of authorisation) to do everything that they are expected to do by themselves. For the same reason services must collaborate to perform the task they are expected to do. If a service would try to do everything, you end up with the monoliths and application silos that we are slowly trying to get rid of!

The Nerd, The Suit and the Fortune Teller

I am currently being entertained by the nerd (Clemens Vasters), the suit (Rafal Lukawiecki) and the fortune teller (Pat Helland) on the last day of TechEd Europe. Fortune teller is reconciling nerd and suit by promoting service orientation as the thing that will bring business and IT together. Services represent aspects of the business, that decouple concerns at the technical level in line with how they have been separated at the business level.

The question that fortune teller has not answered is HOW we will decompose the business into maximally independent aspects? What are the rules of thumb to be used when analysing the business? During this conference there are some (Steve Cook, Pat Helland) who hint to people/business actors, their commitments to each other and conversations to reach these commitments as first-order concepts needed to start understanding (and model) business and the automated services that support the business. This is very much in line with my thinking, which has strongly been influenced by DEMO.

My advise would be to first determining the roles that are played by people in a business (using communication patterns). As a next step you can define a (candidate) service for each of the identified roles. Each service performs activities that are the responsibility of the people in this role but can be delegated to software.

Saturday, June 26, 2004

The anatomy of business logic

During TechEd Europe 2004 there will be plenty of guidance by eminent people like Pat Helland, Don Box, Arvindra Sehmi, Clemens Vasters and many others on service orientation and supporting infrastructure that is (coming) out (WSE, Shadowfax, FABRIQ, Indigo).

The majority of this guidance is likely to focus on the technical side of SO: how to glue together heterogeneous software into larger solutions using standard messaging protocols and standard metadata formats for service interfaces (WSDL) and policies (WS-Policy).

What happens in the business logic layer?

Guidance on the business side of SO tends to be lacking. People tend to agree that services should offer a coarse-grained interface to business logic, but that's about as good as it gets. What does coarse-grained mean? How are we to discover these coarse-grained chunks of logic during the analysis and design phase of a development project? These questions resemble the ones that come up when defining the architecture for a layered component-based application.

In a three-tier application (presentation, business logic and data access) the responsibilities of the presentation and data access layers are clear. The presentation layer bridges the gap between humans and the software that is supposed to help them to their job. The data access layer ensures we can store and retrieve data from arbitrary data stores, hiding implementation details such as query language and stored procedure usage. But what does the business logic layer do? And how should we partition this layer into separate components?

The software development and design guidance I have encountered in this area has not been very satisfactory so far. Over the past four years my belief has grown that in general the understanding of the business logic layer - the very heart of our business applications - is pretty poor within software development land. However, during the same period I have been inspired by a number of people who promote a "business process oriented" approach to software development.

Separating data- and process-centric business logic

In a business process oriented approach business logic is separated into process-centric logic and data-centric logic. The data-centric logic ensures that data can be stored, retrieved and transformed without violating its meaning (semantics). The process-oriented logic coordinates who (both persons and software) can perform which activities when.

Typically business process logic tries to obtain information from multiple sources (users and systems inside and outside of the organisation) with the final objective to commit a business transaction. An order process for example obtains customer details and confirmation that the financial status of the client is acceptable. Only when all the required information has been gathered the order will be accepted and actually exist in the eyes of the selling organisation. Up until the acceptance of the order, any collected data is work-in-progress data which has no real meaning to the selling organisation except to those directly involved in the process of evaluating the order. Once the work-in-progress data is deemed complete, the last action of the business process logic is to mark this information as a fact for the whole organisation: the final business transaction turns work-in-progress data into business information.

While the work-in-progress data is not complete the business process logic determines who is to provide which information at any given point in time. If the information is to obtained from a person the process logic may prompt the relevant person(s) to provide the required input (ideally using some form of human notification service that abstracts the process logic from presentation channel issues), but in the end it will have to wait possibly a long time for it to be served. If the required information can be provided by software the process logic can invoke the relevant systems, but still may have to wait a long time.

Because business process logic may have to run and maintain work-in-progress data (business process state) state for a long time, it makes sense to implement this logic in a stateless fashion and store this data in a central database. Because the semantics of business process state differs from the semantics of data that is managed by the data-centric business logic, it makes sense as well to keep the two types of data separate. Mixing the two is likely to cause trouble sooner or later, as you mix data that makes sense to the whole organisation (long-lasting facts that are managed by data-centric business logic) with data that only makes sense to the participants in a particular business process instance (temporary work-in-progress data managed by process-centric business logic).

Identifying data- and process-centic business services

In a service-oriented world (as well as in a component-based world) we have to partition data- and process-centric business logic into business process and business information services with minimal mutual dependencies. Of course this raises once more the question of which partitioning rules to use. For business information services (data-centric) it is possible to use a high-level business object or entity model and apply clustering rules such as described by John Cheesman and John Daniels in their book UML Components (recommended reading, a thin book with challenging, to the point guidance on designing component-based software).

For the decomposition of process-centric business logic into business process services (process-centric) I have trouble finding rules of similar clarity in current software development literature and methodologies. My main inspiration in this area currently comes from work done at the University of Delft by Jan Dietz c.s. on modeling the structure of and business processes in organisations from a communication perspective. Though the presentation of Dietz' methodology (DEMO - Dynamic Essential Modeling of Organisations)- still is rather academic, it defines very clearly a number of concepts and patterns that help understand and decompose business processes. As such it provides a good starting point for designing business processes services that must support/implement these business processes.

Find out more at TechEd Europe 2004

If you are around at TechEd Europe 2004 in Amsterdam and are interested in more than the technical view of service orientation, please come and attend Chalk & Talk session CHT047 (Service Identification and Implementation using Communications Patterns, Wednesday morning 8:30-9:45 in room T). And this plug would not be complete if I did not provide a pointer to our article in Journal 1 if you want to come prepared!

Sunday, June 13, 2004


There you go, I have launched into blogosphere. The process of commenting on other people's posts was painful enough to take the hurdle of getting a blog running with the fairly limited means I have available right now.

I intend to write about and comment on my technical topics of interest: software architecture, ranging from business solution design (including buzzword topics such as business process analysis/modelling and service orientation, the cloudy part of this blog) to component design and implementation (the muddy part). Since obtaining my Chemical Engineering degree in 1995 I have mostly spent my time coding, designing, troubleshooting and consulting on Microsoft technology based systems, so the muddy posts will be Microsoft coloured.