As promised, I have posted the 3.0 version of the Measurement Utility for download here: Measurement Utility (Plugin Hardware Abstraction Layer Using Actor Framework).
Note: If you’re not familiar with the system I’m referring to, start with this older blog entry: Designing a LabVIEW Measurement System with Multiple Abstraction Layers.
This latest version has been quite a labor of love over the last several months and I’m very confident in the quality and functionality of this new version – that being said, I am referring to it as ‘Alpha’ because I want to get your feedback.
For those of you familiar with the previous version, you’ll see that the functionality of what I now refer to as the Server Controller is almost identical; however, clicking ‘Start Server’ fires up a TCP listener that will establish a connection with a remotely running Client Controllers. The connection is established using the Linked Network Actor, written by Allen Smith.
In the course of developing this system, I refactored a lot of the code as a product of wanting to derive the server and the client from the same parent controller, as well as the functionality already written for the UI. As an example, the server’s interface was 90% identical to that of the regular controller, so I wanted to reuse the references stored in the private data of the parent – to do this, I modified the VIs responsible for setting these attributes so that they only set the reference the first time they’re called in a specific instance. That way it’s always the leaf-level UI controls that are stored – later attempts to set the reference values (such as within the parent’s actor core) are ignored.
Perhaps the biggest lesson for me was debugging the delicate timing involved with establishing a network connection. Establishing a connection was easy – making sure that connection was robust was a whole different beast. To test it, I would launch a large number of clients (50+) and then toggle the server on and off as quickly as I could. In the course of this testing I discovered that either a random client or the server would hang for reasons I couldn’t pin point. I spent an extremely long time attempting to fix this, but it’s no secret that debugging a large Actor Framework system is non-trivial.
This brings me to one of the most valuable lessons I learned when writing this system: the value in documenting the exact series of events that occur across actors and in what order. To do this, I set about creating a new and much more detailed diagram detailing the communication between actors.
I came up with the following notation for my diagram:
I diagramed the execution of the server and one client establishing communication (you can enlarge the image by clicking on it) below. The posted version of the diagram shows how it works now, but the original diagram I created almost immediately revealed the problems I had spent weeks trying to track down. For starters, there were two points in time during which if the other entity ceased to exist, the other side would end up waiting forever. I also had a logical error when sending the ACK from the connected client – I needed this to come from the LNA, not the server, so that a timeout condition could shut down the appropriate LNA. The problems were simple to address once identified, but very difficult to identify – creating this diagram made it easy.
To create this diagram, I used a combination of the status window in the UI of the Measurement Utility as well as the Desktop Execution Trace Toolkit. You’ll see that the utility outputs a verbose amount of information when executing, but there is additional commented out code in the ‘Send Log Status’ message within the controller that can be used to give even more info, including where the message was sent from.
If you want to create your own like this, you’ll find the power point slide where I created it in the Project File of the Measurement Utility – power point works well for this since it’s vector based graphics.
Actor Framework often gets a bad rap for being hard to debug, but I see similar challenges in any other highly a-synchronous system. I believe that the creation of one of these diagrams is perhaps the most effective way to accurately visualize the behavior of a large asynchronous system and understand behavior for the sake of debugging. Keep in mind that I did not attempt to diagram the entirety of the system, but rather the critical path within the application that was not functioning correctly. Given how many other actors I would’ve otherwise had to diagram, it risks becoming highly unreadable with many more actors displayed.
Perhaps one day we’ll be able to render one of these automatically, but for now it’s worth creating if you find yourself in a similarly frustrating debugging situation.
Share your comments and feedback regarding this new Measurement Utility. I’m also interested to hear your thoughts on how you debug large actor framework systems and whether or not a tool to create this type of a sequence diagram automatically would be useful.