Saturday, February 25, 2012

Unit tests that give you code coverage, but don't test anything

Back in November/2011, I wrote about how high code coverage may not be enough. Code coverage, when it's actually triggered by tests that actually test something, may not be enough as it doesn't test all potential paths (like the ones in a regular expression) but they can actually mean something.

This time, it is about a way more serious issue: unit tests do not test anything at all, but guarantee code coverage.

Let me give you a very simple example:
public static int Sum(int a, int b)
    return a + b;
Very simple method. Now, let's say we write this test:
public static void TestSum()
    Program.Sum(100, 200);
I have the desired code coverage, but what are we actually testing here?

I saw many different variations of such tests, what compelled me to write this post. I believe we had enough arguments before against anyone who claimed that the code had quality because it does have some measure of code coverage.

If this is the metric management values, and nobody is actually reviewing test cases as they should, this is very likely to happen. Pretty much everybody is happy, as the tests pass and the team has a high code coverage. But this is self-deceiving on the product quality, at the least - assuming of course that this was not done intentionally.

When I look back at the different organizations I worked for, I believe this is more likely to happen when other engineers write tests than the original author. For instance, if a junior engineer is assigned to increase the code coverage, he may do it without actually testing anything. One of the reasons is that s/he may not know anything about the code - while another reason is just that this engineer is striving for the valued metric: code coverage.

Another fact that contributes to this kind of practice is that reviewers tend to pay more attention to the product code than to the tests themselves. So these things slip. I even admit that for a long time I assumed that engineers knew what they were doing when they tested and never looked that carefully.

At the end of the day, this kind of work wastes company's and engineer's time and don't accomplish anything. This goes back to my point about not deceiving oneself about quality (as written in "Be serious when providing a web service").

UPDATE: See new post "Code coverage and missed requirements".

Wednesday, February 22, 2012

Distributed computing and partial failures

I read a classical paper on distributed systems tonight called "A Note on Distributed Computing". It is from 1994, but still very relevant and a must read for anyone using distributed systems. It talks about the proposal of unifying the programming model for local and distributed objects, like proposed in CORBA at the time.

Although it seems that we are over this phase of unifying the programming model, I feel like that many people still don't get distributed systems as they should, and still don't think that they are that different from local programming. In a way, I think that they don't get things like partial failures, which is one of the things mentioned in this paper on why local and distributed computing cannot cannot be unified under the same model. So it turns out that this paper helps clarify why there is a difference and that must be taken into account when developing systems as well as testing them.

I like that the authors explain the myth of "Quality of Service" by giving a good example that fails if there are partial failures. Also they show how concurrency can be a problem through another example. And going beyond this example, they talk about the popular NFS file system and how, by being a distributed system behind an interface for a local file system, it has had major problems. I remember hearing about that since the 90s.

Below I paste some core excerpts on partial failures and concurrency from the paper. I would still encourage you to take the time and read it entirely.

The hard problems in distributed computing are not the problems of how to get things on and off the wire. The hard problems in distributed computing concern dealing with partial failure and the lack of a central resource manager. The hard problems in distributed computing concern insuring adequate performance and dealing with problems of concurrency. The hard problems have to do with differences in memory access paradigms between local and distributed entities. People attempting to write distributed applications quickly discover that they are spending all of their efforts in these areas and not on the communications protocol programming interface.

Partial failure is a central reality of distributed computing. Both the local and the distributed world contain components that are subject to periodic failure. In the case of local computing, such failures are either total, affecting all of the entities that are working together in an application, or detectable by some central resource allocator (such as the operating system on the local machine).

This is not the case in distributed computing, where one component (machine, network link) can fail while the others continue. Not only is the failure of the distributed components independent, but there is no common agent that is able to determine what component has failed and inform the other components of that failure, no global state that can be examined that allows determination of exactly what error has occurred. In a distributed system, the failure of a network link is indistinguishable from the failure of a processor on the other side of that link.

These sorts of failures are not the same as mere exception raising or the inability to complete a task, which can occur in the case of local computing. This type of failure is caused when a machine crashes during the execution of an object invocation or a network link goes down, occurrences that cause the target object to simply disappear rather than return control to the caller. A central problem in distributed computing is insuring that the state of the whole system is  consistent after such a failure; this is a problem that simply does not occur in local computing.

Being robust in the face of partial failure requires some expression at the interface level. Merely improving the implementation of one component is not sufficient. The interfaces that connect the components must be able to state whenever possible the cause of failure, and there must be interfaces that allow reconstruction of a reasonable state when failure occurs and the cause cannot be determined.

Similar arguments hold for concurrency. Distributed objects by their nature must handle concurrent method invocations. […] One might argue that a multi-threaded application needs to deal with these same issues. However, there is a subtle difference. In a multi-threaded application, there is no real source of indeterminacy of invocations of operations. The application programmer has complete control over invocation order when desired. A distributed system by its nature introduces truly asynchronous operation invocations. Further, a non-distributed system, even when multi-threaded, is layered on top of a single operating system that can aid the communication between objects and can be used to determine and aid in synchronization and in the recovery of failure. A distributed system, on the other hand, has no single point of resource allocation, synchronization, or failure recovery, and thus is conceptually very different.

Tuesday, February 21, 2012

Be serious when providing a web service


This post is for those who want to get serious about providing service, whether or not you will be working in the cloud computing space. After working at Amazon Web Service and Microsoft, I learned a few things about running a service, but most importantly, I believe I learned the most suitable mindset whether you are an engineer who wants to get better at it or a manager/entrepreneur who wants to learn what is needed to do your best.
Since Amazon, I started understanding what having a services mindset really mean, especially since I joined after having an experience of running a web site on my own. Being a good developer alone does not make you a good services developer unless you are really open to learn. And there is a mindset to do the right thing out there, which is primarily a customer obsession. That is where I have to give Amazon all the credit: for anyone who is an Amazon customer, you know how much the company really focuses on the customer service. However, more than that, if you are or were an Amazon engineer, you know how deeply ingrained into the company’s culture the customer focus is. Even to the point that the company is willing to sacrifice engineers’ time for the sake of providing a customer focused world class service.
My first understanding of what that actually meant started when I saw new hires joining Amazon from other companies that did not have much of a services background and how they thought about services in general. If these people were in charge, we would have had a possibility of deviating from the culture of focusing on the customer experience, but luckily the culture is so strong that they ended up adapting themselves to it and learning the Amazon way of doing things. At Microsoft, it is way more common to see people without the most suitable mindset, since the company doesn’t have the culture of providing service since its inception like Amazon.
[One note before getting to the lessons: if you know companies that are serious about providing services, please share with me – I’d love to know them]
These are the important lessons or pieces of advice I’d give:
  • No customer must be forgotten: that simply means that any customer matters. This may sound silly, especially if you believe that all companies would have this goal. But that is not true, so let me repeat: the goal is that not a single customer should have a bad experience. If necessary, engineers or managers must awaken up if one customer is experiencing issues. Having a mechanism to give credit back in case something goes bad is not the same as really caring about each and every customer.
    • I’ve seen web service systems without any metrics on how customers are actually interacting with the system. The problem is when management, given that we don’t know anything about the system, claims that the system is very successful. There were simply no metrics giving you the bad news.
    • Another common belief is that everything is doing well in your system because you did not get any customer support calls (when you do provide that). The first big mistake here is to believe that all customers will call you if something really bad goes wrong. I am, for one, a customer who will probably give up on the company rather than expect that something will be done if I call customer support. And this model is very bad because it does not account for the service that you lose by having customers walking away from your service – and the bad publicity and word of mouth one will get because of that.
  • All customer impacted operations must be traceable, must be tracked and must be investigated, so you get better over time. And you have data to believe you are actually getting better.
    • This is a corollary of the first one, but it is very important that the service provider is very diligent to trace customer scenarios, detect failures, detect performance degradation, and alert on them. These must be tracked and there must be a serious commitment to fixing issues and improving the service.
    • Another example that we may take for granted, but it’s not true always that companies are being diligent and caring about tracing issues, detecting failures, detecting performance degradation, and most importantly many companies are not necessarily caring whether the issues get fixed at all.
  • Don’t deceive yourself over the quality of your service.
    • Only claim that you have a successful or stable service if you have visibility and know what you are talking about. Do you really catch all potential issues and log them? Do you know that your customers are not experiencing something bad? Do you have visibility into the performance data and trends over time? Do you alert in case things start to get bad?
  • Software fails, so everything must be monitored comprehensively. Better safe than sorry here.
    • I’ve seen lots of push back to monitor components without really understanding the reason why monitoring is important in the first place. One of the counter arguments is that the component should be reliable and should have been tested before going to production, therefore comprehensive monitoring is not required. It may sound ridiculous to have to say this, but software fails for reasons that one doesn’t expect. Yes, it does – for bugs in your code that you did not catch in your tests, for bugs in libraries you use, for hardware issues, for network issues, among others. If one could prove that the software is bug free and has a formal proof that it will always behave well so it doesn’t need monitoring, this person deserves a prize.
    • On monitoring, one of the most important things is to think of how to avoid customer impact at all, so you have to be smart about starting alerting on any signals that indicate that things are going downhill – before they actually do.
  • There is an implicit belief that services will be available, so you better strive for that.
    • If you are providing a service, especially in the cloud computing space, you better be available, especially the core part of your service. You need to understand the system very well from all aspects not to neglect anything. Once you do, you are risking availability for some customers and, if you really care about your business, you would not allow that to happen.
    • A consequence of being available is that you need to be scalable if you’re out there and demand grows. Not thinking ahead of time can potentially kill your business. And again, offering credits back will not fix your brand damage.
  • Know distributed system.
    • Services are distributed system in all cases. At the very least, your customer will be calling your remotely, but typically internally there are many distributed components. Do not architect and make a service available without having the minimum knowledge of distributed systems. And yes, I am talking about theory here. Know partitioning, know Paxos, know CAP Theorem, know about partial failures. Or have someone on your team who knows and always run ideas by this person. If you don’t know well theory of distributed systems – and especially if you don’t have experience running a service – very likely you will get wrong. In the best case it may take a while for something bad to happen and unfortunately your customers will be the ones mostly impacted.
  • Have passion and be proud of what you deliver. Have ownership.
    • You need an organization structure where people feel ownership and they are proud of what they are doing. If nobody is owner, and there’s always someone else responsible for a component, then you will not get the same dedication and willing to learn and solve the problems as you would if people feel that it’s their “baby”.
  • Share and learn the lessons
    • Services that are not successful typically don’t have many problems, but it is simple to see how many lessons successful services had. An effective company learns these lessons effectively and shares them broadly. Have a knowledge base; don’t worry about sharing the shame; make sure people really understand what caused the problem and what takes to fix it. Follow through to make sure changes are driven across the company and these mistakes are not repeated.
  • Be serious about on-call
    • A company that cares about customers will want to be on top of issues. And it’s not only an out of band process that someone will perhaps take a look at the issue, but any customer impacting issue will cause the team to stop what they are doing and go fix it.
    • On-call rotation, although bad for engineers if badly implemented, is vital for a well provided service. Those on call need to be trained, but most importantly they must have an attitude to want to fix the problem and avoid or reduce customer impact.
    • Engineers must know the system well enough to diagnose issues and even potentially fixing issues in different components. That is the goal of a well implemented on-call system.
    • On-call must be reliable – whoever is on-call needs to be paged, get an SMS, an email, or whatever it takes in the most reliable way. You can’t afford to lose alerts.
  • Make your deployment process easy
    • There are many things about deployment, including auditability, but one thing that cannot be traded for anything is its simplicity. It must be simple and easy to deploy for two reasons: you need to be quick to release new features and updates, but you must quicker to fix customer issues. Impacting issues may require fixes right away, and if the process gets in the way, as it does in some cases, who will be the one impacted? The customer.
  • Have people with hands-on experience making the technical decisions
    • People with the skin in the game and who have the actual experience running the service must make the decisions or have great influence. If you don’t have service experience, get someone to help you at the beginning and be humble to take their advice and to learn from them.
  • Take your customer’s feedback
    • Except for some rare cases, typically your customers know more about using the service than you will ever will. Don’t try to be pretentious to assume that you will know more.  Pay attention to their feedback, incorporate into your planning, and be appreciative. Customers help so much – and they stop doing if they notice you don’t care.

Sunday, February 19, 2012

Review - Web Services: Concepts, Architectures and Applications


Just posted this review to Amazon on Web Services: Concepts, Architectures and Applications

First, it is a very conceptual book, which is not a problem in itself, but it's not a book for those who are looking to find code examples or how to architect your web service. Given that is was published in 2004, the value of this book currently is mostly for historical context. It does not make assumptions about the reader's knowledge and starts with the detailed explanation of concepts of information system. From there, it explains the need for middleware to enterprise application integration to web technologies. The context that explains how web services came into existence is responsible for a big portion of the book.

When it comes to the part on web services, the focus is mostly on B2B integration and does not account for the varied application we see nowadays. In particular, it's natural that it does not touch on web services being the foundation in a multi-device world where we have phones and richer clients (running Javascript, Ajax, JQuery, etc.), what one would expect for a more modern book on web services.

Also, it's important to note that it focus primarily on SOAP and spends some time talking about technologies that ended up failing in the end (like UDDI registry) or may not of the interest of readers, like RosettaNet (at least wasn't of much interest to me). And more interesting technologies, like WS-Coordination or WS-Transaction, were not explained in the level of details that I would expect. WS-* standards like WS-Addressing, WS-Routing, WS-Security, and WS-Policy are barely talked about. These sections could have used the the same attention paid to the first section (web service history). In that sense, the book is a little inconsistent on how detailed it is.

I'd have rated it higher had I read it back in 2004. In 2012 it does have value for the historical context and definitely good for those who want to know how we got where we are, but it doesn't help much if what you're looking for is how to write your API to expose your Web Service. For a more modern approach (and potentially more practical) I would try to find other options.

Saturday, February 18, 2012

Right balance between technical and pragmatic: does it exist?

Throughout my career, I found myself on different sides of a debate that I currently have with myself: being technical vs. being pragmatic. I believe that the more experience an engineer has and, as long as s/he keeps an open mind to new lessons, s/he will have a broader perspective of the work being performed. As a consequence of this broader perspective, s/he is able to think from different perspectives and tries do the work in a way that takes care of the different concerns that are raised through the process. If one doesn't have or doesn't learn these different perspectives, s/he will keep doing things mostly in the same way. On the other hand, as one comes across different issues and problems and learns these lessons, this person will know what it takes to avoid them and this can be invaluable. In the software development world, this spans from how to be diligent about the theoretical side in order to guarantee correctness for complex problems all the way to the required tests in order to declare that that piece of software is "production ready".

The challenge for this more experience person is how to determine where to draw the line and also be able to release products or services. After all, typically this kind of project is being done with a business purpose. One one hand, if an engineer is too theoretical or too "formal", then too much can be spent on unnecessary formalisms or concerns and not see things from a practical perspective. On the other hand, not doing the minimum required analysis will potentially bite one in the future. And actually, if the outcome of this work is on behalf of a big company (like I did at Amazon and do currently at Microsoft), the question is whether one is actually being responsible if s/he allows to let something like that happen.

Working in a team environment is the other big challenge. Unless there is a generalized culture of spreading the knowledge and engineers are genuinely interested in getting better, reaching a consensus of what to do is impossible. Actually, to my surprise I realized early that companies and groups vary a lot concerning the "getting better" aspect, and definitely assuming that all people will be interested in learning and getting better is just an utopia. Environments will be always heterogeneous, especially if the culture (including but not limited to mechanisms in place like performance appraisal) rewards people mostly for releasing something rather than actually releasing something good.

Also, even if a subset of those involved in the development process sees issues, then the challenge is how to convince others of them. A typical debate is maintainability. First, people have different opinions about how to write maintainable software, but the question is how to convince others that maintainability is important. Unless upper management actually values that, it will be probably a secondary priority. And not only that, but as a typical political game (read Games at Work for more details) is to do something of impact with the goal of standing out, even if there will be mess to be cleaned up later. And software maintainability is a great example of something that will be a problem only in the future.

Another example is how to convince others of potential future issues either that may happen under stress scenarios or simply because the team did not work through the scenarios to guarantee correctness of the solution from a theoretical perspective. Oftentimes such arguments are dismissed if the system behaves well for some functional tests.

At an earlier point of my career I've been more of the pragmatic type that wanted to ship and getting things done, until I started learning more about the issues that caused real problems in the experiences I've been involved in. I developed a real appreciation for doing things responsibly - what could involve getting to the theoretical part and involve experts when one is dealing with something outside his/her area of expertise. Once I got to that point, I started questioning myself if I am going overboard or being that diligent is actually the right thing to do. Is there any to transmit these experiences to others? Or is it better just to let the process move on and, if the concerns were legitimate, let the organization learn in the hard way?

Another aspect of starting developing this appreciation for a thought through process is that it becomes harder to be proud of what one is working on if you're involved in the process with others that have different mindsets. However, being realistic, one can't really expect homogeneous environments as I mentioned above, and to a certain extent that is a challenge that will be part of one's professional career until the end: there will be always someone around you (with or without more power to make decisions) that will require effort on your part to communicate your concerns. And more often than not, unless the person has gone through the experience him/herself, s/he will not get it.

This Slashdot post touches exactly on the same point: "The last few years have become particularly taxing as I struggle to reiterate basic concepts to the same technically illiterate managers and stakeholders who keep turning up in charge. While most are knowledgeable about the industries our software is targeting, they just don't get the mechanics of what we do and never will. After so many years, I'm tired of repeating myself. I need a break."

On the upside, this discussion leads me to think what is the best organization structure and who should be the upper management. As a trend that I see across some companies, currently I believe that engineers with strong theoretical background with the right practical balance are the right people to be on top. They will fundamentally help spread a culture where the right solutions - from an academic point of view - are encouraged. Brad Calder, a former professor at UC San Diego, moved to Microsoft some years back and is a good example of person that I believe should be at the top. Albert Greenberg, with a research background, is currently in Windows Azure too (as mentioned here) and seems to me the right person to be at the top as well. Typical non-technical business executives may see mostly the releases from the business standpoint but will not understand that many times that cheap solution has some serious drawbacks. And, if one is talking about a service (as in cloud computing), just a few serious production issues can harm a brand in a very significant way.

ASP.NET Web API: ObjectContent constructor changed

WCF Web API was released as ASP.NET Web API (part of ASP.NET MVC4 Beta) and, as part of this release, there were a few changes for those using the Web API. For instance, this is how we were using the Web API to serialize our object when making a POST or PUT call:
var oc = new ObjectContent<MyClass>(input, new MediaTypeHeaderValue(Constants.ContentTypeXml));
After this week's release, this constructor does not exist. This post on the ASP.Net Forums shows that the constructor is now internal, so one needs an alternative. This is the alternative I found to do the same as above:
HttpRequestMessage<T> request = new HttpRequestMessage<T>();
ObjectContent<T> content = request.CreateContent<T>(operationInput, new MediaTypeHeaderValue(Constants.ContentTypeXml), new MediaTypeFormatterCollection() { new XmlMediaTypeFormatter() }, new FormatterSelector());
Definitely a little more verbose than before, but that solves the problem until the ObjectContent constructor becomes public again in a future release (as mentioned in the forum post).

ASP.NET Web API HttpClient and Client Certificates

I started working with WCF Web API (now released as ASP.NET Web API) and one thing that wasn't obvious is how to add a client certificate to the client, since the process is not the same as HttpWebRequest.

In order to do that, one needs to instantiate a WebRequestHandler, set the client certificate there, and use that to create the HttpClient.
WebRequestHandler handler = new WebRequestHandler();
X509Certificate2 certificate = GetMyX509Certificate();
HttpClient client = new HttpClient(handler);

Wednesday, February 15, 2012

Web Applications evolution to support multiple devices as clients


I wrote this a couple of weeks back on the evolution of web applications with respect to becoming web services and the need to support multiple devices as clients. Although there are many accounts on how web services came into existence, like through the evolution of middleware, this post is mostly about how web applications turned into web services over time.


When the World Wide Web came into existence, web applications were targeted at only type of client: a web browser. Although there was some variety among desktop computers and laptops, typically they wouldn’t vary much in terms of screen size or computational power. At that time, the clients would not perform much, and the web applications would generate a synchronous server request for the majority the user actions. The server, in this case, would process the request and return a view (typically an HTML page) ready to be rendered by the browser.
Over time, however, with the evolution of technologies like JavaScript on the client side, this synchronous model where each user request would need to wait for response from the server side started to give space a model where more functionality has been moved to the client side. Requests were made asynchronously, not necessarily related to a user action, and did not block while it waited for a response.
Since operations were performed asynchronously and did not block any longer, the logic to display a user interface moved to the client side, which knew how to react to asynchronous responses and update the UI accordingly. That removed the job from the server side to take care of views for the most part, and it started being a service that would take requests on some data (“model” in MVC framework) and return status codes and potentially some more data (“model”) to display. At this point, web applications were already moving into a data oriented direction.
While applications were mostly focused on the web and on one user interface, the way requests and responses transmitted data was not very revelant and could be specific to the application or the framework used. Some web application frameworks had their specific way of perfoming asynchronous requests, and as long as a browser was your only client, that was fine.
With mobile devices like smartphones, initially the typical solution was to have a mobile version of the web site, which would perform similar actions as the main website. However, in the past few years we’ve seen the uptake of mobile applications (“apps”) due to the huge success of smartphones, which introduced different user interface on the client side. A web interface as returned by the web server no longer sufficed as that wouldn’t be the preferred interface on these devices. In order to support both a web site and a mobile application, how data would come into the service and come out would need to be at least abstracted in a way that, irrespective of the client side, we could use the same server.
In addition to that, the fact that the mobile app was built and run on a different platform was another major factor to call for the right type of abstraction. In order to fetch data from server or submit data back to server, the requirement on the server shouldn’t be too heavy, as at this point our client is no longer only a powerful desktop or laptop computer, but also a mobile device without so much CPU power memory and memory.
And not only mobile apps, but a range of different devices (like tablets, slates, large screens) and different input methods (like touch screen or sensors) accelerated even further changes to modern applications that wanted to expose their functionality to multiple consumers seamlessly. Actually, even on the same device one can have multiple experiences, like on a smartphone one can access an app and a web application through its browser, or on a desktop PC one is able to access a web site, to have a regular application, or even have a new experience with Metro-style applications in Windows 8.
The answer was that these web applications would need to be turned into web service to be consumed by these different UIs, which would perform operations on the data. And that is the context for those that want to make available a service – the direction is to expose the data in an easily consumable format, using a practical protocol, and build the different interfaces around that. If this is only used by a mobile application, only the web service will be used. If this is also a web application, some web resources (like scripts) will need to be hosted. But irrespective of your customers, a good architecture is to pick the right technologies and abstractions to be ready for these challenges and future evolution.