Wednesday, December 28, 2011

The parable of the sadhu

Yesterday I read this Harvard Business Review called "The parable of the sadhu", by Bowen McCoy.

This is a very popular HBR article where the author faces a moral dilemma in Nepal. When climbing the mountains, his group finds an Indian holy man - a sadhu - lying on the ice, suffering from hypothermia. The dilemma between taking or not care of the man and their own needs ends when they give him some aid and comfort, but carry and leave him close to a hut, not knowing for sure whether he makes to the hut and, for that matter, if he lived after that.

The discussion that comes up after that is what is limit of their responsibility in a situation like that.
"'Where, in your opinion', I asked, 'is the limit of our responsibility in a situation like this? We had our own well-being to worry about.'"
And after analyzing the situation, the author understand that...
"One of our problems was that as a group we had no process for developing a consensus. We had no sense of purpose or plan. […] Because the group did not have a set of preconditions that could guide its action to an acceptable resolution, we reacted instinctively as individuals. […] We had no leader with whom we could all identify and in whose purpose we believed."
 I don't to give away the entire article, but I must quote these paragraphs that summarize the lesson:
"Individuals who operate from a thoughtful set of personal values provide the foundation for a corporate culture. A corporate tradition that encourages freedom of inquiry, supports personal values, and reinforces a focused sense of direction can fulfill the need to combine individuality with the prosperity and success of the group. Without such corporate support, the individual is lost."

"That is the lesson of the sadhu. In a complex corporate situation, the individual requires and deserves the support of the group. When people cannot find such support in their organizations, they don't know how to act. If such support is forthcoming, a person has a stake in the success of the group and can add much to the process of establishing and maintaining a corporate culture. Management's challenge is to be sensitive to individual needs, to shape them, and to direct and focus them for the benefit of the group as a whole."
It is interesting to analyze our own corporations and if we provide the foundation for a corporate culture, if any. Do we encourage freedom of inquiry? Do we support personal values? Do we have a focused sense of direction? Oftentimes I think that "the individual is lost" in many corporate environments, like the author says above. And this is harmful to the company.

PS: Photo from Flickr (Creative Commons).

Java: how can a 1Gbit/s attack keep up to 100K i7 CPUs busy?


This is a hash table vulnerability found in many web application platforms, like PHP, ASP.NET, Ruby, and Java (but not only Java). In one of the cases, 1Gbit/s can keep up to 1 million CPUs busy!

Microsoft has already posted an advisory on this issue. The video has more information about other platforms.

The video below was published today and contains more details on the vulnerability. It is quite interesting, especially if you're into security.

And here you can find a blog post with more details on the talk:

Monday, December 19, 2011

StackOverflow clone with RavenDB

Today I watched the video below on how to build a StackOverflow with RavenDB. This was my first video on a document store database, and it was amazing to show show interesting and easy it is to use it for a site like StackOverflow.
I was surprised to know that RavenDB supports transactions and by its integration with Linq. Although it's a .NET solution only and requires a commercial license if you're not working on an open source project, it was good to start learning the benefits of a document store.
It also supports full text search, which is implemented under the hood using Lucene.NET.

If you're interested in getting your StackOverflow running, do the following:
  • Download RavenDB from RavenDB website
  • Extract RavenDB into a directory and run Server\Raven.Server.exe
  • Download RavenOverflow from its GitHub repository
  • Extract RavenOverflow in a directory and open RavenOverflow.sln in Visual Studio
  • Right-click on "RavenOverflow.Web" and click on "Set as StartUp Project"
  • Hit F5 in Visual Studio
The original post by the presenter can be found here.

Friday, December 09, 2011

DNS domain names: 253 or 255 bytes/octets?

The question of whether DNS domain names are 253 or 255 is something that is hard to find a good confirmation on, but I hope to be able to provide the answer here in this post.

Let's start taking a look at some RFCs:

So it seems that the domain names should be up to 255 octets, right? That is not what Wikipedia says:

There is even a long discussion on Wikipedia about the right value here

And even an RFC mentions 253 octets:
  • "When the result of macro expansion is used in a domain name query, if the expanded domain name exceeds 253 characters (the maximum length of a domain name) [...]"

Then you start playing with Microsoft DNS or BIND, and tools on both Windows and Linux and see some interesting behaviors. For instance, nslookup on Windows times out when domain name is 255 characters against a BIND server. The version on Linux, however, you get an explicit error for anything longer than 253 characters (ASCII, hence 253 octets):

host <255-char domain name>
<255-char domain name> is not a legal name (ran out of space)

The answer lies actually in the good old RFC 1035 - thanks to a colleague for finding this definitive answer:
  • "Each label is represented as a one octet length field followed by that number of octets. Since every domain name ends with the null label of the root, a domain name is terminated by a length byte of zero."

The answer is that, over the wire, the domain name uses 255 octets. The first one is used to indicate the length, and there's a last byte that is a terminator. So what is left for the actual domain name is 253 octets - which can represent different numbers of characters depending on your domain.

Sunday, December 04, 2011

Microsoft Visual Studio Tips and Tricks

As I am trying to become more productive using Visual Studio - even after some months at Microsoft, I still think I am more productive with Eclipse, I came across this good TechEd talk on Channel 9 on tips and tricks that taught me a bunch of nice things about Visual Studio:

I hope it helps you too.

Saturday, December 03, 2011

Visual Studio add-ins for pasting XML


I compiled and installed a couple of Visual Studio add-ins that I thought quite useful for those dealing with XML, in particular pasting them into VS.

SmartPaster 2010
Allow you to past XML as comments, strings, and as StringBuilder – escaping the text as necessary!

          In my case, I had to paste a long XML from my API document and did not want to escape everything.
          This is the source site:
o   Binary version did not load in Visual Studio 2010 64-bit (exception was thrown), so I recompiled it and it worked just fine.

Paste XML  as Type
         This add-in is really cool for REST APIs. You copy a XML (like from our MSDN documentation) and it paste it as a serializable type. All classes are automatically generated.

          It was included in Microsoft WCF REST Starter Kit Preview 2:
o   It doesn’t have a binary version, so I compiled this add-in and made it available too

Sunday, November 20, 2011

Silverlight, cross-domain issues, and self-signed certificates

I've been meaning to post this for quite sometime now as I haven't seen others with exactly the same issue. First, some context: when running a Silverlight application, it has some special security measures in place to avoid Cross-Site Request Forgery (CSRF). By default, Silverlight only allows site-of-origin communication - for instance, "" will be able to access "", but not "". In order to allow more than site-of-origin communication, a service owner must have a clientaccesspolicy.xml file in the root configuring which domains are allowed to access that service. If you're interested, this is explained in greater detail on this MSDN site.

The issue I ran into is that I had a Silverlight application and also a service, both running locally. My service had a proper clientaccesspolicy.xml configured to allow access from anywhere. And still my Silverlight would fail with the message:

"An error occurred while trying to make a request to URI 'https://MYDOMAIN/MYSERVICE.svc'. This could be due to attempting to access a service in a cross-domain way without a proper cross-domain policy in place, or a policy that is unsuitable for SOAP services. You may need to contact the owner of the service to publish a cross-domain policy file and to ensure it allows SOAP-related HTTP headers to be sent. This error may also be caused by using internal types in the web service proxy without using the InternalsVisibleToAttribute attribute. Please see the inner exception for more details. ---> System.Security.SecurityException ---> System.Security.SecurityException: Security error..."

After debugging the issue further, the problem was that my service had only a secure endpoint (SSL) and its certificate was self-signed (or did not match the domain, can't remember now). In that case, my Silverlight application would not download the service's clientaccesspolicy.xml and therefore declined access to it. Since I was running code within another larger application that I did not have control of, I did not investigate further whether one can configure to allow self-signed or mismatched certificates to be accepted during development. (In case you know if these are possible, please let me know!)

How did I get it solved? If you're running in Internet Explorer:

  1. Before loading your Silverlight application, first access the clientaccesspolicy.xml file. IE will alert about being self-signed or mismatched cert, but you can opt to proceed with it.
  2. In the same tab, access then your Silverlight application. It will be able to access your clientaccesspolicy.xml at that point, and the call will go through. 

Simple trick, and effective. I'd love to know if other browsers work the same. By the way, this was tested in Internet Explorer 9.

ReadyNas, WebDav, and "Method Not Allowed"

I have a ReadyNas Duo network-attached storage, which I access via WebDAV only due to some permission conflicts if I use different protocols to write files to it. Given that Windows does not support WebDav properly in my case, I installed a WebDav client called BitKinex. I configured it to point to my share and guess what: "HTTP: Method Not Allowed (/)" error. This is the dialog:The problem is that BitKinex, by default, points to the root of your server. In ReadyNas case, it has different shares, and you must point to the right share to get it fixed. In order to do that, right click on the WebDav connection and select "Properties". Go to "Site Map" and update "/" with your share name (in my case, "/documents").Then it works fine.

Thursday, November 17, 2011

.NET: do not use System.Uri for domain validation

Last time I talked about System.Uri, I was talking about a bug that prevents trailing dots from being used for REST resources. Now the issue is different: how about relying on System.Uri for domain validation?

It's not uncommon to see System.Uri being used to validate an input that is supposed to be a domain name. I've seen code like this trying to validate domains:
public static bool IsDomainValid(string name)
new Uri("http://" + name);
catch (UriFormatException)
return false;
Or, besides relying on UriFormatException or on the the Host property, something like this:
public static bool IsDomainValid(string domainName)
if (StringComparer.OrdinalIgnoreCase.Equals(new Uri("http://" + domainName).Host, domainName))
return true;

return false;
catch (UriFormatException)
return false;
Preliminary tests with this code shows that completely wrong domains (like are rejected, so it seems to be a great code. And the best is that we don't have to write any domain validation ourselves.

Now, what about the following domains?

They are all considered valid according to System.Uri(). However, according RFC 1035 or RFC 1123, they are not. According to RFC 1035, not even a digit only domain (like is valid, but System.Uri() is fine with all of them.

I played with some of the internal flags and it seems that, if you use E_HostNotCanonical (256), it starts rejecting some of these invalid domain names, but I really couldn't understand the rules it follows. And since there are different RFCs and different interpretations, it would be really hard for System.Uri() to do a precise validation unless one passed the type RFC that the domain is expected to be compliant with.

At the end of the day, you're better off understanding the RFC you want to comply with and implementing the proper regular expression for that. In my case, I wanted it to be compatible with RFC 1123, so this is the regular expression I started with:


And then relaxed it to the following after learning that digits only domains were accepted by RFC 1123 (there are multiple interpretations, but I read the RFC and was convinced that it was fine).


This is the regular expression per domain label (text between the dots). It does not apply to the rightmost label as it must not start with a digit - in order to differentiate a domain name from an IP address.

Also, this regular expression requires an explicit check that the entire domain is less or equal to 255 characters.

Wednesday, November 16, 2011

Regular expressions: backtracking can kill your performance

Or why you should learn atomic grouping...

After last post on turning off useless backtracking by using atomic grouping, I kept on reading the Regular Expressions Cookbook and ran another experiment to validate the performance difference. Let's start with the results:

Normal Regex - # of loops: 1000, # of matches: 0, time (ms): 66149
Atomic Grouping - # of loops: 1000, # of matches: 0, time (ms): 11196

Note that a normal regex was 6 times slower than atomic grouping to fail to match.

The example from the book is a regex to match a well-formed html page. I saved a Wikipedia page, made a few adjustments to the program used in the last post (e.g. to read the html contents from file), and ran the tests.

The regular expression used is:
Every test I ran with this regex and with the version without atomic grouping, I can see the normal regex being 6 times slower. If you want to know more about using atomic grouping or a regular regex, please read my last post.

And this difference was found without setting the regex to be compiled. After setting this flag, these are the values I get:

Normal Regex - # of loops: 1000, # of matches: 0, time (ms): 49319
Atomic Grouping - # of loops: 1000, # of matches: 0, time (ms): 9471

Still a pretty significant change between normal regex and atomic grouping.

Tuesday, November 15, 2011

Regular expressions: turning off useless backtracking

Last time I mentioned the nice feature on how to change quantifier to be greedy or lazy, and then help you match what you really want. This time, it is how to make your regular expression more efficient. First, we need to remember last post where greedy or lazy quantifier change how backtracking works. Sometimes backtracking just doesn’t make sense. Look at this example:
It is supposed to match integers at word boundaries (\b means boundaries). At first, it may be a bit hard to understand, but backtracking is unnecessary here. For instance, try to run this regular expression on an example: .
789abcdef654 123
At the point when the regular expression fails (when it checks that “a” is a word boundary), it doesn’t make sense to start backtracking to see if 9 is a word boundary (or 8, for that matter). We should just go ahead and move on to the next token.

This is where it pays off to know well the tool you’re using. Different flavors of regular expressions offer ways to avoid keep backtracking positions, so when a match fails, it just moves on. In case of Java and .NET, both support atomic grouping. .
Atomic grouping here is represented by “(?>)”. When the regular expression engine leaves the group, all backtracking positions are lost, so a failed match will not have any recourse other than moving on to the next characters to find further matches. It will give up on the current context.

In the example above, when it matches 789, it will leave the atomic group, so when \b fails to be matched there are no backtracking positions to be tried. From this quick analysis, we see that we avoid a lot of extra computation by avoid these useless backtracking.

The question about saving time is how much we actually save here. I wrote a test code to benchmark these different options and verify whether we are talking about any substantial savings or not.
static void Main(string[] args)

static void TestAtomicGrouping(int numLoops)
Regex regex1 = new Regex(@"\b\d+\b", RegexOptions.Compiled);
Regex regex2 = new Regex(@"\b(?>\d+)\b", RegexOptions.Compiled);

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++)
sb.Append(' ');
for (int i = 0; i < 100; i++)

string testString = sb.ToString();
int firstMatchCount = 0;
DateTime start = DateTime.Now;
for (int i = 0; i < numLoops; i++)
if (regex1.IsMatch(testString))
TimeSpan firstTest = DateTime.Now - start;

start = DateTime.Now;
int secondMatchCount = 0;
for (int i = 0; i < numLoops; i++)
if (regex2.IsMatch(testString))
TimeSpan secondTest = DateTime.Now - start;

Console.WriteLine("Normal Regex - # of loops: {0}, # of matches: {1}, time (ms): {2}",
numLoops, firstMatchCount, firstTest.TotalMilliseconds);
Console.WriteLine("Atomic Grouping - # of loops: {0}, # of matches: {1}, time (ms): {2}",
numLoops, secondMatchCount, secondTest.TotalMilliseconds);
Now we need to see results:

Normal Regex - # of loops: 1000, # of matches: 1000, time (ms): 48.0028
Atomic Grouping - # of loops: 1000, # of matches: 1000, time (ms): 30.0017

Normal Regex - # of loops: 10000, # of matches: 10000, time (ms): 386.0221
Atomic Grouping - # of loops: 10000, # of matches: 10000, time (ms): 239.0137

Normal Regex - # of loops: 100000, # of matches: 100000, time (ms): 3145.1799
Atomic Grouping - # of loops: 100000, # of matches: 100000, time (ms): 2079.1189

So, at the end of the day, getting rid of backtracking can be quite significant if you’re matching this regular expression quite often. In this test, we could save 32% or more of the matching time just by “turning off” backtracking with atomic grouping.

Monday, November 14, 2011

Regular expressions: greedy vs. lazy quantifier

I am reading the great book Regular Expressions Cookbook after seeing today a few things that I did not about regular expressions. I will get to the interesting regular expression I had to work on in a future post, but for now I will share something I found quite interesting: greedy and lazy quantifiers.

Let's start by trying to match a paragraph in HTML. A paragraph is typically surrounded by <p> and </p>. So, I would write regular expression as:

This should take care of matching the paragraph, right? Partially right. If you have a long HTML, with multiple paragraphs, this will match from the first paragraph start (<p>) to the very last paragraph end (</p>). This "*" is actually called a greedy quantifier.

If you want to make it behave differently, you will want to use what is called lazy quantifier. This is just the regular question mark placed after another quantifier. Note that, if question mark is placed after a regex token, it means "zero or once". This is not what we are talking about here - question mark here after a quantifier means that it changes the quantifier behavior.

In the example above, it matches the first paragraph only, not the entire text.

Under the covers, the regular expression engine uses backtracking to match the expression. For a greedy quantifier, it eats up all the content that matches the current regular expression and then moves to the next token. In the case of the paragraph matching example, it reads the entire text until the very end. Then it moves on to the next token (in this case <) - and since it fails as the document finished, it back tracks, and tries to match < again. It keeps going back each character until it matches.

For the lazy quantifier, it repeats as few times as it can, moving to the next regex token (here <). If the token is not matched, then it back tracks and moves forward another time, seeing if the token is matched.

I was happy to learn this, as I had always asked myself how to control this behavior. And quite interesting to understand the regular expression engine behavior, what can come in handy. Just be careful when using the question mark. As said above, it can serve two purposes depending on where it's placed.

Sunday, November 13, 2011

Moving Channel 9 to Azure: good design principles

Today I read a great article on Microsoft Channel 9 moving to Azure talking about the sound design principles in place and lessons the Channel 9 team shares about how to move a web site to run in the cloud.

One of the things that caught my attention is to see a Microsoft project using a distributed cache fleet running Memcache. Using a caching layer is definitely the right thing to do in many cases to make the site more scalable. I wonder why they haven't used Windows Azure AppFabric Caching. And also, after working on Amazon Elasticache before joining Azure, I'd be curious how they monitor their Memcache instances.

I was very glad to see modular code, coding to interfaces, and mostly dependency injection being used. While dependency injection is pretty popular in the Java world, it's still not as popular for many Microsoft developer. They mention dependency injection being used "for testing purposes but also to isolate you from very specific platform details". Very well done.

Division of labor is a right principle for environments where machines are not reliable. This is proper mindset about machines in the cloud: "In practice they tend to run a very long time, but you can’t depend on that fact." And breaking down the tasks and using worker roles to pick them up, connecting them via queues, seems a smart strategy (assuming you have proper monitoring on these queue depths in place). In particular, I like the fact that the Channel 9 did not just thought that instances run for a long time and released an architecture based on that, so potential problems could be addressed in the future. Unfortunately I've seen a lot of people with this mindset, and Channel 9 did very well here.

From the article, though, the only thing that could have been done better was to think about database sharding. Although SQL Azure will provide Federation, there are many things that service owners need to think about: what the database partition key will be, what queries will need to go over partition and impact potential scalability, what queries will need to be federated, etc. I am not very familiar with SQL Azure Federations and don't know if it will repartition automatically hot partitions, but if it doesn't, that's another task service owners need to prepare for. With all that said, you don't need to shard right away, but you need to think of that before you service version 1 goes out, otherwise scaling can be a major headache - and if you can't afford downtime, then that can be an almost impossible task to accomplish in some cases.

All that said, I was very glad to read about their work and their sharing the architecture and lessons publicly.

Link to the InfoQ article:

Saturday, November 12, 2011

Why high code coverage is not enough

Managers typically like high "code coverage", and oftentimes think that this means that the code quality is good. I agree that low code coverage definitely means that one doesn't have enough unit tests, but high code coverage may not mean much either. It's required but not sufficient. To prove this, let's take a look at one example.

Once upon a time, I saw the following regular expression in a production code. I will write it in C#, but the language or platform doesn't mean much.

public static bool IsValidIp(string ipAddress)
return new Regex(@"^([0-2]?[0-5]?[0-5]\.){3}[0-2]?[0-5]?[0-5]$").IsMatch(ipAddress);
Let's say now that you have one unit test to make sure that your "boundary case" is accepted.

Now you are happy, get the code checked in, and brag that you have 100% code coverage for that IsValidIp method. And so what? A simple "" IP address is not considered a valid address. Completely buggy code, but 100% code coverage.

That is why managers that really understand what is being developed and have the chance to spend time looking at the code can make a total difference in the final product's quality.

Note: on the case above, it's amazing that the developer did not Google'd for the right regular expression for Ip validation, did not write data-driven unit tests to make sure different Ips are being written, and that code reviewers did not review it properly.

On Zynga and its "give back stock or get fired" story

This week it's been all over the place the news about Zynga CEO and its executives demanding that you either give back the not-yet vested stocks or face termination. One of the ways that startup companies have to lure employees into taking the risk is to offer equity - that's the currency for startups for the risk taking as well as for offering lower salaries and demanding long hours. Once you offer this equity, I think companies must honor their contracts. At the same time, this can tell you a lot about the company, its value, and whether other people will want to join them in the future.

However, Zynga is correct in trying to be meritocratic. Those who contributed more should have a bigger piece than those who are around but did not contribute much to the company's success. A good compromise would be to have policies stating that the stock grants are dependent on your performance evaluation. Some sort of multiplier would be applied in this case - if you reach the expectations, you will get 1x your stock grants, if you're a rock star, you could get up to Nx (e.g. 2x), and if you're an underperformer, you may get nothing at all. That is much more fair than just demanding stocks back or threat with termination. Of course no system is entirely fair, as it can subjective and politics always play a part there, but it's better than lure people into thinking that they will get their stocks if they stick around long nothing and then fire them primarily for this reason.

On other hand, though, how many other companies may be contemplating or actually firing people with unvested stocks to accomplish the same goal? At least one can say that Zynga was transparent on the reason why they would fire its employees. But this is the kind of transparency that one doesn't see quite often because it lowers all employee's morale and the company's moral values are questioned. A company that does the same, but not that openly, seems to be much better off as employees tend to still believe that the company abide by its moral principles and its worth putting in all the effort to make the company grow.

Thursday, November 10, 2011

Coding guidelines and readability

The more experience I get in the industry, more and more I value great developers that know how to distinguish great from good code. And I am glad that I had a fantastic experience reading and working with the Linux kernel - most of the code I had seen at the time falls into the category of great code.

There are many aspects of a great developer that one can think of, but I'd like to focus here on code readability and maintainability. First, one quite important distinction, especially for those used to following strict "code guidelines". Code readability is not so much about where you place the brackets, or any style that can be verified by a static analysis tool. These are usually what don't really matter much in my opinion.

The real readability is about the art in writing your code, not its science. Things like how to properly break your code into methods, how to name variables, classes, and methods, how to use spaces properly, how to make proper use of the fixed-width characters or align/indent code.

In my opinion, great engineers know that the code must not just work, but it must be a work of art. Something that you and others can read in the future and maintain. It is NOT just about getting it work. A lot of code I read works, but they are not readable, not elegant, and oftentimes not efficient at all.

Digressing a bit, I miss systems where you actually need to get the max performance out of a system. That seemed to require better engineers than nowadays. Now, with web services and cloud computing, oftentimes one doesn't care much about performance as you can always get a faster box to run your code. After working for two cloud providers, I see that this can specially happen with those providing these cloud services. I wonder whether these engineers who just throw more CPU power or memory at a problem actually ever thought of the cost of running a service and that this is one of the things factored in that will make the difference between being profitable or not. Or, on a more philosophical side, if they really feel pride of the engineering in the code they write.

But back to coding guidelines, I really recommend that you read the article below (published in the ACM Queue magazine) if you want to get better at that. It has a few tables that you should print out and put it up on the wall where you can peek when writing your code.

Coding Guidelines: Finding the Art in the Science

Today I also came across a new O'Reilly book on the topic called "The Art of Readable Code", which seems to be quite interesting and probably delve into this topic in much greater detail.

The Art of Readable Code

URI segments, dots, REST, and .NET bug

These days I learned about a bug in the System.Uri() class that would strip leading dots from URI segments. See an example:




That happens if your client or your server is .NET. If your client believes you support the URL RFC correctly, it may send the request with trailing dots, and when it gets to your code, these dots are gone.

The implication is that, if this URI segment is actually a resource name, you may be in trouble. Let me show you a concrete example:
  1. Resource is created by posting to URL: http://host/addresses. At this point, the resource name is passed in the payload and your service will correctly accept these trailing dots. For example, let's say we create an address named "home." So far, so good.
  2. User tries to perform a REST operation on this resource. It could be something as simple as a GET on http://host/addresses/home. (dot included)
  3. In the case you have a .NET client, the request will go out as http://host/addresses/home (no dot). Of course your server will return the wrong data or an error (like 404 - not found)
  4. In case you have a non-.NET client, the request will go out correctly, but if your server is .NET-based, then you may have an issue. For instance, a WCF REST service will have this resource name parsed as "home" (no dots), which will also return the wrong data or an error.
The consequence is that, because of that, your .NET REST service should not allow dots. At least trailing dots. However, allowing dots everywhere but at the end is not desirable and quite possibly you will forbid dots altogether.

There's a workaround for this issue if you control both client and server code. However, in case your customers are generating client proxies, then you must document what they need to do.
MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
foreach (string scheme in new[] { "http", "https" })
UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
if (parser != null)
int flagsValue = (int)flagsField.GetValue(parser);
// Clear the CanonicalizeAsFilePath attribute
if ((flagsValue & 0x1000000) != 0)
flagsField.SetValue(parser, flagsValue & ~0x1000000);
The code above clears a flag that is set to canonicalize an URL as a file path. Yes, all URLs are thought to be Windows file locations.

Unfortunately this bug is known since 2008, but has never made into a .NET release. It is marked as fixed, but as of .NET 4 we are still waiting for the fix to be released.

Here you can find more details about this issue:

Internet Explorer vs. Chrome: Recover Session

Working at Microsoft, I use Internet Explorer, but when I want to separate some sessions, I run Chrome at the same time. And I wanted to tell you about an experience I had last week.

Last weekend, after plugging in my Garmin heart monitor to download my bike run, my PC just rebooted (that happened more than once with this monitor). After coming back up, I was expecting that Chrome would recover my session, but not Internet Explorer (from what I recall, IE hasn't been very good at this). And guess what? Chrome did not recover one single of my tabs, while IE recovered the entire session correctly. That was clearly unexpected and a glad surprise.

In the end, I had to go over Chrome's history to recover some tabs that I hadn't read it, what I could avoid with IE.

Monday, September 26, 2011

WCF + REST + PUT/DELETE = 405 (Method not allowed)

In short: uninstall WebDAV module from IIS.

If you are getting an 405 (Method not allowed) error back when hitting your WCF REST endpoint, you should check your IIS configuration to make sure you don't have WebDAV. In my case, WebDAV was intercepting requests and returning 405 for PUT and DELETE requests. Note, though, that it is not enough to disable WebDAV, you have to uninstall it.

Sunday, September 25, 2011

The design of the Domain Name System


Good posts on DNS, potential issues and limitations with it, and how to design applications to use DNS.

The design of the Domain Name System (Part I)

The design of the Domain Name System (Part II) - Exact and approximate name matching

The design of the Domain Name System (Part III) - Name structure and delegation

The design of the Domain Name System (Part IV) - Global consistency

The design of the Domain Name System (Part V) - Large data

The design of the Domain Name System (Part VI) - Overloaded record types

The design of the Domain Name System (Part VII) - Related names are not related

The design of the Domain Name System (Part VIII) - Names Outside the DNS

Saturday, August 06, 2011

Flickr interestingness downloader in Ruby

And this time this is the Ruby code using Flickraw gem to download large size versions of Flickr interesting photos.

require 'flickraw'


photos = flickr.interestingness.getList( :per_page => 500 )

frob = flickr.auth.getFrob
auth_url = FlickRaw.auth_url :frob => frob, :perms => 'read'

photos.each do |pic|
photo_info = =>
photo_url = FlickRaw.url_b(photo_info)

puts "Downloading #{photo_url}"

open("flickr/" + + ".jpg", "wb") { |file|

S3 file bucket downloader in Ruby

Today I wanted to download files from a website that I happened to find out that stored all files in S3. By accessing the website root, I realized that it was just the response of a S3 ListBucket API call. For instance:

<ListBucketResult xmlns="">

In order to download all files more quickly, I wrote the following Ruby program that downloads all files from this website, and I hope it can be useful for others:
require 'net/http'
require 'rexml/document'

baseurl = ''

# get the XML data as a string
xml_data = Net::HTTP.get_response(URI.parse("http://" + baseurl)).body

# extract event information
doc =
titles = []
links = []
Net::HTTP.start(baseurl) do |http|
doc.elements.each('ListBucketResult/Contents/Key') do |ele|
puts "Downloading " + ele.text
resp = http.get("/" + ele.text)
open("images/" + ele.text.gsub("/", "_") + ".jpg", "wb") { |file|
puts "Done"

Tuesday, July 26, 2011

link_to data-method delete not working in IE9?

I was testing a Ruby on Rails 3 application with devise authentication library and to my surprise, link_to with data-method was not working in Internet Explorer 9 (IE9). In that case, the method was set to delete. It works perfectly in Chrome, though.

There are some bugs reported on that, but the solution I found was
  1. Make my application depend on jquery
    • Edit your Gemfile and add:
      • gem 'jquery-rails', '>= 1.0.12'
  2. Run rails generate to install jquery and get rid of other .js files (like prototype.js)
    • rails generate jquery:install
  3. Run application
I did not see these instructions anywhere, just decided to follow what I had done in another project to test Ajax support for Ruby and it just worked. I hope it works for you.

By the way, I've seen several mentions of this, but since my Rails 3 already had it, I did not bother much. In your case, also make sure that your application.html.erb has these tags in your :

<%= javascript_include_tag :defaults %>
<%= csrf_meta_tag %>

Saturday, July 23, 2011

Thoughts on "growing as a developer"

I just read the following blog post written by Robert Bowen:

These are the highlights:
  • We should always be moving forward, reaching for the next plateau
  • Find a way to grow, actively pursing opportunities of growth
  • Make sure:
    • Your schedule allows for that
    • You don't think you mastered the field
    • You think outside of your day-to-day job world that may have become stagnant and monotonous
  • Ways to grow:
    • Step outside of our comfort zones to try something new and experiment to guarantee to learn something new and push our skills to new heights
    • Study the work of those you admire
    • Keep up with the field and how it is evolving
    • Reach out to others for feedback
    • Collaborate on projects with others that will push you to challenge yourself
    • Be active in the community, like running a blog, contributing to blogs and discussions
This is a great post and reflects my motto of always keep getting better and pushing myself. I believe that, by getting better, you will grow in your field and, no matter what the current circumstances are, over time that will pay off.

Is it worthwhile?
I've started challenging myself in this regard by asking: is my motto actually something worth living by?  This pay-off will definitely be worth in terms of personal satisfaction. Like one that has hobbies and personal projects, oftentimes these are for personal satisfaction more than for some special reward in the future.

Provided value
Other than personal projects, when you are  an employee or a freelancer, or even an entrepreneur, this growth pays off to the extent that it provides value to those paying for your time, service, or product. And that is very important to realize sooner than later.

Multiplying effect
Even getting better technically, typically you hit a ceiling at some point, because no matter how much you can accomplish, you can accomplish only so much as an individual contributor. Unless you have a multiplying effect in the organization. Not that multiplying effect is something easy to define, and can be subject to politics and subjectivity, but the general idea is that on can make the entire organization better. Only doing that you can grown as an individual contributor, otherwise you will not grow beyond a certain point in your career.

Is getting better valued?
Besides that, another very important point is how much getting better is actually valued by the organization you are part of. If you are surrounded by people that do not see value in getting better - they may think that the current level is good enough, or just don't value improvement as long as things get done - you will grow frustrated as your growth will not be recognized and you may not even be potentially be rewarded at all by it. In some cases, you can be even penalized by that. This all can lead to potential stagnation and will require a major effort for one to keep sanity in such an environment. Notice that, if you are dealing with customers (as a freelancer or entrepreneur), it can be the same.

Address actual needs
The key is how you can leverage your getting better to provide more value - and find or create the environment that nurtures that mindset. You can make your program/code/design more efficient, more resilient, more secure, etc, and that is where your growth will manifest itself. In reality, the trick is to find the right set of people that really want more efficient, more resilient, more secure, etc, as oftentimes they think the current state of affairs is good enough. It just could be that the challenges faced by the company cannot be fixed or substantially improved by your skills, in which case the relationship can grow sour over time.

Being ahead of the curve
Eventually, if one keeps getting better, it becomes increasingly more difficult to find the right opportunities to leverage this knowledge, and this can be incredibly frustrating for the individual. The personal satisfaction is still there, but as we say that some people are much ahead of their times, some people are just ahead of the curve and must find how to apply their knowledge. Taken to the extreme, only a few places and few people in world may actually value these people, and spending their days doing the "wrong" thing or in the "wrong" place may prove to be a complete waste of time.

More than technical skills
As part of growing as developer, everybody should work on their interpersonal skills if these can be their weaknesses. I don't think that weaknesses must be necessarily completely fixed. One should build on their strengths with the following rule: don't let the weaknesses get in the way. In our field in particular, these weaknesses are typically related to soft skills, and bright people often can work on them. By doing that, nobody will dismiss great ideas or contributions on the basis of soft skills. As part of finding the right environment and right set of people to work with, improving on soft skills can be quite beneficial.

Steps to install MySQL2 GEM (Ruby on Rails) on Windows 7 (64 bit)

I ran into many issues trying to get it compiled and installed on Windows 7 64-bit and wished I had an "apt-get" that would have fixed all these issues.

The trick here is: you need MySQL 32-bit to have Ruby on Rails MySQL2 gem compiling on Windows 7 64-bit. In short, this is what I had to do:

  1. Install Ruby Development Kit in order to be able to compile C-bindings for Ruby
    • Error: The 'mysql2' native gem requires installed build tools.
  2. Install MySQL Server 64-bit
  3. Download MySQL Server 32-bit .zip file
  4. Add MySQL 32-bit lib directory to PATH (or copy libmysql.dll to %RUBY_HOME%\bin)
  5. Install MySQL2 2.0.6 GEM specifying --with-mysql-lib and --with-mysql-include options pointing to the 32-bit lib and include directories
    • gem install mysql2 -- '--with-mysql-lib="c:\Development\MySQL Server 5.5\lib\opt" --with-mysql-include="c:\Development\MySQL\MySQL Server 5.5\include"'
  6. Install Rake 0.9.2
    • Error: uninitialized constant Rake::DSL

Friday, July 22, 2011

Why use regular expression?

I never posted code I've come across, but reading a random code today, I started wondering why one couldn't use regular expressions below to accomplish the same result - maybe I am just missing some good reason for that, but I thought it might be amusing to share:

string[] stringArray = str.Replace("Jan, ", "Jan ").Replace("Feb, ", "Feb ").Replace("Mar, ", "Mar ").Replace("Apr, ", "Apr ").Replace("May, ", "May ").Replace("Jun, ", "Jun ").Replace("Jul, ", "Jul ").Replace("Aug, ", "Aug ").Replace("Sep, ", "Sep ").Replace("Oct, ", "Oct ").Replace("Nov, ", "Nov ").Replace("Dec, ", "Dec ").Split(new[] { ',' });

Monday, July 18, 2011

The top 9+7 things every programmer or architect should know

Great list of things programmer and architects should know:

About learning to estimate, I recommend the following book:

These are three top things for me:
1. The Boy Scout Rule - Robert C. Martin (Uncle Bob)"You don’t have to make every module perfect before you check it in. You simply have to make it a little bit better than when you checked it out."

To be honest this is not something I have followed throughout my career, and although I certainly try improve code where I can, I never did it per check-in. I do however feel that it is an awesome principle and should be something that is actually part of a code review process. It is all to easy to just say:
"It was like that already"
"that nasty code was there for years, I am not going to touch it."
"It never had any tests"

I work in a corporate environment were applications often last for 4-10 years. If part of the process is always to just make something a little better, everything from deleting unused code to writing a single extra unit test, year after year... it will end up with saving a lot of people a lot of time and money.

This seems obvious, but not always we make it better when checking in. It takes a lot of diligence to do that - and agreement in a team environment on what better actually means.

4. Continuous Learning - Clint ShankThis is a very important topic, we are in a industry that is constantly growing, changing, shifting and as a programmer you need to be learning and improving yourself wherever you can. It's very easy to get into a comfort zone and just rest on your laurels, I did that for a couple years, and I do regret it now.
Things I am trying to do to keep up and would recommend:

  1. Get a Kindle... then buy & read books.
  2. Use Google Reader add the popular blogs and website RSS feeds for your specific field as well as a couple outside your field that interest you.
  3. Start a blog, by putting my code and thoughts out there, I put in more effort knowing that it's going to visible than if I just wrote the code/article for myself. I also force myself to do 1 - 2 posts a week, ensuring that I must always find new content to learn about.
  4. Join an open source community, we generally don't get to do enough "technical" development in our corporate environments.
Related to the first one: keep getting better - nothing will stop you. Build on your strengths, but don't let your weaknesses get in the way.

5. Record your rationale - Timothy HighThis is something that I feel is often neglected. Quite recently a project that I had been involved in for a long time hit the spotlight for all the wrong reasons: customer, user and management dissatisfaction. The first thing to be questioned was not the analysis, requirements, testing, management or expectations, but rather the architecture. Documentation discussing all the decisions, options looked at and reason for options taken would have been valuable. When things go fine, no one will even know about the document, but when things turn bad as they sometimes do having justification and documentation for all the major decisions will be a lifesaver.

That happened a lot in my experience. People will disagree, or some people will not have been involved in some discussions, and then you need to be able to articulate why some decisions were made. Most of the times new arguments and ideas were considered, but without recording your rationale, you may not be able to discuss them, especially in public forums like large meetings. I'd say that one should write the rationale and make sure it's in your head when the subject comes up - and obviously this assumes that you communicate effectively.

Saturday, July 16, 2011

Publishing to Blogger from Microsoft OneNote

I was trying to post something today and was wondering how it can be still so complicated to post and format something to a blog nowadays. One of the integration that I wanted is from OneNote, where I've been keeping many of my notes for quite sometime, to Blogger. And looking it up on the web, I found that it's already a reality and works very well - you may need to go over the formatting to remove some additional spaces, but it reduces the work substantially compared to a regular copy and paste.

For more info, this is the link:

Design Principles and Design Patterns


Object Oriented Class Design

  • Open-Closed Principle (OCP)
    • We should write our modules so that they can be extended, without requiring them to be modified.
    • Techniques:
      • Dynamic polymorphism
      • Static polymorphism (templates or generics)
  • Liskov Substitution Principle (LSP)
    • Derived classes should be substitutable for their base classes.
    • Derives from concept of Design by Contract
    • In terms of contracts, a derived class is substitutable for its base class if:
      • Its preconditions are not stronger than the base class method.
      • Its postconditions are no weaker than the base class method.
      • (In other words, derived methods should expect no more and provide no less)
    • There are subtleties: canonical example is the Circle/Ellipse dilemma
  • Dependency Inversion Principle (DIP)
    • Strategy of depending upon interfaces or abstract functions and classes, rather than upon concrete functions and classes.
    • Motivation behind DIP is to prevent you from depending upon volatile modules. The DIP makes the assumption that anything concrete is volatile.
    • Object creation: abstract factory
  • Interface Segregation Principle (ISP)
    • Many client specific interfaces are better than one general purpose interface
Package Architecture
  • Release Reuse Equivalency Principle (REP)
    • One criterion for grouping classes into packages is reuse
  • Common Closure Principle (CCP)
    • Group together classes that we think will change together
  • Common Reuse Principle (CRP)
    • Classes that aren't reused together should not be grouped together
    • "Changes to a class that I don't care about will still force a new release of the package, and still cause me to go through the effort of upgrading and revalidating."
  • Acyclic Dependencies Principle (ADP)
    • The dependencies between packages must not form cycles.
    • Breaking cycles
      • Creating a new package
      • Make use of DIP and ISP
  • Stable Dependencies Principle (SDP)
    • Depend in the direction of stability
    • Stability is related to the amount of work required to make a change
    • Should all software be stable? We greatly desire that portions of our software be instable. We want certain modules to be easy to change so that when requirements drift, the design can respond with ease.
  • Stable Abstractions Principle (SAP)
    • Stable packages should be abstract packages
    • The more packages that are hard to change, the less flexible our overall design will be. Highly stable packages at the bottom of the dependency network may be very difficult to change, but according to the OCP they do not have to be difficult to extend!
    • SAP is just a restatement of the DIP - it states the packages that are the most depended upon (i.e. stable) should also be the most abstract

Monday, February 28, 2011

Hacker's Delight: reversing bits

From "Hacker's Delight" book, how to reverse bits elegantly:
x = (x & 0x55555555) <<  1 | (x & 0xAAAAAAAA) >>  1;
x = (x & 0x33333333) <<  2 | (x & 0xCCCCCCCC) >>  2;
x = (x & 0x0F0F0F0F) <<  4 | (x & 0xF0F0F0F0) >>  4;
x = (x & 0x00FF00FF) <<  8 | (x & 0xFF00FF00) >>  8;
x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;

An slightly more efficient version:
x = (x & 0x55555555) <<  1 | (x & 0xAAAAAAAA) >>  1;
x = (x & 0x33333333) <<  2 | (x & 0xCCCCCCCC) >>  2;
x = (x & 0x0F0F0F0F) <<  4 | (x & 0xF0F0F0F0) >>  4;
x = (x << 24) | ((x & 0xFF00) << 8) | 
      ((x >> 8) & 0xFF00) | (x >> 24);

And this last line gives you byte reversal:
x = (x << 24) | ((x & 0xFF00) << 8) |
      ((x >> 8) & 0xFF00) | (x >> 24);

Sunday, February 20, 2011

SEDA architecture

Not long ago, I worked on system that, in a way, resembled SEDA architecture. Although I had attended SOSP '01 and attended Matt Welsh's presentation on SEDA, I couldn't remember much about it so many years later. At the time, I did not connect the dots, but a Principal Engineer mentioned it and quickly I reminded myself about SEDA. Today I read an article explaining it and also a Matt's restrospective on SEDA that I wanted to share with you:

Matt's retrospective is interesting as the architecture is something that he stills sees a valid and worth considering for modern system, but the stages is something he would group to reduce latency. I would say that, from my experience, the issue is how to have the proper visibility into each stage and its queues, to understand how the system is behaving. Sometimes one just butt heads due to the lack of understanding of the overall architecture and due to lack of metrics on where your backlog may be to understand your bottleneck.

One great question about this retrospective is how Matt views Actors (as in Scala)? The poster asks whether each actor could be a sort of micro-stage in a SEDA architecture. Unfortunately Matt hasn't replied to this question (yet).

Tuesday, February 08, 2011

Queueing Theory Books On Line

Good list of books online on queueing theory:

Tuesday, January 25, 2011

How to find topmost frequent items from a data stream?

I've been asked this question in a interview a long time back: if you have a huge amount of data and don't have memory to keep all of them, how do you get the top items, even if you have some error. Answering this precisely is not simple and requiring the interviewee to provide a 100% correct may be too much, but the goal is more about the discussion - and luckily it seems I did well at that time. Anyway, below you can find two good references on the topic, in case you are interested in learning more:

Manku, Motwani - "Approximate Frequency Counts over Data Streams" [pdf]

Data Streams: Algorithms and Applications

Sunday, January 23, 2011

First program in Scala: porting binary search add/remove from Java

Today I worked on my first program in Scala, after reading most of Programming Scala. This is the first time in a very long time that I try to learn a new programming language and I took the approach of first reading the book. After starting writing the code, though, I noticed how important it is to actually code something in the language, as without coding, you can't remember much.

So, what I did first was to port the code to delete nodes from a binary tree available here in Java to Scala. The outcome was almost identical, except for some Scala-specific things, like trying to use Option.

Second, I tried to make is more Scala ("scalafy my code") and, since it's been a couple of weeks since I last read the "Programming Scala", I thought that there wasn't a whole lot I could do for this kind of program. However, after trying to implement pattern matching in some sections, I realized that I could get rid of my if statements (which were long, with a bunch of elses) only with pattern matching. Also, some of the "get" calls on the Options objects ended up being unnecessary with the pattern matching. I thought the result was quite elegant compared to the Java version. Open the Java version in a different tab and compare the code, and let me know what you think. Also, if you know Scala, please let me know what suggestions you would have to make use of more Scala features.

case class Node(value: Int, var right: Option[Node], var left: Option[Node])

object TreeUtils {
 def printInOrder(node: Option[Node]) {
  node match {
   case Some(node) => {
    Console.print(" " + node.value);
   case _ => ;
 def addNode(node: Option[Node], valueToAdd: Int): Option[Node] = {
  node match {
   case None => 
    Option(new Node(valueToAdd, None, None));
   case Some(currentNode) => {
    currentNode.value match {
     case v if v < valueToAdd =>
      currentNode.right = addNode(currentNode.right, valueToAdd);
     case v if v > valueToAdd =>
      currentNode.left = addNode(currentNode.left , valueToAdd);
     case _ => node;
 def deleteNode(node: Option[Node], valueToDelete: Int): Option[Node] = {
  node match {
   case None => 
   case Some(currentNode) => {
    currentNode.value match {
     case v if v < valueToDelete =>
      currentNode.right  = deleteNode(currentNode.right, valueToDelete);
     case v if v > valueToDelete =>
      currentNode.left  = deleteNode(currentNode.left , valueToDelete);
     case _ => {
      (currentNode.left, currentNode.right) match {
       case (None, None) => None
       case (None, _) => currentNode.right 
       case (_, None) => currentNode.left
       case (Some(leftNode), Some(rightNode)) if rightNode.left == None => {
        rightNode.left = currentNode.left;
       case (Some(leftNode), Some(rightNode)) => {
        var q = rightNode;
        var p = rightNode;
        while (p.left.get.left != None)
         p = p.left.get;
        q = p.left.get;
        p.left = q.right;
        q.left  = currentNode.left;
        q.right = currentNode.right;

object BinaryTree {
 def main(args: Array[String]) = {
  var root : Option[Node] = None;
  val numbers: Array[Int] = Array(56,86,71,97,82,99,65,36,16,10,28,52,46);
  for (i <- numbers) {
   Console.println("Inserting " + i);
   root = TreeUtils.addNode(root, i);
  Console.print("Tree: ");
  for (i <- numbers) {
   Console.println("Removing " + i);
   root = TreeUtils.deleteNode(root, i);
   Console.print("Tree: ");
   root match {
    case None => Console.println("  ");
    case _ => 

Friday, January 07, 2011


It's been a while that I don't post, but this deserves a post in capital letter: DO NOT BUY on I placed an order on 11/25/2010, and their system recorded two orders, charged twice, and shipped two packages. When I called them, they told me to refuse the package when it was delivered by USPS. That is what I did on 12/07/2010. It's been a month since I refused the package, and several calls to their customer service number, one email sent through their web site (without response after a week), and I haven't received any refund. Except for the last associate I talked to, they did not seem to care a bit about the problem. Most of them say that I should keep waiting, because it can TAKE MONTHS. One of them said that he would talk to the manager and call me back: HE'S NEVER DONE THAT. Now, after a month of following their directions, they will investigate further without any guarantee whatsoever - yes, it's possible that they will never refund.

I think I've been really spoiled by and other companies that care about their customers and forgot about companies like Frys. Frys clearly doesn't care about their customers, provide a horrible experience, and I will definitely recommend all my friends against buying anything from this company.

After a quick search on Google, this is the first link I get about customers reviews of - 950 reviews, 1 star out of 5. It seems I am not the only one: