Hacking Your Way Through Codebases

You’ve joined a new project, and you’re trying to fix a bug. You try to make sense of the classes, how they interact, what they abstract.. Looking at the design documentation, you realize that they’re older than your son. Previous project members seem to be vacationing in dense jungles in Zanzibar.

It’s a weekend and you’ve nothing to do. So you decide to add that feature you always wanted into your favourite open source library. You check out the SVN trunk and try to figure out where to start hacking away.

Does any of this sound familiar?

That “software development” involves more reading than writing of code, comes as a surprise to many industry newcomers. Books and university courses teach how to design, implement and test software. How to model, simulate and measure systems. How to design, analyze and improve algorithms.

Yeah, well. You don’t start off your career like that, kid. First, you get to make tiny fixes. Which requires you to go through a lot of code, figure out how the entire damn thing works, pinpoint the exact root cause of that sporadic bug; and then use all your education to make the most minimal code change possible to get rid of that bug. Oh, and the bug appears only when there are more than 50,000 people accessing the 4-million-row database table simultaneously, so don’t think about attaching your cute debugger and stepping through the code. Did I say it was sporadic?

If you pass out of the university having read nothing but the example code listings in the text books, then be ready for a rough time when you join the industry. Even if you’ve written your own nice little software. Because reading code is very different than writing it.

You can really judge how good a developer is be seeing how he responds when asked to fix a bug in a codebase which he’s seeing for the first time. The response is either of “what? in that? i didn’t get a knowledge transfer session for that!” or “by when?”.

So how do you do it? It’s many things, but mainly it’s experience and tools. “Ah, so you’ve a box of magical tools!” Well, no. You tend to end up using only what’s generally available, on most platforms and that too by default. Things like ctags, vim, grep and a debugger. That you’ve learned to use really well. But mainly, it’s the experience, and the intution and patience that comes with it.

The best reward of reading code is that you get to learn more from that than from textbooks. Ever seen how a Hindley-Milner system looks like in code?

Go read code. Grow up.

Ah, Life.

What a cruel irony, that you never know what you have, until you have it no more.

That is life. Grok it fast. Before it kills you.

Driving In India

[Just to destress a bit, here is a humourous post.]

Ever driven in India? No? Add it to your to-do-before-I-die list. Seriously. You’ll yawn during roller coaster rides after that.

Here is a mail that was doing the rounds a looong time ago. I’ve no idea who the author is or whether it is copyrighted(!) or not. Do let me know if you know otherwise.

—————————————————————————

For the benefit of every Tom, Dick and Harry visiting India and daring to drive on Indian roads, I am offering a few hints for survival.

They are applicable to every place in India, except the state of Bihar, where life outside a vehicle is only marginally safer. Indian road rules broadly operate within the domain of karma where you do your best, and leave the results to your insurance company.

The hints are as follows:

Do we drive on the left or right of the road? The answer is “both”. Basically you start on the left of the road, unless it is occupied. In that case, go to the right, unless that is also occupied. Then proceed by occupying the next available gap, as in chess. Just trust your instincts, ascertain the direction, and proceed.

Adherence to road rules leads to much misery and occasional fatality. Most drivers don’t drive, but just aim their vehicles in the intended direction. Don’t you get discouraged or underestimate yourself except for a belief in reincarnation, the other drivers are not in any better position.

Don’t stop at pedestrian crossings just because some fool wants to cross the road. You may do so only if you enjoy being bumped in the back. Pedestrians have been strictly instructed to cross only when traffic is moving slowly or had come to a dead stop because some minister is in town. Still some idiot may try to wade across, but then, let us not talk ill of the dead.

Blowing your horn is not a sign of protest as in some countries. We horn to express joy, resentment, frustration, romance and bare lust (two brisk blasts), or, just to mobilize a dozing cow in the middle of the bazaar.

Keep informative books in the glove compartment. You may read them during traffic jams, while awaiting the chief minister’s motorcade, or waiting or the rainwaters to recede when overground traffic meets underground drainage.

Night driving on Indian roads can be an exhilarating experience (for those with the mental makeup of Genghis Khan). In a way, it is like playing Russian roulette, because you do not know who amongst the drivers is loaded. What looks like the premature dawn on the horizon turns out to be a truck, attempting a land speed record. On encountering it, just pull partly into the field adjoining the road until the phenomenon passes.

Our roads do not have shoulders, but occasional boulders. Do not blink your lights expecting reciprocation. The only dim thing in the truck is the driver, and the peg of illicit arrack he had at the last stop, his total cerebral functions add up to little more than a naught. Truck drivers are the James Bonds of India, and are licensed to kill.

Often you may encounter a single powerful beam of light about six feet above the ground. This is not a super motorbike, but a truck approaching you with a single light on, usually the left one. It could be the right one, but never get too close to investigate. You may prove your point posthumously. Of course, all this occurs at night, on trunk roads.

During the daytime, trucks are more visible, except that the drivers will never show any signal. (And you must watch for the absent signals; they are a greater threat.) Only, you will often observe that the cleaner that sits next to the driver, will project his hand and wave hysterically. This is definitely not to be construed as a signal for a left turn. The waving is just an expression of physical relief on a hot day, or a gesture to a fellow trucker.

Occasionally you might see what looks like a UFO with blinking colored lights and sounds emanating from within. This is an illuminated bus, full of happy pilgrims singing bhajans. These pilgrims go at breakneck speed, seeking contact with the Almighty, often meeting with success.

One-way Street - These boards are put up by traffic people to add jest in their otherwise drab lives. Don’t stick to the literal meaning and proceed in one direction. In metaphysical terms, it means that you cannot proceed in two directions at once. So drive as you like, in reverse throughout, if you are the fussy type.

Lest I sound hypercritical, I must add a positive point also. Rash and fast driving in residential areas has been prevented by providing a “speed breaker”; two for each house. This mound, incidentally, covers the water and drainage pipes for that residence and is left untarred for easy identification by the corporation authorities, should they want to recover the pipe for year-end accounting.

If, after all this, you still want to drive in India, have your lessons between 8 pm and 11am - when the police have gone home. The citizens then are free to enjoy the ‘FREEDOM OF SPEED’ enshrined in our constitution.

—————————————————————————

In case you’re wondering: yes, the description is quite accurate.

And let me leave you with this clip — in case you’re not among the 4,049,128 people who’ve already watched it.

http://www.youtube.com/watch?v=RjrEQaG5jPM

[If you're a true Indian, you'd be left scratching your head after seeing it through -- "what's the catch...?"]

The Beginning of The End

I’m really not the kind of person who does doomsday predictions, but I simply have to make this one: it is the beginning of the end for software development outsourcing to India. There is not one but many factors that now come to a head; in fact it looks so uncannily like a Seldon crisis.

[Note: this post is too long and probably controversial. Flames welcome. Comments are not moderated.]

The story goes like this: India started to become a successful destination for software development outsourcing somewhere in the later half of 90s. By 1998-99, it became apparent to lay public that software engineers drew salaries like king’s ransoms and traveled abroad in business class — things that the previous generation couldn’t have done even with a lifetime’s dedicated employment.

The engineering graduate education situation however, was a tough one during those days. A state would turn out only maybe 450-500 computer science engineering graduates a year, and literally millions would write an entrance exam to qualify for a seat every year. Passing the engineering entrance exam (with a good rank that too) was possible only for talented, committed, hard-working students. And this reflected in the quality of engineering professionals that filled the ranks of software shops in Bangalore and elsewhere. They did a good job obviously, resulting in success stories of outsourced software development that made profit-minded capitalists sit up and take notice half a world away.

The meager annual supply of professionals, however, was not sufficient to meet the demands of the booming software industry. Enterprising people were quick to setup engineering colleges, and in a few years, the number of CS graduates jumped ten-fold, to 5000 per year. Yup. Ten-fold. However, (i) where did the extra people come from, and (ii) where did they get the teaching staff to teach these extra people?

The extra people came in not (significantly) because more people wrote the engineering entrance, but simply because the threshold was lowered (since now there were ten-fold more number of seats per year).

The extra teaching staff reality is a bit more pathetic. Mostly, the previous year’s graduates who could not land a job anywhere took up this task. Of course, surely this was not always the case, still every new college had a few such “young” lecturers.

The end result was that you now had 5000 graduates a year, of which 500 would actually be talented, committed and hard-working. Some of the rest could be trained and would be useful in the long run. Most of the rest would end up being a liability to the organization, having to be “hidden” “on field”.

Thus organizations evolved from 1998-2008, with their core, competent teams getting diluted by a constant influx of progressively less competent, less committed graduates. Organizations now need to have extensive, academic-like training/induction programmes. Senior (”old”) developers need to spend more time mentoring/directing/instructing “new” developers than before. The senior devs act as “seeds” around which teams gets built. All the while the ratio of pre-boom-era developers to the gen-x-era developers keep decreasing.

This process has now reached a point where it is no longer profitable for organizations to continue in this fashion. Teams have become unmanageable, unproductive and the business is on the brink of being unprofitable. Organizations that can come up with and employ radically different approaches towards building and sustaining productive, balanced, high-performance teams would probably survive. Big, bureaucratic, process-driven organizations would probably flounder. In less than 2 years. Yeah. That’s a prediction with my name on it.

But there are other factors too that support the prediction. Today, more than ever, the talent of Eastern Europe and China is accessible to the western world. The break up of USSR and the “opening up” of China (to whatever extent) made this possible. It now also makes possible the offering of a dev job to Hungary what would earlier have been definitely to India. It pains me to accept that they’d really probably do a better job of it.

The US economy, after many years, is in recession again. This would further affect the jobs that get outsourced at all from the US. And probably a few other countries too.

There are some more factors too, for which I don’t have space on this post — how pioneering organizations have become bureaucracies, how people-orientation has given way to process-orientation, how Bangalore infrastructure has collapsed totally, how the support professional groups (HR, IT, adminstration) are struggling, how dismal the hiring/head-hunting scene has become.

And therefore, my prediction, again:

This is the beginning of the end. Within two years, major earthquakes will spoil the dream run of the Great Indian Outsourcing saga. The industry will become sober, mature, and will leave quite a few in tears in the process.

Algorithms + Data Structures + Technology = Programs

Niklaus Wirth famously titled his seminal book on Pascal as “Algorithms + Data Structures = Programs“. It is of course, implied that the programs were written in Pascal — the technology that is used to realize the programs. Technology then, is an essential ingredient to creating programs.

The technology has come a long way from Prof. Wirth’s time. Developers do not need to write their own quicksort implementations, they can use std::sort (or qsort or java.util.Collections.sort or System.Collections.Generic.List<T>.Sort or whatever). They need not implement hash tables, carefully analyze possible keys nor devise optimal hash functions — they can just use the friendly hash class from the standard library. No longer need to worry about creating efficient file formats to store your data, just use XML! They are an industry standard, you know. What’s that? Not efficient? Hmm, why not try Darby! Or Berkeley DB? Or shall we choose a programming language that can pickle or marshal or serialize everything in sight? Download a file given it’s URL? Surely there must be an API for that somewhere..

“[...] LAMP - the software platform comprised of Linux, Apache, MySQL and PHP/Perl often viewed as the foundation of the Internet” said someone recently.

Upcoming software developers and computer science students are often blinded and misled by technology.  They tend to concentrate on learning about various technologies and how to use them. To be able to use a programming language, a few libraries, and some decent debugging and analytical skills can get you a job. Jobs. Good ones. You would, however, be wrong if you thought that that was what computer science is all about.

Here is a quick check: write a program to list all the prime numbers between 1 and 1 million. Think about how you’d write this program.

Thought about it? Well then, did you use the Sieve of Eratosthenes? What’s that you ask? Why, that’s a 22-century old algorithm (and a reasonably efficient one) that was developed because there was no API to list out all the primes!

Here’s another one: Your train goes over a hilly terrain, stopping at railway stations that are at varying altitudes. Which was the steepest climb? (Say station1 was at a height of 400 MSL (meters above sea level), station2 at 350, station3 at 500, station4 at 550, station5 at 480, station6 at 520; then the steepest climb was station2-station3-station4.)

The answer? Well, google for “longest monotonic subsequence”.

How about this one: How fast can you count the number of ‘1’s in the binary representation of a given 32-bit integer?

A Google search for “rivers of somalia” returns “Results 1 - 100 of about 340,000 for rivers of somalia. (0.43 seconds)”. How do you think Google searches the contents of billions of pages to choose 340k of them in just 0.43 seconds?

How do you think this works?

How is an mp3 encoder able to compress a 50MB WAV file to a 4 MB mp3 file but still sounds the same?

What is the minimum required number of colors to paint a political map of a country, such that no two states that share a border have the same color?

What is the fastest way, using the NYC subway, of getting to Coney Island from Central Park?

Why can’t a C++ parser understand “vector<vector<int>> vvi;” ?

How does a spell checker suggest alternatives for misspelled words?

Algorithms lie at the heart of innovation. That’s how two university students beat Yahoo. Faster and efficient algorithms change the way we work, and live. Technology is a means, a platform to implement your new algorithm. It is not an end in itself.

Talent, Skill and Opportunity

There are four kinds of software engineers that I’ve come across:

  • Talented: These are the people who would intuitively guess how something works, and of course, it’d turn out that yes, that’s exactly how it does work. People who come up with a solution so simple and elegant that others feel a bit behind the evolutionary churn. But for some reason or the other, these people are not as productive as one would expect. For various reasons: software is not their primary area of interest, they lack the discipline that engineering demands, or sometimes just plain lazy. You might still want them one your team just for the flashes of insight they provide, if they’re affable enough.
  • Skilled: You can spot them by their notebooks. They work hard, they are motivated. They keep themselves up-to-date. They know their job, their goals, and they have the determination to do whatever it takes to get to the goal. They may not come up with an original creative solution, no innovative breakthroughs; and often they know that they would not, too. Skilled people invariably travel far ahead of their talented counterparts. These people you definitely want on your team.
  • Talented + Skilled: They go very, very far. A natural flair, honed by hard work and discipline quite often ensures success in all endeavors. You want to be on the team of such a person.
  • Talented + Skilled + an opportunity: A lucky break, a hand of God.. Given to a person with talent and skill, the spark turns into a forest fire. These are the names familiar to all people within the organization; names that you’d recognize around the Internet. People whose potential was exploited by an opportunity. A blessing, offered and utilized, by the best recipient. You’d be lucky to know one in person.

So what, you ask? Well, nothing much, to be honest. Just some food for thought.

5 Ideas for Open Source Projects

I love open source. I’ve a project over at sourceforge, doing moderately well. Life is still short however, so here are a few projects that I’d love to do myself if there were more hours in a day:

  1. Textmate clone for Windows: There a couple of $$$ ones: E-TextEditor and InType. However, with a high-quality text editing component like Scintilla, which has nearly all that you can wish from a source code editor component, there really is no excuse for not adding a bundle feature support and rolling up a full-fledged editor. Scintilla also provides support for call tips and intellisense, so that should be easy to do too. A tree pane that can display any directory and it’s children, explorer-like; SVN-awareness and build-output windows would be killer additions.
  2. Generic Backend for Duck-typed Languages: This one’s a bit of having-fun-reinventing-the-wheel thing. Rather than having stack/register based VMs that eat bytecode (CLR, JVM, YARV, parrot), probably we could have a nice C library with APIs allow creation and manipulation of objects? With the caller managing namespaces, symbol tables, the scope in which methods would execute?
  3. A Helper Library in C/C++ for Apache Modules: The regular Apache/APR APIs are the minimal required, and is good in that respect. However, all I want to do 99% of the time is to have a content handler, and for that I also need the parsed query string and POST data. libapreq is there, but is a pain to use. Moreover, the helper library, or rather a base-class-module should already parse the input and the actual module should be able to just fill in the content handler, which can access the parsed input. And please, don’t expose those buckets and brigades and pools.
  4. Embeddable Apache: Apache is quite a decent piece of software. Can’t it be made embeddable? A nice library? We’ll provide the configuration info programmatically. And once we have this, let’s throw out those BaseHTTPServer, wsgiref.simple_server, mongrel implementations and put in an embedded Apache!
  5. A Decent Network Client Library in C: One stop-shop for base64, qp, uu encoding/decoding, rfc2822 headers, MIME messages, HTTP Cookie jars; alongwith HTTP, NNTP, SMTP, POP3 and IMAP clients.

Yeah well. Have fun.

Oh and don’t pull the anti-Microsoft crap because of TextMate. After all, I’m writing this post from a Gutsy Gibbon.

Solution Domain Artifacts

You’re in Munich, and want to travel to Frankfurt. Given the time and place constraints, let’s say your best choice is to take a train. That is the problem. The Deutsche Bahn is your solution provider. And the solution provider decides that you need to learn a concept inorder to avail of the solution — the concept of a train ticket. The train ticket is a solution domain artifact. It is a concept that is not present in the problem domain (travelling by train from one place to another). Inorder to use the solution though, you need to learn about the solution domain artifact (the ticket). This post is about solution domain artifacts in software design.

[Disclaimer: this is what I call a solution domain artifact. You may call it by another name. Romeo and Juliet -- II. 2.]

Let’s take an example relevant to us. Let’s say you want to embed a Javascript interpreter in your application. Since we are, of course, from the old school, the application is necessarily written in C (and command line based — more about this application in a later post), and our javascript engine of choice is spidermonkey. While building it, you realize that you need to know, apart from the Javascript language itself, at least the following concepts so that you can get the interpreter to do any work:

  • a runtime, represented by a JSRuntime object
  • a context, represented by a JSContext object
  • an error reporter, which is a callback function to handle errors

The remaining artifacts are not introduced by the solution domain itself — things like standard classes, objects, properties and functions are artifacts in the problem domain. Their manipulation was what you set out to do originally.

As another example, consider some SDAs that your grandma has to learn just to send an email to her son:

  • web browser
  • menu
  • scroll bar
  • button
  • URL
  • login credentials
  • e-mail address

Pause.

Does the tired architect want to tell you what he thinks about solution domain artifacts? No. He wants you to think about them.

Resume.

He’ll give you more fodder for thought, though.

As a designer of a component, you should minimize the artifacts that you expose your (potential) clients to. An artifact represents a cost, an investment that has to be made by your potential client. To the designer, it represents the cost of documentation and tech support.

Designers and architects often get carried away by building castles in the solution domain — mostly in the name of generalization and flexibility. A newcomer trying to develop a component would first have to know about the 33 types of components, the 12 categories they fall into, the 9 different interfaces that they can implement, the 10 different ways in which they can communicate, and of course the 7 XML files that have to be “suitably edited” to deploy the component.

Is it just a matter of the learning curve? No. Human minds and bodies haven’t evolved much in the past 1000 or so years. In other words, the human mind is designed for the requirements of a world many centuries past. There is a limit to the complexity that an average human mind can handle. It’s an engineering limit. When you create more abstractions, and the more complex your abstractions are, you deviate further from the average client. Your clients now have to be super-humans, or trained, disciplined, super-professionals, who can juggle all this complexity within their brilliant heads. And none of whom probably are available in the development team that is maintaining the product.

Components with fewer and simpler artifacts tend to be better maintained, widely used and more profitable. An ideal solution domain artifact is that which cannot be simplified further. An ideal solution presents only the minimally necessary and sufficient artifacts to the user.

The tired architect thinks that spidermonkey is fun to use, and that grandmas will forever be baffled by computers.

The Second System

This first post is not going to be a cheery “hello, world”. I’m not a blogger. I’m not going to introduce myself. This is just yet another blog, and you know what to do with a blog, so just do it. This post is about the second system effect.

Many a year ago, somewhere around the start of my career, I read the book The Mythical Man-Month by Frederick Brooks. If you create software, in any form of it, you should have read this. There’s no point in explaining why, and I’m not here to sell the book — if you love your profession, read it.

Brooks devotes five pages of the book to the phenomenon of the second system. That’s a bit too much really; for the concept of the second system is something simple and so fundamentally human, that it can be understood by just a few paragraphs, maybe even a blog post. Why it is so interesting, however, is its omnipresence in just about all aspects of software development.

Consider: a person develops a software. A tool, let’s say, that solves a problem faced by his team daily. He shows it to his colleagues. They like it. They love the fact that it saves them time and effort. They’ve a few suggestions, however. A few things that’d make it more convenient for them. Sure, says the developer, happy to help. He realizes some very cool and helpful features cannot be added without changing basic assumptions and algorithms in the code. So those features will have to wait.

Days go by. The software is a hit. The developer has incorporated quite a few suggestions from his happy users. There are some cool features, however, that’d make it really a blockbuster. It’d take time though, since the core design will have to be changed substantially to implement these. He realizes how the design should have been in the first place — more general, more flexible, like this, not like that, use this, not that. The scribbles on napkins and doodles on whiteboard develop into UML diagrams. He researches similar tools offered by others, and realizes that with the new features his software would simply be the best in the market.

It’s not easy, he realizes. He shouldn’t really have had to do this, if the original design was more open and flexible. And more generalized. Backends, what about backends — of course, it should be able to work with any database! Yes, that’s what has to be done. Quite generic, of course. That’s how it should have been, always.

Most of the old code was thrown away, since they couldn’t be refactored into the new design, or had assumptions that tied to specific systems and situations. His faithful colleagues still used the old software, however, since it solved many of their problems. The developer worked all his spare time on building the second system.

Stop.

At this point, the tired architect would like to interrupt the gentle reader. Because the story does not end here, or ever. The building of the second system goes on. And on, and on. Until they’re forcibly put to death by emptied budgets, tired management, or burned-out developers. Second systems are tempting visions painted by the devil himself, take the path and you’ll find yourself in purgatory sooner or later.

The second system originates from a deep, basic need of humans — to make things better. To correct mistakes. To live life one more time, without making the same mistakes again. The Undo button. Do it till you get it right. Citius, Altius, Fortius.

The second system is not a disease, it is a symptom. Which indicates that the people involved have not worked on a second system, or are not aware of what is the second system.

If you haven’t worked on a second system, you just can’t appreciate it. As they say, good judgement comes from experience, and experience from bad judgement. It makes you a better architect, like how pain makes you appreciate health better.

Look around your world today — do you see people engaged in second systems? People trying to build the “next version” of their software that will be the “biggest and most light-trippin’-fantastic” ever? Versions that keep on getting written? Rewritten? Refactored? People trying to create their own software that will solve all the problems of the previous one? Programming languages? Web frameworks? IDEs?

If you haven’t been there, be aware enough to catch yourself from falling into the trap. Good luck.

All The Bits Are Volatile

In the colourful land of India, this is a week of festivals. From where I come from, we celebrate today as Saraswathi Pooja. Goddess Saraswathi is the goddess of learning.

It is a time to remind myself that I’m just one of 6,000,000,000, on a planet that is one of 9, in a solar system that is one of 200,000,000,000, in a galaxy that is one in 125,000,000,000. That in the grand scheme of things, I matter less than a single bit of my hard-drive. That while I argue and debate to establish my intellectual superiority over others, I forget that I stand on the shoulders of giants who had the humility to let me. That while I scorn others, I forget that I matter less than a single bit of my hard-drive.

It reminds me that I’m just a speck in the journey called time. That two generations hence nobody would ever know who I was. That in a few decades I’ll probably be a senile old man who keeps forgetting how to use templates in C++. That there is a time to draw ASCII-art in .nfo files, and there is a time to smile gently when the young firebrand challenges your design. That I’ll never reclaim the days and nights I burn away in front of a monitor.

May this festival bring humility and tolerance, remind us to appreciate and acknowledge, and help us remember that while we reach for the stars our feet must still remain on the ground.

« Previous entries