5 Ideas for Open Source Projects

I love open source. I’ve a project over at sourceforge, doing moderately well. Life is still short however, so here are a few projects that I’d love to do myself if there were more hours in a day:

  1. Textmate clone for Windows: There a couple of $$$ ones: E-TextEditor and InType. However, with a high-quality text editing component like Scintilla, which has nearly all that you can wish from a source code editor component, there really is no excuse for not adding a bundle feature support and rolling up a full-fledged editor. Scintilla also provides support for call tips and intellisense, so that should be easy to do too. A tree pane that can display any directory and it’s children, explorer-like; SVN-awareness and build-output windows would be killer additions.
  2. Generic Backend for Duck-typed Languages: This one’s a bit of having-fun-reinventing-the-wheel thing. Rather than having stack/register based VMs that eat bytecode (CLR, JVM, YARV, parrot), probably we could have a nice C library with APIs allow creation and manipulation of objects? With the caller managing namespaces, symbol tables, the scope in which methods would execute?
  3. A Helper Library in C/C++ for Apache Modules: The regular Apache/APR APIs are the minimal required, and is good in that respect. However, all I want to do 99% of the time is to have a content handler, and for that I also need the parsed query string and POST data. libapreq is there, but is a pain to use. Moreover, the helper library, or rather a base-class-module should already parse the input and the actual module should be able to just fill in the content handler, which can access the parsed input. And please, don’t expose those buckets and brigades and pools.
  4. Embeddable Apache: Apache is quite a decent piece of software. Can’t it be made embeddable? A nice library? We’ll provide the configuration info programmatically. And once we have this, let’s throw out those BaseHTTPServer, wsgiref.simple_server, mongrel implementations and put in an embedded Apache!
  5. A Decent Network Client Library in C: One stop-shop for base64, qp, uu encoding/decoding, rfc2822 headers, MIME messages, HTTP Cookie jars; alongwith HTTP, NNTP, SMTP, POP3 and IMAP clients.

Yeah well. Have fun.

Oh and don’t pull the anti-Microsoft crap because of TextMate. After all, I’m writing this post from a Gutsy Gibbon.

Solution Domain Artifacts

You’re in Munich, and want to travel to Frankfurt. Given the time and place constraints, let’s say your best choice is to take a train. That is the problem. The Deutsche Bahn is your solution provider. And the solution provider decides that you need to learn a concept inorder to avail of the solution — the concept of a train ticket. The train ticket is a solution domain artifact. It is a concept that is not present in the problem domain (travelling by train from one place to another). Inorder to use the solution though, you need to learn about the solution domain artifact (the ticket). This post is about solution domain artifacts in software design.

[Disclaimer: this is what I call a solution domain artifact. You may call it by another name. Romeo and Juliet -- II. 2.]

Let’s take an example relevant to us. Let’s say you want to embed a Javascript interpreter in your application. Since we are, of course, from the old school, the application is necessarily written in C (and command line based — more about this application in a later post), and our javascript engine of choice is spidermonkey. While building it, you realize that you need to know, apart from the Javascript language itself, at least the following concepts so that you can get the interpreter to do any work:

  • a runtime, represented by a JSRuntime object
  • a context, represented by a JSContext object
  • an error reporter, which is a callback function to handle errors

The remaining artifacts are not introduced by the solution domain itself — things like standard classes, objects, properties and functions are artifacts in the problem domain. Their manipulation was what you set out to do originally.

As another example, consider some SDAs that your grandma has to learn just to send an email to her son:

  • web browser
  • menu
  • scroll bar
  • button
  • URL
  • login credentials
  • e-mail address

Pause.

Does the tired architect want to tell you what he thinks about solution domain artifacts? No. He wants you to think about them.

Resume.

He’ll give you more fodder for thought, though.

As a designer of a component, you should minimize the artifacts that you expose your (potential) clients to. An artifact represents a cost, an investment that has to be made by your potential client. To the designer, it represents the cost of documentation and tech support.

Designers and architects often get carried away by building castles in the solution domain — mostly in the name of generalization and flexibility. A newcomer trying to develop a component would first have to know about the 33 types of components, the 12 categories they fall into, the 9 different interfaces that they can implement, the 10 different ways in which they can communicate, and of course the 7 XML files that have to be “suitably edited” to deploy the component.

Is it just a matter of the learning curve? No. Human minds and bodies haven’t evolved much in the past 1000 or so years. In other words, the human mind is designed for the requirements of a world many centuries past. There is a limit to the complexity that an average human mind can handle. It’s an engineering limit. When you create more abstractions, and the more complex your abstractions are, you deviate further from the average client. Your clients now have to be super-humans, or trained, disciplined, super-professionals, who can juggle all this complexity within their brilliant heads. And none of whom probably are available in the development team that is maintaining the product.

Components with fewer and simpler artifacts tend to be better maintained, widely used and more profitable. An ideal solution domain artifact is that which cannot be simplified further. An ideal solution presents only the minimally necessary and sufficient artifacts to the user.

The tired architect thinks that spidermonkey is fun to use, and that grandmas will forever be baffled by computers.

The Second System

This first post is not going to be a cheery “hello, world”. I’m not a blogger. I’m not going to introduce myself. This is just yet another blog, and you know what to do with a blog, so just do it. This post is about the second system effect.

Many a year ago, somewhere around the start of my career, I read the book The Mythical Man-Month by Frederick Brooks. If you create software, in any form of it, you should have read this. There’s no point in explaining why, and I’m not here to sell the book — if you love your profession, read it.

Brooks devotes five pages of the book to the phenomenon of the second system. That’s a bit too much really; for the concept of the second system is something simple and so fundamentally human, that it can be understood by just a few paragraphs, maybe even a blog post. Why it is so interesting, however, is its omnipresence in just about all aspects of software development.

Consider: a person develops a software. A tool, let’s say, that solves a problem faced by his team daily. He shows it to his colleagues. They like it. They love the fact that it saves them time and effort. They’ve a few suggestions, however. A few things that’d make it more convenient for them. Sure, says the developer, happy to help. He realizes some very cool and helpful features cannot be added without changing basic assumptions and algorithms in the code. So those features will have to wait.

Days go by. The software is a hit. The developer has incorporated quite a few suggestions from his happy users. There are some cool features, however, that’d make it really a blockbuster. It’d take time though, since the core design will have to be changed substantially to implement these. He realizes how the design should have been in the first place — more general, more flexible, like this, not like that, use this, not that. The scribbles on napkins and doodles on whiteboard develop into UML diagrams. He researches similar tools offered by others, and realizes that with the new features his software would simply be the best in the market.

It’s not easy, he realizes. He shouldn’t really have had to do this, if the original design was more open and flexible. And more generalized. Backends, what about backends — of course, it should be able to work with any database! Yes, that’s what has to be done. Quite generic, of course. That’s how it should have been, always.

Most of the old code was thrown away, since they couldn’t be refactored into the new design, or had assumptions that tied to specific systems and situations. His faithful colleagues still used the old software, however, since it solved many of their problems. The developer worked all his spare time on building the second system.

Stop.

At this point, the tired architect would like to interrupt the gentle reader. Because the story does not end here, or ever. The building of the second system goes on. And on, and on. Until they’re forcibly put to death by emptied budgets, tired management, or burned-out developers. Second systems are tempting visions painted by the devil himself, take the path and you’ll find yourself in purgatory sooner or later.

The second system originates from a deep, basic need of humans — to make things better. To correct mistakes. To live life one more time, without making the same mistakes again. The Undo button. Do it till you get it right. Citius, Altius, Fortius.

The second system is not a disease, it is a symptom. Which indicates that the people involved have not worked on a second system, or are not aware of what is the second system.

If you haven’t worked on a second system, you just can’t appreciate it. As they say, good judgement comes from experience, and experience from bad judgement. It makes you a better architect, like how pain makes you appreciate health better.

Look around your world today — do you see people engaged in second systems? People trying to build the “next version” of their software that will be the “biggest and most light-trippin’-fantastic” ever? Versions that keep on getting written? Rewritten? Refactored? People trying to create their own software that will solve all the problems of the previous one? Programming languages? Web frameworks? IDEs?

If you haven’t been there, be aware enough to catch yourself from falling into the trap. Good luck.

All The Bits Are Volatile

In the colourful land of India, this is a week of festivals. From where I come from, we celebrate today as Saraswathi Pooja. Goddess Saraswathi is the goddess of learning.

It is a time to remind myself that I’m just one of 6,000,000,000, on a planet that is one of 9, in a solar system that is one of 200,000,000,000, in a galaxy that is one in 125,000,000,000. That in the grand scheme of things, I matter less than a single bit of my hard-drive. That while I argue and debate to establish my intellectual superiority over others, I forget that I stand on the shoulders of giants who had the humility to let me. That while I scorn others, I forget that I matter less than a single bit of my hard-drive.

It reminds me that I’m just a speck in the journey called time. That two generations hence nobody would ever know who I was. That in a few decades I’ll probably be a senile old man who keeps forgetting how to use templates in C++. That there is a time to draw ASCII-art in .nfo files, and there is a time to smile gently when the young firebrand challenges your design. That I’ll never reclaim the days and nights I burn away in front of a monitor.

May this festival bring humility and tolerance, remind us to appreciate and acknowledge, and help us remember that while we reach for the stars our feet must still remain on the ground.

Floating point is slow, you say?

And Bresenham’s algorithm is fast because it uses only integer operations?

Wake up. Modern processors crunch floating points even faster than integers. Don’t believe me? Read on.

Here’s a small test application, in good ol’ C:

void f1()
{
    double d = 0;
    int i;

    for (i=0; i<10000; ++i)
        d = d * i + 1;
}

void f2()
{
    int d = 0;
    int i;

    for (i=0; i<10000; ++i)
        d = d * i + 1;
}

int main()
{
    f1();
    f2();

    return 0;
}

Let’s compile this with a Microsoft VC 7.1 compiler with this command line:

cl -Zi tm.c

The -Zi is so that .pdb (program database) files are generated, that contain symbol information and is used by the profiler to instrument the binaries. My profiler of choice is AQTime. Rather cool one, that. So anyway, let’s run this in AQTime and see what it reports:

f1: 6639.89 μs, 18545395 machine cycles
f2: 196.82 μs, 549727 machine cycles

[The machine is a 2.8GHz Intel Pentium with HT.]

Hrmph. Nowhere near. Let’s try again with a different compiler option.

cl -Zi -arch:SSE2 tm.c

The “-arch:SSE2″ instructs the compiler to generate SSE2 instructions. (Note sure what is SSE2? Read the Intel developer’s manual, over here.)

OK, what does the profiler say now?

f1: 224.12 μs, 625973 machine cycles
f2: 278.28 μs, 777237 machine cycles

Hah! Surprised? Well don’t be. These are the new rules of the old game.

Let’s try again, with one more option:

cl -Zi -G7 -arch:SSE2 tm.c

The “-G7″ is to “optimize for Pentium 4 or Athlon”. The results:

f1: 159.31 μs, 444965 machine cycles
f2: 265.11 μs, 740473 machine cycles

Suprised? Sweating? There’s-something-wrong-with-his-profiler?

Know your hardware. It pays.

Three Steps To Becoming a Better Programmer

There are people who love programming. Who enjoy the challenge and pleasure of creating something. Of watching it grow and flourish. Of seeing it do what it was destined to. Such people start off with a spark in them. When the spark is fanned into a flame, they become creators of software that makes a difference in our lives.

I’m asked, once in a while, by promising coders, about how to become better at the job they love. This is my advice to them:

  1. Code. If you’re a painter, and you want to better yourself, what would you do? Paint more? Paint things you haven’t painted before? Use painting techniques that you haven’t used before? Surely yes. And why should these steps not be applicable to our profession? To become a better programmer, program. Write programs. Complete ones, that work. Not one, not two. Just keep doing it. The more you do, the better you become. Rewrite it in Haskell. Port it to Solaris. Add a garbage collector. Support another user-interface language. Publish your work as open source. Write a patch for gcc.
  2. Look around you. Things happen. Things change. It’s always exciting out there. So many things to learn. So many things you never knew existed. Read about elegant solutions to challenging problems. See how people create, use, misuse and abuse technologies. Find out what Brainf*ck is. What IOCCC is. Why Ariane 5 failed. Why jwz started a nightclub. Why Python has a GIL. Why VirtualAlloc returns 64k-aligned pointers.
  3. Reflect. Connect your input and output. Think about what you read. What you learned. How you can apply it to your work. Reflect and ruminate. Why did your code work the very first time with Python? Why is it faster to iterate through std::vector than std::list? How should duck-typed languages handle primitive types? When is it best to use frameworks and when libraries? Why should this routine here create so many page faults? Shall I build a generic backend for duck-typed languages?

If you’re reading this, you’re probably one of the gifted few. You are one among a rare and endangered species. Keep the spark, the flame, the passion alive. May the force be with you.

42! 42!!

No, it’s not. Calm down. You haven’t found the The Answer. What you claim as the best thing that can happen to software development isn’t.

Over the rather few years that (software) programming has ventured out of universities and military establishments and into the hands of the average denizen, more and more people claim to have found the elixir to the woes of developing perfect programs, on time and under budget. Day in day out, blogs proclaim and expound the virtues of the latest solutions — and they sound not quite unlike this. Someday it is a new programming language, someday it is a new development methodology. Oh and new versions of IDEs too.

Don’t get me wrong. I like that IDE too. Well, sort of. And also ruby and agile and scrumm and whatever. Just don’t tell me that your programming language is the best because everything is an object. Or even because it is not, for that matter. You see, creating great software is rarely a result of the development methodology your team adopts, the programming language they choose. Or even the size of offices they sit in (with apologies to Joel). Or because they used Ruby on Rails. Or Glassfish. Or Flex. Or Spring. Or whatever.

No, there is no 42. Every software is a solution to a problem. Each problem is different. The constraints they pose are different. A good designer is one who makes the best trade off, the best compromise between two orthogonal qualities. Tools, libraries, frameworks, IDEs are all tools of the trade — like jackhammers and chainsaws. What makes or breaks a software are the skill and discipline of the people who use it.

Good software is a skillful blend of creativity and engineering. It requires people who have a passion for programming, a desperation to improve their skills, a sharp and intutive mind to analyze problems, and the creativity and innovation to go with it. It’s what is between your ears that matter, not what is in your hands.

You want to create software that people would remember you for? Then learn, and apply. Write code, read about everything related to creating software, and reflect upon them. Repeat. Ad nauseum. Until you get calluses on the balls of your wrist.

If you’re a good programmer, you’ll be one whether you use NetBeans or Notepad. Whether you use GTK+ or MFC. Whether you are married or a geek.

There is no 42. There’s just you.