Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Wednesday, May 19, 2004

Why separate bin and src distributions?

Something that struck me as I was preparing the latest HiveMind release just now. Why do we in the open-source world bother with separating the binary and source distributions?

Take HiveMind. The binary distribution follows standard procedure: it includes all sorts of documentation. Because of the use of Maven, the documentation set is out of control, but even so, what we have is a 281KB (uncompressed) JAR distributed inside 16,526KB (uncompressed) of documentation. Meanwhile, the source code is just another 1,257KB (uncompressed).

The binary distributions are 3.1MB/1.5MB (.zip vs. .tar.gz) and the source distributions are 556KB/229KB. In other words, adding the source to the binary distribution would not be particularly noticeable ... just an additional second or two at broadband speeds.

If I had my say (which, count to think of it, I largely do) I would produce a combined binary/src distribution and have the documentation as the add-on. A combined binary/source distribution would be approximately 50%/100% larger (since the JAR file is already itself compressed). If you assume that most people download the binaries and source together but largely read the documentation on-line (at least until they get serious about a package) ... then a combined bin/src distro is a win.

Certainly when I've used other packages, I've wasted a lot of time unpacking the binary distribution, using the jar, then having to get the source jar and connect it up inside Eclipse to I could actually debug code that uses the library.

This approach would be better for slow connection users as well; they would get what they need to work (the binary and the source) and could cherry pick the documentation they need from a live web site. Certainly, anyone serious about a package would want the full documentation on their own hard drive ... but why pay that cost just to take a peek? Distributing binaries with (full) documentation makes every user pay that download cost ... or keeps some users from bothering to evaluate the package at all.

It's open-source. The point is to buck tradition and think for ourselves.

4 comments:

Unknown said...

The reason for requiring an account is to reduce the amount of comment spam.

thechrisproject said...

I agree. The comment about people not caring about the source is ridiculous. I think that most users of a project like this will use the source fairly frequently. To have it connected up in Eclipse saves so much time, as I can just control+click my way right into the relevant Tapestry classes. Every time a new version comes out I download the bin and src and continue to use the on-line docs.

Unknown said...

You don't seem to use stuff the way I do. For example, I was just writing an Ant task. I have the Ant documentation handy, but since I'm in Eclipse, its easier to connect the ant.jar to the source and get documentation right from the source code. I think a lot of people work that way.

In the Tapestry world, I'm amazed at the number of people who don't read even the README.html, never mind the rest of the documentation. So we see the exact same questions, answered in the readme and the FAQs, constantly. No point in packaging the docs if no-one's going to read them anyway.

sitio said...

It is not unprecedented to include source in the dist -- for example, shortly after reading this blog, I updated our project's Hibernate and noted that they include source (and documentation) in the main distribution.

This is probably a statement easiest made by somebody with a cable connection (let them eat cake), but I don't see why distributions are broken into pieces at all. Bin, source, documentation, examples -- put 'em all together.

I doubt I'm alone in always downloading all of them when they're separate and I'm sure I'm not alone in sometimes failing to keep them in sync.

The repetitious beginner questions are probably inevitable, but keep in mind, there is no way to tell how many people didn't post the same question because it was in the (included) documentation.