I’m pleased to announce that, beginning in July 2005, Componentized Linux will become a fully supported Progeny product. This note is meant to serve as a roadmap for the product and an overview of what’s coming and when it should be here. For the latest version of this roadmap (which, as with any roadmap, is bound to change at least slightly over time), please see http://componentizedlinux.org/roadmap/.
Background
For those who aren’t familiar with Componentized Linux, it is a highly customizable Linux distribution that can be used to build customized versions of the Linux operating system (at Progeny, we call them “custom distributions”).
The fundamental unit of Componentized Linux is called a component (hence the name “Componentized Linux”). A component is a collection of packages that is internally consistent (i.e., the packages exhibit dependency closure) along with some associated metadata like a description of the component and a list of external dependencies that satisfy the dependency closure requirement. Collectively, the components represent a set of reusable building blocks that can be easily assembled into a wide variety of configurations and customized as necessary to build a custom distribution.
Until early last year, Componentized Linux was largely a Progeny internal technology, used to build custom distributions for our customers (Platform Services). A little over a year ago, Jeff Licquia and I launched something of a skunkworks project at Progeny to “productize” the concept of a componentized Linux, and that’s what eventually evolved into the Componentized Linux project now hosted at http://componentizedlinux.org/.
Over the past year, several projects building localized distributions have chosen to base on Componentized Linux, and numerous others have adopted individual Componentized Linux technologies (the Anaconda for Debian installer, Discover hardware detection system, etc.). Of particular note, LinEx is based on Componentized Linux. LinEx is a Linux distribution built and maintained by the government of Extremadura, Spain that is deployed on 80,000 computers in Extremadura’s public schools and government offices. In addition to adding to the collective installed user base of Componentized Linux, the LinEx developers have become invaluable partners in Componentized Linux development over the past year.
Given the success we’ve experienced with the Componentized Linux project, we allocated a product team in January to take the skunkworks project and move it forward, and we’ve been making tremendous progress on building the next generation of the Componentized Linux platform. We’re far enough along now that I’m comfortable sharing some details of the new platform and even publicly committing to a release schedule.
Particularly in platform technologies, it is important to have transparency–after all, we are asking other projects to base on our work, so it is only reasonable to give those projects (and other projects that might be considering using our platform) some idea of where it is going. Accordingly, we now have a published roadmap and schedule, and it will be updated over time to reflect any changes. Please email cl-workers@lists.progeny.com with any comments or questions.
(From here on out, I will abbreviate Componentized Linux as “CL” to save unnecessary wear and tear on my keyboard. You can read more about the goals and motivations of CL here.)
External factors
With the transition from a skunkworks project to a supported product, a number of important changes will be taking place (fortunately, most of them are good).
First of all, we’re shifting focus away from LSB 2.0 and to the upcoming LSB 3.0, which is due for release at the end of Q2 2005. This means we will not be pursuing LSB 2.0 certification. Instead, we will move directly to LSB 3.0.
Beginning with 3.0, the LSB is adopting an 18-month release cycle, with periodic point releases as necessary that don’t break compatibility and/or certifications. We will closely track the LSB with CL Core (a.k.a. the LSB component), adopting a synchronized 18-month release cycle and version numbering scheme to match the LSB specification CL implements. Thus, we will release and LSB-certify CL Core 3.0 in July 2005.
Another external factor we are taking into consideration is the continued uncertainty over the release of Debian sarge, on which CL’s Debian components are based. We will continue to track sarge, incorporating package updates on a weekly basis until late June 2005. Hopefully, a sarge freeze is right around the corner, so the number of updates will slow considerably in the coming weeks.
With respect to the interaction between the sarge release and the CL Core release, we will play it by ear. If, in late June, it appears the sarge release is imminent, we will likely postpone the CL Core release slightly; if not, we will release CL Core 3.0 based on a late June snapshot of sarge and incorporate the final sarge release into a later point release.
We also hope to incorporate the RPM platform being built by us and our Linux Core Consortium partners in CL Core 3.0. This effort hasn’t been as open or made as much progress as I was hoping it would (the recent merger of Mandrakesoft and Conectiva has understandably slowed progress a bit), though we’re hoping to release the first version of the LCC RPM core in the July timeframe, also to coincide with the release of the LSB 3.0 specification. If the LCC development team can hit its schedule, then CL Core 3.0 should include the LCC RPM platform as well as the current sarge-based Debian platform.
Componentized Linux Core
In summary, the next version of CL Core will be version 3.0 and is scheduled for release in July 2005. Leading up to the final release, we are planning to make four “preview releases” on the following schedule:
Version | Release Date |
3.0 Preview Release 1 (PR1) | 04/22/05 |
3.0 Preview Release 2 (PR2) | 05/16/05 |
3.0 Preview Release 3 (PR3) | 06/13/05 |
3.0 Preview Release 4 (PR4) | 07/04/05 |
3.0 Release 1 (R1) | 07/25/05 |
3.0 PR1 will essentially be the same as 2.0 RC2, with packages updated to sarge as of April 2005. The preview releases that follow (PR2, PR3, and PR4) will continue to track sarge, with a variety of incremental platform improvements and bug fixes in each new preview release.
Progeny Debian
As with CL Core, Progeny Debian’s version number will be bumped to 3.0 to match the version of the LSB specification it implements (via CL Core).
As in 2.0, Progeny Debian 3.0 Developer Edition will continue to drive CL development and serve as a demonstration of the CL platform. Also as before, Progeny Debian 3.0 DE will bundle CL development tools, so it will continue to serve as an excellent development platform for folks building custom distros. Finally, because Progeny Debian 3.0 DE is based on sarge, it is an ideal distribution for anyone seeking an easy-to-install, fully-configured-out-of-the-box and ready-to-use distribution that is fully compatible with standard Debian.
Because it is based on CL Core 3.0, Progeny Debian 3.0 DE will release on an identical schedule to CL Core. A migration path for current users of Progeny Debian 2.0 DE RC2 to Progeny Debian 3.0 DE PR1 will be provided.
What’s different between CL 2 and CL 3?
The primary focus of CL 3 development involves improvements to the component model and the component management tools.
Hierarchical component model and component descriptors
With CL 3, we have adopted a hierarchical component model. In addition to containing packages, a component may now contain other components as well. With the new model, a “coarse-grained” component can be built from a collection of “finer-grained” components (for that matter, a custom distro itself–a “product” in CL parlance–is also, technically speaking, a component). We are using this new feature of CL to subdivide the relatively coarse-grained LSB component into a number of finer-grained components; this will make CL 3 a better platform for building small-footprint distros for resource-constrained or embedded environments than was CL 2.
In CL 3, each component is described by an RDF-based component descriptor. (By convention, we refer to the outermost component descriptor in the hierarchy as the product descriptor, although they are technically the same file format.) The component descriptor includes the list of packages and/or components contained within it, along with supporting metadata.
In addition to the obvious bits of metadata (name, description, etc.), a component descriptor can also include configuration information. For example, an “NTP (Network Time Protocol) Support” component might include the tag default-ntp-server, which allows the custom distro builder to specify the default NTP server without having to modify any packages or make any distribution-specific changes (i.e., the default will be applied to both sarge-based and LCC-based distros, albeit using different mechanisms).
Because of the containment properties of our hierarchical component model, metadata in “outer” components (such as products) overrides metadata in “inner” components. For example, the product descriptor for a custom distro might override the default NTP server value by including its own default-ntp-server tag. This makes it easy to provide reasonable defaults at the component level without sacrificing flexibility at the product level.
Component compiler
Component descriptors (and product descriptors) are processed with a tool called a component compiler. In essence, the component compiler translates the component descriptor into an APT repository. The component compiler can be run against any component or product descriptor to generate an APT repository. This allows CL 2-style components (i.e., an APT repository per component) and additionally allows end products to be delivered as a single, unified APT repository.
As in CL 2, PICAX is used to create installable ISO and/or runtime images suitable for distribution or factory installation. In the new toolkit, PICAX is being extended to interpret certain tags in the product descriptor, allowing branding, installation customization (e.g., the ability to enable or disable certain screens during installation), etc. to be specified in a central location and without the requirement to modify and/or create -defaults packages, as was the case in CL 2.
One important side effect of this change is that Progeny Debian 3.0 DE will be delivered as a single APT repository rather than as multiple APT repositories representing its constituent components. The main reason for this change is that the APT-repo-per-component model caused numerous problems. By supplying a unified APT repo for the product, we will be able to use standard APT tools for component management.
One outstanding issue with software management on a CL-based distro is how to install packages from sarge and keep them up to date over time. We are evaluating two potential solutions to this problem, one involving APT’s pinning mechanism and the other involving Debian tasks. We will provide more details as we progress toward a solution.
Another important side effect of these changes is that the current CL data formats (comps.xml) and tools (comp-get and comp2repo) are being deprecated in favor of the new component descriptor and component compiler. Note, however, that the output of the compiler in CL 3 is still a standard APT repository, so CL 2 components built using comp2repo will be fully compatible with CL 3. In other words, all work on CL 2 components will be reusable in CL 3, and CL developers may continue to use comp2repo while the new tools are still under development without fear of their work being made obsolete by CL 3.
Also note that, while comps.xml is being deprecated as the component specification format in favor of the new component descriptor, PICAX is being extended to generate a merged comps.xml for Anaconda to use–we are not modifying Anaconda to interpret component descriptors directly. Furthermore, when the new tools become available, a migration path from comps.xml to the new component descriptor format will be provided, so it will be easy for CL developers to transition to the new format.
Component ecosystem
Up to now, CL components have largely evolved in lockstep, primarily because we’ve had complex problems to solve in the component model before we could do otherwise. One of the original ideas behind CL was that it would allow components to closely track “upstream” projects (e.g., so a “GNOME 2.10” component could be made available without having to change the sarge core) as well as provide flexibility to custom distro builders (e.g., developers could choose between GNOME 2.6, GNOME 2.8, and GNOME 2.10, depending on their stability and/or feature needs).
With the new CL platform, we are taking a step closer to this ambitious goal. While the LSB core (alternatively called “CL Core” or the LSB component) will have a feature set and release cycle to match the LSB specification, the components that live above the LSB core will have their own release cycles synchronized with the release cycles of the upstream projects themselves. Each component will be built against the oldest supported version of the component beneath it (as determined by its component dependencies), allowing maximum compatibility with standard Debian and the LCC core.
Ideally, the CL component of a given technology will be updated shortly after each upstream release, and users of CL-based distros will be able to mix and match from a broad, distributed universe of components and component versions without sacrificing compatibility. How close we’ll be able to get to this ideal is an unknown question–there are many tricky issues involved that have been discussed before, such as the potential for a combinatorial explosion of component dependencies (e.g., supporting three versions of X and three versions of GNOME would involve supporting nine different combinations of X and GNOME–expand the number of components to a few dozen or even a few hundred to get an idea of the potential complexity involved). We have ideas on how to mitigate this complexity (basing dependencies on interfaces, only supporting certain combinations of components, etc.), but there are still numerous problems that need to be solved before we can declare victory.
Status
As previously mentioned, this note describes ongoing work that is not yet ready for release.
According to the current schedule, CL Core and Progeny Debian will be released in July 2005, with four “preview releases” scheduled between now and then.
The new CL toolkit, which we’re calling the Componentized Linux Platform Development Kit (PDK), is still in the early stages of development, so we’re not publishing a roadmap or release schedule for the PDK at this time. However, we can say that, when released, the PDK will be bundled with Componentized Linux; and, furthermore, that it’s very likely a preview release will be made available to the CL community in the mid- to late-summer timeframe.
It is our explicit goal to fundamentally change the way custom distributions of Linux are created and maintained, and we think we’re onto something pretty interesting with Componentized Linux and the Componentized Linux PDK. It is our belief that the future of the Linux distribution market is custom distributions–for, after all, Linux is customizable by its very nature as a free/open-source platform. Tim O’Reilly and other thought leaders in the open source community share this view. If you too share our belief, or if your company or organization is involved in the creation and maintenance of custom Linux distributions, I encourage you to keep an eye on what we are doing with Componentized Linux. Exciting times are ahead.
Pretty good things will come with the PDK and the LSB compatible core. I can’t wait to see all this working :-D