Is there a future for software versioning?

Versioning in software is as old as software itself. So, one should assume we are able to handle versions pretty good right? Well, you can’t be more wrong. Versions at the moment – and this holds true for ALL programming languages and concepts I know off – haven’t even remotely been done right till now. You ask why? Well, that’s easy to explain…

Currently, versions can only be considered as simple timestamps with a sophisticated syntax. You might be able to read intentions from the provider but further than that they are useless. The syntax of a version theme might indicate some sort of intend to express compatibility, but in fact this is never enforced, so you can’t really trust it! You ALWAYS have to check it yourself (manually or with integration tests) to figure out if it really fulfills the contract. As a result you can only be sure to some extend that some bits changed, if the version got increased – what ever this means. For me, this is the ultimate problem with dynamism in applications. You can’t test all possible installation scenarios of your clients as soon as you open yourself to third parties. Someone will break your perfect little world, better anticipate it!

In OSGi, things get even harder from a testers perspective (and developer for that matter). As the only system, I know of, it defines a way to describe future versions a component (aka bundle) can be compatible against. For instance you can specify that you require a certain version of a package your module depends up-on. Therefore you just add the required dependency in your Manifest.mf file as a range:

Import-Package: dictionary;version="[1.0.0,2.0.0)"

A typical usage scenario in OSGi might look like this:

OSGi dependency versioning

OSGi dependency versioning

Here the dictionary “user” (cl.client) defines a usage dependency of the dictionary API (dictionary.api). As you can see, the dependency definition differs between a “user” and an “implementer”, because the implementer of the functionality (dictionary.impl) not only has to be binary compatible, but also source code compatible (like implementing a new method of an interface). Fortunately the OSGi allows to at least express such a dependency. The distinct version scheme of OSGi imposes a specific usage. If you’re not binary compatible bump up the major number, if you’re binary compatible bump the minor number. If you do bug fixing, not affecting the API go for the build number and last but not least, if you want to indicate a distinct notion like a release or an iteration, use the quantifier. That’s what it should look like:

[major[.minor[.build[.quantifier]]]]"

You may think this is a need approach and you’re probably right. It is great compared to everything else we have right now. Unfortunately this is not the silver bullet. Not at all, but it is a scheme which is explicit and well defined to some extend. Someone reading this, will at least have an idea what the intention of the component author was.

Now, what is the catch. Plain simple, this versioning scheme is not enforced. Everyone can version their OSGi artifacts however they want. There is no enforcement whatsoever. When you go to any bundle repository, none of them are verified and frankly, I don’t know how they ever will be. Of course, we can go all fancy and do API checks as Eclipse[1] does (and by the way, which is doing a pretty good job so far) or use an approach introduced by Premek Brada[2] at the last OSGi DevCon in Zurich. There is even the possibility to rerun all the former unit tests of a bundle that should be compatible with the latest version to check for semantic inconsistencies, but who is doing this and who guarantees that this covers all edge cases? No one can! Well, at least yet. Maybe in some years with fancy self learning AI on quantum computers, but for now we have to stick what we got.

I understand, my post is quite dark and negative up to now – nothing will work, we will ultimately fail, software reuse as the epic fail, blah blah blah. It’s true, we are not going to be a 100% perfect, but maybe, we don’t need to. Maybe we should just take what we got and try to make the best out of it. Evolve slowly but consistently and fail gracefully…

For one, testing is inevitable. For instance take the small and innocent build number that indicates just a bugfix. Now what is a bugfix? It can be a performance improvement – ok, that fine. It can also be a miscalculation or returning true instead of false in some odd case. When developing against such an behavior it is certainly not a compatible change, when you expect such a behavior and it gets fixed in a build number increased release. On the other hand, bumping the major number, because it in fact affects the client is a pretty heavy step, rendering all users incompatible unless they bump the version range to a higher number. The only thing in my opinion is creating test code to check if this behavior continues to be there and implement ways in your code to handle the fact that at some point it is different. This is painful, but there will always be this one edge case screwing with you.

Another thing we can do is rethinking of the actual version scheme. For one, this bump in the minor or major version, does it actually affect the implementer of the API? Well, not necessarily – not in all cases. So why do we start from 0 every time a higher number gets increased? Aren’t they distinct from each other? Well, maybe they have to get increased by one, but why reset them back to zero?

2.4.2 becomes 3.5.3 or 3.4.2 instead of 3.0.0.

The same holds true to version ranges. Why is the consumer in charge of defining the range? Ultimately, the provider knows what version the current library is still compatible with (from a client point of view). Now, having both, the client and provider version range can allow for a very flexible system.

Inverse Versioning

Inverse Versioning

Last but not least, the ranges we’re defining, if a version segment gets increased independently from each other, as explained before, should version ranges then incorporate this as well? Allowing ranges on all version segments, not only as a whole?

Inverse versioning with distinct version segments

Inverse versioning with distinct version segments

I know, I am describing a maintenance nightmare. How could one stay sane with so many variants? Short answer, humans can’t! Fortunately, there are tools and best practices that can help. With tooling you can get the version right you can depend upon. This is already done. With approaches like running old unit tests against the new API that specifically aim at the API level can verify the (most) of the behavior didn’t change. It is not easy – no doubt, but it is a chance we should take. At least when we want to get to a point where we actually can reuse 3rd party components.

Truth to be told, most of the things I said are just idea’s I picked up over the last few years and there is currently a lot of discussion going on.

If you want to get a full picture, I urge you to read through all of those sources, because ultimately, the worst thing we can do is redoing the mistakes of our past!

Cheers,
Mirko

References:

[1] PDE API Tools: http://www.eclipse.org/pde/pde-api-tools/
[2] Safe Bundle Updates: http://www.slideshare.net/pbrada/safe-bundle-updates

Update[1]: I added the links Alex mentioned. I certainly read his blog, which eventually influenced me as well. None of these concepts are coming from me personally unfortunately. I can’t take any credit for it.  Sorry for not giving you the credit you deserved in the first place, Alex. Btw., the order is arbitrary coming from the top of my head. No semantics are applied whatsoever! I hope I haven’t forgotten someone else as well!

  • Share/Bookmark

OSGi vs. Jigsaw – Why can’t we TALK?

Before starting, I just want to make clear that I am not a member of the OSGi Alliance nor a participant of any EG. I just happen to use OSGi since Eclipse started to investigate OSGi as their componentization model in its core. Since then I got more and more attached to OSGi and I don’t want to give up any of its features, so I guess you can call me a fanboy if you like. Of course, I am following the dispute between OSGi and Jigsaw since project Jigsaw was announced and I have to admit, that I was and am not happy with the approach Sun was taking not using a JSR and hoping to establish a modularization standard beyond the JVM. I already expressed my feelings in an older blog post called [Componentization Wars Part II]. Anyway, I guess there are some things I can’t change, so I’ll try my best to at least help the Jigsaw community to benefit from my experiences with OSGi and hope the input will help to create a better system (we all can benefit from). At the end only the quality of the technology should count and we should all work together to progress. So please consider the rest of this post as  my humble contribution. Take it or leave it,  it’s just an offer.

I thought a lot about the resistance of parts of the Java community in reusing the OSGi standard and starting from what we already have and in my conclusion the problem is two fold. First, it has come from a source outside the JSR initially. Well, I know this is debatable, because OSGi was one of the first JSRs and now with JSR 291 it in fact is one, but it is not “developed” within the JSR process, so one can say this is a valid point. Second modularization is in fact a tough call. When looking around, there is no other language or standard (even beyond Java itself) that has entirely tackled the problem and boy since David Parna’s introduction of the concept of modularity in 1972 quite some time has passed. A “fan” or not one has to admit that OSGi has already has gained a name in terms of modularizing the JVM and looking at the time required, well it took over 10 years to get that far. So I think it is fair to say that modularization is not easy to accomplish and to get right. In fact I had my problems (and partially still have) with OSGi as well. Let me explain…

My first contact with OSGi

When I first encountered OSGi I was working at IBM Research trying to explore ways to improve the modularization approach of [UIMA]. UIMA itself already had the notion of components or modules if you prefer already, but in a way it left certain features to get improved. In particular the isolation of each module wasn’t enforced, so introducing 3rd party modules into the runtime was a risky business. Especially when giving the fact that these modules had access to basically all resources. You never know what they are using or how they behave in the system. That, for instance was a reason to separate the processing into an other Java process. If that one fails, not the entire system is affected. OSGi in this context wasn’t THE savior, but it helped to make it more robust. No Java based model can provide real isolation of modules on JVM level (without the need to create a separate JVM instance is still something entirely missing in Java – one can ALWAYS create an OutOfMemoryError if he or she intends to). The great benefit of OSGi was simple. Hiding internal APIs, making dependencies explicit and potentially create a repository of reusable artifacts being usable without pages of manuals describing on how to set-up a system suitable for these modules.

How OSGi is received (on first encounter)

As you may imagine, one of the most important goals was to hide OSGi as much as possible. No services, no Import-Package (but Require-Bundle on one aggregation bundle), basically reduce OSGi to its minimum. Well, the appreciation was… limited. All people I was working with were exceptional bright and determined researchers in MLP. They developed highly sophisticated algorithms on how to analyze unstructured data, but only a few were software engineers. So the code was… working, but not production ready. Enforcing them to apply rules that ultimately made it harder for them getting things done and as a result slowing them down wasn’t something they welcomed very much. This is something I can totally understand and relate to! What is a system worth, if you don’t gain anything! Well, the problem here is that they didn’t gain anything, because they just used their code and to a limited extend the code of others, so the overhead of integrating with others was rather limited. The real benefit is visible when you start reusing multiple artifacts and potentially in different versions. This is something we (as a community) are trying to achieve for decades but failed so far, if you ask me.

Some reflections about our history

Looking at the history of Java it feels like it is a common problem. One tends to address the immediate problems. In principle there is nothing wrong with that, but in often after introducing a new standard you realize that the actual problem is way more complicated or you haven’t anticipated all vital use cases and the trivial approach looking so attractive at the beginning won’t work in the long run. The problem then is only that you don’t want to brake backwards compatibility and eventually this will drive the design of your solution, ultimately removing many otherwise possible ways to go. Talking about history, when we look back, jars where introduced as a distribution format for applets. Surely solving the immediate problem. After a while, Java on the server emerges and yet another problem arose. Separation of different web applications. Well, we know how it ended, eventually we got different class loading solutions for each J2EE vendor, because this part of the specification was left out.

My point is…

Now we have Jigsaw addressing a subset of OSGi’s modularization capabilities, which by itself is perfectly fine (even if it is not OSGi what they are going to use). The problem I fear is only that soon there will be more requirements altering the current approach unfitting. In particular, the great benefit of modules should be to be able to reuse them, without the need to understand all their internals. It was stated that OSGi’s restrictive class loading approach is not suitable for many applications. I tend to agree. I experienced this pain and it wasn’t pleasant. I heard that not allowing split packages is not acceptable, because it is a necessity. Why? Is it, because it is a good design or just because current systems are using it? If you want to create modules, you want to make them robust, so except for the exposed (public) API, it shouldn’t matter how it is implemented. If modules are not enforced to have their own class loader, you can never be sure whether or not you’ll have collisions. Yes, this is a pain in the **** to rethink and rewrite existing code! And I also have better things to do, than to retrofit some working libraries – no doubt! The question we should ask ourselves however is what do we really want? A system seamlessly integrating with existing code or provide a new way of thinking which might not be as simple as hoped, but give us what we need in the long run? Do we want reusable entities or a better provisioning system as plain jars? Don’t get me wrong here, I am not saying OSGi is the solution or should be used at all! Just saying that the requirements are dictating what you should get and if you intend to create “real” modules, the lessons OSGi learned are most valuable, even if you are eventually taking a completely different approach. In fact it has its short comings and flaws as any other system has.

OSGi is not perfect either

Working with OSGi for a while, I came across several places, where I noticed the need for improvement. For instance, OSGi provides a pretty strict module system, which when running is well defined. Unfortunately, there is a problem with how to get to this state or even knowing when this state is achieved. The idea behind its model is that using services, there should not be any dependency on the start order, because everything can change at any time. This is a nice idea, but in real world it is impossible to achieve. For instance, I am currently working with embedded devices and due to the limited hardware the  start-up can take minutes. Based on the dynamics however, a user might already see a web interface popping up, which seems perfectly usable. However, when using OSGi’s ConfigAdmin service to gather all configured configurations, it can happen that not yet started bundles don’t provide the required information, so the resulting configuration is incomplete. Similar things apply to the start order (no start levels are not sufficient) when using security. It is just not standardized how to ensure that certain bundles have to be loaded before everything else in order to start a particular runtime with security on. OSGi limits itself to define the runtime behavior, leaving out configuration issues when moving from one container to another. Basically the configuration for the start levels of each bundle getting loaded is an implementation detail – doesn’t this look familiar to you when using JEE ;-) . Also providing tools to inspect the runtime behavior (how the container resolves dependencies is not sufficiently defined, so you have to jump through a lot of hoops or use implementation details of a particular implementation. Not really helpful/desirable from a tooling provider perspective. I could go one and on, like with every other technology. Work in progress is written over everyone of them. So, you see, there is so much that can still be done to improve our situation. No one is perfect and will ever be. We can just try to do our best to progress and learn from our faults.

Properties of a true module system

Well, after talking so much about what’s not right and what can be improved, I guess it is just fair to also draw a rough picture of a system providing the features of true – meaning reusable – modules. To help me out a little, I took the liberty in quoting one of the experts of componentization/modularization (which is not quite the same). Clemens Szyperski is currently working for Microsoft (I think) and so has no real relation to Java at all, but his observations and remarks hold true even in our space.

„A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties.“ [Szyperski et. al. 2002, p.41]

… and a more brief definition of required properties.

„The characteristic properties of a component are that it:
• is a unit of independent deployment;
• is a unit of third-party composition;
• has no (externally) observable state.“ [Szyperski et. al. 2002, p.36]

I think this sets the floor for more further investigations. In particular, the citations above are nice, but pretty vague and open to interpretation. For instance, what is “independent” when we’re talking about deployment? What does composition mean in this context? Is it just a descriptor, is it a zip or rpm file containing everything? I don’t know and it doesn’t actually matter I think. What matters is that the resulting system is compelling, concise and valid within itself. Of course, some features seem better than others, but we also need freedom to explore new ways of thinking.

So while thinking about the core features of a (potentially) new module system, for me I identify the following as most important:

  • Isolation:  When talking about isolation, I am basically talking about trust. Having a module system defining an isolation on module level (which currently can only achieved by having a custom class loader), gives me as the module designer the trust that I can create something without the fear of breaking something on the user side, just because of an implementation detail that is not part of the public API.I can trust that when I have the isolation on module level, I know what will have to deal with. As they say: Good fences make for good neighbors.
  • Information hiding: Yes this is as old as it can get in software engineering, but this is the most critical part in so many ways. First of all, I understand modules as a way of abstraction. Zoom out of the implementation details and just focus on the API. That’s what I want! Black Boxes, only with interfaces (or maybe factories visible). Of course, API contracts purely defined in Java Interfaces are not precise enough to provide all required information (like ranges, value lists, limits,…), but is a starting point. The right documentation should or for now has to do the rest.
  • Enabling reuse: Currently the silver bullet to decouple is dependency injection. Although a great way of doing so, it is not enforced. Anyone can just programmatically wire up classes. As a result you get bound to an implementation, which again makes reuse and updates way harder. If I know I have to use a certain API, I usually don’t care who is providing the implementation and that’s key for every robust system.
  • Predictability: Well, usually even in JEE we assume we know how an application will behave, once deployed, but in reality, it is just not true. Resolution is based on the class path which can contain, depending on where you deploy your application, basically any classes and libraries. Now you have many not easy to manage factors affecting what gets loaded when. For instance there can be multiple logging frameworks in different versions pressent interfeering with the one provided by your application. Depending when they are found in the class path, they might cause problems or not. A deterministic system, that declaratively defines its dependencies will only see the required ones. Everything else is just hidden and serve other applications if they need it – no interference with each other!
  • flexible binding, yet safe binding: This is something increasingly important and also not accomplished satisfyingly by any module system I know of – so far. Basically what one wants is to create an application based on the dependencies known and being able to fix later in time appearing problems without the need to redeploy and change the whole application. For instance if a security vulnerability in one module is detected. The latest version should be deployable and tell the runtime that it is a fixed version of some of the existing ones within the runtime, so those versions can be replaced.
  • robustness: Currently no in-JVM approach can guarantee any runtime behavior like the consumed memory or the allocated cpu cycles. If you get a malicious module, it can bring down the entire JVM. There are already research projects out there providing such features, so in theory it should be possible to achieve this on JVM level.

Of course, there is more, but I think the list is contains the most important parts. You might have noticed, there is no mentioning of OSGi or the way OSGi is doing it. I believe there is always more than one way to accomplish your goals, so maybe the OSGi community overlooked a possibility – I honestly don’t know. So if you’re able to come up with any solution that fulfills these requirements, I would be more than happy. Maybe and only maybe you can consider looking at some of the approaches the OSGi was taking to tackle parts of these problems.

Yours,
Mirko

References:

[UIMA]: http://incubator.apache.org/uima/
[Componentization Wars Part II]: http://osgi.mjahn.net/2008/12/04/componentization-wars-part-ii-guerrilla-tactics/
[dmeuima]: http://www.alphaworks.ibm.com/tech/dmeuima/
[Szyperski et. al. 2002]: Szyperski, Clemens ; Gruntz, Dominik ; Murer, Stephan: Component Software. Beyond Object-Oriented Programming. Addison-Wesley Professional, 2002 – ISBN 0201745720

  • Share/Bookmark

The Quest for Software Reuse

Inspired by Peter’s comment I am following up on my last post on “The Myth about Software Reuse”. I received quiet a lot feedback concerning the topic and I felt I should share some of my thoughts and visions in order to answer some of the questions and destroy the concerns I may have provoked.

In my last post I concluded that OSGi has all the features necessary to create great and reusable artifacts/ bundles if you prefer. As Richard pointed out it is THE technology, if you’re trying to build modular applications and if done right even reusable modules. The only flaw here is the “if done right” part. People do mistakes, I do it all the time and I bet you are doing it too. It’s in our nature, we can’t help it. Unfortunately when talking about mistakes in the context of reusable artifacts the implications can be disastrous.  Having a faulty versioned artifact and you rely on its correctness it will most certainly break some functionality or even the whole application. OSGi by itself doesn’t either enforce any constraints on versioning your bundles nor does it give you a detailed guideline on doing so (neither does Java). This is the reason, why the the folks from LinkedIn choose to only rely on one distinct version in their dependencies – the safest call for sure. If there are no rules, you check for them or predict the behavior of future bundles. Everything is custom made. This vacuum of  control renders bundles unpredictable in their versioning behavior and almost impossible to use in a forward compatible way usually applied when talking about Software Product Lines for instance.

If you will, one can say that the lack of control is the root of our problem – enforce control end everything falls into part, right? Unfortunately control is a two bladed sword. On the one hand you have a controlled environment where you exactly know what is going on and what to expect, on the other hand it limits your possibilities and hinders exploring new ways of thinking. Especially if you feel like not having enough information/knowledge about  the problem domain this is the ultimate killer criteria for progress – no exactly what we desire. Picking up Peters comment from my last blog post, this holds pretty much true for versioning policies in OSGi so far – we just don’t know enough yet. However, I gave it some thought and I think there is a way around this problem. Hold on a bit and I’ll explain what I am thinking.

The core of the problem from my point of view is the way we receive our dependencies. Being a good OSGi citizen one should use packages to express dependencies of course, but that’s just one part of the story. The other one are the bundles contributing these dependencies. As mentioned in my previous post, there are multiple repositories for 3rd party bundles one can use (like  [spring-repo], [orbit-repo] or [osgi-repo]). The problem here however is that one has no guaranties on what you’re getting from there. Of course you get the source you require – hopefully, but not necessarily the correct meta data you are looking for or even worse are requiring (see bundlor bug report for instance). The core problem here is specifying versions and version ranges in particular. There are no fixed rules and as Peter stated in my previous post it is a field that needs more exploration, which I can’t agree more. However, I think there is a way to satisfy the need for room of further exploration as well as accomplish the need for more control – the issue with the two bladed sword, I was talking about earlier. Let me elaborate on this a little bit more…

In my opinion, all we need is actually a repository we can trust. Trust in that sense, that we know for certain that the artifacts provided are following certain rules. The rules however shouldn’t be set in a hard coded/wired way, so that the rules can evolve and provide extra information while we evolve in understanding the topic. Another important feature (for me at least) is the “not lock in” option. I don’t want to lock myself into some vendor specific rules, if I don’t have to or don’t agree on them. It would be nice, if certain vendors provide me with some of their artifacts, but ultimately I want to be in control of what is going into my application and how.

Now, I think all this (and even more) can be accomplished with the right repository design. The OSGi is currently working on their RFP 122 for repositories and as far as I can tell this would be a great opportunity to consider the following additions.

Imagine while uploading artifacts to the repository one can also provide additional meta data and go through a verification process where certain features are tested. For instance, assuming a base version is already at present, the provider can check what actually changed between the last and the current version. Assuming there are certain rules deployed to check for API changes, the one uploading/providing the artifact can be guided through a process where he can assign the correct version information. This goes so far that not only the exported packages can be checked but also the version ranges of the imports, because all artifacts known to the repository are going through the same process (assuming a proper base-lining of course). So what could these checks be?

  • check for the minimal version to apply for an exported package (ensuring API breaks are assigned to a major version increase f.i.). Of course semantic changes can’t be picked up, but here the human interaction comes into play.
  • check the smallest possible matching version for a package import known to the repository to ensure maximal compatibility. Again, human interaction or API test cases can assist for semantic incompatibilities.
  • multiple exporters of the same package can be identified and if appropriate an optional property like the provider, purpose, etc. can be added to make a provider selection possible.
  • even errors, like missing import statements can be detected here.

Now, after having checked for these and potentially other things, the bundle can be altered to contain the defined meta data. It can even be signed and express its validity by complying to these “rules”. The resulting bundle can now be downloaded or stored on the server for further use.

Of course, this brings some more problems. First of all, not everyone wants to have its components uploaded to some server, so these information on how to alter the bundle can be used as transformation guidelines and the actual artifacts remain on another server (to protect IP for instance). The repository is so to speak just a proxy. On a request, it takes the bundle, alters it and provides it to the requester (if he has the correct access rights). Now, of course not every “jar” is allowed to be altered. We need to have some sort of proof that the uploader/provider is the author or has the rights to do so. I can think of many ways to do so, like verifying domain ownership or manual approval processes, but this will not be the topic of this post.

Another, very important problem is the hosting. One might not have the ability to use a open, freely available repository, because the bundles in question are commercial with protected IP. In that case an instance of this very repository must be available for local installation, so it can be used in companies as well. Of course chaining of those repositories must be possible as well. This brings me to the next point.

Rules valid for the whole world might not hold true for a certain company or even more important, while the knowledge about how to handle these reusable artifacts evolves and finer, more advanced checks become necessary or other languages should be supported as well, the verification process must be pluggable, updatable to ones (evolving) needs. With this we don’t have to buy into a solution that has to be correct forever. We can evolve while we’re going. Because the rules on how to alter the original bundle are stored, they can be changed, enhanced or removed at any time later if necessary. Of course, this can potentially cause other problems, but at least it would be possible.

Having this flexibility, of course one needs to know for certain, what one will receive when requesting an artifact. In fact one might even won’t to have only certain rules applied or a special set of rules only in “beta” mode available. This should also be possible with a distinct request API.

With the ability to change bundles on the fly, it is also possible to reuse existing infrastructures like maven repositories, obr or p2 for instance. A maven repository for instance can theoretically provide the meta data necessary to create the correct bundles by providing a rule-set in a distinct file as meta data. With something like this a maven repo can be used as a data source for the bundle repository. Pretty much the same hold true for any other repository I can think of.

The beauty of such a repository is that no one is forced to go with the main stream. Everyone can for their own bundles overwrite the default behavior in their own instances of repositories and f.i. limit the versions chosen to exactly one instead of a range. The central repository however enforces certain rules, so everyone can trust the output and alter it as needed. Even the decision if a bundle should be altered or if it can only be re-wrapped in a new bundle can be defined by a rule the bundle provider can define. You basically get all the freedom to do what you want locally and rely on common rules from the central repo.

There is even plenty of space for service providers making money by providing their own repositories with enterprise support. Porting the latest OSS libraries to the repo or ensuring test coverage of the released bundles, advanced checks to detect semantical changes are just a few possible enterprise features.

However, this is just the surface I scratched, there are so many more things I could add here, but I think you got the basic idea. The remaining question now is: Are we ready for something like this? Is there anyone interested in such a repository? Talking for me, I was looking for something similar quite a while and whenever I talked to someone about this, they agreed that it even has a business case worth spending money on. Don’t get me wrong. I don’t think this is the silver bullet – there is no such thing, but I believe it can be the basis to propel real software reuse and form a coalition between vendors and open source – a common standard with a tool-set capable of pushing us further.

Currently I am thinking about proposing a talk for the upcoming OSGi DevCon in Zurich and was wondering if someone would be interested in this topic as a talk, BOF or even just a bunch of people getting together while grabbing a beer. Me and my company are currently at a point where we are needing something like this and I would very much like to share my ideas and get some other views and experiences on this one. Let me know what you’re thinking!

Cheers,
Mirko

References (in chronological order):

[last post]: http://osgi.mjahn.net/2009/04/02/the-myth-of-software-reuse/
[LinkedIn]: http://blog.linkedin.com/2009/02/17/osgi-at-linkedin-bundle-repositories/
[spring-repo]: http://www.springsource.com/repository/app/
[orbit-repo]: http://www.eclipse.org/orbit/
[osgi-repo]: http://www.osgi.org/Repository/HomePage/
[bundlor bug report]: https://issuetracker.springsource.com/browse/BNDLR-196
[RFP 122]: http://www.tensegrity.hellblazer.com/2009/03/osgi-rfp-122—the-osgi-bundle-repository.html
[OSGi DevCon]: http://www.osgi.org/DevConEurope2009/HomePage

  • Share/Bookmark

The myth of software reuse

Are we fooling ourselves? Or is it real?

The ones of you who know me, know that I believe in software reuse and I am trying to evangelize about it for quiet some time. Methods, Classes, Aspects, Components, Modules, Software Product Lines, you name it – everything is focused on reuse to some extend. With OSGi for the first time we are able to create true reusable software artifacts and theoretically mix and match them on an as needed basis. Great new opportunities are coming and with the recently held OSGi Tool Summit chances are good that we’ll soon see more and better integrated tooling support in our daily work with OSGi. So, we all should be happy bees right?

Well, what we are producing right now is a massive amount of bundles, no question about this. Virtually everyone is now shipping with at least some basic OSGi headers. Several projects are trying to provide bundle repositories for the not (yet) converted projects (like [spring-repo], [orbit-repo], [osgi-repo]) and some go even further and try to wrap incompatible APIs in a compatibility layer (see [DynamicJava] for instance). Great work and very much appreciated, but this is not the point. What we got so far are “just” explicit software artifacts. We only know what is behind bundle X in version Y, a subjectively taken SNAPSHOT somewhere in the infinity of time. It doesn’t answer the question on how this code can and will evolve or how future versions will fit into my universe.

The problem domain

Too abstract? Let’s examine an arbitrary example. Assuming you are trying to create a web application similar to flickr but on some sort of an embedded device to use at home. Taking the state of the art tools, you might choose to develop your app with Spring DM or any other frameworks you like, take your pick. For now, we stick to this one. As a starter we use the following assets: Spring DM 1.1.3, Spring 2.5.6 and Maven 2 as a built system. So far so good. Once you start developing you realize that you also need a file upload feature for your pictures. Of course someone already developed something to upload files and we are trying to reuse that stuff of course, so you look around. Soon, you’ll find commons-fileupload. The spring guys are putting a lot of work into building up a repository of reusable osgi bundles in their bundle repository and we can just use the OSGi-ified version of it from here: [spring comFileUpload], which is really great! But then you soon realize that you can’t use that version. Take a look at it’s (simplified) header:

Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: Apache Commons File Upload
Bundle-Version: 1.2.0
Bundle-SymbolicName: com.springsource.org.apache.commons.fileupload
Bundle-Vendor: SpringSource
Export-Package: org.apache.commons.fileupload;version="1.2.0";
  uses:="javax.servlet.http",
 org.apache.commons.fileupload.disk;version="1.2.0";
  uses:="org.apache.commons.fileupload",
 org.apache.commons.fileupload.portlet;version="1.2.0";
  uses:="javax.portlet,org.apache.commons.fileupload",
 org.apache.commons.fileupload.servlet;version="1.2.0";
  uses:="javax.servlet,javax.servlet.http,org.apache.commons.fileupload",
 org.apache.commons.fileupload.util;version="1.2.0"
Import-Package: javax.servlet;version="[2.5.0, 3.0.0)",
 javax.servlet.http;version="[2.5.0, 3.0.0)",
 javax.portlet;version="[1.0.0, 2.0.0)";resolution:=optional,
 org.apache.commons.io;version="[1.4.0, 2.0.0)",
 org.apache.commons.io.output;version="[1.4.0, 2.0.0)"

As you can see, it requires to have the Servlet API in version 2.5, which is not available in our case on the embedded device. What now? Well, if you know the bundle, you also know, that in fact it doesn’t require 2.5! You can re-bundle it and give the bundle another version and here is the problem. The Spring guys did a great work bundling this artifact for us, but they created it based on THEIR environment and not on the minimal requirements of that specific bundle. Even worse, there is not even a real specification stating what version to take and how to version changes correctly. Forward compatibility is just a beautiful dream rather than an actual fact. If you look at well known companies adapting OSGi like LinkedIn, you’ll see what I mean. Where are we heading to when one is forced to define dependencies like this:

Import-Package:
 com.linkedin.colorado.helloworld.api;version=”[1.0.0,1.0.1)”,
 com.linkedin.colorado.helloworld.client;version=”[1.0.0,1.0.1)”,
 org.osgi.framework;version=”[1.4.0,1.4.1)”

Doesn’t this feel totally odd and just plain wrong? Shouldn’t we be able to trust the authors or “something else” to take care of compatibility issues? Someone who knows the code, its changes and its backwards compatibility better than we do? I don’t want to change all of my 200 bundles just because we have a new OSGi framework version that now provides a new final static property for instance, I am not using anyway! I want to express that I don’t care what changed in the provided bundle as long as my code still works as expected and I want to be sure that the provider has the exact same concept about “compatible code changes”.  The Eclipse community already realized this and created an API for baselining components and a version schema to apply versions. Along with several detailed instructions on what to consider a change and how this change should be treated in terms of its version (eclipse-vers1, eclipse-vers2, eclipse-vers3), they have created the first usable guide on defining version numbers for bundles. However, this is just a small step towards reusable artifacts. We also need to agree on such rules and be able to enforce them. Unfortunately a lot of questions remain unanswered yet. For instance, what about semantical changes? How to treat them and more importantly, how to identify those? Code analysis won’t always help here.

So what do we learn from this? Basically that we can’t trust any bundle provided by a third party just yet. It doesn’t matter how good the providers intentions are on providing the perfect bundle. Right now, we are basically left alone. Every time we update to a new version it is like a game. You never know what you’ll be facing.

Ok, we are screwed. Who to blame?

Now you might ask yourself who’s fault is it? It is certainly not the Spring guys fault who provide us in this example with the software, but who to blame instead? The tools? Most people either use BND, Eclipse or the newly created bundlor from Spring. All of these are pretty dumb. They can’t possibly know, which version to take (although they are trying hard to guess). There is no baseline, no knowledge about the domain or infrastructure). Too many questions are unanswered and the tool authors are left alone, so I think the tools are the last ones to blame. OK, so what about the OSGi specification? It is so vaguely written, when it comes to versioning your bundles – you can’t possibly draw any universal conclusion what version to apply. Everyone can have their own interpretation of “compatible” changes, which is not compatible among each other. All true, but I don’t think that a simple specification will be enough nor is the OSGi suitable for that. The issue is too big to be solved by only one company or organization all by themselves. Sun might be the only one fitting, but after all the problems with JSR 277 and project JigSaw, I have no convidence in their ability and willingness anymore. To be fair, one have to admit that the Java Language Specification does provide a chapter about binary compatibility, but it is not much of a help, because not all cases are covered and there is of course no notion of a bundle (I would love to rant about not treating packages as 1st class citizen in Java, but that’s an entirely different post). Sun also has a sigtest tool to check for API breaks, but with the given functionality it is pretty much worthless for what we need.

What next?

Is it the job of an external organization to define rules everyone has to apply while developing reusable artifacts for a specific language? I don’t think so. I think this should be the job of all of us. Maybe as a joint project, maybe umbrella’d by the JCP, I don’t no, but definitely as a open and community driven effort. I don’t wonna lock myself to any vendor or proprietary standard I might get stuck with. I dream of a central repository (maybe based on RFP 122, maybe something completely different), where I have a one stop shop for all the 3rd party artifacts I need. At the same time being able to do in-house development with the same reliable system not having to expose anything to the outside world. Open, reliable and trustworthy software with a healthy community of open source artifacts – does it have to be a dream or can we make it real? I already have ideas how this can come true, but I would be very much interested if you’re feeling the same or even having concrete projects concerning something similar I haven’t reference here? Is there a potential for collaboration? What do you think?

My 2 cents,
Mirko

References (in chronological order):

[OSGi]: http://www.osgi.org/
[OSGi Tool Summit]: http://www.osgi.org/Event/20090327/
[spring-repo]: http://www.springsource.com/repository/app/
[orbit-repo]: http://www.eclipse.org/orbit/
[osgi-repo]: http://www.osgi.org/Repository/HomePage/
[DynamicJava]: http://www.dynamicjava.org/projects/jsr-api/
[Flickr]: http://www.flickr.com/
[Spring DM 1.1.3]: http://www.springsource.org/osgi/
[Spring 2.5.6]: http://www.springsource.org/download/
[Maven 2]: http://maven.apache.org/download.html
[commons-fileupload]: http://commons.apache.org/fileupload/
[spring comFileUpload]: http://www.springsource.com/repository/app/bundle/version/detail?name=com.springsource.org.apache.commons.fileupload&version=1.2.0
[Servlet API]: http://java.sun.com/products/servlet/
[LinkedIn]: http://blog.linkedin.com/2009/02/17/osgi-at-linkedin-bundle-repositories/
[baselining]: http://www.ibm.com/developerworks/opensource/library/os-eclipse-api-tools/index.html
[version schema]: http://wiki.eclipse.org/Version_Numbering
[eclipse-vers1]: http://wiki.eclipse.org/Evolving_Java-based_APIs
[eclipse-vers2]: http://wiki.eclipse.org/Evolving_Java-based_APIs_2
[eclipse-vers3]: http://wiki.eclipse.org/Evolving_Java-based_APIs_3
[BND]: http://www.aqute.biz/Code/Bnd/
[eclipse]: http://www.eclipse.org/
[bundlor]: http://www.springsource.org/bundlor/
[JSR 277]: http://jcp.org/en/jsr/detail?id=277
[jigsaw]: http://osgi.mjahn.net/2008/12/04/componentization-wars-part-ii-guerrilla-tactics/
[Java Language Specification]: http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html
[binary compatibility]: http://java.sun.com/docs/books/jls/third_edition/html/binaryComp.html
[sigtest tool]: https://sigtest.dev.java.net/
[RFP 122]: http://www.tensegrity.hellblazer.com/2009/03/osgi-rfp-122—the-osgi-bundle-repository.html

  • Share/Bookmark

The Dynamic Import Issue

After some less OSGi technology centric posts I am now getting back to the meat and will highlight the problems involved with dynamic imports. In well designed OSGi applications the “DynamicImport-Package” header should be obsolete. Unfortunately not every code is OSGi aware. Actually it is not the missing OSGi awareness that is the core of the problem. It is more the notion of making wrong assumptions on the visibility of classes and the misuse of context class loaders in 90% of the cases I saw. But what is the problem in a concrete example?

Well, a common example is JAXB. You have a library/API bundle which for itself provides no real functionality, but needs to be “enhanced”. You need actual implementations to provide functionality. Unfortunately your library needs access to those classes. OSGi is very strict with its class path definition. This is done in a static manner and has to be set during build time of your bundle (Import-Package header in the manifest), but your library provider has no idea about the implementation bundles (and it shouldn’t!). As you can see there is no way of defining such dependencies at built-time. In OSGi the only way to enable this without modification of the bundle code is to include a dynamic import header.

Eclipse soon realized that this is an issue. Especially the undirected import forces in many cases a binding to bundles that are not really related to the bundle defining the import. This causes issues with restarts and updates of the library bundle, because a lot, if not all bundles, can be affect by that. Even worse, it can mean that you end up finding the wrong classes (and I am not yet talking about versions). Eclipse, while migrating to OSGi had a huge amount of legacy code not aware of OSGi concepts and so they were the first really feeling the pain. That’s why the Eclipse community introduced the concept of buddy class loading. Although a better way by providing a scope (you have to define that you’re a contributor on the one side and declare that you’re open for that on the other side), it doesn’t solve the problem of different versions or provider selection in general. 

Quiet a few people assume this is the only way to work around this kind of problems and they are correct when you don’t have the possibility of adding you’re own mechanisms or the API prevents other measures. However, if you have the access to the source, it is likely that you actually can do something that solves the problem. 

While working on my thesis, some years ago, I came across this problem and in some cases you are able to use basic OSGi mechanisms to get what you need. The core idea and the requirement is, that the library needs the implementation class and offers a way to either inject the object or tweak the class loading mechanism to actually find the class you are looking for. So the question is how to gain access to the implementation object or its class loader for that matter. For the ones familiar with OSGi this should be obvious. We instantiate the implementation in our implementation bundle and offer that one as a service. We can register that object with the interface(s) theses classes are implementing or the annotation class they got annotated with. This is an universal key to identify the correct service. The nice thing about this is also that OSGi will handle the version problem for free. With Spring or the up coming RFC 124 we don’t even have to write a single line of code in most cases, we don’t even have to touch the bundle. All we need is to provide is a fragment attached to the implementation bundle with a spring configuration inside creating a bean and publishing that one as a service.

Ok, we now have our implementation objects flying around in the ServiceRegistry. The next step is providing access to those objects in our library bundle. This is the actual tricky part and the point where you’ll most likely will face problems. Your API needs to provide a way of injecting these service objects and should allow you to dynamically manage them (add/remove/update). Otherwise you’ll end up with a static solution that once started won’t be able to adapt to the dynamic nature of OSGi. If you’re API provides/ uses a class loader look up that can be tweaked you’re the lucky one. Here you can provide a delegating class loader that dynamically queries the ServiceRegistry for the fitting object and returns its class loader. This is what we did in our OSGi-efied UIMA implementation. The drawback here is, once you provided the defining class loader you lost control on how to resolve others unless your library has some mechanisms for that, which is unfortunately not very likely.

As you’ve seen, you not always have to go for a dynamic import. All the mentioned approaches have their advantages and disadvantages and none of them are “perfect”, but at least you have the choice choosing the one fitting best to the specific problem context.

If you found other ways of migrating problematic legacy code, please let me know. Maybe there is a solution out there, I was just not aware of. You never know ;-)

Cheers,
Mirko

  • Share/Bookmark