Tilting at windmills

Duncan Mac-Vicar P.'s blog


4 Comments

On Java, Maven, JPP and rpm

Java on Linux has been always a “special” topic. They don’t mix well.

The mindset of Linux distributions is very different to the Java world when it comes to build software. This is understandable as they have different requirements.

In the Java world, there is the concept of artifacts. You build org.foo.bar:bar-moo:1.1 once and it stays there forever, archived for anyone to use it. Tools like maven and ivy allow developers to specify in their source tree the specific dependencies of their components and those are grabbed from the network, the software built and then publish the output as a new artifact that others can grab.

Linux distributions on the other hand, bootstrap the complete stack from source. They don’t take the binary artifact from upstream but build it, and then use the binary they built to build the next. This seems to work pretty well for C, C++, and for Ruby, Python, let’s say it “works”.

When it comes to package Java software, Linux distributors find themselves in the following situation:

  • If a buildable source tarball is provided then you are lucky.
  • If the buildable tarball is provided, it will either include a directory full of binary jars (the build dependencies) or it will have a very automatic build system grabbing them from the network.

This clashes with Linux in various edges:

  • Linux distributions have normally one version of each component. The Java method works well if you bundle your dependencies inside your application, but not if everything is a reusable component. I have mixed feelings here. I think bundling your dependencies in the application for everything that is not part of the “base” system is the right approach. Updating them can break the application and trying to control this via QA only moves QA from the application developers to the distribution itself. When I ship apache-commons-collection as part of my Foobar Java application as a Java package, I am inviting everyone to use it, forcing myself to give support for it out of the context of the application.
  • Distributions needs to build from source. Even if you get rid of the above requirement and you bundle all your dependencies, distributions want to build everything from source. This has technical and legal reasons. SUSE build system does very complex checks on every package that it builds. Those checks are part of the quality we sell to our customers. Other reasons are legal: I am still trying, for example, to build the Play! framework. Even if it is BSD, it includes some .jars inside of unknown origin. What would happen if one of these jars results to be proprietary?. Michael Vyskocil had a similar issue with openproj and its bundled dependencies.

    Another reason to build from source is support. Enterprise distributions sell support and if a customer has a problem, we will fix it on our own and not wait for upstream to release a new version. Having a standardized way to build from source with our own fixes allows us to serve our customers. We can bundle jars in our application, but if a bug traces back to a jar we included, we would need to change the complete build description of the product in order to take this component. If we are able to rebuild this component at all. It already happened to us once with an XML-RPC library. And we were glad that it could be fixed by adding a patch to the rpm build description.

  • Grabbing dependencies from the network just does not work. All packages are built without network access for security reasons.

Because the Linux distributions know that they are not the center of the universe, they adapted. At the beginning things where still ok. Ant was very popular and basically you recursively packaged all build dependencies until you could build your package, in the same way:

  • unpack
  • delete all binary jars
  • set CLASSPATH to the jars grabbed by the packaged build dependencies
  • call ant
  • install the jars

Something like this:


%prep
%setup -q

# remove all third party jars
find . -iname '*.jar' | xargs rm -rf

%build
export CLASSPATH=$(build-classpath foo)
ant

Until this was true, the world was still fine. ant needed bootstrapping, but this was doable.

Until Maven…

Maven is at the same time revolutionary and one of the biggest atrocities I have seen when building software.

On the positive side:

  • it defined a common convention for modules: groupId, artifactId, version.
  • it defined a standard layout for the source tree
  • it started a wave of convention over configuration that Java was always lacking

on the negative side:

  • it requires itself to build itself
  • it can’t build much itself, so it requires plugins to build anything
  • plugins require maven to be built, plus more dependencies, which are usually require maven to build, plus… plugins.

All the above means that maven basically requires all the software it is supposed to build. Not the best design for a build system.

To make things worse, maven grabs dependencies from the network, which is what is disabled in our distro build process.

Fedora has done quite a progress providing a maven stack, by improving extending on the conventions the JPackage project started for maven packages. This is implemented using what is called a “dependency map”.

The approach works by installing some xml files per-package that map the maven artefact identifiers (groupId, artifactId) to a local jar in the system. Then maven itself is patched to include a resolver for artefacts with some properties:

  • ignores multiple versions (usually in Linux you have one version installed)
  • resolves the artefact names to local installed jars. Every package uses macros to add stuff to the dependency map

What I don’t like for this approach is:

Why would anyone in their sane mind use XML files to create mappings to files when you are in a UNIX-like OS and you have the filesystem and symbolic links?.

It is very explicit. It does not rely on a simple convention.

A second issue is how packages are built. This is SUSE specific. Fedora can bootstrap packages with circular dependencies by introducing a binary package A, build other dependencies until it can build a real A. Once a package is built, it stays in the buildsystem frozen as an artefact (just like the Java world).

In the openSUSE Build Service, the repository is always ready to bootstrap. For circular dependencies you create a package A-bootstrap that provides A and set the project config to prefer A. When A does not exist, A-bootstrap is grabbed, but as soon as A is there, it is preferred and used. When a package changes the packages depending on it are rebuilt automatically. This approach has several advantages, but makes hard to bootstrap a collection of packages where everything depends on everything.

In openSUSE, we have successfully build many maven dependent packages in the Java:base project without having a maven package by using the maven ant plugin to generate a tarball with ant build files.

This method does not work for every package, specially when files are generated, then one needs also to include those. But they may be good enough for solving our specific bootstrapping problem. The question is how many bootstrap packages would we need.

Another idea is to use package with binary jars for bootstrapping.

Fedora is not very happy with the current situation either, and they have been researching adding native support to Koji to build maven packages.

In any case, I think there is room for improvement everywhere. I think the Maven infrastructure can be simplified taking into account that what maven contributed to the world was a (now) popular way to identify a module, and this is now being used also outside of Maven. Apache Ivy, SBT, Gradle, etc all support maven-style repositories and support refering to an artefact as groupId:artifactId:version.

Why not instead of a depmap just have:

/usr/share/java/foo.jar
/usr/share/java/org.bar/foo.jar -> /usr/share/java/foo.jar
/usr/share/java/org.bar/foo.pom

And have the Maven patched resolved to just look there?

If you need parallel versions, then just

/usr/share/java/foo1.jar
/usr/share/java/foo2.jar
/usr/share/java/org.bar/1.0/foo.jar -> /usr/share/java/foo1.jar
/usr/share/java/org.bar/2.0/foo.jar -> /usr/share/java/foo2.jar
/usr/share/java/org.bar/1.0/foo.pom
/usr/share/java/org.bar/2.0/foo.pom

Or use the standard alternatives:

/usr/share/java/foo.jar -> /etc/alternatives/foo.jar

The resolver would first look for the specific version described in the .pom file as /usr/share/java/$groupId/$version/$artifactId.ext. If it is not found, it could fallback to just look for /usr/share/java/$groupId/$artifactId.ext. This supports most cases where we just have one version for the system and exceptions for some packages where providing a specific version in parallel is also required. If the same jar is also known under a different groupId, well, then you create another symlink.

Then, build-classpath is enhanced so that in addition of being able to say ‘build-classpath commons-logging’ you can also call ‘build-classpath commons-logging:commons-logging’. Identify every module by this convention.

The same with Provides: java(commons-logging:commons-logging). Fedora is already doing this as mvn(..), but is this maven specific?.

Why do we need xml files with maps, fragments of XML files that need to be updated at install and uninstall time?.

Looking for a new solution…

I discussed this with Fedora developers Alexander Kurtakov and Stanislav Ochotnicky and they mostly agreed with my concerns. They pointed me to Carlo de Wolf’s work on a similar solution, but using a standard maven repository layout.

Carlo’s solution does not touch maven but is implemented as a plugin that gets loaded using a custom config file that is used when you call the wrapper script fmvn instead of mvn (for Fedora-Maven).

The whole solution as they described it has some extras like macros to symlink the maven repository artifacts so that they can be found as artifacts in the JPP layout. I am not sure if we need this. What I like from the solution alone:

  • It does what you expect: uses only the local repository and ignores versions (uses latest) if the requested version is not found.
  • It does not require macros. We need to build stuff on released distros and it is no fun to introduce new rpm macros.
  • It does not require patching maven. fmvn is a separate package, providing the plugins and the wrapper script.
  • As soon as Carlo gets “mvn install” working, there is no need to manually install the jar/pom in the spec file. Just calling “fmvn install” should build and install it.

I have been playing with Carlo’s plugins and it looks very promising. Fedora would need time to switch to a solution like this, but at SUSE we don’t have maven in our stack so we have nothing to lose and at the same time we can help serving as a test bed.

The current plan…

Not having the need to patch maven allows us to use a vanilla build of Maven for bootstrapping.

maven-bootstrap (upstream binary release, Provides: maven)
fmvn-bootstrap (binary jars built locally with maven, Provides: fmvn)

Note: If you have more than one package with the same capability and want to use it in (Build)Requires, you will need to setup “Prefer:” in prjconf.

We would like to build now maven using fmvn. Here is where the circular dependencies start. We need maven (provided by maven-bootstrap) and it dependencies, like plexus and a big bunch of maven plugins.

Here is where pom2spec comes to the rescue. This script allows to quickly create bootstrap packages from search.maven.org. It is based on Pascal Bleser’s script.

So lets say I need a bootstrap package for maven-compiler-plugin:


org.apache.maven.plugins:maven-compiler-plugin : using version 2.3.2
Writing maven-compiler-plugin-bin.spec
Done
Downloading maven-compiler-plugin-2.3.2.pom...
######################################################################## 100.0%
Downloading maven-compiler-plugin-2.3.2.jar...
######################################################################## 100.0%
t http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.3.2/maven-compiler-plugin-2.3.2.pom
_ http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.3.2/maven-compiler-plugin-2.3.2.jar

Which generates the following files:


maven-compiler-plugin-2.3.2.jar
maven-compiler-plugin-2.3.2.pom
maven-compiler-plugin-bin.spec

If I build it, I get an rpm with the following layout:


/usr/share/java/maven-compiler-plugin.jar
/usr/share/maven/repository/org/apache/maven/plugins/maven-compiler-plugin/maven-compiler-plugin-2.3.2.jar
/usr/share/maven/repository/org/apache/maven/plugins/maven-compiler-plugin/maven-compiler-plugin-2.3.2.pom

/usr/share/java/maven-compiler-plugin.jar is just a symlink to the real jar. This layout is enough for fmvn to find the artifact and also for legacy packages to just use build-class-path. It would still be better to enhance build-class-path to also accept groupId:artifactId keys and return the path to the jar.

The -bin suffix is to allow then the real package (built from source) to coexist in the same repository. The package with the -bin suffix also "Provides:" the package without the suffix so it can be used by dependent packages. Actually both "Provide:" java(org.apache.maven.plugins:maven-compiler-plugin) which is what a package that depends on it should "BuildRequire:".

Once Carlo's resolver works with "mvn install" I will try to build a repository following this method.


9 Comments

The greatest unknown openSUSE 11.0 package management feature

During the development of openSUSE 11.0, we have been reporting in real time cool improvements like the [fast installation][4], [how YaST became sexy][5], [how YaST/ZYpp/zypper became fast][1], [how YaST/ZYpp/zypper performs better than others][2] and even [that our solver is also really smart][3].

However, there is something else…

## Interoperability

Media_httpimg232image_yfnpi

### Background

One of the features of our stack is the availability of patches and patterns. The first provide updates in a sense of “fix for a problem” (which can mean various, or none updated packages), while patterns are intelligent groups that can recommend, require and suggest packages in order to make certain functionality available, without being too strict in the specific packages to install.

Unlike in other systems where groups and updates are handled as special entities, ZYpp patterns and patches are just objects with dependencies like packages, and the solver threats them in the same way.

Because rpm installed packages database does not know about patterns and patches, in openSUSE 10.x (and SLE10) those objects are installed in a separate database, only viewable to libzypp. This is hidden to the users, but does not allow for easy management using 3rd party tools.

In addition to that, the patch metadata format is our own extension to the [metadata][7] handled by [yum][6], the tool used by Fedora and Centos. That means, even if other distribution provide similar concepts, they will mostly ignore our extended metadata.

This is sad, if we share 90% of the metadata format, why not go further?. Sometimes it is no worth to wait that others do steps in becoming more interoperable with you, so what about doing those steps ourselves?

At this time, Fedora was implementing updates metadata by using a yum plugin and a updateinfo.xml description. Metadata for deltarpm availability is handled via the yum-presto plugin.

### Sharing tools and data, a step for Interoperability

#### Metadata format

In openSUSE 11.0, ZYpp reads patches from updateinfo.xml too! ([check 11.0 update repo!][17]). Not only that, our delta rpm availability metadata will be in the same format [yum-presto][9] (with some modifications agreed with its author).

How will this benefit you?

* You will be able to use yum stack with updates out of the box with updates and deltarpms, and they will just work.
* You will be able to generate custom patches for openSUSE using existing tools like [Bodhi][8], from Fedora, or generate custom deltarpm information using the yum-presto included tools.
* We are working hard to get the [ZYpp/YaST stack to build on Fedora][10] (and other distros), in a near future, you will be able to enjoy ZYpp performance and features on Fedora with their own repositories!
* We decoupled the delta-rpm information from patches, so we may start adding delta-rpm to normal factory packages and it will start to work out of the box!
* Much more!

#### Handling of patches and patterns

As we mentioned, in 10.x codebase we used to install patterns and patches in a special database. This is no longer the case.

Patterns and patches are no longer installed, which means your system is rpm only! Patches are shown now as (un)satisfied (and (un)relevant). Which means you have all the requirements to consider them present.

All the information of patches and patterns (and products) is extra information that openSUSE applications use to add more value to you. So if you for example remove a repository offering patches, then you just lose the information about which patches do you have, the real information is the rpm packages you have installed. When you re-add the update repository, and you can immediately see which patches published affect you, which ones are irrelevant, and which ones are relevant but you don’t need because your packages are up to date.

Patterns and patches become “advice” and “value”, not extra non-compatible information.

How will this benefit you?

* Simpler system.
* No conflicts because using 3rd party tools, “rpm by hand” or our native tools.

#### openSUSE, PackageKit enabled

[PackageKit][11] is the new actor in the package management world. It is a thin layer that provides applications access to the package management system as a [DBUS][14] service. You may heard about it because Fedora 9 is coming PackageKit enabled. How it benefits you?

* Role based (non-root) package management, via [PolicyKit][15].
* Sharing of upstream tools across distributions.
* Gives the desktop the chance to integrate with software manipulation.

So, openSUSE 11.0 is fully PackageKit enabled. You will be able to use all [PackageKit compatible applications][16] on openSUSE and they will use the ZYpp stack underneaths. Not only that it is enabled, but our hackers [Scott Reeves][13] and [Stefan Haas][14] did an amazing job on the backend, I would dare to say it is one of the most robust backend implementations, and it fully benefits from the ZYpp speed and features.

### Future

All this improvements are available now. May be you are already enjoying them in Factory. However this opens the door for new possibilities, just a few examples come to mind:

* The openSUSE Build Service, the great software building platform from our openSUSE team, builds packages for all major distributions since long time. The build service could allow to enter and generate patch information for fixed bugs and the update/patch information will be compatible across yum/Fedora and openSUSE. Same with deltarpm information.
* We could extend ZYpp parser to understand Fedora groups stored in comps.xml and threat them as patterns.
* Do you have more ideas?

### Community involvement

We welcome any help on creating more interoperability possibilities, especially about building the ZYpp stack and YaST on Fedora, Mandriva and others. There are already some packages building in the build service, but we still have a long way to go.

[1]: http://duncan.mac-vicar.com/blog/archives/296
[2]: http://duncan.mac-vicar.com/blog/archives/309
[3]: http://duncan.mac-vicar.com/blog/archives/311
[4]: http://www.kdedevelopers.org/node/3385
[5]: http://www.kdedevelopers.org/node/3143
[6]: http://en.wikipedia.org/wiki/Yellow_dog_Updater%2C_Modified
[7]: http://linux.duke.edu/projects/metadata/
[8]: http://fedorahosted.org/bodhi
[9]: http://hosted.fedoraproject.org/presto
[10]: http://download.opensuse.org/repositories/zypp:/Backport/Fedora_8/
[11]: http://www.packagekit.org/
[12]: http://en.opensuse.org/User:Haass
[13]: http://en.opensuse.org/User:Sreeves1
[14]: http://freedesktop.org/wiki/Software/dbus
[15]: http://hal.freedesktop.org/docs/PolicyKit/index.html
[16]: http://www.packagekit.org/pk-screenshots.html
[17]: http://download.opensuse.org/update/11.0


12 Comments

openSUSE 11.0 beta

openSUSE 11.0 beta is coming. This means that all the pieces of the update stack have to be in place by then.

While reading [Review: Hat Trick For Fedora 9 Beta][11], it caught my attention that the author says “Fedora continues to wear the innovation hat” because some of the changes described there. I did not see any killer feature openSUSE 11.0 does not have too, and a few ones are worth to highlight:

* KDE 4 integration: No distro will ever beat openSUSE here ;-) you know that. openSUSE is a multi desktop distribution and both the Gnome and the KDE team are unbeatable. Period. (I am objective today ;-) )
* The installation experience and look essentially remained the same from Fedora 8: Sorry guys, but YaST installer [looks far cooler][12] than in our last release. The [experience is also much different][13].
* except there is now support for resizing ext2, ext3, and NTFS partitions from the installer: zzzzzzz. YaST resizes partitions before Linux was invented.
* The installer also checks for password strength for the root account: YaST in the 80′s ?
* The new package management solution, PackageKit, is another interesting feature: openSUSE 11.0 will have native PackageKit support (the backend is upstream) and it works really well. The yum backend has no good reputation. The ZYpp backend in 11.0 inherits all the unbeatable speed of ZYpp 4.x and it is robust at the same time.

Sorry guys, the innovation hat is green. Ok, enough with articles. Lets back to 11.0 beta.

We talked about [package management speed][7], we talked about [new looks and features already][8]. However our work around patches and patterns was still missing.

During the last weeks, we have been working on this and now all the pieces start to fall together. Click on any image to see it in full size. Also note that ugly scrollbar in the disk usage is was also fixed already.

You may remember the pattern selector:

[![Old Pattern Selector][9]][10]

New pattern selector:

[![New Pattern Selector][3]][4]

If you go to the repository view, it was a little boring. openSUSE has the Build Service, which has generated a big community of repositories. Visually, we want to make a difference to the eye if a repo is a normal repository, or the home project of a friend, or a well known repository, or may be a update repository. Result?

[![New repo view][1]][2]

The old patch view was confusing. The category column never had space for itself.

[![Old patch selector][14]][15]

So we introduced categories just like in the patterns view.

[![Old patch selector][5]][6]

There are still some details. In that view the reboot patch should not be shown in the default filter, because I don’t have the package the patch fixes installed. For some reason isRelevant() is not working there. Same with the security patch. It shows correctly the fact that the patch is satisfied, but therefore it should be hidden.

In a next post I will write a bit about how patterns, products and patches are handled now, plus other features.

[1]: http://files.opensuse.org/opensuse/en/thumb/3/3f/Repos.png/783px-Repos.png
[2]: http://files.opensuse.org/opensuse/en/3/3f/Repos.png
[3]: http://files.opensuse.org/opensuse/en/thumb/e/e6/Patterns.png/750px-Patterns.png
[4]: http://files.opensuse.org/opensuse/en/e/e6/Patterns.png
[5]: http://files.opensuse.org/opensuse/en/thumb/8/8c/Patches.png/800px-Patches.png
[6]: http://files.opensuse.org/opensuse/en/8/8c/Patches.png
[7]: http://duncan.mac-vicar.com/blog/archives/296
[8]: http://duncan.mac-vicar.com/blog/archives/303
[9]: http://files.opensuse.org/opensuse/de/thumb/b/b8/Selecting-patterns-de.png/75…
[10]: http://files.opensuse.org/opensuse/de/b/b8/Selecting-patterns-de.png
[11]: http://www.crn.com/software/207200137
[12]: http://news.opensuse.org/2008/03/19/announcing-opensuse-110-alpha-3/
[13]: http://www.kdedevelopers.org/node/3385
[14]: http://files.opensuse.org/opensuse/de/thumb/e/e7/Yast-gui-online-update.png/5…
[15]: http://de.opensuse.org/Bild:Yast-gui-online-update.png


Leave a comment

ZYpp stack on other distributions?

Keeping a permanent build of svn on the build service motivated me to try to build it on other non-SUSE based distros. On the way I discovered not only simple things as different package names, but linking problems, compile errors, etc.

In the past it did not make much sense to try libzypp outside of SUSE as we were trying to catchup with other tools speed. But now that libzypp speed outperforms any other tool in speed, while keeping it complete set of features, we may start thinking about taking over the world. May be other distros want to use libzypp, or why not, the small and powerful sat-solver library alone.

So, since today I got the first successful build of [sat-solver/libzypp/zypper][4] on [Fedora 8][1]. Mandriva is also close, but I still get a compiler error I should not get when compiling the rpm backend.

At the beginning, it may be a consuming effort, but once it is a continuous process, our software should adopt a more agnostic view of the world and just compile there out of the box. If anyone has the time to test it, I would be interested in the results ;-)

By the way, our colleagues at internal tools finished processing the FOSDEM talk’s videos we had in the openSUSE Developer’s room. You can find the ogg video files [here][2]. Also you can watch on [google video here][3].

[1]: http://fedoraproject.org/
[2]: http://tube.opensuse.org/
[3]: http://video.google.de/videosearch?q=FOSDEM2008+site%3Avideo.google.com&a…
[4]: https://build.opensuse.org/project/monitor?project=zypp%3Asvn

Follow

Get every new post delivered to your Inbox.