On Java, Maven, JPP and rpm
Java on Linux has been always a “special” topic. They don’t mix well.
The mindset of Linux distributions is very different to the Java world when it comes to build software. This is understandable as they have different requirements.
In the Java world, there is the concept of artifacts. You build org.foo.bar:bar-moo:1.1 once and it stays there forever, archived for anyone to use it. Tools like maven and ivy allow developers to specify in their source tree the specific dependencies of their components and those are grabbed from the network, the software built and then publish the output as a new artifact that others can grab.
Linux distributions on the other hand, bootstrap the complete stack from source. They don’t take the binary artifact from upstream but build it, and then use the binary they built to build the next. This seems to work pretty well for C, C++, and for Ruby, Python, let’s say it “works”.
When it comes to package Java software, Linux distributors find themselves in the following situation:
- If a buildable source tarball is provided then you are lucky.
- If the buildable tarball is provided, it will either include a directory full of binary jars (the build dependencies) or it will have a very automatic build system grabbing them from the network.
This clashes with Linux in various edges:
- Linux distributions have normally one version of each component. The Java method works well if you bundle your dependencies inside your application, but not if everything is a reusable component. I have mixed feelings here. I think bundling your dependencies in the application for everything that is not part of the “base” system is the right approach. Updating them can break the application and trying to control this via QA only moves QA from the application developers to the distribution itself. When I ship apache-commons-collection as part of my Foobar Java application as a Java package, I am inviting everyone to use it, forcing myself to give support for it out of the context of the application.
-
Distributions needs to build from source. Even if you get rid of the above requirement and you bundle all your dependencies, distributions want to build everything from source. This has technical and legal reasons. SUSE build system does very complex checks on every package that it builds. Those checks are part of the quality we sell to our customers. Other reasons are legal: I am still trying, for example, to build the Play! framework. Even if it is BSD, it includes some .jars inside of unknown origin. What would happen if one of these jars results to be proprietary?. Michael Vyskocil had a similar issue with openproj and its bundled dependencies.
Another reason to build from source is support. Enterprise distributions sell support and if a customer has a problem, we will fix it on our own and not wait for upstream to release a new version. Having a standardized way to build from source with our own fixes allows us to serve our customers. We can bundle jars in our application, but if a bug traces back to a jar we included, we would need to change the complete build description of the product in order to take this component. If we are able to rebuild this component at all. It already happened to us once with an XML-RPC library. And we were glad that it could be fixed by adding a patch to the rpm build description.
- Grabbing dependencies from the network just does not work. All packages are built without network access for security reasons.
Because the Linux distributions know that they are not the center of the universe, they adapted. At the beginning things where still ok. Ant was very popular and basically you recursively packaged all build dependencies until you could build your package, in the same way:
- unpack
- delete all binary jars
- set CLASSPATH to the jars grabbed by the packaged build dependencies
- call ant
- install the jars
Something like this:
%prep
%setup -q
# remove all third party jars
find . -iname '*.jar' | xargs rm -rf
%build
export CLASSPATH=$(build-classpath foo)
ant
Until this was true, the world was still fine. ant needed bootstrapping, but this was doable.
Until Maven…
Maven is at the same time revolutionary and one of the biggest atrocities I have seen when building software.
On the positive side:
- it defined a common convention for modules: groupId, artifactId, version.
- it defined a standard layout for the source tree
- it started a wave of convention over configuration that Java was always lacking
on the negative side:
- it requires itself to build itself
- it can’t build much itself, so it requires plugins to build anything
- plugins require maven to be built, plus more dependencies, which are usually require maven to build, plus… plugins.
All the above means that maven basically requires all the software it is supposed to build. Not the best design for a build system.
To make things worse, maven grabs dependencies from the network, which is what is disabled in our distro build process.
Fedora has done quite a progress providing a maven stack, by improving extending on the conventions the JPackage project started for maven packages. This is implemented using what is called a “dependency map”.
The approach works by installing some xml files per-package that map the maven artefact identifiers (groupId, artifactId) to a local jar in the system. Then maven itself is patched to include a resolver for artefacts with some properties:
- ignores multiple versions (usually in Linux you have one version installed)
- resolves the artefact names to local installed jars. Every package uses macros to add stuff to the dependency map
What I don’t like for this approach is:
Why would anyone in their sane mind use XML files to create mappings to files when you are in a UNIX-like OS and you have the filesystem and symbolic links?.
It is very explicit. It does not rely on a simple convention.
A second issue is how packages are built. This is SUSE specific. Fedora can bootstrap packages with circular dependencies by introducing a binary package A, build other dependencies until it can build a real A. Once a package is built, it stays in the buildsystem frozen as an artefact (just like the Java world).
In the openSUSE Build Service, the repository is always ready to bootstrap. For circular dependencies you create a package A-bootstrap that provides A and set the project config to prefer A. When A does not exist, A-bootstrap is grabbed, but as soon as A is there, it is preferred and used. When a package changes the packages depending on it are rebuilt automatically. This approach has several advantages, but makes hard to bootstrap a collection of packages where everything depends on everything.
In openSUSE, we have successfully build many maven dependent packages in the Java:base project without having a maven package by using the maven ant plugin to generate a tarball with ant build files.
This method does not work for every package, specially when files are generated, then one needs also to include those. But they may be good enough for solving our specific bootstrapping problem. The question is how many bootstrap packages would we need.
Another idea is to use package with binary jars for bootstrapping.
Fedora is not very happy with the current situation either, and they have been researching adding native support to Koji to build maven packages.
In any case, I think there is room for improvement everywhere. I think the Maven infrastructure can be simplified taking into account that what maven contributed to the world was a (now) popular way to identify a module, and this is now being used also outside of Maven. Apache Ivy, SBT, Gradle, etc all support maven-style repositories and support refering to an artefact as groupId:artifactId:version.
Why not instead of a depmap just have:
/usr/share/java/foo.jar /usr/share/java/org.bar/foo.jar -> /usr/share/java/foo.jar /usr/share/java/org.bar/foo.pom
And have the Maven patched resolved to just look there?
If you need parallel versions, then just
/usr/share/java/foo1.jar /usr/share/java/foo2.jar /usr/share/java/org.bar/1.0/foo.jar -> /usr/share/java/foo1.jar /usr/share/java/org.bar/2.0/foo.jar -> /usr/share/java/foo2.jar /usr/share/java/org.bar/1.0/foo.pom /usr/share/java/org.bar/2.0/foo.pom
Or use the standard alternatives:
/usr/share/java/foo.jar -> /etc/alternatives/foo.jar
The resolver would first look for the specific version described in the .pom file as /usr/share/java/$groupId/$version/$artifactId.ext. If it is not found, it could fallback to just look for /usr/share/java/$groupId/$artifactId.ext. This supports most cases where we just have one version for the system and exceptions for some packages where providing a specific version in parallel is also required. If the same jar is also known under a different groupId, well, then you create another symlink.
Then, build-classpath is enhanced so that in addition of being able to say ‘build-classpath commons-logging’ you can also call ‘build-classpath commons-logging:commons-logging’. Identify every module by this convention.
The same with Provides: java(commons-logging:commons-logging). Fedora is already doing this as mvn(..), but is this maven specific?.
Why do we need xml files with maps, fragments of XML files that need to be updated at install and uninstall time?.
Looking for a new solution…
I discussed this with Fedora developers Alexander Kurtakov and Stanislav Ochotnicky and they mostly agreed with my concerns. They pointed me to Carlo de Wolf’s work on a similar solution, but using a standard maven repository layout.
Carlo’s solution does not touch maven but is implemented as a plugin that gets loaded using a custom config file that is used when you call the wrapper script fmvn instead of mvn (for Fedora-Maven).
The whole solution as they described it has some extras like macros to symlink the maven repository artifacts so that they can be found as artifacts in the JPP layout. I am not sure if we need this. What I like from the solution alone:
- It does what you expect: uses only the local repository and ignores versions (uses latest) if the requested version is not found.
- It does not require macros. We need to build stuff on released distros and it is no fun to introduce new rpm macros.
- It does not require patching maven. fmvn is a separate package, providing the plugins and the wrapper script.
- As soon as Carlo gets “mvn install” working, there is no need to manually install the jar/pom in the spec file. Just calling “fmvn install” should build and install it.
I have been playing with Carlo’s plugins and it looks very promising. Fedora would need time to switch to a solution like this, but at SUSE we don’t have maven in our stack so we have nothing to lose and at the same time we can help serving as a test bed.
The current plan…
Not having the need to patch maven allows us to use a vanilla build of Maven for bootstrapping.
maven-bootstrap (upstream binary release, Provides: maven)
fmvn-bootstrap (binary jars built locally with maven, Provides: fmvn)
Note: If you have more than one package with the same capability and want to use it in (Build)Requires, you will need to setup “Prefer:” in prjconf.
We would like to build now maven using fmvn. Here is where the circular dependencies start. We need maven (provided by maven-bootstrap) and it dependencies, like plexus and a big bunch of maven plugins.
Here is where pom2spec comes to the rescue. This script allows to quickly create bootstrap packages from search.maven.org. It is based on Pascal Bleser’s script.
So lets say I need a bootstrap package for maven-compiler-plugin:
org.apache.maven.plugins:maven-compiler-plugin : using version 2.3.2
Writing maven-compiler-plugin-bin.spec
Done
Downloading maven-compiler-plugin-2.3.2.pom...
######################################################################## 100.0%
Downloading maven-compiler-plugin-2.3.2.jar...
######################################################################## 100.0%
t http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.3.2/maven-compiler-plugin-2.3.2.pom
_ http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.3.2/maven-compiler-plugin-2.3.2.jar
Which generates the following files:
maven-compiler-plugin-2.3.2.jar
maven-compiler-plugin-2.3.2.pom
maven-compiler-plugin-bin.spec
If I build it, I get an rpm with the following layout:
/usr/share/java/maven-compiler-plugin.jar
/usr/share/maven/repository/org/apache/maven/plugins/maven-compiler-plugin/maven-compiler-plugin-2.3.2.jar
/usr/share/maven/repository/org/apache/maven/plugins/maven-compiler-plugin/maven-compiler-plugin-2.3.2.pom
/usr/share/java/maven-compiler-plugin.jar is just a symlink to the real jar. This layout is enough for fmvn to find the artifact and also for legacy packages to just use build-class-path. It would still be better to enhance build-class-path to also accept groupId:artifactId keys and return the path to the jar.
The -bin suffix is to allow then the real package (built from source) to coexist in the same repository. The package with the -bin suffix also "Provides:" the package without the suffix so it can be used by dependent packages. Actually both "Provide:" java(org.apache.maven.plugins:maven-compiler-plugin) which is what a package that depends on it should "BuildRequire:".
Once Carlo's resolver works with "mvn install" I will try to build a repository following this method.
Poor man’s rollback
By Andrew Wafaa‘s request.
Save the package list:
dmacvicar@piscola:~> rpm -qa --queryformat="%{name}\n" > 1
Do something… like uninstalling what was cool last week and it is not cool anymore:
dmacvicar@piscola:~> sudo zypper rm erlang
Loading repository data...
Reading installed packages...
Resolving package dependencies...
The following packages are going to be REMOVED:
erlang rabbitmq-server
2 packages to remove.
After the operation, 51.3 MiB will be freed.
Continue? [y/n/?] (y): y
Removing rabbitmq-server-2.2.0-1.2 [done]
Removing erlang-R14B-1.2 [done]
There are some running programs that use files deleted by recent upgrade. You may wish to restart some of them. Run 'zypper ps' to list these programs.
Save the new state:
dmacvicar@piscola:~> rpm -qa --queryformat="%{name}\n" > 2
Now you need to know that zypper accepts + and – in its input. You can install and uninstall packages in one go:
zypper in -- +pkg1 -pkg2 +pkg3 ...
So we can diff both files:
dmacvicar@piscola:~> diff -u 1 2
--- 1 2012-01-19 17:23:26.640180000 +0100
+++ 2 2012-01-19 17:24:43.196248000 +0100
@@ -420,7 +420,6 @@
gnome-themes-accessibility
libeet1
icc-profiles-mini
-rabbitmq-server
kde4-filesystem
gpg-pubkey
libiptcdata-lang
@@ -3561,7 +3560,6 @@
perl-Config-General
PolicyKit-devel
gtk2-engine-aurora
-erlang
libeet-devel
cyrus-sasl-gssapi
libimobiledevice2
Close to what we need. We remove the context lines by using -u0 and we remove the 3 first lines:
dmacvicar@piscola:~> diff -u0 1 2 | grep -Ev '^(@@|\+\+|--)'
-rabbitmq-server
-erlang
Now feed zypper with this to get your packages back:
zypper in -- $(diff -u0 2 1 | grep -Ev '^(@@|\+\+|--)' | xargs)
Of course this only will work if you have all repositories. It is also useful to sync packages across computers (like you get a new laptop and need to setup it in a similar way).
openSUSE 12.1 rollback is implemented using btrfs via snapper, plus a zypp plugin that records a snapshot on every commit. It should be possible to write a Poor’s man version by recording the package list on every commit and then performing the above operation to go one or more steps back.
new ruby RPM bindings
The original ruby-rpm bindings were originally written around the year 2002 for the Kondara Linux distribution. David Lutterkort adopted them to power various systems management pieces written in ruby, and later I did a couple of releases.
After openSUSE 12.1 was released, the gem stopped building against the current rpm (4.9.x) and something needed to be done. After studying the code a lot I figured out:
- API compatibility was important, as the goal was to keep some software running.
- I did not want to add more #ifdefs to the code, as it was already supporting ancient rpm versions.
- I wanted to avoid C where ruby could be used instead
- I wanted good documentation (I had added some to ruby-rpm in the latest releases)
I decided to start fresh: target rpm 4.9.x first and later see if older rpms can be supported. I want to introduce an early release of the new rpm gem.
What does this milestone implement?
- Querying rpm database
- Querying packages
How is this gem different to the original ruby-rpm?
- It is written in pure ruby, and uses FFI to access librpm
- It is documented
- It will be compatible with ruby-rpm. The testsuite is a continuation of the original
- It is MIT licensed instead of GPL. Kenta Murata gave me permission to re-license all the code I studied while writing the pure-ruby version
What is missing?
- Only rpm 4.9.0 is supported for now. May be older versions work. I do know that 4.4.x does not work.
- Not all APIs are covered yet, for example the RPM::Source class or methods to execute transactions (install, remove).
What did I learn on the way
The RPM API and compatibility
The RPM API is quite scary. Part of it because the field it covers but also older rpm versions exposed lot of unnecesary stuff to the API. Comparing the API across versions shows that Panu is doing an awesome job cleaning it up.
Because the mess in rpm 4.4.x, supporting older rpm versions will not be trivial. I realized very late that functions like headerNew are not even exposed as symbols in 4.4.x.
FFI is great, but it is not there yet
FFI is advocated as a better way to access native code from ruby interpreters. The true is that ruby never had any API to do that, and when you write ruby C extensions you are just playing with the MRI interpreter guts.
With FFI, each interpreter provides the FFI API and implements it. For example, JRuby may use JNI, and MRI may implement it as a C extension using the API we all already know.
However, after being unable to run the gem on rubinius because its FFI does not implement enums, I realized the C compatibility layer most interpreters provide may be even more mature than FFI itself.
Also, because you are accessing the library symbols, you inherit another set of problems, like the inability to refer to anything that is not a symbol, like macros.
It is still better, but it needs time to mature. I am happy that I can write ruby code that interacts with the operating system without having to tie the code to an specific interpreter or platform.
Go grab the source code and send me a pull request!.
bicho 0.0.3 released
Bicho is a ruby gem implementing access to bugzilla. It is a library but comes with a simple command line client.
This release fixes some bugs and adds support for named queries.
From the API, you can give a bug number or named query, or a combination of many of them:
server.get(127043, 432423) => [....]
server.get("Named list") => [....]
server.get("Named list", 4423443) => [....]
Named queries will be “expanded” to a list of bugs.
or from the command line
bicho -b https://user:pw@bugzilla.domain.com show query-name
If you are using Novell's bugzilla, Bicho includes a plugin that automatically authenticates using your .oscrc credentials.
Be sure to also checkout Klaus's bugzilla adapter for data-mapper, which is also powered by Bicho.
Picasa 3.8 on Linux (and fix web albums login on the way)
Today I found myself with Picasa for Linux (3.0 beta) not allowing me to login to web albums, even if I could login without problems from the web browser.
After googling a bit, it seems that Picasa 3.0 does not work anymore due to some Google+ related changes. On the other hand it looks like Picasa on Linux is abandoned ie: 3.0 vs 3.8 on Windows.
Picasa 3.8 has some interesting new features. Why not a Linux version? Picasa for Linux is no more than Picasa for Windows bundled with wine plus some minor changes. Ok, lets do one ourselves.
I started by unpacking the original rpm and replacing the “Picasa3″ folder in “Program Files” with the tarred content of the “Program Files/Picasa3″ resulting from installing Picasa 3.8 with wine. That worked, but it requires you to create this tarball.
Then I went one step further: why not trying installing the newer Picasa 3.8 inside the build section of the .spec file? Thanks to the wpkg project I figured how to run the installer in unattended mode. The Build Service Tips & Tricks page explains how to run something that requires an X server using XVfb.
So the spec file first unpacks the original Linux rpm in the builroot. Then runs the Windows-based installer in unattended mode using a temporary wine prefix, and then copies the new installed Picasa over to the buildroot, replacing the files in the original rpm. We use the official rpm as a base because it contains a custom wine and some other Linux integrations, however it would be worth to see if it behaves better with newer wine versions.
You can find the resulting .spec file in my home project.
I can’t redistribute the original rpm and the windows installer, so I include a fetch.sh (just like the spec file for the nvidia driver does) that will fetch those binary files.
To build it:
osc co home:dmacvicar picasa
# get the files I can't redistribute
./fetch
osc build openSUSE_Factory
Now install it and you should have Picasa 3.8 on Linux, which also solves the issue of login into Picasaweb:
Factory and package guidelines
I see some changes going into Factory that apply current packaging guidelines to packages.
To summarize some of the cleanups you can find as submit requests: (no guarantee that those are actual guidelines. Check yourself)
-# spec file for package dom4j (Version 1.6.1) +# spec file for package dom4j
-# norootforbuild
-AutoReqProv: on
-Authors: ---------- - Bob Esponja - Peter Parker
-make %{?jobs:-j%jobs}
+make %{?_smp_mflags}
+%check + make check
%install -rm -rf $RPM_BUILD_ROOT
-%clean -rm -rf $RPM_BUILD_ROOT -
proprietary drivers broken with Factory
ld.so.conf seems to have changed to include /etc/ld.so.conf.d/*.conf.
Unfortunately, proprietary drivers like the NVIDIA installed a file without the .conf extension. Therefore XOrg did not find the custom openGL library included by the driver.
Tracked as bug 718734.
home:dmacvicar:branches:X11:Drivers:Video contains a fixed nvidia-gfxG02 (Also updated to 280.13).
Realtime syntax checking with emacs
Note: this post is a web cache recover of a November 2010 post that got lost with the blog crash. I updated it to add C/C++ autocompletion.
One of the nice features of fat IDEs is that you get real time syntax checking. Some languages make it easy, some not.
For example Eclipse has access to the compiler as a service inside the IDE, and it checks the code as you type, even suggesting fixes.
I started to research what could be done on the emacs side to get a better experience, as when I am coding, I am usually thinking at the same time I write, and this means I make more syntax errors than the average guy.
So I found flymake, which is included by default in the emacs package. Copy pasting some snippets from the emacs wiki I got really nice real time checking for ruby:
Going forward with python was easy. A flymake extension that uses pyflakes (a python checking program) was already available.
It was with C/C++ where things started to get more complicated. Flymake has support for those hardcoded so that make with a special target is invoked. I wanted to check the current file only. Also gcc syntax errors are not that good.
So I started to modify the extensions I had seen to use clang (and clang++) from the LLVM project. Once it worked, I got nice error messages on C/C++ files:
I was so happy to have this working on C++ that I needed more to challenge my new acquired mastery. So I decided to try it with ycp. Not that I code with it that often, but I have colleagues who do. After adapting the extensions, here is the result:
You can find all the required files on my emacs setup repository. Follow custom.el which goes to custom-$lang.el to site-lisp/flymake-$lang.el.
Update 20.11.2010:
Thanks to this post I managed to get also Rails .erb templates working. Those are more tricky as they can’t be parsed directly by ruby, but they have to go first through erb -x and then through ruby -c. I ported the script to the style of loading (init and load functions) I was already using.
Update 30.08.2011:
Thanks to this autocomplete extension, I was able to setup autocompletion using clang for C/C++. Here is how it looks like:
Next! ruby autocompletion.
NVIDIA nightmare
NVIDIA continues to be a nightmare on my T410 hardware.
The proprietary driver does not resume correctly after the second suspend to ram. Not sure if it has something to do with the dock.
The nouveau driver continues to be affected by the famous bug 26980 which has been affecting for a while a set of GPUs, producing random freezes.
There was some hope in this comment, which points to this patch and stattes that using 2.6.38 with that patch improves the situation. I am trying to build Kernel:stable + this patch here.
openSUSE clients with Spacewalk
My team has been busy the last months with the release of SUSE Manager, which was received with very good reviews. There is lot of room for improvements, some of them specific to SUSE products/integration but others in Spacewalk itself.
There is lot of work to do and lot of patches being reviewed. Lot of them are already upstream.
One common question is: if I already have a Spacewalk server, how do I setup openSUSE clients?
Thanks to Michael Calmer, we submitted the required packages to a repository in the openSUSE Build Service. You can find the instructions looking for the SUSE section in the “Registering Clients” page of the official Spacewalk wiki.










