Duncan Mac-Vicar

Just another WordPress.com weblog


10 Comments

A better Java: Scala or Xtend?

I have been playing with two languages recently: Scala and Xtend.

Xtend and Scala have some similarities, but don’t let this make you think they are that similar. Both are JVM based languages offering a refresh over Java, but the itch they focus on is different and the culture behind them is even more different.

Xtend focuses on fixing Java so that it is good enough to do the stuff you are already doing with Java without so much pain. It is like bringing Java to the state where C# is, which is something Sun and Oracle haven’t been able to do. Xtend compiles to Java source code, not bytecode. If you use it on eclipse, you will see a xtend-gen folder with the generated code, which is then in turn compiled to bytecode. Everything is transparent to the developer. Xtend is built on top of Xtext, a framework to write Domain Specific Languages and get Java/IDE support for free (example: a Cucumber-like DSL)

Scala is also a “better Java”, but it focuses more on providing a “Scalable Language”. A language that can be used on different paradigms or as Domain Specific Languages (eg. the Play! web framework features type-safe HTML templates thanks to Scala). It mixes heavily the object oriented paradigm with functional programming. The functional aspect of Scala aims to provide support to designing concurrent programs easily. This goes beyond simple lambdas and the philosophy aims of immutable objects and expressions over statements, immutable collections and actors in the library.

For developers just looking for a refresh, both provide:

Optional Semicolons

Yes. No need to write them in most cases.

val and var

In Scala:

val someVariable : String = "This will not change"
var someVariable : String = "This can change later"

In Xtend:

val String someVariable = "This will not change"
var String someVariable = "This can change later"

While “val” is not different from “final”, I really like this syntax. In scala it is heavily used to make your brain always think if you really need
to change a variable later. You soon realize that you don’t, and most variables are calculations that need to be initialized once.

Type inference

In the example above, you don’t need to specify the types. They will be infered:

val someVariable = "This will not change"
var someVariable = "This can change later"

You want to specify it for some field members that will not be initialized. Or if you assign a concrete class (e.g. HashMap) but wan’t the variable to be a Map.

Scala also has a nice feature, called “lazy val”:

lazy val someVariable = new VerySlowToConstructClass()

In this case, the variable will be initialized the first time it is accessed.

Lambdas

In Xtend

val lambda = [String s | s.length]

In Scala

val lambda = (s : String) => { s.length }

Both languages have shorter versions for special cases (no parameters, one parameter, etc). I am not going to describe them here.

Something really cool in Xtend: if you have an interface with one method used as a callback, the compiler will convert a lambda automatically to the interface type.

final JTextField textField = new JTextField();
textField.addActionListener(new ActionListener() {
  @Override
  public void actionPerformed(ActionEvent e) {
    textField.setText("Something happened!");
  }
});

Can be described using a lambda in Xtend like:

val textField = new JTextField
textField.addActionListener([ ActionEvent e |
  textField.text = "Something happened!"
])

Or, using the shorter lambda version I did not describe:

val textField = new JTextField
textField.addActionListener [
  textField.text = "Something happened!"
]

Lambdas are closures, so they take variables from the current scope. Lambdas are not only useful for callbacks. I do miss ruby’s each/map/collect in Java:

movies.filter[ (1980..1989).contains(year) ].sortBy[ rating ].last.year)

Or map/reduce

val charCount = strings.map[s|s.length].reduce[sum, size | sum + size]

Extending libraries

Xtend has a really cool feature called extensions.

"hello".toFirstUpper // calls StringExtensions.toFirstUpper(String)
listOfStrings.map[ toUpperCase ] // calls ListExtensions.<T, R>map(List<T> list, Function<? super T, ? extends R> mapFunction)

Xtend already includes quite a lot of extensions for Java classes like String. Some are convenience methods, but some support features, like the map example above. It is not very useful to support lambdas if your collections do not have forEach or map. I like how this feature is implemented. Simple and elegant. There is more awesomeness in this area. You can find about it in the guide.

Scala on the other hand has something called implicit convertions. They can be used with a pattern called “pimp-my-library” to extend existing APIs:

For example, to add a method headOr to the List class, one first create a “wrapper” class with the method:

class ListExtensions[A](xs : List[A]) {
    def headOr(f : => A) : A = xs match {
        case h :: _ => h
        case Nil    => f
    }
}

Then you add a implicit conversion to this wrapper class:

implicit def listExtensions[A](xs : List[A]) = new ListExtensions(xs)

And then this should work:

println(List(1,2,3).headOr(0))     ==> 1

Implicit conversions in Scala can be used to use lambdas as event listeners:

implicit def function2ViewOnClickListener(f: View => Unit) : View.OnClickListener = {
   new View.OnClickListener() {
     def onClick(view: View) {
       f(view)
     }
   } 
}

Then this works:

loginButton.setOnClickListener((v: View) => { println("CLICK!") } ))

This is not automatic as in Xtend. When I was playing with Scala & Android what I did was to create a “trait ScalaActivity extends Activity” with the implicit conversion and then make my activity “class MainActivity extends Activity with ScalaActivity”. Traits are cool, give them a look.

There is more stuff in both languages I am not going to spend time on, but you can go to the respective documentation.

For Xtend go to the documentation or this document called “20 Facts about Xtend”.

For Scala. I bought “the book”. However you can also find more in the documentation site.

IDE support

I first tried Scala a year ago (can’t remember) and the IDE support was so bad that I did not go further. This is no longer true. There has been quite a lot of investment in it lately and it is good enough already.

Xtend is part of Eclipse now. Therefore you can expect good IDE support. I found some glitches and weird messages, but in general it works fine.

Android

I tried both languages on Android.

Scala worked fine, but you have to use ProGuard to reduce the size of the application by removing unused methods and classes.

This guide was a good start. However I hit weird errors with the Treeshaker Proguard plugin I was using. This Stackoverflow answer put me back on the right track with the right ProGuard plugin.

I don’t feel confortable with Scala on Android. Without an external tool to trim the jar the generated code goes over the limit of methods that can be handled, even when developing. Not only that. The scala runtime library is 8.7M:

-rw-r--r--  1 duncan users  8.7M Sep 17 00:01 org.scala-ide.scala.library_2.10.0.v20120820-123254-M7-1ab4994990.jar

Xtend was a surprise. Theoretically, as it just generates Java code, it should kind of work out of the box. This was not the case. It was harder than with Scala.

I spend quite some time trying to use the “R” Android class with Xtend. Basically:

setContentView(R.layout.activity_main);

And the answer was not intuitive for me.

setContentView(R$layout::activity_main)

This is a syntax issue. It is documented. But it was just too unexpected for me.

There is another issue you should know about. No debugging. The Dalvik VM does not support JSR-45 which is why debugging Xtend (and other Xbase languages) doesn’t work.

Once I accepted not being able to debug. I ran my program:

09-29 21:26:02.794: I/dalvikvm(11174): Failed resolving Lduncan/test/MainActivity$5; interface 514 'Lorg/eclipse/xtext/xbase/lib/Functions$Function1;'

Ok. The Xtend runtime libraries need to be put into the apk. I went to the build settings “Export and Order” and marked the Xtend library.

[2012-09-29 23:40:08 - Myapp] Error generating final archive: Found duplicate file for APK: about.html
Origin 1: /space/sw/eclipse/plugins/org.eclipse.xtend.lib_2.4.0.v201208210511.jar
Origin 2: /space/sw/eclipse/plugins/com.google.guava_10.0.1.v201203051515.jar

Ok, if I delete these files, everything will work (of course it will break again when I update Xtend with Eclipse:

zip -d /space/sw/eclipse/plugins/org.eclipse.xtend.lib_2.4.0.v201208210511.jar about.html
deleting: about.html
zip -d /space/sw/eclipse/plugins/org.eclipse.xtext.xbase.lib_2.4.0.v201208210511.jar about.html

Run again:

[2012-09-29 23:44:07 - MyApp] Error generating final archive: Found duplicate file for APK: plugin.properties
Origin 1: /space/sw/eclipse/plugins/org.eclipse.xtend.lib_2.4.0.v201208210511.jar
Origin 2: /space/sw/eclipse/plugins/org.eclipse.xtext.xbase.lib_2.4.0.v201208210511.jar

Ok. Now I get it. These are Eclipse plugins. There will be always duplicated files. This is not the solution.

What I did was to unmark the “Export and Order” check I did to “Xtend Libraries”. Copy xtend.lib, xbase.lib and guava jars from the plugins into libs/ of my project. And remove about.html and plugin.properties from these 3 jars. Basically I did a un-OSGi version of the jars.

Also, a good tip is to change the “xtend-gen” folder in the Xtend settings to use the “gen” folder Android already uses to dump generated files from resources and others.

After that, everything worked fine and I got my application running on Android. The size of the Xtend and Xbase libraries is 7K and 90K. guava is the heaviest dependency with 1.2M. But nothing compared to Scala. My application was 2M installed without proguard processing (which happens by default in release mode).

Criticism

Scala

In my opinion, Scala is much more mature. The syntax is well thought. I was annoyed for using [] for generics and () to index arrays until I saw the explanation in the book. I am enjoying reading the book and learning the “Scala way”.

Scala has a unified type system. Unlike Java, everything is an object. You can write:

"some text".length
1.abs

The problem starts when you realize there is Null, null, Nil, Nothing, None, and Unit (a.k.a void). Also Any, AnyRef and AnyVal. If you want to learn about those you can read this post. Or you look at scala.collections.immutable.List to look for more information about the type:

sealed abstract class List[+A] extends LinearSeq[A] with Product with GenericTraversableTemplate[A, List] with LinearSeqOptimized[A, List[A]]

Oh, and there is a generic called “Either[+A, +B]“.

I know there is a reason for this. I know at some point a page in the Scala book will explain it. But the step curve is not easy. I was writing very basic code and I was asking myself how to do lot of stuff. The biggest problem in my opinion is that Scala, being “Scalable”, allows to write the code in various ways. One is the Scala way. The other is not. Sometimes one is the intuitive one. Sometimes the other is to slow.

An example of this is the Quick Pimp Library Pattern. This allows you to write the explicit conversion easier by not having to create a “wrapper” type.

implicit def string2toInt(s: String): { def toInt: Int } = new {
  def toInt: Int = java.lang.Integer.parseInt(s)
}

But for some reason (see the link) this is slow. There was also some speed issue with “for” loops if you write the “for” in the wrong way (I forgot the details).

This gives me a feeling of the Scala culture very similar to what some subworld of C++ doing heavy template meta-programming, policy based design gives to me. I like boost. But I have to be very awake to decode the signatures and understand how to use a library. The design and the quality is top-notch, but it is an advanced device. You need to invest quite a bunch of time on it and make sure those features will actually pay off. I am not against a language that requires some CS background and a type system that requires you to think a bit. But if you are going to master it, you have to be aware of the cost/benefits.

Xtend

Xtend was a pleasure to write and read. I got used to it very fast. The documentation is a single page guide. The type system is the same as Java.

However, Xtend is not fully mature yet (despite version being 2.x). I was playing with the JFugue library and I tried to do this:

rhythm.addSubstitution('O', "[BASS_DRUM]i")

But I got an error, because the char was interpreted as an string. Then I realized that Xtend does not have character literals yet. The solution:

rhythm.addSubstitution('O'.charAt(0), "[BASS_DRUM]i")

Then when playing with Android, I tried to use one of the greatest features: automatic conversion of lambdas to interfaces with one callback method:

button.onClickListener =  [
  this.textView.text = "Foo" 
] 

But it did not compile! The simplest example did not work. Disclaimer: I was not using the released version. But I was not using the nightly builds either. I was using the milestones. I got a “Incompatible types” error. I updated the eclipse plugin to the current milestone, restarted Eclipse, and everything was working. (Facepalm).

Conclusions

Both are great languages. If you are doing servers and services, I’d pick Scala. But be prepared to invest some time learning the culture behind it.
Learning Scala gives you the most from the Play! Framework. It can also be used from Java, but it is not the same.

If you are only looking for a Java refresh. Go with Xtend. You will learn it in one night and will benefit from the value it provides. Once the quirks with Android get resolved (especially the debugger), it is a great addition to Android development. Being an Eclipse project means the IDE will be a first citizen and releases will be stable enough to ship with Eclipse itself.


3 Comments

C++ does have automatic memory management.

Some weeks ago someone was discussing what language to use to write a small library. He wanted to go with C. I suggested using C++ for the implementation but keeping a pure C API is an interesting alternative. He said C++ added complexity but not benefits like Go would do with automatic memory management, and in that case C would be the best option.

This is a common miss-understanding. Modern C++ does have automatic memory management. It does not have a Garbage Collector, which is one form of automatic memory management.

Consider the following:

void someFunction()
{
    SomeClass *obj = new SomeClass();
    ...
    delete obj;
}

Here you have to take care that obj gets deleted before leaving the function. When the function ends, only the pointer is destroyed, because is a local variable allocated on the stack, but no the object the pointer is pointing to.

The problem begins with exceptions. If the code between the allocation and the delete throws an exception, delete is not called and the object is not deleted. Of course you can catch the exception, delete the object and then rethrow the exception. Not very nice.

Enter RAII

RAII is a name I dislike. Stands for “Resource Acquisition Is Initialization”, but is a powerful concept in C++: The language guarantees that the destructor gets called for an object that is allocated on the stack when it goes out of scope.

// global mutex
Mutex mutex;

void someFunction()
{
     mutex.lock();
     ...
     mutex.unlock();
}

Here we have the same problem. If an exception is thrown, the mutex is never unlocked!. Lets create a solution based on the concept of RAII.

class Lock
{
    Lock(Mutex *m)
      : _mutex(m)
    {
        _mutex->lock();
    }

    ~Lock()
     {
       _mutex->unlock();
     }
private:
    Mutex *_mutex;
};

And now incorporate it into the code:

void someFunction()
{
    Lock lock(&mutex);
    ...
}

As soon as the lock is constructed, it locks the mutex. If an exception is thrown or the function ends, lock goes out of scope and it destructor is called. The destructor calls unlock() on the mutex, which we stored a pointer when constructing the lock. The lock itself does not need to be deleted as it is a local variable on the stack.

Now imagine you want to extend this to various classes that can provide lock() and unlock(). You would create a template for the class.

So how can this concept be used for memory management. Lets take the Lock and implement a Deleter from it. We will use a template:

template<class T> class Deleter
{
    Deleter(T *ptr)
      : _ptr(ptr)
    {
    }

    ~Deleter()
     {
       delete _ptr;
     }
private:
    T *_ptr;
};

See? it is the same as the Locker but not lock() call, and instead of unlock() we delete the object.

Now, you may ask yourself, what is the usefulness of having the pointer wrapped into another object. True. Lets provide access to it. Thanks to C++ operator overloading, we can overload the -> operator which is the one we already use when working with pointers.

template<class T> class Deleter
{
    // ...
    // ...

    T* operator-> ()
    {
        return _ptr;
    }

private:
    T *_ptr;
};

Now we can access the underlying pointer in a natural way. Exactly the same as we would use it if it was naked and not wrapped in this smart deleter:

void someFunction()
{
    Deleter<SomeClass> ptr(new SomeClass());
    // hello() is a method of SomeClass
    ptr->hello();
    ...
}

When we call ptr->hello() the -> operator of the Deleter returns a pointer to the member object, so we can directly call methods on it.

Now if an exception is thrown or the function returns, ptr goes out of scope. Deleter‘s destructor is called and the object is deleted.

This is called a smart pointer. This implementation is the most basic one. If you try to return it from the function it will not work as it would create a copy of the Deleter, pointing to the same underlying pointer and it would delete the object twice. But from here you can extend the concept by implementing copy constructors that add reference counting and much more.

This is all now built-in

Originally, you could get smart pointer implementations from the boost smart pointer library. However C++11 already incorporates most of them in the standard library.

auto_ptr and unique_ptr

The basic implementation I showed you before is provided by the standard library as std::auto_ptr. With the only difference is that auto_ptr is more robust and if it is copied, the original one loses the pointer (gets changed to 0) in order to avoid double deletion:

#include <iostream>
#include <memory>
using namespace std;
 
int main(int argc, char **argv)
{
    SomeClass *c = new SomeClass();
    auto_ptr<SomeClass> x(c);
    auto_ptr<SomeClass> y;
 
    y = x;
 
    cout << x.get() << endl; // this one is 0
    cout << y.get() << endl; // this one is some memory address
 
    return 0;
}

In C++11 auto+ptr was deprecated (but still available) and replaced to std::unique_ptr which can’t be copied, but the pointer can be transferred.

std::unique_ptr<SomeClass> p1(new SomeClass());
std::unique_ptr<SomeClass> p2 = p1; // this will not compile
std::unique_ptr<SomeClass> p3 = std::move(p1); // You can manually transfer it though, and p1 will be set to 0

shared_ptr

std::shared_ptr​ adds reference counting, so if you copy it, it points to the same pointer but the reference count gets increased. When the last one gets destructed the pointer is deleted.

std::shared_ptr<SomeClass> p1(new SomeClass());
std::shared_ptr<SomeClass> p2 = p1;
// the pointer will be deleted when both go out of scope and the reference count goes to 0

The problem with reference counting is that if two objects reference each other, then the counts never goes to zero. What you do in this case is that one has shared_ptr and the other uses a weak_ptr. A weak_ptr does not increase the reference count of the shared_ptr. When you want to access the underlying object, you can obtain a shared_ptr from the weak_ptr. But you will need to test the pointer first, as it may have been deleted.

Conclusion

C++ does provide automatic memory management. It requires more thinking than a garbage collector but it is more deterministic and it is built on principles that can be used for general resource management like files, locks, temporary files, etc.

In addition to that you get real strings, containers, etc. plus full C interoperability. You can keep the complexity away from your programs choosing carefully what idioms and concepts your program will use. The complexity can stay for library developers (e.g. boost) who need to use all C++ features in order to provide flexible components.

Nothing new?

  • Objective-C implements reference counting in two ways. Also there is garbage collection, but it is deprecated.
  • Vala and gobject (the underlying technology), use reference counting, and also have the concept of a “weak” reference.
  • Ruby implements RAII using blocks. This is used across the stdlib: File.open, Dir.mktmpdir, etc. Once you escape the block the resource is closed, deleted, cleaned, etc, depending on the resource type.
  • You can simulate some RAII in Java using “finally” in the function body, but does not work for objects passed around.


Leave a comment

Be aware of the Garbage Collector when accessing C/C++ from Ruby

When you wrap C/C++ code into a language like Ruby which has a garbage collector, you have to be very careful because the GC knows about the Ruby objects referencing to each other, but not about the underlying C objects. You need to manually hint the GC that two Ruby objects are connected by their underlying pointers so that the GC does not deallocate one, freeing all the pointer chain and leaving another Ruby object with a dangling pointer.

As an example: You have a Ruby r1 object that points to a Widget *w1, and r2 points to Widget *w2. The parent of w2 is w1, and if you delete the w1 pointer all children will be deleted.

In Ruby’s mark & sweep GC this is achieved by implementing the mark() hook correctly and calling rb_gc_mark() from there. For the example above it would be something like:

void
ui_widget_mark(YWidget *wg)
{
  // ...
  // mark our child _ruby_ objects by finding the C
  // ptrs and looking the ruby counterparts in the hash
  for (YWidgetListConstIterator it = wg->childrenBegin();
       it != wg->childrenEnd();
       ++it) {
    YWidget *child = *it;
    VALUE rb_child = widget_object_map_for(child);
    if (!NIL_P(rb_child))
      rb_gc_mark(rb_child);
  }
}

This may be sometimes tricky, and you may need to keep some extra metadata in the binding’s code in order to figure the dependencies between objects.

For example libyui keeps a dialog stack, but does not provide access to it. You can’t delete the dialogs in a different order than the stack one. However if you have Ruby references to multiple dialogs, the GC may kick starting by any dialog. You have to tell the GC to mark all parent dialogs when marking a given dialog corresponding Ruby object. As I don’t have access to the stack I have to keep my own stack (implemented with a list) in order to implement mark() correctly.


Leave a comment

SUSE Manager, a year later retrospective II (Backstage)

I posted a retrospective of what we did for our customers in SUSE Manager during the last year. This is a continuation of that post, focusing on what is going behind the scenes.

Testing

At SUSE we love to test a lot. We sell mission-critical Linux right?. If a package .spec file manages to pass our very strict suite of checks we have enabled in the build service, that is only the first step. We also do testing at various other levels. Our internal Jenkins instance has hundreds of jobs testing the code SUSE contributes before it its packaged.

For SUSE Manager we had the challenge of various inter-dependent components we were at the beginning not very familiar with. We decided to test the end-user functionality and model the tests as you would do when using Behavior-Driven-Development, describing every test-case in a human-friendly way. The test-case implementation prepares some of the required environment based on the test-case descriptions.

One of the first things we open-sourced was this testsuite we created based on Cucumber, Capybara and Selenium.

For every commit in our git tree we use Jenkins to send packages to the build service. A few minutes later the build service not only rebuilds the changed packages but also creates the appliance images. A job takes those and deploy them on a server and a client.

After that the testsuite is launched and more than 200 regression and feature tests are ran against the server.

We also added screenshots for failed tests. If a test fails, we can see exactly where the problem was, and fix it.

At the beginning, this test-suite allowed us to build an agile process by assimilating a big code-base we were not very familiar with at the beginning. Today, where we are already contributing to the upstream project, the test-suite is one of the building blocks of our continuous integration process. Allowing us to have a shippable product every night and add features with more confidence. For our customers it means enjoying SUSE Manager with the SUSE level of quality they are used to.

Security

SUSE products are continuously audited for security problems. Thomas has done a great job in the last years introducing security into the development process and educating developers about the topic. We found some problems and they were reported upstream using the standard procedures, most of the time with patches attached.

Upstream

You may already know that SUSE Manager is built from the Spacewalk project source code. When we shipped, we had done a lot of porting work which resulted on a lot of patches. It was a challenge for us at that point to become contributors to the project. Engineering and Project Management in products based on open-source components involve a lot of trade-offs between custom patches and up-streaming, between re-basing and forking. None is right. You play with them in order to satisfy your schedules, lower the overhead of custom patches (fork-debt) and adding features without sacrificing stability.

We are very happy with the results. We have contributed back everything that was possible and feel very welcome in the upstream community. We are very grateful to the Spacewalk developers: Miroslav Suchý, Jan Pazdziora, Thomas Lestach, Stephen Herr and others, who reviewed our patches and gave very insightful feedback.

Of course our git tree is different. There are patches that need some work in order to be up-streamed. But I think we made clear that we want to support the project. Where possible we are trying to upstream patches before we release them into our tree.

I hope I was able to give you an insight on how SUSE Manager is being developed. We have more topics to share! may be in a future post.


1 Comment

SUSE Manager, a year later retrospective

It has been more than a year. Around March 2011 we shipped SUSE Manager 1.2 and enhanced the management story for our customers. Since then we have been very busy! Time to look back and see what we have done. This first post will describe the features we have been working on. In a future post I will address more details about our development process and relationship with Spacewalk.

SUSE Manager screenshot

Setup reinvented

SUSE shines not only in the number of certified enterprise applications but also in the appliances area with tools like SUSE Studio. We allow our customers to build custom SUSE-based distributions with a few clicks.

When we set to build SUSE Manager as a product we decided to eat our own dog-food. After looking at the installation procedures of Spacewalk we found a natural way to make setting up SUSE Manager simple by using our existing technologies.

  • Appliance form-factor: SUSE Manager is a simple bare-metal or virtual appliance. Just boot it, answer a few questions and you have a SUSE Manager server running.
  • YaST-based setup and migration: a first-boot work-flow assists you with any configuration and data migration.

Creation of SUSE Manager-ready appliances from SUSE Studio

Not all the cool stuff happens in SUSE Manager itself. The Studio team added a feature that allows you to create appliances in SUSE Studio that are SUSE Manager-ready. This means once the image boots, it will automatically register itself to your SUSE Manager server and be ready to be managed.

James did a very nice demo at BrainShare creating an image in SUSE Studio, deploying it to a private OpenStack cloud directly from the Studio user interface, and having the machine automatically register itself to SUSE Manager after booting. Watch it here.

Audit logging

Regulatory and corporate auditing requirements require our customers to record what actions (and by whom) were done to the managed systems. We introduced an audit logging feature that allows you to record actions to a remote log, database, xml files, etc.

Audit Log Keeper, the buffer that receives the actions from the application is not specific to SUSE Manager and any application can be integrated using XML-RPC. Keeper is open-source and available on github.

Deploying images from SUSE Studio

SUSE Manager can deploy images to a physical host so that they run as virtual machines. If you are a SUSE customer, you will use Studio to create images. Creating in Studio, download the image, upload to SUSE Manager, deploy…? No way.

We added a feature to deploy the images from Studio directly in the SUSE Manager user interface. The code is already being reviewed upstream.

Code10 client support

For our customers running SLE-10 we back-ported the Code11 ZYpp stack (including a very fast zypper using the SAT solver). The Code11 stack includes a plugin architecture that we use to hook with the spacewalk agent in order to get the server-side repositories and keeps the managed server software inventory up-to-date.

SUSE Linux Enterprise Point of Service

Joe has been working with the SLEPOS team making sure that there is a story for them to work together. Check his blog post to know more.

SUSE Manager Mobile (Android)

During the last hack-week, part of the team took the mission to think how we could bring some of the SUSE Manager functionality to your mobile phone. We went beyond thinking and completed a prototype, which was presented at Brainshare.

Today, we are releasing it, and you can get it for free from Google Play. Have fun with it!.


1 Comment

kvm setup for laptops with NetworkManager using bridges or openvswitch and NAT

On my workstation I have a static network setup: I don’t give an ip to eth0 but configure dhcp to give the ip to a bridge br0 attached to eth0. Then qemu-kvm creates tap devices attached to the bridge, getting ip’s in the same network as the host.

On my laptop I run NetworkManager, which does not play well with bridges. It seems that in other distributions you can tell the network configuration to use NetworkManager for certain interfaces only (which is still not exactly what I want). After lot of reading, I found a configuration that fits my needs.

The idea is to create a bridge, and instead of having it attached to eth0 (which is controlled by NetworkManager), we create the bridge in a separate network and use NAT to have the VMs access the internet.

I found a script by Amos Kong that setups the network (adapted to the paths of brctl in openSUSE):

#!/bin/bash

# Script used to add/remove setup of private bridge and dnsmasq
# @author Amos Kong 

brname='br0'

add_br()
{
    echo "add new private bridge"
    /sbin/brctl addbr $brname
    echo 1 > /proc/sys/net/ipv6/conf/$brname/disable_ipv6
    echo 1 > /proc/sys/net/ipv4/ip_forward
    /sbin/brctl stp $brname on
    /sbin/brctl setfd $brname 0
    ifconfig $brname 192.168.58.1
    ifconfig $brname up
    # add iptable entry as libvirt, then guest can access public network
    iptables -t nat -A POSTROUTING -s 192.168.58.254/24 ! -d 192.168.58.254/24 -j MASQUERADE
   /etc/init.d/dnsmasq stop
    /etc/init.d/tftpd-hpa stop 2&gt;/dev/null
    dnsmasq --strict-order --bind-interfaces --listen-address 192.168.58.1 --dhcp-range 192.168.58.2,192.168.58.254 $tftp_cmd
}

del_br()
{
    echo "cleanup bridge setup"
    kill -9 `pgrep dnsmasq|tail -1`
    ifconfig $brname down
    /sbin/brctl delbr $brname
   iptables -t nat -D POSTROUTING -s 192.168.58.254/24 ! -d 192.168.58.254/24 -j MASQUERADE
}

# clean original setup first
del_br 2>/dev/null

if [[ $# > 0 ]];then
    if [[ $# = 2 ]];then
        # setup tftp function
       tftp_cmd=" --enable-tftp --tftp-root $1 --dhcp-boot $2 --dhcp-no-override"
    fi
    add_br
fi

Calling the script with no arguments will remove the bridge. Calling it with “1″ as argument setups the bridge and NAT and also runs dnsmasq (dhcp server and dns cache: zypper install dnsmasq) on the bridge. Calling it with 2 will also setup a tftp server on the dnsmasq process.

Then you need a pair of scripts for qemu, which are called with the tap device as a parameter.

#!/bin/sh
switch='br0'
/sbin/ifconfig $1 0.0.0.0 up
/sbin/brctl addif ${switch} $1
/sbin/brctl setfd ${switch} 0
/sbin/brctl stp ${switch} off

And for bringing down the interface:

#!/bin/sh
switch='br0'
/sbin/ifconfig $1 0.0.0.0 down
/sbin/brctl delif ${switch} $1

Then you run the VM like:

qemu-kvm -boot c -drive file=./disk.qcow2,if=virtio -m 2500 -net nic,macaddr=XX:XX:XX:XX:XX:XX -net tap,script=script-ifup,downscript=script-ifdown "$@"

You can use the same setup with openvswitch. There is a package in the network project of the build service, but the package is tied to the xenserver configuration so I did not get it running. I redid the package based on the Debian one, which not only is separated in subpackages but does not assume you are running Xen. The package is available here until the submit request is accepted.

Then change on the setup script the brctl addbr line to use ovs-vsctl:

add_br()
{
    echo "add new private bridge"
    ovs-vsctl add-br $brname
    echo 1 > /proc/sys/net/ipv6/conf/$brname/disable_ipv6
    echo 1 > /proc/sys/net/ipv4/ip_forward
    /sbin/brctl stp $brname on
    /sbin/brctl setfd $brname 0
    ...

And the for the qemu scripts, use ovs-vsctl add-port instead of brctl addif:

#!/bin/sh
switch='br0'
/sbin/ifconfig $1 0.0.0.0 up
ovs-vsctl add-port ${switch} $1
/sbin/brctl setfd ${switch} 0
/sbin/brctl stp ${switch} off

And for bringing down the interface:

#!/bin/sh
switch='br0'
/sbin/ifconfig $1 0.0.0.0 down
ovs-vsctl del-port ${switch} $1


4 Comments

On Java, Maven, JPP and rpm

Java on Linux has been always a “special” topic. They don’t mix well.

The mindset of Linux distributions is very different to the Java world when it comes to build software. This is understandable as they have different requirements.

In the Java world, there is the concept of artifacts. You build org.foo.bar:bar-moo:1.1 once and it stays there forever, archived for anyone to use it. Tools like maven and ivy allow developers to specify in their source tree the specific dependencies of their components and those are grabbed from the network, the software built and then publish the output as a new artifact that others can grab.

Linux distributions on the other hand, bootstrap the complete stack from source. They don’t take the binary artifact from upstream but build it, and then use the binary they built to build the next. This seems to work pretty well for C, C++, and for Ruby, Python, let’s say it “works”.

When it comes to package Java software, Linux distributors find themselves in the following situation:

  • If a buildable source tarball is provided then you are lucky.
  • If the buildable tarball is provided, it will either include a directory full of binary jars (the build dependencies) or it will have a very automatic build system grabbing them from the network.

This clashes with Linux in various edges:

  • Linux distributions have normally one version of each component. The Java method works well if you bundle your dependencies inside your application, but not if everything is a reusable component. I have mixed feelings here. I think bundling your dependencies in the application for everything that is not part of the “base” system is the right approach. Updating them can break the application and trying to control this via QA only moves QA from the application developers to the distribution itself. When I ship apache-commons-collection as part of my Foobar Java application as a Java package, I am inviting everyone to use it, forcing myself to give support for it out of the context of the application.
  • Distributions needs to build from source. Even if you get rid of the above requirement and you bundle all your dependencies, distributions want to build everything from source. This has technical and legal reasons. SUSE build system does very complex checks on every package that it builds. Those checks are part of the quality we sell to our customers. Other reasons are legal: I am still trying, for example, to build the Play! framework. Even if it is BSD, it includes some .jars inside of unknown origin. What would happen if one of these jars results to be proprietary?. Michael Vyskocil had a similar issue with openproj and its bundled dependencies.

    Another reason to build from source is support. Enterprise distributions sell support and if a customer has a problem, we will fix it on our own and not wait for upstream to release a new version. Having a standardized way to build from source with our own fixes allows us to serve our customers. We can bundle jars in our application, but if a bug traces back to a jar we included, we would need to change the complete build description of the product in order to take this component. If we are able to rebuild this component at all. It already happened to us once with an XML-RPC library. And we were glad that it could be fixed by adding a patch to the rpm build description.

  • Grabbing dependencies from the network just does not work. All packages are built without network access for security reasons.

Because the Linux distributions know that they are not the center of the universe, they adapted. At the beginning things where still ok. Ant was very popular and basically you recursively packaged all build dependencies until you could build your package, in the same way:

  • unpack
  • delete all binary jars
  • set CLASSPATH to the jars grabbed by the packaged build dependencies
  • call ant
  • install the jars

Something like this:


%prep
%setup -q

# remove all third party jars
find . -iname '*.jar' | xargs rm -rf

%build
export CLASSPATH=$(build-classpath foo)
ant

Until this was true, the world was still fine. ant needed bootstrapping, but this was doable.

Until Maven…

Maven is at the same time revolutionary and one of the biggest atrocities I have seen when building software.

On the positive side:

  • it defined a common convention for modules: groupId, artifactId, version.
  • it defined a standard layout for the source tree
  • it started a wave of convention over configuration that Java was always lacking

on the negative side:

  • it requires itself to build itself
  • it can’t build much itself, so it requires plugins to build anything
  • plugins require maven to be built, plus more dependencies, which are usually require maven to build, plus… plugins.

All the above means that maven basically requires all the software it is supposed to build. Not the best design for a build system.

To make things worse, maven grabs dependencies from the network, which is what is disabled in our distro build process.

Fedora has done quite a progress providing a maven stack, by improving extending on the conventions the JPackage project started for maven packages. This is implemented using what is called a “dependency map”.

The approach works by installing some xml files per-package that map the maven artefact identifiers (groupId, artifactId) to a local jar in the system. Then maven itself is patched to include a resolver for artefacts with some properties:

  • ignores multiple versions (usually in Linux you have one version installed)
  • resolves the artefact names to local installed jars. Every package uses macros to add stuff to the dependency map

What I don’t like for this approach is:

Why would anyone in their sane mind use XML files to create mappings to files when you are in a UNIX-like OS and you have the filesystem and symbolic links?.

It is very explicit. It does not rely on a simple convention.

A second issue is how packages are built. This is SUSE specific. Fedora can bootstrap packages with circular dependencies by introducing a binary package A, build other dependencies until it can build a real A. Once a package is built, it stays in the buildsystem frozen as an artefact (just like the Java world).

In the openSUSE Build Service, the repository is always ready to bootstrap. For circular dependencies you create a package A-bootstrap that provides A and set the project config to prefer A. When A does not exist, A-bootstrap is grabbed, but as soon as A is there, it is preferred and used. When a package changes the packages depending on it are rebuilt automatically. This approach has several advantages, but makes hard to bootstrap a collection of packages where everything depends on everything.

In openSUSE, we have successfully build many maven dependent packages in the Java:base project without having a maven package by using the maven ant plugin to generate a tarball with ant build files.

This method does not work for every package, specially when files are generated, then one needs also to include those. But they may be good enough for solving our specific bootstrapping problem. The question is how many bootstrap packages would we need.

Another idea is to use package with binary jars for bootstrapping.

Fedora is not very happy with the current situation either, and they have been researching adding native support to Koji to build maven packages.

In any case, I think there is room for improvement everywhere. I think the Maven infrastructure can be simplified taking into account that what maven contributed to the world was a (now) popular way to identify a module, and this is now being used also outside of Maven. Apache Ivy, SBT, Gradle, etc all support maven-style repositories and support refering to an artefact as groupId:artifactId:version.

Why not instead of a depmap just have:

/usr/share/java/foo.jar
/usr/share/java/org.bar/foo.jar -> /usr/share/java/foo.jar
/usr/share/java/org.bar/foo.pom

And have the Maven patched resolved to just look there?

If you need parallel versions, then just

/usr/share/java/foo1.jar
/usr/share/java/foo2.jar
/usr/share/java/org.bar/1.0/foo.jar -> /usr/share/java/foo1.jar
/usr/share/java/org.bar/2.0/foo.jar -> /usr/share/java/foo2.jar
/usr/share/java/org.bar/1.0/foo.pom
/usr/share/java/org.bar/2.0/foo.pom

Or use the standard alternatives:

/usr/share/java/foo.jar -> /etc/alternatives/foo.jar

The resolver would first look for the specific version described in the .pom file as /usr/share/java/$groupId/$version/$artifactId.ext. If it is not found, it could fallback to just look for /usr/share/java/$groupId/$artifactId.ext. This supports most cases where we just have one version for the system and exceptions for some packages where providing a specific version in parallel is also required. If the same jar is also known under a different groupId, well, then you create another symlink.

Then, build-classpath is enhanced so that in addition of being able to say ‘build-classpath commons-logging’ you can also call ‘build-classpath commons-logging:commons-logging’. Identify every module by this convention.

The same with Provides: java(commons-logging:commons-logging). Fedora is already doing this as mvn(..), but is this maven specific?.

Why do we need xml files with maps, fragments of XML files that need to be updated at install and uninstall time?.

Looking for a new solution…

I discussed this with Fedora developers Alexander Kurtakov and Stanislav Ochotnicky and they mostly agreed with my concerns. They pointed me to Carlo de Wolf’s work on a similar solution, but using a standard maven repository layout.

Carlo’s solution does not touch maven but is implemented as a plugin that gets loaded using a custom config file that is used when you call the wrapper script fmvn instead of mvn (for Fedora-Maven).

The whole solution as they described it has some extras like macros to symlink the maven repository artifacts so that they can be found as artifacts in the JPP layout. I am not sure if we need this. What I like from the solution alone:

  • It does what you expect: uses only the local repository and ignores versions (uses latest) if the requested version is not found.
  • It does not require macros. We need to build stuff on released distros and it is no fun to introduce new rpm macros.
  • It does not require patching maven. fmvn is a separate package, providing the plugins and the wrapper script.
  • As soon as Carlo gets “mvn install” working, there is no need to manually install the jar/pom in the spec file. Just calling “fmvn install” should build and install it.

I have been playing with Carlo’s plugins and it looks very promising. Fedora would need time to switch to a solution like this, but at SUSE we don’t have maven in our stack so we have nothing to lose and at the same time we can help serving as a test bed.

The current plan…

Not having the need to patch maven allows us to use a vanilla build of Maven for bootstrapping.

maven-bootstrap (upstream binary release, Provides: maven)
fmvn-bootstrap (binary jars built locally with maven, Provides: fmvn)

Note: If you have more than one package with the same capability and want to use it in (Build)Requires, you will need to setup “Prefer:” in prjconf.

We would like to build now maven using fmvn. Here is where the circular dependencies start. We need maven (provided by maven-bootstrap) and it dependencies, like plexus and a big bunch of maven plugins.

Here is where pom2spec comes to the rescue. This script allows to quickly create bootstrap packages from search.maven.org. It is based on Pascal Bleser’s script.

So lets say I need a bootstrap package for maven-compiler-plugin:


org.apache.maven.plugins:maven-compiler-plugin : using version 2.3.2
Writing maven-compiler-plugin-bin.spec
Done
Downloading maven-compiler-plugin-2.3.2.pom...
######################################################################## 100.0%
Downloading maven-compiler-plugin-2.3.2.jar...
######################################################################## 100.0%
t http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.3.2/maven-compiler-plugin-2.3.2.pom
_ http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.3.2/maven-compiler-plugin-2.3.2.jar

Which generates the following files:


maven-compiler-plugin-2.3.2.jar
maven-compiler-plugin-2.3.2.pom
maven-compiler-plugin-bin.spec

If I build it, I get an rpm with the following layout:


/usr/share/java/maven-compiler-plugin.jar
/usr/share/maven/repository/org/apache/maven/plugins/maven-compiler-plugin/maven-compiler-plugin-2.3.2.jar
/usr/share/maven/repository/org/apache/maven/plugins/maven-compiler-plugin/maven-compiler-plugin-2.3.2.pom

/usr/share/java/maven-compiler-plugin.jar is just a symlink to the real jar. This layout is enough for fmvn to find the artifact and also for legacy packages to just use build-class-path. It would still be better to enhance build-class-path to also accept groupId:artifactId keys and return the path to the jar.

The -bin suffix is to allow then the real package (built from source) to coexist in the same repository. The package with the -bin suffix also "Provides:" the package without the suffix so it can be used by dependent packages. Actually both "Provide:" java(org.apache.maven.plugins:maven-compiler-plugin) which is what a package that depends on it should "BuildRequire:".

Once Carlo's resolver works with "mvn install" I will try to build a repository following this method.

Follow

Get every new post delivered to your Inbox.