Sorting, and impatience – our dev meeting for 16th June

Chris talked about things that he has learned from the new 3rd edition of the “Scala for the Impatient” book… that we received several months after Amazon had initially told us it would be delivered. He was impatient.

Chris discussed:

  • String interpolation – while he knew about the s”” format for interpolating variables, he learned about the f”” prefix to do printf style interpolation, and that you can define your own string interpolation operators
  • Scala ARM – provides resource management in a similar way to the try/resources construct in Java
  • Renaming classes on import – while you might normally import individual files under different names, for example to deal with the clash between “Files” in Guava, and “Files” in Java, you can also import the whole of a package but block out an import with -> _ to avoid it conflicting.
  • Values classes – you can use the AnyVal class when you’re wrapping a simple type – like wrapping an integer inside a class to make it strongly typed – and the compiler will not create a class
  • Stream creation via functions – you can use the the #:: operator on streams to create a lazily evaluated stream based on a function. This is useful for all the times when you need to, for example, create an infinite sequence of Fibonacci numbers.
  • The @Elidable annotation allows you to strip out methods at compile-time at particular levels – which potentially allows you to remove things like calls to logging in a production environment.

Then, Rodney talked about generic sorting and discrimination – the problem of sorting a list of things, such as a pack of cards. He talked about recent research on how to improve the traditional ways of sorting. Normally you might implement something like “Comparable” or “Equals” on your “Card” class – but that only gives you one way of ordering, whereas you might want to group your cards by number or by suit. Using just an equals method means that finding unique values in a set is hard – it takes O(N^2) comparisons, as you compare every item against every other item.

To improve this, “Discrimination” generalizes partitioning and sorting, and defines a language for doing this.

Rodney demonstrated sorting a pack of cards, using the bucket sort. If you have 52 buckets, then you can put each card in the right place on a table in linear time. If you have 13 buckets and don’t care about suit, you can sort the cards into buckets purely by value, as a linear operation. If you’re sorting an arbitrarily large list of integers, you just need a potentially infinite amount of buckets, stored in a potentially infinite amount of memory, and you can get linear time sorting – no problem…

The problem comes if you don’t have enough memory for all the buckets you want. However, if you have 2^32 numbers to sort, then you can split each into a pair of words, and then sort them all initially by the high word and then by the low word – using multiple bucket sorts where you only need 2^16 buckets not 2^32.

The language of ordering is a DSL that defines various ways of ordering – natural ordering puts things in their natural order; product ordering sorts by one feature such as the suit then another such as the number; sum ordering sorts by type such as whether the card is a joker or not.

You can use this generically by defining a discriminator for your type, such as a card, in terms of the language of ordering, and then you can apply the rules of sorting those terms to split the ordering into an appropriate sequence of buckets for your type, and this can be significantly more efficient than using a single bucket sort.

In conclusion, using a DSL is an effective technique for simplifying and solving problems, and we should use them more widely.

The price of Some(everything), the value of None

This week, Loic talked about how to measure the value of software, and Reece talked about category theory.

Loic based his talk on an academic study of software value from Gio Wiederhold, published in ACM. He talked about how the value of software changes over time. Typically, revenue from a software product will rise until it reaches a peak, but the price of software is usually not expected to increase significantly over time. Maintenance costs, particular for keeping old versions available, can rise over time. In practice, 5% of code is deleted per year, and a new version of a product consists of less than 30% more code – with most effort spent on keeping everything running smoothly. Loic discussed rational design decisions about where to focus the effort in software development, and making sure there is sufficient investment in maintenance.

Everyone else was generally interested in the topic, but found the research it was based on somewhat dated. It’s very much based on the “large software company producing a big product” model, like sales of MS Office. However, the current market is much more about web delivery of systems that can be updated very frequently in small ways, and about apps, and about Open Source. We discussed how these sales models impacted software value.

Then, Reece talked about category theory, and its relevance to software development – in a talk called “Monads are not (always) Optional”. He talked about the value of using standard mathematical terminology for the concepts of programming, and defined “category”, “object”, and “morphism”. Then we discussed composition of morphisms, and how classes of numbers (e.g. integers, rational numbers, natural numbers, complex numbers) could be transformed into each other via various operations, but other operations (such as addition and multiplication) changed the values of numbers but kept them within the same category. He discussed the relevance to programming, and how category theory and its concepts provided a more unified mechanism for structuring solutions than other approaches such as design patterns.

We then discussed monads, and monads in C# via Linq and in Scala via sequences and optional. We discussed that the “Advanced Scala with Cats” had recently been made available for free, and several people planned to read it. No-one was clear of the difference between Scalaz and Cats – both seem to cover similar ground (some subsequent research seems to show that they’re similar, and Cats might generally be preferred).