Jumble sale developer meeting

At our developer meeting this week, we had a “jumble sale” theme, where various of us talked briefly about interesting technologies or things we’d been working with recently.

Inigo talked about Runnertrack and Heroku. Runnertrack is an app that he’s written that tracks runners running marathons, such as the London Marathon. Heroku is a very convenient hosting option for applications that are run for brief periods with low traffic like this. It’s easy to set up, and applications can be launched based on pushing to a Git repo.

Dan discussed Project Euler – he’s been working through the (currently 546) Euler problems and has just completed the first 100 problems. The early ones are all easy – the later ones are more challenging. The maths is not hugely complicated. You can get statistics for how many have been completed, and in what languages. Dan is using Scala. It’s a fun thing to do if you like maths and programming.

Chris talked about Serverless Cloud Computing. We’re using Amazon AWS for managing our servers – but that still means that there’s a server somewhere that needs to be managed, and have security updates applied, and apps installed, and so on. However, if you’re using one of their more specific servers like Route 53, or the Simple Queue Service, it’s just a service. API Gateway pro. Amazon Lambda is a mechanism for running code. A Serverless app has a single-page webapp hosted as HTML in S3, and then it triggers code running on Lambda, using data from Dynamo. The major disadvantage is that it’s entirely tied in to Amazon.

Simon talked about BioJS. It has a good set of visualizations, such as tree stuctures, heatmaps of interactions, graph plots, and so on. Some of these are very life science specific, but some are more widely applicable. The BioJS group is also trying to promote good practice in writing biological code – such as version control, common structure, and having demonstrations. Because of this, the BioJS site is essentially a list of references to other GitHub repos.

Nikolay talked about parsing Word documents. One of our clients has supplied us with documents without any real structure, because they’ve been converted from PDF. We’ve written a set of macros to add appropriate structure and reformat those documents, using Word search and replace. Word search and replace is surprisingly powerful – it can match and replace formatting and styles. Annoyingly, there are two forms of search-and-replace available in Word.

Rhys talked about mocking in Specs2 tests for Scala – and capturing how many times a given call has been made and what it is. There is a way of doing this easily – using the “capture” method. This allows you to capture the calls made to the mock, and then you can retrieve the actual values from the captor.

Loic talked about units, and using strong typing in Scala to attach units to your numeric quantities. There is a Squants library that defines a number of standard units, and the type system can then apply dimensional analysis to check that you are performing legal operations on them – not adding a distance to a time, for example.

Command line parsing with Scala dynamic and macros

In our developer meeting this week, we talked about using the Scala features of “dynamic” and macros to simplify code for command-line parsing.

The problem we were trying to solve is a recent project in which we have lots of separate command-line apps, with a lot of overlap between their arguments, but with a need for app-specific arguments and to apply different default parameters to each app.

Continue reading “Command line parsing with Scala dynamic and macros”

Developer meeting – mobile development

In our developer meeting, we discussed mobile development. We’re not primarily mobile developers, but modern web applications need to work effectively on mobile devices, and we have done some mobile-specific projects.

We talked about who cared about mobile browsers and responsive design – and came to the conclusion that everyone did. It’s not just people actively using a mobile device, but it’s also things like displaying on projectors at client meetings, docking browsers, and Chromebooks and other very small laptops.

We mostly use a responsive design CSS framework like Bootstrap, which does a lot for us but not everything. There is a “bootlint” tool that checks whether we are using Bootstrap classes effectively. We can also use Chrome and Firefox to simulate mobile devices. Actually accessing with real mobile devices is useful – but a bit of a pain for C# projects, because you have to configure the embedded IIS to allow access from outside the computer it’s running on, which isn’t the default.

We’ve had to do work in the past to fix display of items via “hover” – e.g. making them work on-click instead – but this may conflict with existing behaviour. Right-click is effectively unavailable too. Gestures are available – but not widely used.

 

 

Testing with Selenium

Inigo talked about testing with Selenium in our developer meeting.

The detail is in the slides – but the high-level overview is that Selenium is a useful tool for producing automated tests for web application behaviour, but it can be hard to write solid tests that aren’t flakey and dependent on timing. It helps a lot to use an API approach – writing a class for each page in the web application and the operations it exposes, and then separately writing the tests to exercise that API. This leads to much cleaner and easier to maintain code than if you just write tests directly against the pages.

Dev meeting Selenium slides

Agile on the Beach conference report

Chris was recently at the Agile on the Beach 2015 conference, and this Friday he reported back to the rest of the dev team on some of the talks that he went to there.

The death of continuous integration

Steve Smith talked about “the death of continuous integration” – CI is a cultural thing, not just Jenkins. The key questions are: whether everyone commits something to trunk every day, and whether problems in the build get fixed quickly. He described two build models – synchronous and asynchronous – the former is waiting for the build to complete before continuing, and the latter is allowing the build to continue, and swapping back if the build breaks.

He discussed three branching models:

* “Long lived feature branches” – these are generally a nightmare.
* “Short-lived feature branches” – fine if everything works as it should, but reviewers aren’t always available immediately, and you might need to switch context back to an old branch if reviewing takes a while. There’s also less of an incentive to fix the build, because it doesn’t immediately effect you. It’s also more painful to change something that affects many, many files.
* “Trunk based development” – apparently Google do this. Everything is committed to trunk, all the time, there are no separate code reviews, and half-developed features are disabled via feature toggles. Large scale changes can be handled by putting the change behind an interface, and then switching code behind the interface.

We discussed these models. Several people expressed scepticism about trunk-based development – in their experience, committing directly to trunk has generally led to long pauses between commits, rather than shorter development ccycles. We use short-lived feature branches – and while we do sometimes have problems with huge changes, we don’t typically have a problem with people not dealing with broken builds (because our culture is that the build should be fixed quickly).

Testing in Production

In a talk about Testing in Production by Unruly, they described that their process has teams responsible for specific features from end-to-end. This encourages developers to make the code more robust, because the dev team are the people responsible for it in production and will be woken up by pagers if it goes wrong.

They test their code in production, because it is too much work to maintain an exact mirror of the live environment. They have no QA environments. Half-developed features are kept out of production via “feature toggles” and “data toggles”. They have lots of monitoring on their live servers – they monitor things that are of value to the company – like “are we making money” rather than “have the servers gone down”. “Monitoring driven development” is about first writing a monitor to check your new feature, and then developing the code for it – similar to test-driven development but much more so. They also use a chaos-monkey style approach for testing – with badly behaved client code, load testing with extreme events – because if someone is going to break the live environment, it might as well be them. They also use mob programming.

We discussed this – many of these approaches seem interesting and appropriate for a company that has very granular income, from a very large number of clients, that is coming in quickly and very responsively. They seem less relevant to more traditional companies.

Management 3.0

Pia-Maria Thorens spoke about delegation poker. Chris showed a set of “delegation poker” cards, that showed the continuum of delegation decisions between “the manager makes the decision on his own” and “the team makes the decisions entirely without the manager”.

We discussed this as an interesting way of thinking about management and delegation.

Debugging whitespace issues in MarkLogic

I’ve just resolved a very confusing bug in some of our MarkLogic XQuery code. It ultimately turned out to be very simple to fix – just adding an ‘indent=no’ parameter to the XSLT.

However, there were three unexpected bits of behaviour that I encountered along the way, that made fixing it much harder than it should have been.

First unexpected thing – MarkLogic was defaulting to ‘indent=yes’ in its XSLT processing.

Second unexpected thing – MarkLogic is applying its indent setting to the results of <xsl:copy-of select=”.”/>. I expected <textarea><xsl:copy-of select=”.”/></textarea> to give me an exact copy of the input, but it doesn’t, because indent is applied to it. Saxon doesn’t apply indent here, even if indent is set to yes.

Third unexpected thing – MarkLogic’s QConsole can’t be trusted to display whitespace accurately. I displayed two almost identical documents in exactly the same way, via doc(‘blah.xml’), in the console, and one of them was indented, and the other one wasn’t. These were the before-and-after documents for our ‘splitDocument’ function, so I naturally thought that splitDocument was applying indentation. It was only when I wrote Scala code to retrieve the XML that I saw what was happening.

Lessons:

1) Always turn on indent=’no’ in stylesheets used in MarkLogic
2) Don’t trust QConsole to debug whitespace issues. Instead, use code.

Reactive Functional Programming

In our dev meeting this week, we discussed the Reactive Functional Programming Scala course from Martin Odersky, that Chris has been doing. Several others of us have signed up for it, but haven’t been very good at actually doing the course. Chris told us what he had been doing, and what he had learned (aside from being impressed by Eric Meijer’s shirts).

Reactive Programming is:

  • Event-driven – asynchronous with no blocking
  • Scalable – across multiple cores and machine
  • Resilient – against failure
  • Responsive – it is responsive to the user

The RFP course has been about techniques for doing reactive programming that wrap these concepts neatly so the complexity of them is hidden and they can easily be composed.

Values can be separated along two axes – single/multiple and synchronous/asynch. Hence, in terms of Scala types:

  • a single synchronous value – a Try[T]
  • a single asynch value – a Future[T]
  • multiple synchronous values – Iterable[T]
  • multiple asynch values – Observable[T]

The “multiple asynch” area is where RFP fits.  An “Observable” is something to which you can give an Observer, and then it will send callbacks to your Observer. For example, a set of full pint glasses (i.e. a Round) on a table might be an Observable – and then the Round might call back to your Observer to inform it about each pint. You could combine this with another Observable – for example each pint can itself be an Observable that informs you when it is empty. Hence, you can identify when a new round needs to be bought by composing the Round observables with each Pint observable, and receving all the messages from each.

Another part of the course is using Akka. Akka is an actors library for Scala (used in Play). Actors are autonomous pieces of code that each individually run synchronously, and communicate via message passing. The core of an Akka actor is a receive method that takes a message and carries out some activity. It’s also possible to use “become” on an Akka actor to transform the actor to have different behaviour – this is a way of better managing the mutable state within the actor (you can “unbecome” as well…). Error handling for actors is interesting, because actors are responsible for their children – if a child actor dies, then the parent actor’s supervision strategy determines what should happen (e.g. throwing the child away and recreating a new one, or entering a suicide pact to have the parent actor die if the child actor dies, or sending poison pills to destroy other actors, and so on). It all gets quite macabre.

One of the course projects was to build a binary tree in which each node is an actor. If a node is removed, then it instead marks itself as removed, and a separate garbage collection process actually takes nodes out. Akka manages the ordering of messages, so once you get the basic logic right, it works very quickly. But – it’s hard to debug.

A surprising feature of Akka is that you can create messages of any type – this is in contrast to Scala’s normal philosophy of making everything as strongly typed as possible. It’s recommended to use case classes as messages to provide better typing.

Selenium screenshots

We use Selenium for testing front-end web pages. My colleague Chris Rimmer just added code to take advantage of a Selenium feature I hadn’t previously been aware of – now, the tests will automatically take a screenshot of the browser if they fail, which should help us in tracking down issues.

Scala Stream.cons for creating streams from functions

A feature of Scala that I hadn’t used before was Stream.cons. This allows you to create a stream (essentially, a lazily evaluated list) from a function. So, for doing some work on files based on their position in the directory hierarchy, we can create a list of their parents:

def fileParents(file: File) : Stream[File] = {
  val parent = file.getParentFile
  if (parent == null) Stream.empty else 
    Stream.cons(parent, fileParents(parent))
}

and then use standard Scala functionality for filtering and finding things in lists, rather than having to write code that iterates through the parent files manually.

Parser combinators

In our developer meeting this week, we discussed parsing, and particularly parser combinators.

We’ve used the Scala parser combinator library in the past for parsing search query syntax – for example, to support a custom search syntax used by a legacy system and convert it into an XQuery for searching XML. We’ve also used Parboiled, a Java/Scala parser library, for parsing geographic latitude and longitude values from within scientific journal articles about geology. We’ve done simpler parsing with regular expressions in C# to identify citations within text like “(Brown et al, 2012)” and “(Brown and Smith, 2010; Jones, 2009)”.

The parser combinator approaches are typically better than using a traditional parsing method like Lex and YACC or JavaCC, because they’re written in the host language (e.g. Java or Scala), and so it’s much easier to write unit tests for them and to update them easily. They’re particularly approachable in Scala, because Scala’s support for domain-specific languages means that you can write code that looks like:

  “{” ~ ( comment | directive ) ~ “}”

where the symbols like ~ and | are Scala method invocations – which means that you can focus on the parsing, rather than the parser library syntax.

We briefly discussed where it makes sense to use regular expressions for parsing, and where it makes sense to use a more powerful parsing approach. We agreed that there was a danger of creating overly complex regular expressions by incremental “boiling a frog” extensions to an initially simple regex, rather than stopping to rewrite using a parser library.

For further processing of the content once it’s been parsed, we discussed using the Visitor pattern. For example, having created an abstract syntax tree from a search query, it’s useful to use a visitor approach to turn that tree into a pretty printed form, or into an HTML form for display, or into a query language form suitable for the underlying datastore.