VocabHunter – A tool for learners of foreign languages

A Java-based application, developed as open source.

Working here at King you meet people from all over the world. In my case I’m English but living and working in Barcelona. I’m busy learning Catalan and being a techie, I’ve developed some software to help in the learning process. One of the important of steps in the process of learning a language is starting to read in that language. However, a problem that many of us face is that we don’t have enough vocabulary to understand the material we want to read. My idea was to make this easier by providing an easy way to find new vocabulary in a document so that someone can learn it before reading. Enter VocabHunter…

VocabHunter is an Open Source desktop application that runs on Mac, Linux, or Windows and can be used for analysing documents. It can read a variety of document formats and the user can easily step through the words in a document to find the important words that they don’t yet know. Here’s an example of VocabHunter in use, in this case looking at the English vocabulary in Great Expectations:


Java 8

Apart from wanting a tool to help learning foreign languages, I wanted to try building something from the ground up in Java 8. Lambdas and streams are two of the stand-out features of this version of Java and are something that internally at King we’re starting to use right across our Java codebase. Analysing text provides a great chance to play with these new features and creating VocabHunter was the perfect opportunity to put these skills into action.

Java 8 streams provide facilities for mapping, filtering and grouping. You can think of them as being a bit like working with SQL for Java collections. One of the tasks that VocabHunter performs while analysing a document is to take an individual sentence and find the word uses within that sentence. These uses are then combined with those from all the other sentences in the document to form the results of the textual analysis.

When we work with Java 8 streams, we work in a declarative way. It is often helpful to think of the use of streams as being like a recipe made from a sequence of steps. In order to find the word uses in a specific sentence, VocabHunter uses a chain of Java 8 stream operations which can be visualised and understood as follows:

String: a sentence from the document being analysed
 | split the sentence into words using a regular expression
 Stream : The words in the sentence, possibly including blanks
 | Filter: remove any blanks and words with fewer than the minimum letters
 Stream : The words in the sentence
 | Group: Create a map from each word to the number of times it is used
 | in the sentence
 Map<String, Long>: A use count for each word in the sentence
 | Map: Create a WordUse object for each of the counted words from the sentence
 Stream: A Java 8 stream of WordUse objects for the sentence all ready to be combined

The Java 8 code that does this is expressed in the source code as follows:

private Stream<WordUse> uses(String line, int minLetters) {
  Map<String, Long> counts = PATTERN.splitAsStream(line)
    .filter(w -> !w.isEmpty() && w.length() >= minLetters)
    .collect(groupingBy(Function.identity(), counting()));

  return counts.entrySet().stream()
    .map(e ->
      new WordUse(e.getKey(), e.getValue().intValue(), line));

You can see this code in its context in the following class: SimpleAnalyser.java

The above is the heart of VocabHunter! In a small class we have the majority of the logic for analysing documents. Much of the rest of the program code is taken up with building the Graphical User Interface.


VocabHunter has a simple and I hope user-friendly GUI written in JavaFX. This rich client platform now comes as standard with Java 8 and is intended to replace Swing. If you’ve worked with Swing in the past you’ll really appreciate the more modern approach of JavaFX. To start with, as I hope you can see from the screenshot at the beginning of this blog post, JavaFX has a fresh new look. The framework uses CSS to style the user interface and this is put to good effect in VocabHunter to transform a boring button like this:

to something much brighter, easier to understand, like this:


Working with CSS in this way instead of embedding the styles in the Java code provides a great separation of concerns and means that someone more skilled than me at design can help to improve the look of the program without their needing to be Java programmer. The VocabHunter logo shown on the front screen is simple text styled with CSS to impressive effect:


This is achieved with the following CSS:

.title.left {
  -fx-fill: linear-gradient(
    from 0% 0% to 100% 200%, repeat, #EBFFF0 0%, #008000 90%);

.title.right {
  -fx-fill: linear-gradient(
    from 0% 0% to 100% 200%, repeat, #FFF0ED 0%, #a43030 90%);

In a similar vein, the user interface is laid out using FXML an XML-based layout language that avoids the need to describe layout in Java as people used to do with Swing. There is even a free graphical editor for FXML, the Scene Builder, to help putting together this markup.

JavaFX has a lot of other features that are worth exploring and in VocabHunter I’ve made extensive use of the Property Binding feature to connect up parts of the UI. Please feel free to have a look around the source code on GitHub to see how it all works.

Open Source

At King we use a lot of Open Source software and it seemed natural to make this personal project Open Source too. As a Java developer it was the perfect chance for me expand my understanding of Java 8 while creating a tool that my colleagues, friends and the wider Open Source community could use regularly and innovate as they wish. You can find it on GitHub here: https://github.com/VocabHunter/VocabHunter.

One of the great things about Open Source is the wealth of libraries available out there. VocabHunter for example makes use of the Apache Tika library to read documents. Using this fantastic software the program is able to read all sorts of different file formats, including Microsoft Word, PDF, OpenDocument, and many more!

If you think you can help out, your code contributions would be most welcome. Or perhaps you’re a graphic designer and can improve on my layouts, icons, colour schemes and so forth. Please feel free to send me a pull request.

Join the Kingdom - Jobs.King.com

Share this article!

Adam Carroll

About Adam Carroll

Adam is a Software Engineer working at King. He’s originally from London but now lives and works in Barcelona. Adam works in Java on the server-side systems that King uses to communicate with the players of the mobile games. These games have millions and millions of players so this makes for a really interesting technical challenge! Adam's interests include travel and learning languages. You might well find him walking around in the sunshine in Barcelona or perhaps sitting reading a book in one of the city’s many cafes.

11 thoughts on “VocabHunter – A tool for learners of foreign languages

  1. This is a brilliant tool! I am going to suggest it to some of my Spanish friends living in London 🙂

  2. I am Java dev learning Chinese
    I also thought about something similar
    Havent you based your engine on Lucene? If not can you tell why?

    1. Thanks for the feedback! I haven’t looked at Lucene yet. I decided to take a very simple approach to text analysis for the first version of the system. I’m busy adding features (at the moment I’m adding filters to make it easier for users to focus on the words that are important to them). In the future I hope to look at more sophisticated language analysis and at making the system work better with a more diverse set of writing systems. To keep up to date with the progress, you can follow VocabHunter on twitter https://twitter.com/vocabhunterapp

    1. Thanks Jim! I hope you find VocabHunter useful. Good luck with the Catalan study – I’m sure you’re enjoying the learning process in this great city just as I am.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.