Tuesday, October 1, 2013

How Many Threads Should I Run?

Running operations sequentially (i.e. one after another) often leaves one or more of your CPU cores sitting idle. This can lead to poor performance, or at the very least, not optimum performance. Why not make use of those idle processors and get your work done quicker?

This leads us to spawning multiple threads in our application to run operations concurrently. But how do you know how many threads to run? If you've ever written concurrent applications, you've probably found yourself increasing the thread count, watching your application finish quicker along the way, but then get slower once your thread count exceeds a certain number. So, you reduce the number of threads until you find that "magic number" of threads that seems to yield the best performance. But, how do you find that "magic number" of threads without so much experimentation? Knowing the number of CPU cores on the machine and the type of work that each thread is doing, we can accurately estimate the number of threads we need.

Let's say that we have four CPU cores on our machine. To take full advantage of the CPU, we should run at least four threads. However, we can increase the number of threads by quite a bit, if the threads spend most of their time waiting on a response from outside calls (e.g. web service requests). If the threads are sitting in a blocking/wait state, why not spawn another thread and put another service call in that external service queue? However, if your threads are very busy crunching numbers and chewing up CPU cycles, then you'll actually pay a penalty by spawning more threads. This is because the CPU cores are already busy, so throwing another thread at it, will only make the CPU switch back and forth between jobs (i.e. context switching).

When we talk about type of work that your threads are performing, we're referring to the "blocking coefficient". A job that spends most of it's time blocking/waiting for some external service has a very high blocking coefficient. A job that spends its time crunching numbers and performing calculations has a very low blocking coefficient, because it doesn't block (or blocks very minimally).

When determining the number of threads to run in your application, use the following formula:

Number of Threads = Number of CPU Cores / (1 - Blocking Coefficient Percentage)


For example, if the job that you're planning to multi-thread spends 90% of its time blocking/waiting on an external service, the "Blocking Coefficient Percentage" = .90. This means that on a quad core machine, you would want 40 threads to ensure that you are taking full advantage of your CPU.

There are a few things to keep in mind when using this algorithm. First of all, don't forget that your development machine won't necessarily have the same number of CPU cores as the production server that will run your code. This will require you to calculate the number of available CPU cores at runtime. Also, don't forget that there may be other applications running on that production server, so don't be a CPU hog!

Writing concurrent applications is much more involved than this simple formula, but this should give you a good starting point when trying to figure out how many threads you should spawn.

Friday, September 27, 2013

Scala Parser Combinators - JavaOne 2013 BOF

In this post, I wanted to share my working example that I demonstrated in my JavaOne 2013 BOF session on "Building Small Languages with Scala Parser Combinators". In the interest of a simple demo, I kept the language features to a minimum. For example, the language types are only integer values. In other words, there are no floating point values, booleans, strings, or chars. Also, the language is interpreted, which means that I'm not generating byte codes.

These simplifications aside, there are some cool things that I present in this demonstration, all of which form a good foundation for you to use as a starting point for your own language. For example, there are complex expressions, supporting multiplication, division, addition, subtraction, and modulo. There are also several things you'd expect in a small language, such as if-else statements, loops, variable references, and functions.

Here's an example script that the language can process:

// an example of a function that returns a value
func subtractNumbers( x, y, z )
 return x - y - z
endfunc

// an example of a function that doesn't return anything
func doSomething()
 println 100
endfunc

// main program routine
main:
 var a = 10
 var b = 20
 
 if (a > b) then
  loop 3 times
   var d = subtractNumbers( x = 10, y = 5, z = 3 )
   println d
  endloop
 else
  if (b > 0) then
   loop 2 times
    println b
   endloop
  endif
 endif
 
 println 4
 
 doSomething()
 
 println 1
 println 2


The architecture consists of the following components:

  • A parser written with the Scala parsing library
    • The output of the parser is an AST which describes the structure of the input script
  • An interpreter that walks the resulting AST and processes the nodes accordingly
Here's the source code for the parser:

package net.travisdazell.parsers

import scala.util.parsing.combinator.syntactical.StandardTokenParsers
import net.travisdazell.parsers.model._

// small interpreted language with the following features:
//   - variable definitions and references
//  - if-else statements
//  - loops
//  - error handling
//  - scoping
//  - functions with named arguments
class SmallLanguageParser extends StandardTokenParsers {
  lexical.reserved += ("var", "println", "loop", "times", "endloop", "if",
    "then", "else", "endif", "func", "return", "endfunc", "main")
  lexical.delimiters += ("*", "/", "%", "+", "-", "(", ")", "=",
    "<", ">", "==", "!=", "<=", ">=", ",", ":")

  def program: Parser[Program] = (rep(function) <~ ("main" ~ ":")) ~ codeblock ^^ {
    case f ~ c => new Program(f, c)
  }

  def function: Parser[Function] = ("func" ~> ident) ~ ("(" ~> arguments) ~
    (")" ~> codeblock) ~ opt(returnStatement) <~ "endfunc" ^^ {
      case a ~ b ~ c ~ None => new Function(a, b, c, Number(0))
      case a ~ b ~ c ~ d => new Function(a, b, c, d.get)
    }

  def returnStatement: Parser[Expr] = "return" ~> expr ^^ {
    e => e
  }

  def arguments: Parser[Map[String, Int]] = repsep(ident, ",") ^^ {
    argumentList =>
      {
        (for (a <- argumentList) yield (a -> 0)) toMap
      }
  }

  def codeblock: Parser[List[Statement]] = rep(statement) ^^ { a => a }

  def statement: Parser[Statement] = positioned(variableAssignment | outStatement |
    loopStatement | ifStatement | functionCall | outStatement) ^^ { a => a }

  def variableAssignment: Parser[VariableDefinition] = "var" ~> ident ~ "=" ~
    positioned(functionCall | expr) ^^ { case a ~ "=" ~ b => { new VariableDefinition(a, b) } }

  def outStatement: Parser[PrintStatement] = "println" ~> positioned(expr) ^^ {
    case a => new PrintStatement(a)
  }

  def loopStatement: Parser[LoopStatement] = ("loop" ~> iterations <~ "times") ~
    codeblock <~ "endloop" ^^ {
      case i ~ s => {
        new LoopStatement(i, s)
      }
    }

  def ifStatement: Parser[IfStatement] = conditional ~ codeblock ~
    opt("else" ~> codeblock) <~ "endif" ^^ {
      case a ~ b ~ c => {
        c match {
          case None => new IfStatement(a, b, List())
          case _ => new IfStatement(a, b, c.get)
        }
      }
    }

  def conditional: Parser[Condition] = "if" ~ "(" ~> condition <~ ")" ~ "then"

  def condition: Parser[Condition] = positioned(expr) ~
    ("<" | ">" | "==" | "!=" | "<=" | ">=") ~ positioned(expr) ^^ {
      case a ~ b ~ c => {
        new Condition(b, a, c)
      }
    }

  def iterations: Parser[Int] = numericLit ^^ { _ toInt }

  def functionCall: Parser[FunctionCall] = ((ident) <~ "(") ~
    functionCallArguments <~ ")" ^^ {
      case a ~ l => new FunctionCall(a, l)
    }

  def functionCallArguments: Parser[Map[String, Expr]] =
    repsep(functionArgument, ",") ^^ {
      _ toMap
    }

  def functionArgument: Parser[(String, Expr)] = (ident <~ "=") ~ expr ^^ {
    case a ~ b => (a, b)
  }

  def expr: Parser[Expr] = term ~ rep(("+" | "-") ~ term) ^^ {
    case a ~ List() => a
    case a ~ b => {
      def appendExpression(c: Operator, p: Operator): Operator = {
        p.left = c
        p
      }

      var root: Operator = new Operator(b.head._1, a, b.head._2)

      for (f <- b.tail) {
        var parent =
          f._1 match {
            case "+" => new Operator("+", null, f._2)
            case "-" => Operator("-", null, f._2)
          }

        root = appendExpression(root, parent)
      }

      root
    }
  }

  def term: Parser[Expr] = multiplydividemodulo ^^ { l => l } | factor ^^ {
    a => a
  }

  // note that "rep" returns a List
  def multiplydividemodulo: Parser[Expr] = factor ~ rep(("*" | "/" | "%") ~ factor) ^^ {

    case a ~ List() => a
    case a ~ b => {
      def appendExpression(e: Operator, t: Operator): Operator = {
        t.left = e.right
        e.right = t
        t
      }

      var root: Operator = new Operator(b.head._1, a, b.head._2)
      var current = root

      // for each of these, i'm just building up the parse tree
      for (f <- b.tail) {
        var rightOperator =
          f._1 match {
            case "*" => Operator("*", null, f._2)
            case "/" => Operator("/", null, f._2)
            case "%" => Operator("%", null, f._2)
          }

        current = appendExpression(current, rightOperator)
      }

      root
    }
  }

  def factor: Parser[Expr] = numericLit ^^ { a => Number(a.toInt) } |
    "(" ~> expr <~ ")" ^^ { e => e } |
    ident ^^ { new Identifier(_) }

  def parseAll[T](p: Parser[T], in: String): ParseResult[T] = {
    phrase(p)(new lexical.Scanner(in))
  }
}

Next, let's take a look at the interpreter:

package net.travisdazell.parsers.runtime
import net.travisdazell.parsers.model._

class Interpreter(program: Program) {
  var currentScope = new Scope("global", null)

  def run() {
    walk(program.statements)
  }

  private def getVariable(ident: Identifier): Expr = {
    var s: Scope = currentScope

    while ((!s.name.equals("global")) && !s.variables.contains(ident.name)) {
      s = s.parentScope
    }

    if (s.variables.contains(ident.name)) s.variables(ident.name)
    else {
      sys.error("Error: Undefined variable " + ident.name +
        " at position [" +
        ident.pos.column + "] on line: " +
        ident.pos.line)
    }
  }

  private def calculateExpr(e: Expr): Int = {
    e match {
      case Number(value) => value
      case Identifier(name) => {
        calculateExpr(getVariable(e.asInstanceOf[Identifier]))
      }
      case Operator(op, left, right) => {
        op match {
          case "*" => calculateExpr(left) * calculateExpr(right)
          case "/" => calculateExpr(left) / calculateExpr(right)
          case "%" => calculateExpr(left) % calculateExpr(right)
          case "+" => calculateExpr(left) + calculateExpr(right)
          case "-" => calculateExpr(left) - calculateExpr(right)
        }
      }
    }
  }

  private def isConditionTrue(condition: Condition): Boolean = {
    val a = calculateExpr(condition.left)
    val b = calculateExpr(condition.right)

    condition.op match {
      case "==" => (a == b)
      case "!=" => (a != b)
      case "<=" => (a <= b)
      case "<" => (a < b)
      case ">=" => (a >= b)
      case ">" => (a > b)
    }
  }

  private def executeFunction(f: Function, arguments: Map[String, Expr]) {
    currentScope = new Scope(f.name, currentScope)

    for (v <- arguments) currentScope.variables(v._1) = v._2

    walk(f.statements)

    currentScope = currentScope.parentScope
  }

  private def walk(tree: List[Statement]) {
    if (!tree.isEmpty) {
      tree.head match {
        case FunctionCall(name, values) => {
          val f = program.functions.filter(x => x.name == name)

          if (f.size < 1) sys.error("Error: Undefined function '" +
            name + "' being called at position [" +
            tree.head.pos.column + "] on line: " +
            tree.head.pos.line)
          else {
            executeFunction(f(0), values)

            walk(tree.tail)
          }
        }
        case VariableDefinition(name, value) => {
          // push this variable into scope
          if (value.isInstanceOf[FunctionCall]) {
            val functionCall = value.asInstanceOf[FunctionCall]
            val function = program.functions.filter(x => x.name == functionCall.name)

            // check if proc is defined and if not throw an error
            if (function.size < 1) sys.error("Error: Undefined function '" +
              functionCall.name + "' being called at position [" +
              tree.head.pos.column + "] on line: " +
              tree.head.pos.line)
            else {
              executeFunction(function(0), functionCall.values)

              currentScope = currentScope.parentScope

              // assign the return value of the function to the variable
              currentScope.variables(name) = function(0).returnValue
            }
          } else {
            currentScope.variables(name) = value
          }

          walk(tree.tail)
        }
        case PrintStatement(value) => {
          println(calculateExpr(value))
          walk(tree.tail)
        }
        case LoopStatement(iterations, statements) => {
          currentScope = new Scope("", currentScope)

          for (i <- 0 until iterations) walk(statements)

          currentScope = currentScope.parentScope

          walk(tree.tail)
        }
        case IfStatement(condition, trueBranch, falseBranch) => {
          currentScope = new Scope("", currentScope)

          if (isConditionTrue(condition)) walk(trueBranch) else walk(falseBranch)

          currentScope = currentScope.parentScope

          walk(tree.tail)
        }
        case _ => ()
      }
    }
  }
}


The last thing we need is a main routine that will read our input script and execute our parser:

package net.travisdazell.parsers.runtime

import scala.io.Source

import net.travisdazell.parsers.SmallLanguageParser

object SmallLanguage {
  def main(args: Array[String]) {
    val inputFile = Source.fromFile("scripts/program.small")
    val inputSource = inputFile.mkString

    val parser = new SmallLanguageParser
    parser.parseAll(parser.program, inputSource) match {
      case parser.Success(r, n) => {
        val interpreter = new Interpreter(r)

        try {
          interpreter.run
        } catch {
          case e: RuntimeException => println(e.getMessage)
        }
      }
      case parser.Error(msg, n) => println("Error: " + msg)
      case parser.Failure(msg, n) => println("Error: " + msg)
      case _ =>
    }
  }
}


The one thing I haven't included is the model classes that represent the AST nodes. However, as always, I've provided the full source code on GitHub: https://github.com/travisdazell/javaone-2013-BOF-small-language

Monday, July 29, 2013

JavaOne 2013

For those of you following my blog, you've probably noticed I haven't been actively posting tutorials as usual. I apologize for my slow posts lately. I'll be back to posting helpful tutorials soon. In the meantime, I'm glad to announce that I'll be at JavaOne this year. I'm speaking on the following topics:
  • Building Small Languages with Scala Parser Combinators
  • Developing Domain-Specific Languages for the JVM
I hope to see you at JavaOne this fall!

Friday, May 10, 2013

Scala Lists Flatten

The flatten method on Scala lists is used to remove one level of nested structure. For example, if we have a list that contains two lists, we can flatten that structure and result in a single one-dimensional list. Here's an example.

val listOfLists = List(List(1, 3, 5), List(2, 4, 6))
listOfLists flatten    //   List(1, 3, 5, 2, 4, 6)


Scala Lists Zip and Unzip

Zipping is the process of taking two lists and merge them into a single list of tuples, where the first item in each list comprises the first tuple, the second item in each list comprises the second tuple, etc. It's best to think of this operation like a zipper on a jacket where the first teeth on each side of the zipper become merged, the second teeth on each side of the zipper become merged, etc.

Here's an example of two lists that are zipped.

val digits = List(1, 2, 3)
val chars = List("a", "b", "c")
val result = digits zip chars    // result : List((1,"a"), (2,"b"), (3,"c"))


The opposite of the zip method is unzip. Here's the result of unzipping the result from above.

result unzip    // (List(1, 2, 3),List("a", "b", "c"))


Thursday, May 9, 2013

Scala - Transform All Elements in a Scala Collection

Often times, you need to transform all of the elements in a Scala collection. This is where the map method comes in handy. With the map method, you can apply a function to all of the elements in a collection and obtain a new collection with the results. Here's an example that uses the map method to create a new list where all of the elements in the list are converted to uppercase.

val places = List("Atlanta", "Chicago", "Dallas", "New York City")
val uppercasePlaces = places.map(_.toUpperCase)

This has the same effect as iterating over all of the elements and calling toUpperCase on each element individually.

Sometimes you'll have a collection of various types. For example, your collection might contain Int and String values and you want to convert all of the String values to uppercase and still keep the Int values in the resulting list. For this, we have the collect method, along with case classes. Here's an example.

val numbersAndStrings = List(1, "Atlanta", 10, "Boston", "Dallas")
val numbersAndUppercaseStrings = numbersAndStrings.collect {
   case s: String => s.toUpperCase; case i: Int => i
}


Wednesday, May 1, 2013

Throttling Service Requests Using Camel

This tutorial shows how to use Camel to throttle the sending of messages to a service. This is necessary when you don't want to overload the service. For example, the service might have a published service level agreement (SLA) requiring you to only send a certain number of requests per second or minute. If you're not familiar with Camel, read my blog post on Implementing Pipes and Filters Using Camel.

The source code for this example is available on GitHub: https://github.com/travisdazell/camel-throttler

Let's consider an integration scenario where we receive orders via an incoming file drop. We then pick up the file, process it with some custom Java code, and then place a message on a JMS queue. Here's the Camel route, with a throttler that only allows one message to be sent to the JMS queue every five seconds.

<camelcontext xmlns="http://camel.apache.org/schema/spring">
    <route>
        <from uri="file://incoming" />
  
        <to uri="bean:orderProcessor?method=processIncomingOrder" />
  
        <throttle timePeriodMillis="5000">
            <constant>1</constant>
            <to uri="activemq:queue:ORDERS.INCOMING" />
        </throttle>
    </route>
</camelcontext>


Monday, April 29, 2013

Scala Lists - Partition

Two very common list functions are filter and filterNot, where the former function returns a new list where all of the elements in the list hold true for a specified predicate and the latter function returns a new list where all of the elements in the list do not hold true for a specified predicate. For example, we can use these two functions to traverse a list of integers and generate a list of all positive integers, as well as a list of all negative integers.

    val l: List[Int] = List(-12, 2, -2, -3, 4, 5, 6)
    
    val positiveList = l.filter(x => x > 0)
    val negativeList = l.filterNot(x => x > 0)


The list function partition combines both of these operations into a single traversal of the original list and returns two lists, one list of values where the predicate holds true and another list of values where the predicate does not hold true.

val l: List[Int] = List(-12, 2, -2, -3, 4, 5, 6)

val positiveAndNegativeLists = l.partition(x => x > 0)


We can then access each list using positiveAndNegativeLists._1 and positiveAndNegativeLists._2, where the former is the list of all elements where the predicate is true and the latter contains all elements where the predicate is false.

    for (y <- positiveAndNegativeLists._1) println(y)
    for (y <- positiveAndNegativeLists._2) println(y)

Scala Lists - Map and Filter

Two very common List operations in Scala are map and filter. Let's start by looking at map. This function is used when you need to map all of the elements in the list to some over value. For example, if we have a list of integer values and we need to produce a new list where each value in the original list is multiplied by 3, we can write the following map implementation in Scala.

    val l: List[Int] = List(1, 2, 2, 3, 4, 5, 6)
    
    val newList = l.map(x => x * 3)


Another useful function is filter. Filter is like map, in that it traverses every element in the list. However, filter will return a new list where all of the elements in the original list hold true for a particular predicate/test. For example, if we wanted to filter out all odd integer values in a list, we can write the following in Scala.

    val l: List[Int] = List(1, 2, 2, 3, 4, 5, 6)
    
    val newList = l.filter(x => (x %2 == 0))


Of course, we can combine both map and filter. Here's an example.

    val l: List[Int] = List(1, 2, 2, 3, 4, 5, 6)
    
    val newList = l.map(x => x * 3).filter(x => (x % 2) == 0)


Scala Parser Combinators - Part 3

In a previous post on Scala parser combinators, I demonstrated how to use Scala's parser library to write a simple arithmetic parser. This example resulted in a parser that parsed an arithmetic expression and returned the results of the operation. In this tutorial, I'm going to expand on this parser by generating a parse tree, instead of simply evaluating the expression. For the expression "2 * 8 + 6", the parser will output the following parse tree.

      *
    /   \
   2     +
       /   \
      8     6


Notice that every node in our tree is either an expression consisting of an operator and a left and right side, or a single digit. For example, "8+6" is the addition operator (+) with a left side of 8 and a right side of 6. 8 and 6 are the other type of tree node (i.e. number). We'll start by defining these two types of nodes in our parse tree with a case class.

class Expr
case class Number(value: Int) extends Expr
case class Operator(symbol: String, left: Expr, right: Expr) extends Expr


Next, we'll define the rules of our parser. Note that if you read my previous posts on Scala parser combinators, I've simply added the "term" and "factor" class members.

class ExprParser extends RegexParsers {
  val number = "[1-9][0-9]*".r

  def expr: Parser[Int] = (number ^^ { _.toInt }) ~ opt(operator ~ expr) ^^ {
    case a ~ None => a
    case a ~ Some("*" ~ b) => a * b
    case a ~ Some("/" ~ b) => a / b
    case a ~ Some("+" ~ b) => a + b
    case a ~ Some("-" ~ b) => a - b
  }

  def operator: Parser[String] = "+" | "-" | "*" | "/"

  def term: Parser[Expr] = (factor ~ opt(operator ~ term)) ^^ {
    case a ~ None => a
    case a ~ Some(b ~ c) => Operator(b, a, c)
  }
  
  def factor: Parser[Expr] = number ^^ (n => Number(n.toInt))
}

Let's test the parser and evaluate the results.

  def main(args: Array[String]) {
    val parser = new ExprParser
    val result = parser.parseAll(parser.term, "9*8+2")

    println(result.get)
  }

The output from this test is the following parse tree.

Operator(*,Number(2),Operator(+,Number(8),Number(6)))


In order to look at this response and recognize a parse tree, notice that a tree of the form "8+6" is printed as "Operator(+, Number(8), Number(6))". This is because the "+" is the root of the node, so it's listed first. Thus, we have a subtree that looks like this.


         +
       /   \
      8     6



After applying this to the rest of our parse tree response of "Operator(*, Number(2), ...)", our resulting parse tree looks like this.

      *
    /   \
   2     +
       /   \
      8     6


Sunday, April 28, 2013

Scala Nested Functions

In Scala, we have the ability to define a function within another function. Here's an example of a function named "appliesToMoreThanHalf" that determines whether a certain predicate holds true for more than half of the values in a list. The function contains a nested function named "iter" that processes the list.

  def appliesToMoreThanHalf(s: List[Int], p: Int => Boolean): Boolean = {
    var count = 0

    def iter(startingIndex: Int): Boolean = {
      if (startingIndex < s.length) {
        if (p(s(startingIndex))) {
          count += 1
        }
        iter(startingIndex + 1)
      }

      count >= (s.length / 2 + 1)
    }

    iter(0)
  }

Here's an example of this function being called to determine if more than half of the elements in a list are even (i.e. evenly divisible by 2).

  def main(args: Array[String]) {
    val l: List[Int] = List(1, 2, 2, 3, 4, 5, 6)

    if (appliesToMoreThanHalf(l, p => (p % 2 == 0))) {
      println("the function applies to more than half of the elements")
    } else {
      println("the function does not apply to more than half of the elements")
    }
  }


Granted, there is a much simpler way to determine if more than half of the elements hold true for a particular predicate, but this serves as a demonstration of nested functions.

Scala Tail Recursion

When you write a recursive function, every call to the recursive function results in another method call being placed on to the call stack. If the stack grows too much, you'll get a stack overflow error. For example, look at the following Scala code that uses recursion to calculate the sum of all integers in a list.

  def sum(s: Seq[Int]): BigInt = {
    if (s.isEmpty) 0 else s.head + sum(s.tail)
  }


When we execute this method for very large lists, we get a stack overflow error.

Exception in thread "main" java.lang.StackOverflowError
 at scala.collection.AbstractTraversable.(Traversable.scala:105)
 at scala.collection.AbstractIterable.(Iterable.scala:54)
 at scala.collection.AbstractSeq.(Seq.scala:40)
 at scala.collection.immutable.Range.(Range.scala:44)
 at scala.collection.immutable.Range$Inclusive.(Range.scala:330)
 at scala.collection.immutable.Range$Inclusive.copy(Range.scala:333)
 at scala.collection.immutable.Range.drop(Range.scala:170)
 at scala.collection.immutable.Range.tail(Range.scala:196)
 at scala.collection.immutable.Range.tail(Range.scala:44)
        ...


In Scala, we can use tail recursion to tell the compiler to turn our recursive call into a loop to avoid a stack overflow error. To do this, simply add a "tailrec" annotation to the method call.

@tailrec def sum(s: Seq[Int]): BigInt = { if (s.isEmpty) 0 else s.head + sum(s.tail) }

However, if we add the annotation and re-run our example, we get the following compiler error.

could not optimize @tailrec annotated method sum: it contains a recursive call not in tail position


This error is the result of the Scala compiler not being able to utilize tail recursion due to the structure of our code. Why is that? Well, if we take the "else" path in our sum method, the first step is the recursive call to "sum", passing in the tail of the list. The result of the recursive call is added to the head of the list, making the addition operation the last step in the computation. In order to utilize tail recursion, we need to refactor our code to make the recursive call the last step of the computation. Here's the same algorithm, after refactoring to make the recursive call to "sum" be the last step in the computation.

  @tailrec def sum(s: Seq[Int], p: BigInt): BigInt = {
    if (s.isEmpty) p else sum(s.tail, s.head + p)
  }

  def main(args: Array[String]) {
   val result = sum(1 to 1000000, 0)
   println(result)
  }


Now, after running the example, we get a successful result.

500000500000


Note that the Scala compiler will try to use tail recursion when it can; however, it's a good practice to annotate any methods where you expect this optimization to be done (i.e. the last step in your recursive algorithm is the call to the recursive function). That way, you'll be warned at compile-time that there was an issue applying the optimization, preventing surprises at run-time.

Tuesday, April 9, 2013

Consider Dichotomies When Writing Unit Tests

When writing unit tests, it's one thing to have code coverage (i.e. the paths through the code are being exercised by your unit tests), but it's another art to make sure that your tests are exercising the important properties of your function. One of these important properties is the dichotomies (i.e. classifications) of data that the function processes. For example, consider the following Scala function that simply returns a filtered list of strings where the first character is a digit.


  def getStringsStartingWithDigit(s: List[String]): List[String] = {
    s.filter(p => p.charAt(0).isDigit)
  }


In this example, the following dichotomies come to mind for unit tests:

  • An empty list
  • A list that contains strings beginning with a digit, as well as strings that do not begin with a digit
  • A list that doesn't contain any strings beginning with a digit
  • A list that where all strings begin with a digit

The following unit tests will cover these scenarios.


@RunWith(classOf[JUnitRunner])
class StringListTest extends FunSuite {
 test("filtering an empty list should return an empty list") {
   val expected = List()
   val actual = getStringsStartingWithDigit(List())
   assert(expected === actual)
 }
 
 test("filtering a list that contains strings starting with a digit") {
   val expected = List("1alpha", "2bravo")
   val actual = getStringsStartingWithDigit(List("1alpha", "2bravo", "echo"))
   assert(expected === actual)
 }

 test("filtering a list with no strings that start with a digit") {
   val expected = List()
   val actual = getStringsStartingWithDigit(List("alpha", "bravo", "echo"))
   assert(expected === actual)
 }

 test("filtering a list where all strings start with a digit") {
   val expected = List("1alpha", "2bravo", "3echo")
   val actual = getStringsStartingWithDigit(List("1alpha", "2bravo", "3echo"))
   assert(expected === actual)
 }
}

Dichotomies are just the classifications/groups of data that are interesting for the tests. Here are some examples of dichotomies that crop-up quite often during development, so keep them in mind when writing your unit tests:
  • Empty and Full Lists
  • Even and Odd Integers
  • Lists of Even and Odd Lengths
  • Positive and Negative Numbers

Friday, April 5, 2013

Higher-Order Functions in Scala

The functions that most of us are familiar with take some type(s) as parameters and return some type as a result. These are called first order functions. Higher-order functions, however, are functions that take other functions as parameters and/or return a function as a result. In other words, higher-order functions act on other functions.

To demonstrate with a simple example, let's look at how we might define a first order function that takes two integer values and returns the sum of both values squared.


def sumOfSquares(a: Int, b: Int): Int = {
   a * a + b * b
}


Next, let's look at how we could refactor this to use a higher-order function. Here is a function that takes 3 parameters:
  • a function that takes an Int and returns an Int
  • an Int named a
  • an Int named b

def sumOfTwoOperations(f: Int => Int, a: Int, b: Int): Int = {
   f(a) + f(b)
}


Next, let's call the sumOfTwoOperations function, passing a "squared" function as the first parameter.


def squared(x: Int): Int = x * x

val result = sumOfTwoOperations(squared, 2, 5)   // result = 29


The beauty of higher-order functions is that now we can define another operation, such as "cubed", and pass that to the sumOfTwoOperations.

def cubed(x: Int): Int = x * x * x

val result = sumOfTwoOperations(cubed, 2, 5)   // result = 133


Friday, March 29, 2013

Scala Call-By-Value Versus Call-By-Name

This tutorial demonstrates the difference between call-by-value and call-by-name method parameters. By default, method arguments are passed call-by-value. This is the evaluation that you're probably already familiar with from Java. With call-by-value, the arguments that are passed to the method are evaluated as an expression before the control flow is passed to the method. Here are two methods with the exact same code in the method body. They simply execute a loop for five iterations, referencing the parameter "x" within each iteration. The only difference is that the first method takes a call-by-value parameter (notice the => syntax), while the second method takes a call-by-name parameter.

  def callByValue(x : Unit) = {
    for (i <- 0 until 5) {
      print(x)
    }
  }
  
  def callByName(x : => Unit) = {
    for (i <- 0 until 5) {
      print(x)
    }
  }


Now, let's see what happens to the value of the parameter when we call these two methods with an expression that increments the function value by one (i = i + 1). When we call the callByValue method, the result of the value after the function call is simply i = i + 1.


  def main(args: Array[String]) {
    var i = 0
    
    callByValue(i = i + 1)
    
    print(i)  // "1"
  }


What's happening here is that the expression i = i + 1 is being evaluated one time, before the control flow is passed to the callByValue method. However, after calling the callByName method instead of callByValue, the result of the variable "i" is 5.


  def main(args: Array[String]) {
    var i = 0
    
    callByName(i = i + 1)
    
    print(i)  // "5"
  }


The reason this is happening is because when a parameter is call-by-name, the value of the parameter is evaluated every time it's referenced in the method. It's being referenced 5 times, so the expression i = i + 1 is being executed 5 times.

Tuesday, March 26, 2013

JVM Code Generation with Jasmin

When you write your own programming language, you're tasked with writing a lexer and parser. But, once that's complete, you still have to execute the code somehow. If you're looking to target the JVM as a runtime, you need to compile your source code down to JVM bytecode, which can seem like a daunting task. However, Jasmin, which has been around for many years, is a very useful framework that makes this much easier. This tutorial takes you through the steps of creating a simple Hello World program with Jasmin. The result is a HelloWorld.class file that you can run at the command-prompt with Java. In future tutorials, I'll go through more complex examples, but you have to start somewhere. I put the full code on GitHub: https://github.com/travisdazell/Jasmin-HelloWorld


Step 1: Download Jasmin

You can download Jasmin here: http://jasmin.sourceforge.net.

Download and extract the ZIP file to your PC. I have it installed on my root, so the result is a C:\jasmin-2.4\ directory on my PC.


Step 2: Write the Jasmin code for the HelloWorld class.

Let's create a simple Hello World program and then discuss the code in more detail. Here's the complete example:

; class definition
.class public HelloWorld

; extends java.lang.Object
.super java/lang/Object

; default constructor -- public HelloWorld() { super(); }
.method public ()V
 aload_0
 invokenonvirtual java/lang/Object/()V
 return
.end method

; main method -- public static void main(String args[]) { ... }
.method public static main([Ljava/lang/String;)V
 .limit stack 2     ; we're allocating 2 items we plan to put on the stack
 
 ; push System.out onto the stack
 getstatic java/lang/System/out Ljava/io/PrintStream;
 
 ; push the constant "Hello World" string onto the stack
 ldc "Hello World"
 
 ; print the result that's on the top of the stack
 invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
 return
.end method


Let's review the Jasmin code so that you get an understanding of what's involved. The first two lines are fairly self-explanatory. We're just defining the name of the Java class and its superclass. In this example, the superclass is just java.lang.Object.


; class definition
.class public HelloWorld

; extends java.lang.Object
.super java/lang/Object


Note the use of the slash (/) in the package structure. That's one of the differences between actual Java code and Jasmin. The next step is to define the default constructor.


; default constructor -- public HelloWorld() { super(); }
.method public()V
 aload_0
 invokenonvirtual java/lang/Object/()V
 return
.end method

The last section of code is our main method. Within this method, the first thing we do is specify the size of the stack for the method. In our case, we only need a stack size of 2, because we're only going to be putting System.out and the constant "Hello World" string on the stack. The last step is to print the result that's on top of the stack and return.


; main method -- public static void main(String args[]) { ... }
.method public static main([Ljava/lang/String;)V
 .limit stack 2     ; we're allocating 2 items we plan to put on the stack
 
 ; push System.out onto the stack
 getstatic java/lang/System/out Ljava/io/PrintStream;
 
 ; push the constant "Hello World" string onto the stack
 ldc "Hello World"
 
 ; print the result that's on the top of the stack
 invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
 return
.end method


Step 3: Save the file and compile it to bytecode

Save the file as HelloWorld.j and then run the following command from the command-prompt:


C:\> java -jar "C:\jasmin-2.4\jasmin.jar" HelloWorld.j

The output will be a HelloWorld.class file that you can run using Java!


Wednesday, March 20, 2013

Implementing a Content-Based Router Using Camel

Often times, we're faced with the problem of routing a message at run-time based on the content of the message. Consider the following example. Let's say we receive all incoming orders in a single queue. However, we have two order processing services. The first service processes our normal customer orders, but we have a separate service that processes orders placed internally by employees. How do we solve this problem? A content-based router pattern solves this problem of determining where to route a message at run-time. Here's an example of a Content-Based Router implementing the example I described (the image is credited to http://www.enterpriseintegrationpatterns.com/ContentBasedRouter.html).




Thankfully, it's extremely easy to implement a content-based router pattern in Camel. If you're not familiar with using Camel, refer to my tutorial on Implementing Pipes and Filters Using Camel. It's a different integration pattern, but it walks through the basics of using Camel.

Now that you're familiar with Camel, let's look at how you would define a content-based router in your Camel route.


    <!-- content based router -->
    <route>
        <!-- get message from JMS ORDERS.INCOMING queue -->
        <from uri="activemq:queue:ORDERS.INCOMING">

            <!-- route the order to the correct queue -->
            <choice>
                <when>
                    <simple>${header.type} == 'customer'</simple>
                    <to uri="activemq:queue:ORDERS.CUSTOMER"></to>
                </when>
                <when>
                    <simple>${header.type} == 'employee'</simple>
                    <to uri="activemq:queue:ORDERS.EMPLOYEE"></to>
                </when>
            </choice>
        </from>
    </route>

Tuesday, March 19, 2013

Web Service Client in Java Using JAX-WS

This tutorial describes how to consume a SOAP web service in Java using JAX-WS. For this tutorial, we'll create a client application that makes a call to a SOAP service hosted on wsf.cdyne.com.

  1. The first step is to import the WSDL of the web service to create the client proxy code. From the command prompt, navigate to the directory where you want to generate the client code and type: "wsimport -s . http://wsf.cdyne.com/WeatherWS/Weather.asmx?WSDL". After running this command, you'll find a \com\cdyne\ws\weatherws directory has been created with a number of .java and .class files.


  2.  
  3.  Next, we'll write a simple Java class and main method to invoke the web service.
  4.   public class WeatherTest {
      
        public static void main(String args[]) {
            Weather weather = new Weather();
          
            WeatherSoap weatherSoap = weather.getWeatherSoap();
          
            ForecastReturn forecastReturn = weatherSoap.getCityForecastByZIP("56701");
          
            ArrayOfForecast arrayOfForecast = forecastReturn.getForecastResult();
            List<Forecast> forecasts = arrayOfForecast.getForecast();
          
            for (Forecast forecast : forecasts) {
                System.out.println(forecast.getDate().toString() + " " + forecast.getDesciption());
            }
        }

    }


  5. The output from this program is:
    2013-03-14T00:00:00 Mostly Cloudy
    2013-03-15T00:00:00 Snow
    2013-03-16T00:00:00 Partly Cloudy
    2013-03-17T00:00:00 Partly Cloudy
    2013-03-18T00:00:00 Snow
    2013-03-19T00:00:00 Partly Cloudy
    2013-03-20T00:00:00 Partly Cloudy

Monday, March 18, 2013

AES CBC Mode Encryption

If you've used javax.crypto to do encryption, then you're familiar with the code below that obtains an instance of a Cipher (note that the code below is written in Scala).

val encryptCipher = Cipher.getInstance("AES/CBC/PKCS5Padding", BouncyCastleProvider.PROVIDER_NAME)

The first parameter to getInstance is where you specify the type of encryption you want to use. In the example above, I'm using AES encryption in CBC mode This tutorial explains what CBC mode really means and what's happening under the hood.

AES encryption starts by dividing the plaintext message into 128-bit blocks. The result looks something like this:


Once the message has been divided into 128-bit blocks, the message is encrypted using CBC mode. CBC mode stands for "Cipher Block Chaining" mode. In pseudo-code, the steps of the CBC algorithm are as follows:

1. A random initialization vector (IV) is generated and then XORed with Message[0].
2. The result of the XOR operation is fed into the encryption function E(k, m) where k is the encryption key and m is result of the XOR operation.
3. The result of the encryption function becomes the first 128-bit block of the ciphertext (ciphertext[0]).
4. ciphertext[0] is then XORed with Message[1].
5. The result of the XOR operation is fed into the encryption function E(k, m) where k is the encryption key and m is the result of the XOR operation.
6. This continues until the entire message has been encrypted.

Here's what CBC mode AES encryption looks like with a picture:




You can see why it's called cipher block "chaining", as the result of each encryption function is chained to the XOR of the next plaintext message block. Note that if the message can't be divided evenly into 128-bit blocks, the last plaintext block is padded to make it 128-bits. If, however, the message divides evenly into 128-bits, the plaintext message is appended with a dummy block of 128-bits before encryption.

In a future blog post, I'll describe how AES encryption with CTR mode works, and how it differs from CBC mode.

Sunday, March 17, 2013

Implementing Pipes and Filters Using Camel

Pipes and Filters is an integration pattern that solves the problem of having to process a job or task in a series of sub-tasks. If you're not familiar with pipes and filters, read my post on describing the design: Pipes and Filters - Integration Pattern.

This tutorial describes how to implement the pipes and filters integration pattern using Apache Camel. The source code is on GitHub: https://github.com/travisdazell/camel-pipes-and-filters

The key component of this application is the Camel route that's defined in spring-context.xml:

<camelContext xmlns="http://camel.apache.org/schema/spring">

  <!-- pipes and filters example -->
  <route>
 <!-- get message from JMS FOO.BAR queue -->
 <from uri="activemq:queue:FOO.BAR" />

 <!-- use the simple language to transform the input to upper case -->
 <transform>
  <simple>${body.toUpperCase()}</simple>
 </transform>
 
 <!-- route the input to the messageReceiver bean, which will modify the message slightly -->
 <to uri="bean:messageReceiver?method=processMessage"/>
 
 <!-- route the output from the messageReceiver bean to the output console -->
 <to uri="stream:out"/>
  </route>

</camelContext>


This route has four steps/filters. The pipeline begins when a JMS message is posted on the FOO.BAR queue. Next, I'm using Camel to convert the message body to upper-case. The third step in the pipeline sends the result of the upper-case transformation to a Spring bean with an ID of "messageReceiver". Notice the use of "?method=processMessage". This allows us to route the message to the processMessage() method on the messageReceiver bean. The last step of the pipeline is to output the results of the processMessage() method to the console.

In summary, the following items are the main components when implementing the pipes and filters integration pattern in Camel:

  1. Defining the Camel route. In this example, I'm using Spring to define the route.
  2. The result of the first step is the input to the second step. The result of the second step is the input to the third step. And so on...
  3. Use the following Camel URI to route a specific message in a Java class, where {beanId} is substituted with the ID of the Spring bean and {methodName} is the name of the method in the bean.
uri="bean:{beanId}?method={methodName}"

Wednesday, March 13, 2013

Scala Parser Combinators - Part 2

Scala Parser Combinators - Part 3 >


In Part 1 of Scala Parser Combinators, I introduced Scala parsers and gave an example of a trivial DSL that parsed arithmetic operations. I also demonstrated how to evaluate the language and print the results of the parsing operation. I'm going to expand on this example by showing how to transform the flat result string into a useful set of structures.

Rather than simply printing the results of the parsing operation, let's look at how we would evaluate the arithmetic operation to compute the result. For example, rather than parsing "9*8+21/7" and printing "(9~Some((*~(8~Some((+~(21~Some((/~(7~None))))))))))", we would like the parser to return "75".

Before we begin, though, it's important to understand what something like "(9~Some((*~(8~Some((+~(21~Some((/~(7~None))))))))))" really means in the world of Scala. Let's look at a subset of this string, "(9~Some((*~(8~None))))". This is the result of parsing "9*8". The first part that looks interesting is the "9~Some(...)". In the previous tutorial, we defined the following rule in our parser:

def expr: Parser[Any] = number ~ opt(operator ~ expr )

It's clear that "number" is evaluating to "9" and "~" is being printed out verbatim (which you should recall is used in Scala parsers to join parts of the grammar). However, what's with "Some(...)"? Well, whenever Scala parses an opt(x) statement, it will evaluate it as either Some(...) or None, both of which are subclasses of Option. That makes since... the opt(x) statement evaluates to an Option.

Instead of having our parser return a bunch of ~ and options, let's look at transforming the parser results into something more useful. Again, looking at our current parser rule:

def expr: Parser[Any] = number ~ opt(operator ~ expr )

We need to modify this parser definition to have it return an Int instead of Any. This is simple:

def expr: Parser[Int] = ...

The next step is to compute the result of the arithmetic operation. Our grammar rule allows for either a single number or a number followed by an arithmetic operator and another number. If we're dealing with a single number, we need to tell the parser to convert the result to an Int. To do this, we make the following modification to our parser rule:

def expr: Parser[Int] = (number ^^ { _.toInt }) ...

The ^^ just tells the parser to execute the code that follows it, contained in {...}. All we're doing is converting it to an Int.

Next, we need to tell the parser what to do when it encounters a number, or when it encounters a number followed by an operator and another number. For this, we need to define the integer operation for each situation (single integer value, addition of two values, subtraction of two values, division of two values, and multiplication of two values).

Again, we start by adding ^^ to our rule to include the Scala code that we want to execute in these situations. The result is:

def expr: Parser[Int] = (number ^^ { _.toInt }) ~ opt(operator ~ expr ) ^^ {
    case a ~ None => a
    case a ~ Some("*" ~ b) => a * b
    case a ~ Some("/" ~ b) => a / b
    case a ~ Some("+" ~ b) => a + b
    case a ~ Some("-" ~ b) => a - b
}

There are five cases we're handling. The first is the situation where we have just a single integer (a ~ None). When we have an Int with None after it, we simply evaluate the integer value as-is. The second situation is when we have an integer being multiplied by another integer (a ~ Some("*" ~ b)). In this case, we simply perform a * b. We then proceed to define the rules for division, addition, and subtraction.

The key take-aways from this tutorial are:
  • You define the type that your parser rule is returning within the brackets of the Parser[ ] definition. In this example, it's an Int.
  • You can add custom Scala code to operate on the parser results with ^^ { ... }

This works great for this simple arithmetic DSL. However, for anything more involved than this simple example, we most likely need to build a parse tree (or better yet, an AST) that we can process. I cover an introduction to parse trees in the next tutorial on Scala Parser Combinators.

Thursday, March 7, 2013

Scala Parser Combinators - Part 1

Scala Parser Combinators - Part 2  >


Scala ships with a parser library for writing your own lexers and parser from within Scala. It's a really nifty mechanism for writing DSLs embedded within a Scala program. For this example, let's write a parser that parses simple mathematical expressions, such as "1+9*8" and "4*6/2-5". An EBNF grammar for this language would look like this:

digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

number ::= digit | digit number

operator ::= "+" | "-" | "*" | "/"

expr ::= number (operator expr)?


To start writing a parser with the Scala parsing library, we write a class that extends the Parsers trait. Here's an example of a class that extends RegexParsers, a subtrait of Parsers.

class ExprParser extends RegexParsers {
    val number = "[1-9][0-9]+".r

    def expr: Parser[Any] = number ~ opt(operator ~ expr )

    def operator: Parser[Any] = "+" | "-" | "*" | "/"
}


The only difference between the Scala definition of the valid tokens and the EBNF definition is the following:
  • Scala utilizes a "~" between each token
  • Instead of using a "?" like you would in EBNF, Scala uses the keyword "opt"
  • Although this isn't used in the Scala code shown above, instead of using a "+" to denote repetition as you would in EBNF, Scala uses the keyword "rep"
 To run this parser, we simply invoke the inherited parse method that's part of the inherited Parsers trait.

 def main(args : Array[String]) {
    val parser = new ExprParser

    val result = parser.parseAll(parser.expr, "9*8+21/7")

    println(result.get)
}


The result of this println will be:

(9~Some((*~(8~Some((+~(21~Some((/~(7~None))))))))))


Of course,this isn't a very useful result at this point. However, it sets the foundation for writing a parser in Scala using the parser library. Check out Part 2 on Scala Parser Combinators, where we expand on this topic and look at how to modify this result into something that's more useful.

Using ActiveMQ as an XMPP (Jabber) Messaging Server

This tutorial demonstrates how to use ActiveMQ as a message broker for XMPP (Jabber). In this tutorial, we'll install ActiveMQ and use it for our message broker (i.e. server). We'll then use the Spark instant messaging client to create a chat room so that we can send and receive messages between IM clients.

The first step is installing ActiveMQ. For this example, we'll install everything on a Windows PC. Follow these steps to install ActiveMQ:

1. Go to http://activemq.apache.org and click the download link on the right-hand side of the page.
2. Click the link to download most recent stable version (5.7.0 at the time of this writing).
3. Click on the download link for the Windows distribution.



4. Unzip the download to a directory on your PC. I have it unzipped on the root of my C: drive.
5. Open the \apache-activemq-5.7.0\conf\activemq.xml file in an editor.
6. Edit the "transportConnectors" section of the configuration file by adding the XMPP transport.

        <transportConnectors>
               ...
               <transportConnector name="xmpp" uri="xmpp://0.0.0.0:61222"/>
        </transportConnectors>


7. From the command prompt, run \apache-activemq-5.7.0\bin\activemq



Now that you have ActiveMQ installed and the XMPP transport running on port 61222, open the following URL in your browser to view the ActiveMQ web console: http://localhost:8161/admin/



At this point, ActiveMQ is running and waiting for XMPP messages to arrive. We just need an IM client installed to start sending messages. Download and install Spark (http://www.igniterealtime.org/downloads/download-landing.jsp?file=spark/spark_2_0_0.exe). Note that I'm using v2.0.0.

After installing Spark, open it and enter the following at the login screen. Of course, you can use your own name, but put @localhost after your username and put "localhost" for the server.


Next, click on the "Advanced" link at the bottom of the login dialog. Within this screen, un-check the box for "Automatically discover host and port" and enter "localhost" for the Host and "61222" for the Port. Click OK.


Now click the "Login" button. After logging in, you'll see the following screen.






Next, we need to create a chat room. The chat room will auto-magically become a JMS Topic on the ActiveMQ server. From within Spark, there is a button underneath the "Online" indicator to "Join Conference Room". After clicking that button, you'll see the following dialog.



Next, click on the button for "Create or Join Room". You'll see the following dialog where you'll enter the name of the JMS Topic. I chose to name it CHAT.ROOM.





After clicking the "Create" button, click the "Join" button on the next dialog.


The next dialog will display the chat room window and show that you've joined the room.





The next step is to publish an IM message and watch it appear in our Spark client. To test this, go back to your browser where you have your ActiveMQ web console (http://localhost:8161/admin/). Click on the menu link that says "Topics". You'll see a list of JMS Topics on your ActiveMQ server and one of them will be the CHAT.ROOM Topic that you just created. Click on the link for that Topic.


After clicking on that link, you can post a message to the Topic by entering a message in the text area and clicking the "Send" button.


Now, if you go back to your Spark IM client, you'll see the message has been received by the client. That's it!

Wednesday, March 6, 2013

Pipes and Filters - Integration Pattern

Arguably one the simplest integration problems we encounter is having to process a complex job in a sequence of steps. Most of these problems are part of a workflow requirement. Examples include processing an order, approving a purchase order, and applying a credit to an account. Each of these business problems usually requires a series of steps to be performed in a certain sequence. This integration problem can be easily solved using Pipes and Filters. Here's an example of Pipes and Filters being applied to the processing of an incoming order (image is credited to http://www.eaipatterns.com/PipesAndFilters.html).





Essentially, Pipes and Filters is just a breakdown of a requirement into a series of connected steps. Each step is called a "filter" and the transitions between two filters is called a "pipe". When faced with having to process a large job in a series of processing steps, you should consider implementing the Pipes and Filters pattern.

It's important to implement a Pipes and Filters pattern in such a way that makes it easy to add and remove filters at any location in the process. To achieve this flexibility, I recommend using a framework such as Camel. Camel makes it very easy to add, remove, replace, or rearrange the processing steps with very little work and risk to your project. Read my tutorial on Implementing Pipes and Filters Using Camel to learn how Apache Camel makes this easy to implement and maintain.

Thursday, February 28, 2013

Creating a Maven Project Site

Maven is more than just a build tool. It's also a project management tool, dependency manager, a standards informer, and much more. As a project management tool, one of the things you can do is create a project website to track project dependencies, issues, manage the development team, etc.

If you don't have a Maven project created yet, go to my tutorial on Creating a Maven Project in Eclipse. Now that you have a project to work with, open a command prompt and go to the root directory of your project, where your pom.xml file is located. Type "mvn site:run" at the command prompt and hit Enter. You'll see a bunch of JARs downloaded and eventually a message saying "Starting Jetty on http://localhost:8080/".



You can now view your project website by going to http://localhost:8080/ in your web browser.


On the left menu, you'll see a number of links to information about your project, such as the project team, issue tracking, dependencies, etc. Outside of the things that Maven can infer from your pom.xml file, such as dependencies, there isn't much information on the site yet. All of this information must be configured in your pom.xml file. Things such as issue tracking are pulled from your bug tracking tool (e.g. Bugzilla). Static information, such as the name of the project, the project team, and the developers are all things that you can just add as static content to your pom.xml file. I won't go through all of these things in this tutorial, but to get an idea how it works, you can add the following information to your pom.xml file within your <project> element to list your development team.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
...
  <developers>
    <developer>
      <id>travis</id>
      <name>Travis</name>
      <email>travisdazell@gmail.com</email>
      <url>http://travisdazell.blogspot.com</url>
      <organization>Travis</organization>
      <organizationUrl>http://travisdazell.blogspot.com</organizationUrl>
      <roles>
        <role>Architect</role>
        <role>Developer</role>
      </roles>
      <timezone>-6</timezone>
      <properties>
      </properties>
    </developer>
  </developers>
...
</project>

Creating a Maven Project in Eclipse

This tutorial demonstrates how to create a simple Maven project in Eclipse. I assume you already have Eclipse installed. If you don't have Maven installed yet, read my article on installing Maven, before proceeding. This tutorial also uses Eclipse and the m2eclipse plugin. Read my tutorial on setting up Maven in Eclipse, if you haven't setup the m2eclipse plugin.

Now that you have Eclipse, Maven, and the m2eclipse plugin installed, let's get started creating a simple Maven project in Eclipse. Within Eclipse, click the "File -> New -> Project..." menu option. Expand the Maven option, select "Maven Project" and click "Next >".



You'll be presented with the "New Maven Project" dialog. Leave the default options as-is and click "Next >".


The next dialog shows you a list of the archetypes available. An archetype is a packaged application framework that is reusable and redistributable. For this example, choose the "maven-archetype-quickstart" archetype and click "Next >".


Next, we need to provide a group id and artifact id for our Maven project. The Group ID will be used to identify your project across all of your Maven projects. The Group ID naming scheme is the same as a package name in Java. Typically, your Group ID will be your domain name followed by the name of you project, and possibly followed by a sub-project name, if you plan to have multiple sub-projects or subsystems within your main application. The Artifact ID is the name you want for your JAR file when your project is packaged and deployed. For this tutorial, enter "org.examples.quickstart" for the Group Id and "quickstart-app" for the Artifact Id. Leave the version number as-is and enter whatever you'd like for your initial package name. I chose org.examples.quickstart for my project. Click the "Finish" button.


After clicking "Finish", you'll find your Maven project has been created and viewable in your Eclipse project explorer.



Setting Up Maven in Eclipse

The easiest way to use any tool (in my opinion) is to have it integrated with your IDE. This tutorial shows you how to setup Maven in your Eclipse IDE. If you don't already have Maven installed, read my tutorial on installing Maven before proceeding with this tutorial.

With Maven and Eclipse installed on your PC, the next step is to install the m2eclipse plugin. We'll use the Eclipse marketplace to install the m2eclipse plugin. From within Eclipse, click the Help -> Marketplace menu option. Next, click on the "Search" tab. Enter "maven integration" in the search field and click the "Go" button. When the search is complete, there will be a list of options. Scroll down to find "Maven Integration for Eclipse" and click the "Install" button.


The next dialog will ask you to confirm the installation. Select just the "m2e - Maven Integration for Eclipse" option and click the "Next >" button.




You'll be prompted to accept the terms of the license agreement. Accept the terms and click the "Finish" button. At the end of the installation process, you'll be prompted to restart Eclipse. Proceed by restarting your Eclipse IDE. To begin using m2eclipse, read my tutorial on creating a simple Maven project.

Installing Maven on Windows

It goes without saying, that Maven depends on Java. This tutorial assumes that you're running at least version 1.5, so before proceeding with this tutorial, run "java -version" from your command prompt to verify the version of your Java installation.

You can download Maven from the Apache website: http://maven.apache.org/download.cgi

Simply download the binary ZIP of the most recent version to your desktop.



After downloading the ZIP file, unzip it to whatever directory you like. I'll assume (for simplicity), that you've unzipped it to the root of your C: drive, resulting in C:\apache-maven-3.0.5\

After you've unzipped the binary, there are two environment variables you need to set. The first is M2_HOME. The other is to append the PATH variable.





To test your Maven installation, open a new command prompt and type the "mvn -v" command. If you've installed it correctly, you should get a description of your Maven installation. If you don't see the Maven description, double-check that your M2_HOME and PATH environment variables are set correctly.


Thursday, February 14, 2013

Introduction to Java Bytecode

When you compile your Java applications, the Java compiler compiles your code down to Java bytecode that is run on the Java Virtual Machine (JVM). Java bytecode is similar to assembly language in its set of instructions. This is a very simple introduction to Java bytecode, along with an example of how a simple arithmetic expression will look when compiled down to bytecode. To keep this tutorial at a "bytecode 101" level, let's only focus on the following four bytecode instructions.
  1. ldc  integer-constant
  2. This instruction will push a constant integer value onto the stack.
  3. imul
  4. This instruction will multiply the top two integer values on the stack, replacing the two integers with the result of the multiplication operation.
  5. iadd
  6. This instruction will add the top two integer values on the stack, replacing the two integers with the result of the addition operation.
  7. isub 
  8. This instruction will subtract the top two integer values on the stack, replacing the two integers with the result of the addition operation.

Take the following arithmetic expression:

7 + 8 * 9 - 24

This expression will be compiled down to the following Java bytecode:

ldc   24
ldc   7
ldc   8
ldc   9
imul
iadd
isub

As you can see, the JVM places information and data onto the stack and performs stack operations on the information. Obviously, this is not a complete coverage of bytecode operations and the JVM, but I hope this helps provide a basic understanding of what Java bytecode looks like, while also showing how it works. In future tutorials, I'll provide more complicated examples. In the meantime, if you're interested in more information about the JVM and some of the optimizations it performs, read my blog post on JVM optimizations.

Tree Grammars - Abstract Syntax Trees in ANTLR

This tutorial shows how to create an abstract syntax tree (AST) from a combined grammar in ANTLR. Let's start with a simple grammar in ANTLR.

grammar someGrammar;

options {
  language = Java;
  output = AST;
}


program

    :
        variableDeclaration+
    ;
   
variableDeclaration

    :
        type ID ';'

    ;

type
    :
        'int' | 'char'
    ;

ID
    :
        ('a'..'z' | 'A'..'Z' | '_') ('0'..'9' | 'a'..'z' | 'A'..'Z' | '_')*
    ;

WS : ('\t' | '\r\' | '\n' | ' ')+ { $channel = HIDDEN; } ;


The first step in creating an AST is to make sure that we have output = AST; defined in the options section of your grammar. However, this alone, will not create an AST. Alone, this will create a parse tree, which is essentially a general form of an AST. A parse tree will consist of everything in our source code, including noise that doesn't provide any semantic meaning to your language, such as semicolons that terminate statements. Moreover, the parse tree will be a flat tree (i.e. just a flat list of tree nodes). What we'd like, however, is a tree with depth and structure, like this.

 

To remove the noise that's inherent in a parse tree, give it depth/structure, and generate an AST that only includes the relevant information we need, the easiest method is to add rewrite rules to our grammar. These rewrite rules tell ANTLR how we want our AST to be structured. Here's how we would add a rewrite rule to the variableDeclaration definition to tell ANTLR to restructure variableDeclaration to a more useful AST node. Note that I've bolded the two changes that I made to the grammar. I added a tokens section, in addition to a rewrite rule to the variableDeclaration.

grammar someGrammar;

options {
  language = Java;
  output = AST;
}


tokens {
  VAR;
}

program

    :
        variableDeclaration+
    ;

variableDeclaration
    :
        type ID ';'

        ->   ^(VAR type ID)
    ;

type
    :
        'int' | 'char'
    ;

ID
    :
        ('a'..'z' | 'A'..'Z' | '_') ('0'..'9' | 'a'..'z' | 'A'..'Z' | '_')*
    ;

WS : ('\t' | '\r\' | '\n' | ' ')+ { $channel = HIDDEN; } ;



The tokens section defines an imaginary token/node called VAR that we later use in the rewrite rule. This was created because we want to know that it's a variable declaration; however, there is no VAR keyword in the grammar. The rewrite rule is defined as ^(VAR type ID). Note that the terminating semicolon isn't part of this rule. That's because we don't want the semicolon in our AST. The semicolon is important to the syntax of our language to terminate statements, but once we've parsed the input source code, we don't need the semicolon, because it doesn't provide any semantic meaning.

In summary, an AST is a tree with depth and structure, that eliminates anything from the input source code that doesn't provide semantic meaning. In addition, it can (as in this example), add imaginary tokens to help add necessary structure to support the semantic meaning of the source code. In a later tutorial, I'll demonstrate how to write a tree walker to process the resulting AST.