Formatting & TokenizingJ8 Home « Formatting & Tokenizing

In our final lesson of the API Contents section we look at formatting and tokenizing our data. We begin the lesson by looking at formatting our output and Java offers us different options for doing this. In this lesson we will look at formatting data using the java.util.Formatter class as well as using the static format() method of the java.util.String class. We finish of our look at formatting output by looking at the printf() method contained in the java.io.PrintStream and java.io.PrintWriter classes.

We finish off our tour of the Java API by looking at tokenizing our data. For this we will first look at the split() method of the String class which uses a regular expression delimiter to tokenize our data. After this we look at the java.io.Scanner class; objects of this class allow us to break input into tokens using a delimiter pattern which defaults to whitespace or can be set using a regular expression.

Formatting Overview Top

All the methods we will look at here which produce formatted output require a format string and an argument list. The formatted output is a String object which is derived from the formatting string that may contain fixed text as well as one or more embedded format specifiers, that are then applied to the argument list which can be set to null.

Format specifiers which have the argument list set to null have the following syntax:


package info.java8;
// Format specifier syntax with null argument list 
%[flags][width]conversion
  • The optional flags is a set of characters that modify the output format where the set of valid flags depends on the conversion.
  • The optional width is a non-negative decimal integer indicating the minimum number of characters to be written to the output.
  • The required conversion is a character indicating content to be inserted in the output.

Format specifiers used to represent date and time types have the following syntax:


package info.java8;
// Format specifier syntax with argument list for date and time types
%[argument_index$][flags][width]conversion
  • The optional argument_index is a decimal integer indicating the position of the argument in the argument list. The first argument is referenced by "1$", the second by "2$" and so on.
  • The optional flags and width are defined as above.
  • With dates the required conversion is a two character sequence where the first character is 't' or 'T' and the second character indicates the format to be used.

Format specifiers for general, character, and numeric types have the following syntax:


package info.java8;
// Format specifier syntax with argument list for general, character, and numeric types
%[argument_index$][flags][width][.precision]conversion
  • The optional argument_index, flags and width are defined as above.
  • The optional precision is a non-negative decimal integer generally used to restrict the number of characters but specific behavior depends on the conversion.
  • The required conversion is a character indicating how the argument should be formatted, where the set of valid conversions for a given argument depend on the argument's data type.

The table below lists the conversions used in this lesson with their descriptions. You can find the complete list of flags and conversions in the API documentation for the java.util.Formatter class.

Conversion Symbols Description
aFormats boolean true or false
cFormats as a Unicode character
dFormats as a decimal integer
fFormats the argument as a floating point decimal.
oFormats as an octal integer
sFormats the argument as a string.
xFormats as a hexidecimal integer
ALocale-specific full name of day of the week, "Monday", "Tuesday"....
BLocale-specific full month name, "January", "February"....
YYear in format YYYY with leading zeros for years less than 1000

The java.util.Formatter Class Top

The java.util.Formatter class allows us to format output through a wide variety of constructors. The API documentation is extremely detailed and we are just showing an example so you get the idea:


package info.java8;
/*
  java.util.Formatter Example
*/
import java.util.Date; // Import the Date class from java.util package
import java.util.Formatter; // Import the Formatter class from java.util package
import java.util.Locale; // Import the Locale class from java.util package

class TestFormatter {
    public static void main(String[] args) {
        // Some types for formatting
        Date a = new Date(); 
        double b = 123456789.345678; 
        // Create appendable StringBuilder object to output to
        StringBuilder sb = new StringBuilder();
        // Send all output to Appendable object sb using UK locale
        Formatter f = new Formatter(sb, Locale.UK);
        // Output to sb and display on console.
        f.format("Formatted output: %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA | %2$,.3f", a, b);
        // Rearrange output using indices.
        f.format("...Rearranged output: %2$,.3f | %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA", a, b);
        System.out.println(sb);
        // Create appendable StringBuilder object to output to
        StringBuilder sb2 = new StringBuilder();
        // Send all output to Appendable object sb using GERMANY locale
        Formatter f2 = new Formatter(sb2, Locale.GERMANY);
        // Output to sb2 and display on console.
        f2.format("Formatted output: %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA | %2$,.3f", a, b);
        // Rearrange output using indices.
        f2.format("...Rearranged output: %2$,.3f | %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA", a, b);
        System.out.println(sb2);
    }
}

Save, compile and run the TestFormatter test class in directory   c:\_APIContents2 in the usual way.

run test formatter

The above screenshot shows the output of compiling and running the TestFormatter class. First off we create a Date object and double to be formatted for output and a StringBuilder object to output our formatted data to. We then pass StringBuilder object and the UK locale as arguments to our Formatter constructor. We then format some output using the format() method. Lets go through the format specifiers used:

Format Specifier Description
%1$tAFor the first argument index. 1$
Use prefix so we know this is a date/time conversion. t
Give us the locale-specific full name of day of the week conversion. A
%1$tBFor the first argument index. 1$
Use prefix so we know this is a date/time conversion. t
Give us the locale-specific full month name conversion. B
%1$tYFor the first argument index. 1$
Use prefix so we know this is a date/time conversion. t
Give us the locale-specific year conversion. Y
%2$,.3fFor the second argument index. 2$
Use flag so the result will include locale-specific grouping separators. ,
Use the decimal separator. .
Use 3 decimal places of precision. 3
Give us a floating-point conversion. f

Using the format specifers described above we display the dates in different orders, rearrange the output using the argument indices and also output the display for the GERMANY locale.

The String.format() Method Top

The String.format() static method allows us to format an output string and is overloaded to accept a format string and argument list or a locale, format string and argument list. In our example we will use the second overloaded method which accepts a locale, format string and argument list:


package info.java8;
/*
  String.format() Example
*/
import java.util.Locale; // Import the Locale class from java.util package

class TestStringFormat {
    public static void main(String[] args) {
        // Some types for formatting
        int a = 123456789; 
        boolean b = true; 
        char c = 65;
        // Create a formatted String object using UK locale
        String s = String.format(Locale.UK, "UK Dec: %1$,d %1$s Bool: %2b Char: %3$c", a, b, c);
        System.out.println(s);
        // Create a formatted String object using GERMANY locale
        String s2 = String.format(Locale.GERMANY, "GER Dec: %1$,d %1$s Bool: %2b Char: %3$c", a, b, c);
        System.out.println(s2);
    }
}

Save, compile and run the TestStringFormat test class in directory   c:\_APIContents2 in the usual way.

run test string format

The above screenshot shows the output of compiling and running the TestStringFormat class. First off we create some primitives with values, then output these to a String object using the format() method with a UK locale, before outputting the results. Lets go through the format specifiers used:

Format Specifier Description
%1$,dFor the first argument index. 1$
Use flag so the result will include locale-specific grouping separators. ,
Use a decimal integer conversion. d
%1sFor the first argument index. 1$
Use a string conversion. s
%2bFor the second argument index. 2$
Use a boolean conversion. b
%3cFor the third argument index. 3$
Use a character conversion. c

Using the format specifers described above we display the formatted primitives and also output the display for the GERMANY locale.

The printf() Method Top

The printf() method allows us to format output to a java.io.PrintStream or java.io.PrintWriter stream. These classes also contains a method called format() which produces the same results, so whatever you read here for the printf() method, can also be applied to the format() method. For our example we will use the printf() method from the PrintStream class. If you remember from the Java I/O Overview lesson System.out is of type PrintStream and so will be used for convenience:


package info.java8;
/*
  printf() Example
*/
import java.io.PrintStream; // Import the Printstream class from java.io package

class TestStringf {
    public static void main(String[] args) {
        // Some types for formatting
        int a = 1234; 
        // Send formatted output to Printstream
        System.out.printf("Dec: %1$,d  Octal: %1$o  Hex: %1$x", a);
    }
}

Save, compile and run the TestStringf test class in directory   c:\_APIContents2 in the usual way.

run test stringf()

The above screenshot shows the output of compiling and running the TestStringf class. First off we create an integer primitive with value, then output this to a console using the format() method. There is also a method signature in which you can also pass a locale to the method. Lets go through the format specifiers used:

Format Specifier Description
%1$,dFor the first argument index. 1$
Use flag so the result will include locale-specific grouping separators. ,
Use a decimal integer conversion. d
%1oFor the first argument index. 1$
Use an octal conversion. o
%1xFor the first argument index. 1$
Use a hexidecimal conversion. x

Tokenizing Our Data Top

In this part of the lesson we look at splitting our data into separate tokens. For this we will first look at the split() method of the String class which uses a regular expression delimiter to tokenize our data. After this we look at the java.io.Scanner class; objects of this class allow us to break input into tokens using a delimiter pattern which defaults to whitespace or can be set using a regular expression.

The split() Method Top

The split() method will split a string around matches of the given regular expression, returning the results in a String array. The split() method is overloaded and will accept a regex string and a limit argument of type int denoting the number of times the pattern is to be applied. The second form just requires a regex string and in this form it is the same as invoking the split() method with the limit set to zero. An explanation of how values passed to the limit parameter affect the number of times the pattern is to be applied follows:

  • limit < 0
    Pattern will be applied as many times as possible, output array can have any length.
  • limit = 0
    Pattern will be applied as many times as possible, output array can have any length and trailing empty strings are discarded.
  • limit > 0
    Pattern will be applied at most limit - 1 times, output array length maximum <= limit and output array last entry will contain all input beyond last matched delimiter.

For our example we use the split() method to delimit our str1 String object using regular expressions for whitespace and the word "and".


package info.java8;
/*
  The split() method
*/
public class TestSplit {
    public static void main(String[] args) {
        String str1 = "1 and 2 and 3 and 4";
        // Whitespace delimiter
        String[] sOut = str1.split("\\s", 0);
        for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
        System.out.println(" ");
        sOut = str1.split("\\s", -1);
        for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
        System.out.println(" ");
        sOut = str1.split("\\s", 3);
        for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
        System.out.println(" ");
        // "and" delimiter
        sOut = str1.split("and", 0);
        for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
        System.out.println(" ");
        sOut = str1.split("and", -3);
        for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
        System.out.println(" ");
        sOut = str1.split("and", 3);
        for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
    }
}

test split() method

The screenshot above shows the results of running the TestSplit class. We output using different limits to show how this parameter affect the output String array.

The java.util.Scanner Class Top

The java.util.Scanner class is a simple text scanner which allows us to parse primitive data types and strings using regular expressions. Objects of this class allow us to break input into tokens using a delimiter pattern. The resulting tokens can then be converted into values of different types using one of the nexttype methods available in the java.util.Scanner class. In our example we show how to use the Scanner class with the default delimiter of whitespace and also with a delimiter created using a regular expression.


package info.java8;
/*
  Scanner Examples
*/
import java.util.Scanner; // Import the Scanner class from java.util package

class TestScanner {
    public static void main(String[] args) {
        String input1 = "1 2.0 3.1 4";
        String input2 = "1 and 2.0 and 3.1 and 4";
        // Using default delimiter (whitespace)
        Scanner s1 = new Scanner(input1);
        System.out.println(s1.nextInt());
        System.out.println(s1.nextFloat());
        System.out.println(s1.nextFloat());
        System.out.println(s1.nextInt());
        s1.close(); 
        // Using ' and ' as delimiter
        Scanner s2 = new Scanner(input2).useDelimiter("\\s*and\\s*");
        System.out.println(s2.nextInt());
        System.out.println(s2.nextFloat());
        System.out.println(s2.nextFloat());
        System.out.println(s2.nextInt());
        s2.close(); 
    }
}

Save, compile and run the TestScanner test class in directory   c:\_APIContents2 in the usual way.

run test scanner

The above screenshot shows the output of compiling and running the TestScanner class. The examples use a default and custom delimiter to extract tokens to our Scanner object.. We then use the nextInt() and nextFloat() methods to extract the required tokens and print these off to the console.

There are other ways to use the java.util.Scanner class, such as using the match() method, which returns the match result of the last scanning operation performed by this Scanner object. I will leave it as an exercise for you to investigate this method of the Scanner class.

Lesson 6 Complete

In our final look at the Java API we examined formatting and tokenizing our data.

What's Next?

We start a new section on Java I/O as we start three lessons on the subject with an overview of the various streams available in Java.