com.partnersoft.io
Interface IterableInput<T>

Type Parameters:
T - type of the object returned by the iteration methods
All Superinterfaces:
java.lang.Iterable<T>
All Known Subinterfaces:
RoverBytesInput
All Known Implementing Classes:
AbstractIterableInput, CsvDataRecordSource, DataRecordSource, DbfDataRecordSource, DemReader, DxfGroupReader, DxfStructureReader, LineTextDataRecordSource, RandomAccessDataRecordSource, RoverBytesSqlInput, SequentialIterableInput, SequentialRoverBytesInput, ShapefileDataRecordSource, ShapefileReader, ShpReader, SqlDataRecordSource, TextGrep, TextLineReader

public interface IterableInput<T>
extends java.lang.Iterable<T>

An Iterable data input source with open/close and exception handling behavior.

The enhanced for loop in Java and BeanShell makes the Iterable interface very attractive for purposes other than just rooting through Collections. For example, you could iterate through a text file, looping once per line, through an XML file, with different kinds of objects returned as you navigate its contents, or through an SQL query, looping once per record.

However, Iterable has some irksome limitations when applied to such tasks. For one thing, it has no mechanism for throwing exceptions other than as RuntimeExceptions or Errors. Also, it has no cleanup method. When you are reading from a file, database, socket, or similar resource, a close method is required, and exceptions are... not that exceptional.

Also, Iterators over such resources tend to be single pass, and generally have to do a one-item lookahead to determine whether one will be available to satisfy the hasNext() method. We can take advantage of this property to provide the ability to look at the cached value, without calling next(), and make decisions (perhaps passing the input or iterator to another method to continue reading). This provides two alternate methods of traversing the input; one using an Iterator, one using the lower-level (and potentially higher-performance) methods.

So, we have the IterableInput, which:

Implementations must adhere to the following rules:

The stages an IterableInput goes through are sufficiently complex that we've represented it via the inner class com.partnersoft.io.IterableInput.State. Each of the statuses is documented there.

While it is appealing to add in some sort of auto-closing behavior, this is problematic in practice. The client of an IterableInput may bail out of a pass at any time, so we can't guarantee that the entire contents are traversed and thus close when the end of input is reached. A bug in the implementation may allow a RuntimeException to escape, which would similarly break the iteration before the end is reached. Generally a close() method is called after other exception handling is done, in order to ensure that the underlying resources are in fact freed, and it seems best to require that close() always be called so that developers do it out of habit and don't have to consider whether or not it needs to be called.

The implementation of Iterable is trivial given the methods here; but consider using AbstractIterableInput which provides a good default implementation.

Subclasses may choose to provide direct access to internal buffers and fetched data (perhaps even through public-access variables). This should only be done if substantial performance benefits result. Any such facility must be clearly labelled as dangerous, since it will be easy for clients to corrupt the IterableInput's state. However, it may be necessary to reduce object allocation or provide low-level access, and implementations must often scale to allow millions of objects per fetch pass.

Here are some examples of casual use in a BeanShell script.

  source = new CSVDataRecordSource("data/example.csv");
  for (record : source) 
      log.info(record.get("someField"));
  source.close();
 

Here's a more careful example:

  source = new CSVDataRecordSource("data/example.csv");
  for (record : source) 
      log.info(record.get("someField"));
  source.close();
  if (source.getException() != null)
      throw source.getException();
 

And here's one for the performance hounds, in Java. Note use of public variables to eliminate the creation of new Naming objects for each record. We also log the exception instead of re-throwing it.

 CSVDataRecordSource source = new CSVDataRecordSource("data/example.csv");
 while (source.fetch())
        log.info(source.currentValues[3]);
 source.close();
 if (source.getException() != null)
        log.error("Error reading data/example.csv", source.getException());
 

Copyright 2006 Partner Software, Inc.

Version:
$Id: IterableInput.java 2328 2010-01-06 15:38:22Z paul $
Author:
Paul Reavis

Nested Class Summary
static class IterableInput.Status
          Representation of the internal status of an IterableInput, as returned by getStatus().
 
Method Summary
 void close()
          Performs any housekeeping measures required to release the underlying input, closing files or connections or what-have-you).
 boolean fetch()
          Attempts to fetch the next item from the input source.
 java.lang.Exception getException()
          Returns the last exception encountered during open(), fetch(), or close() (or other methods that call these, like the Iterator implementation).
 T getFetched()
          Returns the last fetched item (if immutable) or a copy of it (if mutable).
 IterableInput.Status getStatus()
          Returns the current status of this input.
 boolean isFetchValid()
          Returns true if the last call to fetch() was successful and got a valid result.
 java.util.Iterator<T> iterator()
          This is the same as the super-interface's method, but it has the side effect of opening (via open() the IterableInput if it is currently closed.
 void open()
          Initialize the input, opening the underlying file or other resource.
 

Method Detail

getStatus

IterableInput.Status getStatus()
Returns the current status of this input. This status determines which methods will or will not throw IllegalStateException, as documented by the methods themselves.


open

void open()
Initialize the input, opening the underlying file or other resource. Note that this does not start the actual fetch, and getFetched() will throw IllegalStateException until you do a fetch().

This method can only be called if the current status is CLOSED. Calling it in any other state will result in an IllegalStateException.

If the input is opened successfully, the status is changed to OPENED.

If an exception occurs during the open, the exception is made available via getException(), and the status is changed to END_OF_INPUT. This means close() should still be called.

Throws:
java.lang.IllegalStateException - if called during an inappropriate state.

fetch

boolean fetch()
Attempts to fetch the next item from the input source. If one is available, returns true; otherwise the end of input has been reached by going through all the input or due to an exception.

This method may only be called when the status is CLOSED, OPENED or FETCHING. Calling it in any other status will result in an IllegalStateException.

If the current status is CLOSED, calling this method will automatically call open(), then it will start the fetch.

If the current status is OPENED, calling this method will automatically start the fetch.

If the fetch is successful, this method will return true, and isFetchValid() will return true afterward. The item fetched is made available via getFetched(). The status will then be FETCHING.

If the fetch fails, either due to running out of input or an exception, this method will return false, as will isFetchValid() afterward. The status will then be END_OF_INPUT. Any exception occurring will be available via getException().

Does not throw any checked or common unchecked exceptions (e.g. NullPointerException). May throw Errors or other low-level unchecked exceptions.

Returns:
true if fetch was successful (same as returned afterward by isFetchValid()).
Throws:
java.lang.IllegalStateException - if called during an inappropriate state.

close

void close()
Performs any housekeeping measures required to release the underlying input, closing files or connections or what-have-you).

This method may be called from any status, since client code may abandon a fetch before even opening the input, and since we want to encourage client code to call close() defensively to prevent memory or file handle leaks. Calling it when the status is CLOSED simply does nothing.

Status after calling is always CLOSED.

No exceptions are thrown; if any occur they are made available via getException().

Once the IterableInput has been closed, it may be re-used for another fetch pass.

You should ALWAYS call close() if there is ANY chance this input was opened. Not doing so risks generating memory leaks, locked files, and other resource-exhaustion bugs because the input is unable to release them.


isFetchValid

boolean isFetchValid()
Returns true if the last call to fetch() was successful and got a valid result. If true, getFetched() will return the last fetched item (or a copy; see getFetched()).

This method may only be called when the input is in the FETCHING or END_OF_INPUT statuses. Calling it from any other status will result in an IllegalStateException. In fact, isFetchValid() is always true when the status is FETCHING and always false if it is END_OF_INPUT.

Returns:
true if last fetch was successful
Throws:
java.lang.IllegalStateException - if called during an inappropriate state.

getFetched

T getFetched()
Returns the last fetched item (if immutable) or a copy of it (if mutable). Note that this means multiple calls to getFetched() may return different objects for the same fetch, but that the return value is always safe to share since modifying it won't affect the IterableInput's internal state.

The last fetched item is that loaded by the most recent call to fetch().

Implementations may prefer to instantiate this lazily; in other words, the underlying IterableInput may know that it has a valid fetch (e.g. loaded the characters for a String into an internal buffer) but may not have created this object yet (e.g. created a new String with the characters from the buffer). This allows skipping items without instantiating them, or accessing the underlying state without instantiating an actual item.

Implementations may also wish to provide unsafe access to the fetched item via other methods or public variables. The requirement that a safe copy be returned does prevent object reuse or other optimizations with this method, so feel free to use other means to provide a high-performance alternative.

This method may only be called when the input is in the FETCHING status. Calling it from any other status will result in an IllegalStateException.

Returns:
object representing the most recent fetch
Throws:
java.lang.IllegalStateException - if called during an inappropriate state.

getException

java.lang.Exception getException()
Returns the last exception encountered during open(), fetch(), or close() (or other methods that call these, like the Iterator implementation). Returns null if no exception has occurred.

The exception is cleared by other calls to these methods; thus an exception that occurred during a fetch is cleared by closing and reopening the input.

Returns:
last exception encountered, if any

iterator

java.util.Iterator<T> iterator()
This is the same as the super-interface's method, but it has the side effect of opening (via open() the IterableInput if it is currently closed. This makes it just a bit more convenient to use in scripts by eliminating the open() step.

Another unusual behavior is that the Iterator returned will start at the current position in the input - in other words, it does NOT necessarily start at the beginning of input, and multiple calls to this method will return Iterators that are identical in behavior and state (and may even be the same object). The only way to restart at the beginning is to call close().

Remember, after looping through this Iterator's contents, to call close() and free any underlying resources. You should ALWAYS call close, or risk having hanging file handles or other resources.

Specified by:
iterator in interface java.lang.Iterable<T>