com.partnersoft.io
Class DataRecordSource

java.lang.Object
  extended by com.partnersoft.io.AbstractIterableInput<Naming<java.lang.Object>>
      extended by com.partnersoft.io.DataRecordSource
All Implemented Interfaces:
Coggable, IterableInput<Naming<java.lang.Object>>, java.lang.Iterable<Naming<java.lang.Object>>
Direct Known Subclasses:
DbfDataRecordSource, LineTextDataRecordSource, RandomAccessDataRecordSource, ShapefileDataRecordSource, SqlDataRecordSource

public abstract class DataRecordSource
extends AbstractIterableInput<Naming<java.lang.Object>>
implements Coggable

A source for data records, which are retrieved sequentially.

"Data record" is used here in the classic, tabular sense, of a structure with predefined field names, and record-specified field values. "Record" is often used synonymously with "row"; "field" is often used synonymously with "column".

Many, many data sources follow this convention: spreadsheets, SQL queries, dBase files, etc. Often, the number of records cannot efficiently be determined ahead of time - for example, determining the number of lines in a CSV file requires prescanning the file, and determining the number of rows in an SQL query's results requires some kind of count() query. So, the number of records is not provided as part of this framework.

We have implemented the standard Iterable interface, which is very nice for scripts. However, Iterable has some irksome limitations. For one thing, it has no mechanism for throwing exceptions other than as RuntimeExceptions or Errors. Also, it has no cleanup method. Since we are generally reading from a file, database, socket, or similar resource, a close method is required. This is handled via the IterableInput interface implementation.

Since often the DataRecordSource is loaded from configuration and stored in memory, we need to be able to use it more than once. However, reading in records does tie up some kind of external resource, which is usually not sharable. Thus, you can re-use a DataRecordSource, but only after calling AbstractIterableInput.close() on a previous use, and you cannot use it for more than one iteration at a time. The copy() method provides a means of copying the source for use by another thread or object.

SO, this class provides the close method, a place for exceptions to go, and, while we're at it, direct access to the field name and content arrays. This direct access allows the efficiency-minded to avoid the performance penalty of creating a Naming object for each record. Public variables are, in general, a no-no and against standard Partner coding practice. However, having made some of the variables public, it makes little sense to hide others.

Here's a standard BeanShell usage example:

 source = new CsvDataRecordSource("data/example.csv");
 for (record : source) 
     log.info(record.get("someField"));
 source.close();
 

Here's a more careful example:

 source = new CsvDataRecordSource("data/example.csv");
 for (record : source) 
     log.info(record.get("someField"));
 source.close();
 if (source.fetchException != null)
     throw source.fetchException;
 

And here's one for the performance hounds:

 CsvDataRecordSource source = new CsvDataRecordSource("data/example.csv");
 source.open();
 while (source.currentValues != null) {
        log.info(source.currentValues[3]);
        source.fetch();
 }
 source.close();
 if (source.exception != null)
        throw source.exception;
 

Copyright 2003-2007 Partner Software, Inc.

Version:
$Id: DataRecordSource.java 2474 2010-03-13 14:28:43Z paul $
Author:
Paul Reavis

Nested Class Summary
 
Nested classes/interfaces inherited from interface com.partnersoft.io.IterableInput
IterableInput.Status
 
Field Summary
 java.lang.Object[] currentValues
          Values for the current record's fields.
 java.lang.String[] fieldNames
          Names of the record fields.
 boolean verbose
          If true, prints debugging information to the log.
 
Constructor Summary
protected DataRecordSource()
           
  DataRecordSource(Cog state)
           
  DataRecordSource(java.util.List<java.lang.String> fieldNames)
           
  DataRecordSource(java.lang.String... fieldNames)
           
 
Method Summary
abstract  DataRecordSource copy()
          Makes a copy of this DataRecordSource, that you can iterate over separately.
 boolean fetch()
          Attempts to fetch the next item from the input source.
 java.lang.Object[] getCurrentValues()
          Gets the field values array for the current record.
 Naming<java.lang.Object> getFetched()
          Returns the last fetched item (if immutable) or a copy of it (if mutable).
 java.lang.String[] getFieldNames()
          Gets the field names array.
 boolean isVerbose()
          Accessor for the verbose flag.
 void setVerbose(boolean tizit)
          Sets the verbose flag.
 Cog toCog()
          Returns the complete internal state of this object in the form of a Cog.
 
Methods inherited from class com.partnersoft.io.AbstractIterableInput
close, closeImp, fetchImp, getException, getStatus, isFetchValid, iterator, open, openImp
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

verbose

public boolean verbose
If true, prints debugging information to the log. Feel free to toggle it on and off as needed during a fetch cycle.


fieldNames

public java.lang.String[] fieldNames
Names of the record fields. These are the same for all records fetched. This variable is set by the AbstractIterableInput.open() method.

This variable is public for performance reasons; however only subclasses should change it or its contents.


currentValues

public java.lang.Object[] currentValues
Values for the current record's fields. These are to match the fieldNames array. This variable will be null if the current record is not valid; e.g. after end of input, or if AbstractIterableInput.open() hasn't been called.

This variable is public for performance reasons; however only subclasses should change it or its contents.

Constructor Detail

DataRecordSource

protected DataRecordSource()

DataRecordSource

public DataRecordSource(java.util.List<java.lang.String> fieldNames)

DataRecordSource

public DataRecordSource(java.lang.String... fieldNames)

DataRecordSource

public DataRecordSource(Cog state)
Method Detail

toCog

public Cog toCog()
Description copied from interface: Coggable
Returns the complete internal state of this object in the form of a Cog.

Specified by:
toCog in interface Coggable
Returns:
Cog representing the internal state of this object

getFetched

public Naming<java.lang.Object> getFetched()
Description copied from interface: IterableInput
Returns the last fetched item (if immutable) or a copy of it (if mutable). Note that this means multiple calls to getFetched() may return different objects for the same fetch, but that the return value is always safe to share since modifying it won't affect the IterableInput's internal state.

The last fetched item is that loaded by the most recent call to fetch().

Implementations may prefer to instantiate this lazily; in other words, the underlying IterableInput may know that it has a valid fetch (e.g. loaded the characters for a String into an internal buffer) but may not have created this object yet (e.g. created a new String with the characters from the buffer). This allows skipping items without instantiating them, or accessing the underlying state without instantiating an actual item.

Implementations may also wish to provide unsafe access to the fetched item via other methods or public variables. The requirement that a safe copy be returned does prevent object reuse or other optimizations with this method, so feel free to use other means to provide a high-performance alternative.

This method may only be called when the input is in the FETCHING status. Calling it from any other status will result in an IllegalStateException.

Specified by:
getFetched in interface IterableInput<Naming<java.lang.Object>>
Returns:
object representing the most recent fetch

copy

public abstract DataRecordSource copy()
Makes a copy of this DataRecordSource, that you can iterate over separately.

Returns:
copy of this source

isVerbose

public boolean isVerbose()
Accessor for the verbose flag.

Returns:
true if verbosity is enabled.

setVerbose

public void setVerbose(boolean tizit)
Sets the verbose flag.

Parameters:
tizit - is true if you want verbosity enabled.

getFieldNames

public java.lang.String[] getFieldNames()
Gets the field names array. The names are in the same order as the corresponding values in getCurrentValues().

Returns:
array of String field names

getCurrentValues

public java.lang.Object[] getCurrentValues()
Gets the field values array for the current record. Values are in the same order as the field names returned from getFieldNames().

Returns:
array of field values for current record, null if no current record

fetch

public boolean fetch()
Description copied from interface: IterableInput
Attempts to fetch the next item from the input source. If one is available, returns true; otherwise the end of input has been reached by going through all the input or due to an exception.

This method may only be called when the status is CLOSED, OPENED or FETCHING. Calling it in any other status will result in an IllegalStateException.

If the current status is CLOSED, calling this method will automatically call open(), then it will start the fetch.

If the current status is OPENED, calling this method will automatically start the fetch.

If the fetch is successful, this method will return true, and isFetchValid() will return true afterward. The item fetched is made available via IterableInput.getFetched(). The status will then be FETCHING.

If the fetch fails, either due to running out of input or an exception, this method will return false, as will isFetchValid() afterward. The status will then be END_OF_INPUT. Any exception occurring will be available via getException().

Does not throw any checked or common unchecked exceptions (e.g. NullPointerException). May throw Errors or other low-level unchecked exceptions.

Specified by:
fetch in interface IterableInput<Naming<java.lang.Object>>
Overrides:
fetch in class AbstractIterableInput<Naming<java.lang.Object>>
Returns:
true if fetch was successful (same as returned afterward by isFetchValid()).