NetCDF-Java (version 2.2) User’s Manual

 

John Caron, August 18, 2004

 

1. Introduction.. 2

2. Data Layer: NetcdfFile. 3

Data Layer Object Model 3

Object Names. 4

Array section syntax. 5

The NetCDF API 6

2.1 ucar.nc2.Dimension. 6

2.2 ucar.nc2.DataType. 6

2.3 ucar.nc2.Attribute. 6

2.4 ucar.nc2.Group. 7

2.5 ucar.nc2.Variable. 7

2.6 ucar.nc2.Structure. 9

2.7 ucar.nc2.NetcdfFile. 10

2.8 ucar.nc2.NetcdfFileWriteable. 11

3. Semantic Layer: NetcdfDataset.. 12

3.1 Standard Attributes. 12

3.2 Coordinate Systems. 13

3.3 NetcdfDataset API 15

4 Scientific Data Types. 17

4.1 GeoGrids. 17

4.2 PointData. 19

5. NetCDF Markup Language (NcML) 20

5.1 NcML Coordinate Systems. 21

5.2 NcML Dataset 21

6. Multidimensional Arrays. 22

6.1 ucar.ma2.Array: in-memory multidimensional arrays. 22

6.2 ucar.ma2.Index. 23

6.3 ucar.ma2.IndexIterator. 24

6.4 Type and rank specific Arrays. 24

7 Implementation.. 27

7.1 Remote access to netCDF files through an HTTP server. 27

7.2 Reading HDF5 Files. 27

7.3. NetCDF – OpenDAP Interface. 29

IOServiceProvider. 31

Read data from a Variable. 32

Read data sections from a Variable. 32

Read Variable data into a Java array. 32

Create a netCDF File. 33

Print data from a netCDF dataset 35

Appendix B: Example OpenDAP to netCDF Conversion.. 36

Example: Scalars and Arrays of Primitives. 36

Example: Grids and Structures. 36

Example: Arrays of Structures. 38

Example: Sequences. 39

Appendix C.. 39

ISO 8601 Date Types. 39

References. 41


1. Introduction

 

This is user documentation for version 2.2 of the ucar.nc2, ucar.ma2 and related Java packages, also known as "NetCDF-Java version 2" and "MultiArray version 2" packages.

 

NetCDF-Java version 2.2 provides an Application Programmer Interface (API) for a scientific data model called the Common Data Model (CDM).  The CDM is the result of merging the NetCDF (version 3), OpenDAP (version 2), and the HDF5 (version 1.6) data models. NetCDF and HDF5 define standard file formats, while OpenDAP is a network data access protocol. These handle the details of data access, and (for NetCDF and HDF5) file layout and data writing.

 

The Common Data Model has several layers, which build on top of each other to add successively richer semantics:

  1. The data layer, also know as the syntactic layer, handles data reading and writing. The base data type for this layer is the multidimensional array.
  2. The standard attribute layer knows about some of the meanings that humans associate with scientific data: units, missing data values, coordinate systems, data topology, etc. This layer provides standard methods for common tasks in order to make the application programmer's task easier.
  3. The coordinate system layer identifies the coordinates of the data arrays. Coordinates are a completely general concept for scientific data; we also identify specialized georeferencing coordinate systems, which are important to the Earth Science community.
  4. The scientific data type layer identifies specific types of data, such as grids, images, and point data, and adds specialized methods for that kind of data.

 

NetCDF-Java version 2.2 currently provides read/write access to NetCDF-3 files, read access to most HDF5 files, and read access to OpenDAP datasets. It will also provide read/write access to NetCDF-4 files as that file format becomes available. We are also working on read access to GRIB files, as well as integration with THREDDS dataset annotation.

 

All of these packages are freely available and the source code is released under the Lesser GNU Public License [LGPL]. They require Java version 1.4 or above.


2. Data Layer: NetcdfFile

Data Layer Object Model

It is useful to understand the Common Data Model as an abstract data model (a.k.a. an object model) independent of its APIs, which are language dependent, or its file format, which is really an implementation detail. Here is the object model for the CDM data layer:

 

 

 

 

Fig 1.

NetCDF-Java 2.2 Abstract Data Model in UML

 

A Dataset is a generalization of a netCDF file. It may be a netCDF file, an HDF5 file, an OpenDAP dataset, a collection of files, or anything else which can be accessed through the netCDF API.

 

A Variable is a container for data. It has a dataType, a set of Dimensions that define its array shape, and optionally a set of Attributes.

 

A Group is a logical collection of Variables. The Groups in a Dataset form a hierarchical tree, like directories on a disk. A Group has a name and optionally a set of Attributes. There is always at least one Group in a dataset, the root Group, whose name is the empty string.

 

A Dimension has an index length, and is used to define the array shape of a Variable. It may be shared among Variables, which provides a simple yet powerful way of associating Variables. When a Dimension is shared, it has a unique name within the Dataset. It may have a coordinate Variable, which gives each index a coordinate value.

 

An Attribute has a name and a value, used for associating arbitrary metadata with a Variable or a Group. The value can be a one dimensional array of Strings or numeric values.

 

A Structure is a type of Variable that contains other Variables, analogous to a struct in C. In general, a Structure's data are physically stored close together on disk, so that it is efficient to retrieve all of the data in a Structure at the same time.

 

A Sequence is a one dimensional Variable whose length is not known until you actually read the data. All other Variable types know what their array lengths are without having to read the data. You can have sequences of sequences, which is equivalent to ragged arrays.

 

An Array contains the actual data for a Variable, read from the disk or network, and stored in memory. You get an Array from a Variable by calling read() or its variants.

 

StructureData contains the actual data for a Structure, like an Array does for a Variable.

 

A String is a variable length array of UTF-8 encoded Unicode characters.

 

The primitive types are boolean, byte, char, short, int, long, float and double, same as in the Java language. Together with String and Structure, these correspond to a Variable's DataType. (LOOK: what about signed / unsigned?)

Object Names

 

Groups, Variables, Dimensions and Attributes can be located by name. The full name of a Group or Variable includes the parent group names separated by a "/", as in file names in directories, e.g. "group1/group2/varName". (Note that the root group name is an empty name, rather than "/". This makes objects in the root group look like they are directly contained in the Dataset, for backwards compatibility). When a Variable is a member of a Structure, a "." is used to separate the structure names, e.g. "group1/group2/struct1.struct2.varName".  These rules imply that a Variable's short name is unique within its containing Group, or within its containing Structure.

 

An Attribute's short name is unique within the Group or Variable it belongs to. Its full name uses an "@" as separator e.g. "group1/varName@attName". 

 

[ Dimension short names are unique within the Group it belongs to. Its full name uses the usual "/" as group separator e.g. "group1/group2/dimName".  ]

 

-OR- (hard to do both)

 

[ Dimensions are scoped by the Group they belong to. When a Variable refers to a Dimension by name, the Dimension is looked for in the Variable's parent group, and if not found, in its parent, etc. ]

 

The character set for Object names is restricted. A name must start with a letter or underscore (however, starting with an underscore is reserved for system defined objects). The rest of the name must be an alphanumeric character or dash '-' or underscore '_'. [what about escaping other characters? it does make life harder ]

Array section syntax

 

Array sections can be specified with Fortran 90 array section syntax, using zero-based indexing. For example, varName(12:22,0:100:2,:,17) specifies an array section for a four dimensional variable. The first dimension includes all the elements from 12 to 22 inclusive, the second dimension includes the elements from 0 to 100 inclusive with a stride of 2, the third includes all the elements, and the fourth includes just the 18th element.  For structures, you can specify nested selectors, e.g. record(12).wind(1:20,:,3) does a selection on the wind member variable on the record structure at index 12. If you don’t specify a section, it means read the entire variable, e.g. record.wind indicates all the wind variables in all the record structures.

 

Formally:

 

variableSection := selector | selector '.' selector

selector := varName ['(' sectionSpec ')']

varName := STRING

 

sectionSpec:= dim | dim ',' sectionSpec

dim := ':' | slice | start ':' end | start ':' end ':' stride

slice := INTEGER

start := INTEGER

stride := INTEGER

end := INTEGER

 

   where:

     varName = valid variable name

':' = all indices

slice = one index = to given value

start:end = all indices from start to end inclusive

     start:end:stride = all indices from start to end inclusive with given stride

 

This notation is used in the NetcdfFile.read( String variableSection, boolean flatten), Variable.read( String sectionSpec), and NCdump methods.

 

 


The NetCDF API

 

The following is an overview of the important public interfaces of the ucar.nc2 classes. Consult the javadoc for complete details.

 2.1 ucar.nc2.Dimension

 

 

A Dimension object specifies the length of an array dimension. If the Dimension is shared, then it has a name that is unique within its Group. Otherwise it is an anonymous dimension that is local to the Variable that uses it, and it doesn’t have a name. If the Dimension is unlimited, then the length can increase; otherwise, it is immutable. A sequence Variable can have a Dimension that is unknown, whose length varies for each Variable that uses it, which can only be determined by actually reading the Variable's data.

 

 public class Dimension {

  public String getName();

  public int getLength();

  public boolean isUnlimited();

  public boolean isUnknown();

  public boolean isShared();

  public Variable getCoordinateVariable(); // null if none

 }

 

The method getCoordinateVariable() returns the associated coordinate variable or null if none exists. A coordinate variable is defined as a Variable with the same name as the Dimension, whose single dimension is the Dimension, for example: float lat(lat);

 

When a variable is displayed in NCdump or equivalent program, its shape is indicated by its dimensions, e.g. varName(time=12, lat=114, lon=288). When a variable uses non-shared (anonymous) dimensions, just the dimension lengths are shown, e.g.  varName( 21, 114, 288). When the variable is a sequence, so the length is unknown, a "*" is used, e.g. varName(*).

 2.2 ucar.nc2.DataType

 

DataType is an enumeration of the possible data types that a Variable can take.

 

public class DataType {

  public static final ucar.nc2.DataType BOOLEAN;

  public static final ucar.nc2.DataType BYTE;

  public static final ucar.nc2.DataType CHAR;

  public static final ucar.nc2.DataType SHORT;

  public static final ucar.nc2.DataType INT;

  public static final ucar.nc2.DataType LONG;

  public static final ucar.nc2.DataType FLOAT;

  public static final ucar.nc2.DataType DOUBLE;

  public static final ucar.nc2.DataType STRING;

  public static final ucar.nc2.DataType STRUCTURE;

 

  public static DataType getType(String name); // find by name

  public static DataType getType(Class class); // find by class

 

  public String toString();          // eg "double"

  public Class getPrimitiveClassType(); // eg double.class

  public Class getClassType();      // eg Double.class

}

 2.3 ucar.nc2.Attribute

 

An Attribute is a (name, value) pair, where the value may be a scalar or one dimensional array of String or Number. An Attribute is attached to a Variable or a Group.

 

public class Attribute {

  public String getName();

  public DataType getDataType();

  public boolean isString();

  public int getLength();      // = 1 for scalars

 

  public Array getValues(); 

  public String getStringValue();

  public String getStringValue(int elem);

  public Number getNumericValue();

  public Number getNumericValue(int elem);

}

 2.4 ucar.nc2.Group

 

A Group is a container for variables, attributes, dimensions, or other groups. Groups form a tree of nested groups, similar to file directories. There is always at least one group in a NetcdfFile, the root group, which has an empty string as its name. The full name of a group uses "/" as separator, so "group1/group2/group3" names group3 with parent group2 with parent group1, with parent the root group.

 

public class Group {

  public String getName();      // full name, starting from root group

  public String getShortName(); // name local to parent group

 

  public List getVariables();   // list of Variables directly in this group

  public Variable findVariable(String shortName); // find specific Variable

 

  public List getDimensions();   // list of Dimension directly in this group

  public Dimension findDimension(String name); // find specific Dimension

 

  public List getAttributes();   // list of Attributes directly in this group

  public Attribute findAttribute(String name); // find specific Attribute

  public Attribute findAttributeIgnoreCase(String name);

 

  public List getGroups();   // list of Groups directly in this group

  public Group findGroup(String shortName); // find specific Group

  public Group getParentGroup();

}

 2.5 ucar.nc2.Variable

 

A Variable is a multidimensional array of primitive data, Strings or Structures (a scalar variable is a rank-0 array). It has a name and a collection of dimensions and attributes, as well as logically containing the data itself. The rank of a variable is its number of dimensions, and its shape is the lengths of all of its dimensions. The dimensions are returned in getDimensions() in order of most slowly varying first (leftmost index for Java and C programmers). The data elements have type getDataType().

 

Each Variable is contained in a Group. The full name of a variable uses "/" as separator from its parent groups, so "group1/group2/varName" names varName contained in group2 with parent group1 in the root group.

 

public class Variable {

  public String getName();      // full name, starting from root group

  public String getShortName(); // name local to parent group

  public Group getParentGroup();

 

  public int getRank();           // rank of the array

  public int[] getShape();        // shape of the array

  public long getSize();          // total number of elements

  public int getElementSize();       // byte size of one element

  public DataType getDataType();    // data type of elements

 

  public List getDimensions();       // get ordered list of Dimensions

  public Dimension getDimension(int i);  // get the ith Dimension

  public int findDimensionIndex(String name); // find named Dimension

 

  public List getAttributes();    // get list of Attributes

  public Attribute findAttribute(String attName);

  public Attribute findAttributeIgnoreCase(String attName);

 

  public boolean isUnlimited();   // if any Dimension is unlimited

  public boolean isSequence();    // is a Sequence

  public boolean isCoordinateVariable();

 

  // read the data

  public Array read();  // read all data

  public Array read(int[] origin, int[] shape); // read a section   

  public Array read(List section); // list of ucar.ma2.Range objects

  public Array read(String sectionSpec); // fortran90 syntax, eg "3,0:100:10,:"

 

  // for scalar data

  public byte readScalarByte();

  public double readScalarDouble();

  public float readScalarFloat();

  public int readScalarInt();

  public long readScalarLong();

  public short readScalarShort();

  public String readScalarString();

 

  // for members of Structures : see section 2.7 below

  public boolean isMemberOfStructure();

  public Structure getParentStructure()

  public Array readAllStructures(List section, boolean flatten);

  public Array readAllStructuresSpec(String sectionSpec, boolean flatten);

 

  // for Variable sections

  public boolean isSection();

  public Variable section(List section);

  public List getRanges();

}

 

Data access is handled by calling the various read methods (all of which throw IOException) which return the data in a memory-resident Array object (further manipulation can be done on the Array object, see section 5.2 below). Each read call potentially causes a disk or network access.

 

The read() method reads all the data and returns it in a memory resident Array which has the same shape as the Variable itself.

 

The read(int[] origin,int[]  shape), read(List section), and read(String sectionSpec)  methods specify a section of the Variable to read, whose returned shape is the shape of the requested section. The origin and shape parameters specify a starting index and number of elements to read, respectively, for each of the Variable's dimensions. The section parameter is a list of Range objects, one for each dimension. A null Range in the list means to use the entire shape for that dimension. A Range is constructed with a first and last (inclusive) index value, and optionally a stride. (This is the only way to do strided access)

 

public class ucar.ma2.Range {

  Range(int first, int last);

  Range(int first, int last, int stride);

  int length();

}

 

The sectionSpec parameter specifies the section with a String, using Fortran 90 array section syntax. For example, "10, 1:3, :, 0:1000:100".  See "Array section syntax" in section 2 for details.

 

When the Variable is a scalar, you can use the scalar read routines; for example readScalarDouble() returns a scalar's value as a double.

 

When the Variable is a member of an array of Structures, things are somewhat more complicated. The standard read methods operate only on the first element of the Structure array. To get data from other Structure elements, or from multiple Structure elements, you must use the readAllStructures() and related methods. See next section for details.

 

A Variable can be created as a section of another Variable by using the section() method. A section Variable is a "first class object" that can be treated like any other Variable. If you call read( section) on a section Variable, the section parameter will refer to the new Variable's shape. Generally, a section Variable is a logical section, meaning that no data is read until a read method is called.

 

A sequence Variable is indicated by isSequence() of true.  Sequences are one dimensional arrays of any DataType whose length is not determined until the data is actually read. So getShape() and getSize() return 0. You cannot call section() or read( section) on a Sequence. IS THIS TRUE?

 2.6 ucar.nc2.Structure

 

A Structure is a subclass of Variable that contains member Variables, similar to a struct in the C language. A Structure represents a multidimensional array whose elements have type DataType.STRUCTURE.  A read on a Structure returns an array of StructureData objects, which contains data Arrays for the member Variables.

 

All of a Structure's Variables' data may be assumed to be stored "close" to each other, so it may be more efficient to read all the data in one Structure at once using the Structure.read() method, rather than the read() method on its individual member Variables. This is generally true for remote access protocols like OpenDAP, since each read() call typically incurs the cost of a round-trip to a server.

 

Both a Structure and a Group contain collections of Variables, but a Group is a logical collection, and a Structure can be considered a physical collection of data stored close together. A Group can contain shared Dimensions, but a Structure cannot. Both can contain Attributes. Most importantly, a Structure can be an array, but a Group cannot.

 

public class Structure extends ucar.nc2.Variable {

  public List getVariables();     // list of Variables

  public List getVariableNames(); // list of String : members' short names

  public Variable findVariable(String shortName);

 

  public StructureData readStructure();         // for scalar data

  public StructureData readStructure(int elem); // for rank 1 data

  public Structure.Iterator getStructureIterator(); // iterator returns StructureData

}

 

public class Structure.Iterator {

  public boolean hasNext();

  public StructureData next() throws java.io.IOException;

}

 

To access the data in a structure, typically you use the read() and read(section) methods inherited from the Variable superclass. These return arrays of StructureData objects. The readStructure() and readStructure(int elem) methods can be used when the structure is scalar or one-dimensional, respectively. For any shape structure, use getStructureIterator() which returns an iterator that reads one structure in the array each time iterator.next() is called, and returns a StructureData object. 

 

public class StructureData {

  public List getMembers(); // StructureData.Member objects

  public java.util.List getMemberNames();

  public StructureData.Member findMember(String shortName);

  public StructureData.Member findNestedMember(String fullName);

  public Array findMemberArray(String memberName);

 

 

  // for scalar members

  public byte getScalarByte(String memberName);

  public double getScalarDouble (String memberName);

  public float getScalarFloat(String memberName);

  public int getScalarInt(String memberName);

  public long getScalarLong(String memberName);

  public short getScalarShort(String memberName);

  public String getScalarString(String memberName); // ok for 1D char type, too

}

 

public class StructureData.Member {

  public ucar.nc2.Variable v;

  public ucar.ma2.Array data;

}

 

A StructureData object contains the data of the member variables of the structure, after the data has been read into memory with a read method.  Each member variable is associated with its data Array in a StructureData.Member object. The getScalarXXX() methods are convenience methods for extracting the data when the member Variable is a scalar.

Structure Member Variables

The read methods on variables that are members of structures are complicated by the fact that the parent structure may be a non-scalar array. The member variable's read method therefore is within the context of some element of its parent structure. For this reason, a member variable is not truly a "first class" citizen. Therefore if you use the ordinary read methods on a variable with isMemberStructure() true, these will return the data only from the first structure. 

 

To read data from a member variable across multiple structures, use the member variable's readAllStructures() method.  However, if you need data from more than one member variable, it is probably more efficient to call the read methods on the structure itself, and extract the member data from the StructureData object. You can use the getStructureIterator() method to access all of the data in an array of structures sequentially.

 

If a Variable is a member of a Structure, its full name uses the "." as a separator from the Structure name, so e.g. "group1/record.varName" names varName as a member of the record Structure, which is in group1 whose parent is the root group.

 2.7 ucar.nc2.NetcdfFile

 

A NetcdfFile provides read-only access to datasets through the netCDF API (to write data, use NetcdfFileWriteble, below). Use the static NetcdfFile.open methods to open a local netCDF (version 3 or 4) file, and most HDF5 files. See NetcdfDataset.open for more general reading capabilities, including OpenDAP and NcML.

 

You can look at all the variables, dimensions and global/group attributes through the top level getVariables(), getDimensions(), and getGlobalAttributes() methods. Alternatively, you can explore the contained objects using groups, starting with the root group by calling getRootGroup().

 

Generally, reading NetCDF version 3 files through the NetCDF-Java 2.2 API is backwards compatible with programs using the NetCDF-Java 2.1 API (there are a some name changes), since there are no Groups or Structures in netCDF version 3 files. However, when working with NetCDF version 4 files, HDF5 files, OpenDAP and other kinds of datasets, you will need to deal with the possible presence of Groups, Structures, Sequences, Strings, etc, that are new to the NetCDF-Java 2.2 data model.

 

public class NetcdfFile {

  static NetcdfFile open(String location);

  static NetcdfFile open(String location, String id, String title, boolean useV3,

CancelTask cancel);

  public void close();    

 

  public String getLocation();   // dataset location

  public String getId();         // globally unique id, may be null

  public String getTitle();      // human readable title, may be null

 

  public Group getRootGroup();   

 

  public List getVariables();                  // all Variables in all Groups

  public Variable findVariable(String fullName);   // find specific Variable by full name

  public Variable findTopVariable(String fullName); // no Structure member names

 

  public list getDimensions();       // shared Dimensions

  public Dimension findDimension(String dimName); // get specific Dimension

 

  public List getGlobalAttributes();          //get list of global Attributes

  public Attribute findGlobalAttribute(String attName); //specific Attribute

  public Attribute findGlobalAttributeIgnoreCase(String attName);

  public String findAttValueIgnoreCase(Variable v, String attName, String def);

 

  public Array read(String sectionSpec, boolean flatten);

  public void writeCDL(java.io.OutputStream os);

  public void writeNcML(java.io.OutputStream os);

}

 

The getVariables() method returns all Variables in all Groups, but does not include members of Structures, since those are considered to be part of the Structure Variable. Similarly, findTopVariable() only searches over these "non-nested" Variables. The findVariable() method will also look for Variables that are members of Structures, e.g. "group1/struct.member ".

 

The findAttValueIgnoreCase() method is a convenience method for String–valued attributes which returns the specified default value if the attribute name is not found. If you specify a null for the variable then it will search for global attributes, otherwise it will search for the attribute in the given variable.

 

The read( sectionSpec, flatten) method is a general way to read from any variable in the NetcdfFile. It accepts a String like varname(0:1,13,:). See "Array section syntax" section above for details on the syntac of the sectionSpec string. The flatten parameter is only used when the requested variable is a member of a structure.

 

 2.8 ucar.nc2.NetcdfFileWriteable

 

A NetcdfFileWriteable allows a NetCDF version 3 file to be created or written to.

 

A NetCDF file can only have dimensions, attributes and variables added to it at creation time. Thus, when a file is first opened, it is in "define mode" where these may be added. Once create() is called, the dataset structure is immutable. After create() has been called you can then write the data values. See example below.

      

public class NetcdfFileWriteable extends NetcdfFile {

  public NetcdfFileWriteable();         // create new file

  public void setName(String filename);

  public setFill(boolean fill); // default false

 

  public Dimension addDimension(String dimName, int dimLength);

 

  public void addGlobalAttribute(String attName, String value);

  public void addGlobalAttribute(String attName, Number value);

  public void addGlobalAttribute(String attName, Array values);

 

    // add new Variable and variable Attributes

  public void addVariable(String varName, DataType varType, List dims);

  public void addVariableAttribute(String varName, Attribute att);

 

    // finish structure definition, create file

  public void create();

 

    // open existing file and allow writing data

  public NetcdfFileWriteable(String filename);

 

    // write data to file

  public boolean write(String varName, Array data);

  public boolean write(String varName, int[] origin, Array data);

 

  public void flush(); // flush to disk

  public void close();

}

3. Semantic Layer: NetcdfDataset

 

The ucar.nc2.dataset package and related packages are an extension to the NetCDF API which recognize standard attributes, provides support for general and georeferencing coordinate systems, and provide support for the NetCDF Markup Language (NcML). Opening a NetcdfDataset is also the general way to open any dataset as a NetcdfFile. 

 

NcML is an XML document format that allows you to create "virtual" netCDF datasets, including combining multiple netCDF files into one dataset. This section focuses on the extension of the netCDF API to include coordinate systems. Section 4 below explains NcML and how to create virtual datasets.

3.1 Standard Attributes

 

A standard attribute is a NetCDF attribute that has a defined name and meaning. Several have become widespread and important enough to be handled by the library instead of at the application level.  A variable's description is a sentence that describes the meaning of the variable. The units indicate the variable's scientific data units. Missing data values are special values that indicate that the data is not present at that location. Floating point data is often packed into byte or shorts using scale and offset values. Coordinate systems are also often specified through standard attributes.

 

The VariableEnhanced interface is implemented by classes in the ucar.nc2.dataset package to provide standard methods to access standard attributes. In addition, dataset annotation systems like NcML and THREDDS can add this information to datasets that lack it. It's useful for application developers to use this interface in order to take advantage of both the standard implementations and the opportunity for annotation systems to add value.

 

The getDescription() and getUnitsString() methods are simple convenience routines for getting the variable's description and units, respectively. More complicated is the handling of packed data and missing data:

 

·         Packed data. If the variable has the scale_factor and/or add_offset attributes, hasScaleOffset() will return true, and the variable will automatically be converted to type double or float and its data unpacked using the formula:

unpacked_value = scale_factor * packed_value + add_offset

(scale_factor will default to 1.0 and  add_offset will default to 0.0 if they are missing)

 

·         Missing data. If the variable has any of the invalid or missing data attributes (_FillValue, missing_value, valid_min, valid_max, or valid_range),  hasMissing() will return true. To test if a specific value is missing, call isMissing(val). Note that the data is converted and compared as a double. When the variable element type is float or double (or is set to double because it is packed), then missing data values are set to NaN (IEEE not-a-number), which makes further comparisons more efficient.

 

 

public interface VariableEnhanced {

  public String getDescription();   // long_name attribute

  public String getUnitsString();   // units attribute

 

  public boolean hasScaleOffset();  // is packed

  public boolean hasMissing();

  public boolean isMissing( double val);   // check specific value

}

 

To use this, open the file through the NetcdfDataset.open method, described below.

3.2 Coordinate Systems

 

A well-established convention with netCDF files is the use of coordinate variables to name the coordinate values of a dimension. A coordinate variable is a one-dimensional variable with the same name as a dimension, e.g. float lat(lat) . It must not have any missing data (for example, no _FillValue or  missing_value attributes) and must be strictly monotonic (values increasing or decreasing). Many programs that read netCDF files recognize and use any coordinate variables that are found in the file.

 

Because coordinate variables must be one-dimensional, they cannot represent some commonly used coordinates, for example float lat(x,y) and float lon(x,y) assign latitude and longitude coordinates to points on a projection plane. A coordinate axis is a generalization of a coordinate variable. It must be a variable with no missing data, and no extra dimensions besides its coordinate dimensions (i.e. it may not be vector valued). It may have any number of dimensions. A variable that uses a coordinate axis must have all of the dimensions that the coordinate axis has, for example, the variable T(time, z, y, x) can have a coordinate axis  lat(x,y) because x and y are both used by T, but variable R (time, y) does not use the x dimension, so cannot have lat as a coordinate axis.

 

A coordinate system is a set of coordinate axes used by a variable. If S is a coordinate system for a variable V, and there are n coordinate axes in S, then S assigns to each value of V n coordinate values, one from each coordinate axis. For example if the variable T(time, z, y, x)  has a coordinate system S consisting of the coordinate axes lat(x, y), lon(x,y), level(z), and time(time), then the coordinate values for T(t, k, j, i)  in S are (lat(i,j), lon(i,j), level(k), time(t)). Note that the order of the indices in the lat and lon axes is important! Another interesting example is assigning positions to a trajectory, for example lat(sample), lon(sample), altitude(sample) might be a coordinate system for variable O3(sample). A coordinate transformation is a function that transforms the values in one coordinate system to the values in another coordinate system.

 

In general, coordinate systems can be used by more than one variable, and a variable can have 0, 1, or more coordinate systems.

 

Coordinate System Object Model

 

 

Fig 2.

Coordinate System Data Model in UML

 

Georeferencing

 

In the earth sciences, special consideration is given to coordinate systems that locate data in physical space and time, called georeferencing coordinate systems. In the simplest case, for example, a georeferencing coordinate system identifies the latitude, longitude, height, and time of the data. For numerical model data often the horizontal coordinates are specified on a projection plane, whose (latitude, longitude) coordinates can be found by using a mathematical formula from projective geometry. Satellite data can be extremely complicated, and the position and extent of each datum must be calculated using sensor-dependent empirical algorithms.

 

Implicit in georeferencing coordinate systems are various standards, for example the "Clarke ellipsoidal earth model", "polar stereographic projection", "mean sea level" and "Gregorian calendar". These are part of "well-known" reference coordinate systems. Abstractly, a georeferencing coordinate system is one which allows us to calculate the latitude, longitude, height, and time of each point in its domain. Concretely, the ability to make that calculation might depend on the availability of a library or service which implements a coordinate transformation from the coordinate system specified in the netCDF dataset to a (latitude, longitude, height, time) reference coordinate system.

 

The specification of the reference coordinate system may be arbitrarily complicated, depending upon how accurate the position must be calculated. Importing data into geographic information systems (GIS) can also require much information that may not be explicitly stored in the netCDF file. The netCDF Dataset API focuses on a minimal set of information needed for implementing georeferencing coordinate systems. Future extensions will define more complete metadata requirements for GIS interoperability.

 

For our purposes, a georeferencing coordinate system is one in which we have identified the lat, lon, height, and time axes of the coordinate system, or have identified coordinate transformations from which latitude, longitude, height, and time can in principle be calculated.

NetCDF Conventions

 

Many netCDF files follow naming and attribute Conventions that allow readers to understand what the variables and dimensions in the files mean, and in particular that identify the coordinate systems in use. The classes in the ucar.nc2.dataset.conv package implement some of the important Conventions that Unidata is familiar with. In this context, implementing the Convention means to identify the coordinate systems, and the georeferencing information that are present in the netCDF file, and create a NetcdfDataset object that holds that information.

 

The logic of each Convention is coded in a class that extends ucar.nc2.dataset.conv.Convention, whose augmentDataset() method adds addition information to a NetcdfDataset:

 

     public abstract void augmentDataset( NetcdfDataset ncDataset);

 

We highly recommend that you document and register your naming and attribute Conventions.  Unidata maintains a web page of such Conventions. If you are using Conventions that are not included in this package, you are encouraged to extend the abstract class ucar.nc2.dataset.conv.Convention, implementing your Convention, and. register it (using ucar.nc2.dataset.conv.Convention.register()).  The factory method will look in the netCDF file for a global attribute named "Conventions", and match its value against registered names. If it finds a match, it will use the registered class to instantiate an object of that class (using the no-argument constructor, so make sure you have such a constructor if you are implementing your Convention).

 

3.3 NetcdfDataset API

 

The following presents the important public methods of the ucar.nc2.dataset classes. See the javadoc for complete details.

 

A NetcdfDataset is a NetcdfFile and so can be used wherever a NetcdfFile object is used. Use these factory methods to open any kind of dataset through the NetCDF API, whether you want to enhance it or not.

 

NetcdfDataset.openFile() returns a NetcdfFile: the uriString can start with prefix dods: for OpenDAP files, http: for netCDF files on an HTTP server, or file: for a local netCDF file. If it does not have one of those prefixes, then it should be a local file name. If it has a suffix of .xml or .ncml it is assumed to be an NcML file.

 

The NetcdfDataset.open() factory methods open the dataset, optionally enhance it by reading conventions and attributes, and return a NetcdfDataset.  This contains ucar.nc2.dataset.VariableDS objects instead of ucar.nc2.Variable objects. If enhanced, the getCoordinateAxes() method returns a list of all of the variables used as coordinate axes, and the getCoordinateSystems() method returns a list of all of the coordinate systems used in the dataset.  Note that the coordinate axes are still variables that are included in the getVariableIterator() and findVariable() methods.

 

public class NetcdfDataset extends ucar.nc2.NetcdfFile {

 public static NetcdfFile openFile(String location, String id, String title,                                
      boolean useV3, CancelTask cancelTask);
 

 public static NetcdfDataset open(String location, String id, String title);

 public static NetcdfDataset open(String location, String id, String title,                                   
      boolean useV3, boolean addCoordinates, CancelTask cancelTask);
 

 public NetcdfDataset(String ncmlLocation);      // create from NcML XML doc

 

 public List getCoordinateSystems(); // list of CoordinateSystem objects

 public List getCoordinateAxes();    // list of CoordinateAxis objects

}

 

A VariableDS object extends Variable, and so can be used wherever a Variable object is used. It implements all of the VariableEnhanced methods shown above, along with the getCoordinateSystems() method that returns a list of all of the coordinate systems used for the variable. Note that this list may be empty unless a ucar.nc2.dataset.conv class (or equivalent) has been able to identify the coordinate systems.

 

public class VariableDS extends Variable implements VariableEnhanced {

  ...

  public List getCoordinateSystems();

}

 

To get the VariableDS objects out of a NetcdfDataset, you must cast, eg:

 

  for (Iterator iter = ncd.getVariables().iterator(); iter.hasNext(); ) {
    VariableDS varDS = (VariableDS)  iter.next();
    ...
  }

 

A CoordinateAxis extends VariableDS, so is also a Variable, one that is used as part of a Coordinate System. The getAxisType() optionally indicates the type of the coordinate axis, which can be Lat, Lon, Height, Pressure, Time, GeoX, GeoY, or GeoZ. The getPositive() method returns the String POSITIVE_DOWN or POSITIVE_UP, and is used only for vertical axes. If POSITIVE_UP, then increasing values of that coordinate go vertically up.

 

public class CoordinateAxis extends VariableDS {

  ...

  public AxisType getAxisType(); // null if not special type

  public String getUnitString(); // coordinate units  

  public String getDescription(); // description

  public String getPositive(); // up or down

 

  public boolean isNumeric();     // numeric or String valued

  public boolean isContiguous(); // contiguous vs disjoint edges

  public double getMinValue();  // minimum coordinate value

  public double getMaxValue();  // maximum coordinate value

}

 

A CoordinateAxis1D is a one dimensional CoordinateAxis. This is the common case that implies a one to one correspondence between a dimensional index and a coordinate value. If isContiguous() is true, you can use the double[len+1] edge array from getCoordEdges() (where value[i] is contained in interval  [edge[i], edge[i+1]], otherwise you must use the more general getCoordEdges(int i) which returns the 2 edges for the ith coordinate. If isRegular(), then value[i]=getStart()+i*getIncrement().

 

public class CoordinateAxis1D extends CoordinateAxis {

  public String getCoordName(int index); // String name

  public double getCoordValue(int index);       // value from index

  public int findCoordElement(double value);  // index from value

 

  public double[] getCoordValues(); // double array length len

  public double[] getCoordEdges(); // double array length len+1

  public double   getCoordEdge(int index);      // edge value from index

  public double[] getCoordEdges(int i); // edges for ith coord

 

  public boolean isRegular(); // if evenly spaced

  public double getStart();   //  value = start + i * increment

  public double getIncrement();

}

 

A CoordinateAxis2D is a two dimensional CoordinateAxis, for example float lat(i,j) and float lon(i,j). Currently is just has convenience routines for fetching the coordinate values.

 

public class CoordinateAxis2D extends CoordinateAxis {

    ...

    public double getCoordValue(int i, int j); // get i, j coordinate

    public double[] getCoordValues(); // get coordinates as 1D array

}

 

A CoordinateSystem has a list of coordinate axes and an optional list of coordinate transforms, along with various convenience routines for extracting georeferencing information.

 

public class CoordinateSystem {

  ...

  public List getCoordinateAxes();        // list of CoordinateAxis

  public List getCoordinateTransforms();  // list of CoordinateTransform

  public boolean isProductSet();           // all axes CoordinateAxis1D

 

  public boolean isGeoReferencing();

  public boolean isGeoXY();

  public boolean isLatLon();

  public boolean hasVerticalAxis();

  public boolean hasTimeAxis();

  public ucar.unidata.geoloc.ProjectionImpl getProjection();

 

  public CoordinateAxis getXaxis(); // look for AxisType

  public CoordinateAxis getYaxis();

  public CoordinateAxis getZaxis();

  public CoordinateAxis getTaxis();

  public CoordinateAxis getLatAxis();

  public CoordinateAxis getLonAxis();

  public CoordinateAxis getHeightAxis();

  public CoordinateAxis getPressureAxis();

}                   

A CoordinateTransform represents a transformation from the containing CoordinateSystem to a “well known” reference system. It has a name, a naming authority, a set of name/value parameters used by the transformation, and an optional TransformType.  A transform type of Projection indicates that it transforms from GeoX, GeoY coordinates to Lat, Lon coordinates.

 

public class CoordinateTransform {

  public String getName();  

  public String getAuthority();  

  public List getParameters();   // list of Attribute objects

  public Attribute findParameterIgnoreCase(String name);

  public TransformType getTransformType();

}

 

4 Scientific Data Types

4.1 GeoGrids

 

Note: these classes should be considered experimental and may be refactored in the next release.

 

The ucar.nc2.dataset objects handle coordinate systems in a general way. The classes in the ucar.nc2.dataset.grid package are specialized for georeferencing coordinate systems in which the x, y, z, and time axes are explicitly recognized.

 

In order for it to be georeferencing, a coordinate system must have a lat/lon coordinate axis or a geoX/geoY coordinate axis and a projection that transforms it to lat/lon.  It may optionally have vertical and time axes. Currently, vertical and time axes, if they exist, must be one-dimensional, and x/y or lat/lon axes must be 1 or 2 dimensional. Variables that have georeferencing coordinate systems are made into GeoGrids.

 

A GridDataset is the collection of GeoGrids found in a netCDF Dataset. You can wrap an already opened netCDF Dataset into a GridDataset, or you can use the static factory method to open a netCDF file, read its Conventions and extract the GeoGrids all at once:

 

  GridDataset gds = ucar.nc2.dataset.grid.GridDataset.factory(uriString);

 

is equivalent to:

 

  String uriString = ; // file:, dods:, http:, or local filename

  NetcdfFile ncfile = ucar.nc2.dataset.NetcdfDataset.factory( uriString, null);

  NetcdfDataset ds = ucar.nc2.dataset.conv.Convention.factory( ncfile);

  GridDataset gds = new GridDataset ( ds);

 

 

GridDataset has the following public interface:

 

public class GridDataset {

 public static GridDataset factory(String uriString);

 public GridDataset(NetcdfDataset dset); // wrap a dataset

 public void close();

 

 public String getName();

 public List getGrids(); // list of GeoGrids

 public Collection getGridSets(); // sorted by coordinate system

 public GeoGrid findGridByName(String name);

 

 public NetcdfDataset getNetcdfDataset(); // underlying dataset

}

 

A GridCoordSys wraps a georeferencing coordinate system. It always has 1D or 2D XHoriz and YHoriz axes, and optionally 1D vertical and time axes. The XHoriz/YHoriz axes will be lat/lon if isLatLon() is true, otherwise they will be GeoX,GeoY with an appropriate Projection. The getBoundingBox() method returns a bounding box from the XHoriz/YHoriz corner points. The getLatLonBoundingBox() method returns the smallest lat/lon bounding box that contains getBoundingBox().

 

If there is a vertical axis, isZPositive() is true if increasing values of the vertical axis should be displayed  “up”. The getLevels() and getTimes() returns the list of levels and times, if they exist, as lists of NamedObjects that are convenient for display. If there is a time axis, and it can be converted (via ucar.units) into a Date, isDate() is true, and getTimeDates() returns the time coordinates as Date objects. The findTimeCoordElement() method does a reverse lookup, finding the time index that corresponds to a given Date.

 

public class GridCoordSys extends CoordinateSystem {

 public GridCoordSys(CoordinateSystem sys);

 public CoordinateAxis getXHorizAxis(); // GeoX or Lon

 public CoordinateAxis getYHorizAxis(); // GeoY or Lat

 public CoordinateAxis1D getVerticalAxis(); // Height,Pressure,or GeoZ

 public CoordinateAxis1D getTimeAxis();

 

 public ArrayList getLevels(); // list of ucar.nc2.util.NamedObject

 public ArrayList getTimes();  // list of ucar.nc2.util.NamedObject

 public String getLevelName(int idx);

 public String getTimeName(int idx);

 

 public ucar.unidata.geoloc.ProjectionImpl getProjection();

 public ProjectionRect getBoundingBox();

 public LatLonRect getlatLonBoundingBox()

 public boolean isLatLon();

 public boolean isZPositive(); // increasing means ‘up’

 

 public boolean isDate(); // has Date time axes

 public java.util.Date[] getTimeDates(); // get time coords as Dates

 public int findTimeCoordElement(java.util.Date d);

}

 

interface ucar.nc2.util.NamedObject{

  public String getName();

  public String getDescription();

}

 

A GeoGrid wraps a VariableDS in a VariableStandarized, and also has a GridCoordSys. You can think of it as a specialized Variable that explicitly handles X,Y,Z,T dimensions, which are put into canonical order: (t, z, y, x). It has various convenience routines that expose methods from the GridCoordSys and VariableDS objects.

 

public class GeoGrid implements ucar.nc2.util.NamedObject {

 public String getName();

 public GridCoordSys getCoordinateSystem();

 

 public int getRank();

 public List getDimensions();

 public Dimension getDimension(int idx);

 public Dimension getTimeDimension();

 public Dimension getZDimension();

 public Dimension getYDimension();

 public Dimension getXDimension();

 public int getTimeDimensionIndex();

 public int getZDimensionIndex();

 public int getYDimensionIndex();

 public int getXDimensionIndex();

 

 // from VariableEnhanced

 public String getDescription();

 public String getUnitString();

 public boolean hasMissingData();

 public boolean isMissingData(double val);

 public float[] setMissingToNaN(float[] vals);

 public ucar.nc2.Attribute findAttributeIgnoreCase(String attName);

   

 // from GeoCoordSys

 public ucar.unidata.geoloc.ProjectionImpl getProjection();

 public java.util.ArrayList getLevels();

 public java.util.ArrayList getTimes();

   

 public ucar.ma2.MAMath$MinMax getMinMaxSkipMissingData(ucar.ma2.Array data);

 

 // read data

 public ucar.ma2.Array readVolumeData(int t);       // z,y,x volume

 public ucar.ma2.Array readYXData (int t,int z);  // y,x slice

 public ucar.ma2.Array readZYData (int t,int x);  // z,y slice

 public ucar.ma2.Array getDataSlice(int t,int z,int y,int x); // any

}

 

4.2 PointData

 

coming soon


5. NetCDF Markup Language (NcML)

 

The NetCDF Markup Language (NcML) is an XML representation of the metadata in a netCDF file. It is described formally by a schema model expressed in XML Schema, see http://www.unidata.ucar.edu/schemas/netcdf.xsd.

http://www.unidata.ucar.edu/packages/netcdf/ncml/index.html

 

Here is an example netCDF file in CDL:

 

netcdf example.nc {

 dimensions:

   lat = 3;   // (has coord.var)

   lon = 4;   // (has coord.var)

 

 variables:

   int rh(lat, lon);

    :long_name = “relative humidity”;

    :units = “percent”;

   float lat(lat);

    :units = “degrees_north”;

   float lon(lon);

    :units = “degrees_east”;

 

 // Global Attributes:

    :title = “Example Data”;

 

}

 

which has the following NcML representation:

 

<?xml version=”1.0” encoding=”UTF-8”?>

<nc:netcdf xmlns:nc=http://www.ucar.edu/schemas/netcdf

 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

 xsi:schemaLocation=”http://www.ucar.edu/schemas/netcdf

 http://www.unidata.ucar.edu/schemas/netcdf-cs.xsd

 uri=”example.nc”>

  <nc:dimension name=”lat” length=”3” />

  <nc:dimension name=”lon” length=”4” />

  <nc:attribute name=”title” type=”string” value=”Example Data” />

  <nc:variable name=”rh” shape=”lat lon” type=”int”>

    <nc:attribute name=”long_name” type=”string” value=”relative humidity” />

    <nc:attribute name=”units” type=”string” value=”percent” />

  </nc:variable>

  <nc:variable name=”lat” shape=”lat” type=”float”>

    <nc:attribute name=”units” type=”string” value=”degrees_north” />

  </nc:variable>

  <nc:variable name=”lon” shape=”lon” type=”float”>

    <nc:attribute name=”units” type=”string” value=”degrees_east” />

  </nc:variable>

</nc:netcdf>

The NcML base schema simply reflects the netCDF data model:

 

 

5.1 NcML Coordinate Systems

 

NcML Coordinate Systems is an extension of the NcML base schema. It significantly extends the netCDF data model in order to capture the semantics of general coordinate systems, and georeferencing coordinate systems used in the earth sciences. The important additions to the base schema are:

 

 

See http://www.unidata.ucar.edu/schemas/netcdf-cs.xsd  for the schema.

See http://www.unidata.ucar.edu/packages/netcdf/ncml/AnnotatedNetcdfCS.html for more information.

 

5.2 NcML Dataset

 

NcML Dataset is an extension of the NcML base schema, which defines the public metadata of a netCDF Dataset. A netCDF Dataset is a generalization of a NetCDF file. Its purpose is to allow

 

See http://www.unidata.ucar.edu/schemas/netcdf-cs.xsd  for the schema.

See http://www.unidata.ucar.edu/ packages/netcdf/ncml/NetcdfDataset.html  for more information.


6. Multidimensional Arrays       

 

The ucar.ma2 package implements multidimensional arrays of arbitrary rank and element type. Actual data storage is done with Java 1D arrays and stride index calculations. This makes our Arrays rectangular, i.e. these cannot be "ragged arrays" where different elements can have different lengths as in Java multidimensional arrays, which are arrays of arrays.

 

The ucar.ma2 package is independent of the ucar.nc2 package, and is intended for general multidimensional array use.  Its design is motivated by the needs for NetCDF data to be handled in a general, arbitrary rank, type independent way, and also by the requirements of the JavaGrande numeric working group [Caron2000].

 

It is often critically important for performance that the movement of data between memory and disk is carefully managed.  To obtain the data in a ucar.nc2.Variable object you must call read() to bring the data into memory in the form of an Array.  Any method that potentially makes an IO call will have an IOException in its signature. Note that none of the methods on Array do. The fact that a Variable can throw an IOException but an Array object cannot may in fact be a critical factor in how these objects are used [Waldo94]. 

 

The following is an overview of the important public interfaces of the ucar.ma2 classes. Consult the javadoc for more complete and recent details.

6.1 ucar.ma2.Array: in-memory multidimensional arrays

 

Array is the abstraction for multidimensional arrays with data stored in memory. Arrays can have arbitrary rank, and there are concrete implementations for arrays of rank 0-7 for efficiency. The underlying storage will be a 1 dimensional Java array of any of the Java primitive types (double, float, long, int, short, byte, char, boolean) or a 1 dimensional Java array of Objects, for any reference type. The data can be accessed in a type independent way, for example getDouble() can be called on an Array of any numeric type. The implementing class casts the data to the requested type (and throws a runtime ForbiddenConversionException if the cast is illegal), or uses a direct assign when the requested type is the same as the data type.

 

The data type, rank, and shape of an array are immutable, while the data values themselves are mutable. Generally this makes Arrays thread-safe, and no synchronization is done in the Array package. (There is the possibility of non-atomic read/writes on 64 bit primitives (long, double). In this case the user should add their own synchronization if needed. Presumably 64-bit CPUs will make those operations atomic also.)

 

public abstract class Array {

 

    // array shape and type

  public long   getSize();              // total # elements

  public int    getRank();              // array rank

  public int[]  getShape();             // array dimension sizes    

  public Class  getElementType();       // data type of backing array

 

    // accessor helpers

  public Index         getIndex();          // random access 

  public IndexIterator getIndexIterator();  // sequential access

  public IndexIterator getRangeIterator(Range[] ranges); // access subset

  public IndexIterator getIndexIteratorFast();  // arbitrary order

 

    // accessors: for each data type (double, float, long, int, short,     

    // byte, char, boolean) there are methods of the form eg:

  public double getDouble(Index ima);

  public void   setDouble(Index ima, double value);

  ...

 

    // create new Array, no data copy

  public Array flip( int dim);           // invert dimension 

  public Array permute( int[] dims);     // permute dimensions

  public Array reduce();          // rank reduction for any dims of length 1

  public Array reduce(int dim);   // rank reduction for specific dimension

  public Array section( Range[] ranges); // create logical subset

  public Array sectionNoReduce( Range[] ranges); // no rank reduction

  public Array slice(int dim, int val);         // rank-1 subset

  public Array transpose( int dim1, int dim2);  // transpose dimensions

 

    // create new Array, with data copy

  public Array copy();                  

  public Array sectionCopy( Range[] ranges);    // subset

  public Array reshape( int [] shape);   // total # elements must be the same

 

    // conversion to Java arrays

  public java.lang.Object copyTo1DJavaArray();

  public java.lang.Object copyToNDJavaArray();

}

 

The getShape() method returns an integer array containing the length of the Array in each dimension. The getRank() method returns the number of dimensions, and getSize() returns the total number of elements in the Array. The getElementType() method returns the data type of the backing store, e.g. double.class, float.class, etc.

 

Data element access is described in the sections following this one.

 

Logical "views" of the array are created in several ways. The section() method creates a subarray of the original array. The slice() method is a convenience routine for the common section() operation of rank-1 section of the array. The  transpose() method transposes two dimensions, while permute() is a general permutation of the indices. The flip() method flips the index of the specified dimension so that it logically runs from n-1 to 0, instead of from 0 to n-1. The reduce() method allows user control over rank-reduction.  All of these logically reorder or subset the data without copying.

 

Methods that create new Arrays by copying the data are copy(), sectionCopy() and reshape().

 

The data can be copied into a Java array using the copyTo1DjavaArray() and copyToNDjavaArray() methods.  In the first case, a 1D Java array of the appropriate primitive type is created and the data is copied to it in logical order (rightmost indices varying fastest). In the second case, an N-dimensional Java array is created that matches the Array shape, and the data is copied into it. The user must cast the returned Object to the appropriate Java array type.

6.2 ucar.ma2.Index

 

Accesses to specific array elements are made using an Index object, for example:

 

double sum = 0.0;

   Index index = A.getIndex();

   int [] shape = A.getShape();

    for (i=0; i<shape[0]; i++)

     for (j=0; j< shape[1]; j++)

      for (k=0; k< shape[2]; k++)

    sum += A.getDouble(index.set(i,j,k));

 

Note that in this example, A can be of any type convertible to a double.  Index has various convenience methods for setting the element index:

 

public class Index {

    // general

  public Index set(int [] index);

  public void setDim(int dim, int value);

 

    // convenience methods for rank 0-7

  public Index set(int v0);            // set index 0

  public Index set(int v0, int v1);          // set index 0,1

  public Index set(int v0, int v1, int v2); // set index 0,1,2

  ...                                        // ..up to dimension 7

 

  public Index set0(int v);      // set index 0

  public Index set1(int v);      // set index 1

  public Index set2(int v);      // set index 2

  ...                             // ..up to dimension 7

}

 

Because an Index object stores state, threads that share an Array object must obtain their own Index from the Array.

 

6.3 ucar.ma2.IndexIterator

 

An IndexIterator is used to sequentially traverse all data in an Array in logical (row-major) order. For example, logical order for A(i,j,k) has k varying fastest, then j, then i. Note that because of the possibility that A is a flipped or permuted view, logical order may not be the same as physical order. Example:

double sum = 0.0;

IndexIterator iter = A.getIndexIterator();

while (iter.hasNext())

        sum += iter.getDoubleNext();

 

Note that in the above example A can be of arbitrary rank.

 

public interface IndexIterator  {

  public boolean hasNext();

 

    // for each data type

  public double getDoubleNext();

  public double getDoubleCurrent();

  public void setDoubleNext(double val);

  public void setDoubleCurrent(double val);

  ...

}

There are two special kinds of iterators: Array.getIndexIteratorFast() returns an Iterator that iterates over the array in an arbitrary order. It can be used to make iteration as fast as possible when the order of the returned elements is immaterial, for example in the summing example above. Array.getRangeIterator() returns a "range" iterator that iterates over a subset of an array, in logical order. This is an alternative (and equivalent) to first creating an array section, and then obtaining an Iterator. In the following example, the sum is made only over the first 10 rows, and all columns, of the array:

 

  int sum = 0;

  IndexIterator iter =A.getRangeIterator(new Ranges[2]{new Range(0,9), null});

  while (iter.hasNext())

    sum += iter.getIntNext();

 

6.4 Type and rank specific Arrays

 

For each data type, there is a concrete class that extends ArrayAbstract, e.g., ArrayDouble, ArrayFloat, ArrayByte,etc. ArrayObject is used for all reference types. For each of these, there is a concrete subclass for each rank 0-7, for example ArrayDouble.D3 is a concrete class specialized for double arrays of rank 3. These rank-specific classes are static inner classes of their superclass. This design allows handling arrays completely generally (through the Array class), in a rank-independent way (though the Array<Type> classes), or in a rank and type specific way for ranks 0-7 (through the Array<Type>.D<rank> classes).

 

The most general way to create an Array is to use the static factory method in Array:

 

  public abstract class Array {

    static public Array factory( Class type, int [] shape);

    ...

  }

Array a = Array.factory( double.class, new int[] {128} );

 

Will create a 1D double array analogous to new double[128].

 

The type-specific subclasses can be instantiated directly with an arbitrary rank. These also add type-specific get/set accessors, for example:

 

  public class ArrayDouble extends Array {

      // constructor

    public ArrayDouble(int [] dimensions);

 

       // type-specific accessors

    public double get(Index i);

    public void set(Index i, double value);

       ...

  }

ArrayDouble a = new ArrayDouble ( new int[] {128, 64} );

 

Will create a 2D double array analogous to new double[128][64].

 

If you create your own Array objects, you should usually use the rank and type specific subclasses, which will provide the most efficient access. These classes also add rank-specific get/set routines, for example:

 

  public static class D3 extends ArrayDouble {

      // constructor

    public D3 (int len0, int len1, int len2);

 

      // type and rank specific accessors

    public double get(int i, int j, int k);

    public void set(int i, int j, int k, double value);

}

ArrayDouble a = new ArrayDouble.D3 ( 128, 64, 32);

 

Will create a 3D double array analogous to new double[128][64][32].

 

 

You may also create an Array from an N-dimensional Java array:

 

   public static Array factory(java.lang.Object javaArray);

 

   Array a = Array.factory( new short[] {128, 123, 43} );

 

will create a 1D short array of length 3, with the given values.

 

Note that in this case, the data elements are copied out of the Java array into the private Array storage.

 

IndexImpl is a concrete, general rank implementation of Index, and is extended by rank specific subclasses for efficiency (we have rank 0-7 implementations).  Array and its subclasses have an IndexImpl of the appropriate rank that is delegated all rank-specific functions.  This orthogonal design keeps the number of classes small, and makes adding new ranks or data types quite simple.

 

The "logical view" operations (flip, section, transpose, slice and permute) are implemented by manipulating the index calculation within the IndexImpl object.  These operations are affine, as is the operation that transforms the n-dim index into the 1-dim element index, and therefore any composition is an affine transformation.  The resulting transformation is immutable, and can be computed during the IndexImpl object construction. Therefore there is no extra cost associated with the index calculation for these operations (or any composition of them) during element access. These operations do logical data reordering; physical reordering can be done by making an array copy.

 

                An IndexIterator traverses array elements in logical order, which we have defined as row-major (as in C).  An iterator can in principle be more efficient than other element accesses because 1) the index values cannot be out of range, and therefore do not need to be bounds checked, and 2) the element calculation usually changes by a fixed stride each time.  We take advantage of these facts in our implementation, as well as package-private accessor methods, to reduce the number of method calls per data access from 3 to 2.

 


7 Implementation

 

The NetcdfFile.open methods expect a location that starts with http: for files served from an HTTP server, or a location that starts with file: or is a local file name, for files to be opened as local files. These files must be in one of the following formats:

 

The NetcdfDataset.openFile method opens:

·         anything that NetcdfFile.open does

 

The NetcdfDataset.open method calls NetcdfDataset.openFile, then wraps the NetcdfFile object in a NetcdfDataset if it is not already. It will optionally identify the coordinate systems if it can. If it is a THREDDS dataset, it will add THREDDS information into the NetcdfDataset.

 

Each scientific data type has a factory method which uses the NetcdfDataset.open method to get a NetcdfDataset, then wraps that in the specialized object:

 

7.1 Remote access to netCDF files through an HTTP server

 

NetCDF-3 files can be made accessible over the network by simply placing them on an HTTP (web) server, like Apache. The server must be configured to set the "Content-Length" and "Accept-Ranges: bytes" headers.

 

The client that wants to read these files just uses the usual NetcdfFile.open(String location, …) method to open a file. The location contains the URL of the file, for example: http://www.unidata.ucar.edu/staff/caron/test/mydata.nc.  In order to use this option you need to have the HttpClient.jar file in your classpath.

 

The ucar.nc2 library uses the HTTP 1.1 protocol's "Range" command to get ranges of bytes from the remote file [HTTP]. The efficiency of the remote access depends on how the data is accessed. Reading large contiguous regions of the file should generally be good, while skipping around the file and reading small amounts of data will be poor. In many cases, reading data from a Variable should give good performance because a Variable's data is stored contiguously, and so can be read with a minimal number of server requests. A record Variable, however, is spread out across the file, so can incur a separate request for each record index.  In that case you may do better copying the file to a local drive, or putting the file into a DODS server which will more efficiently subset the file on the server.

 

7.2 Reading HDF5 Files

 

Java-Netcdf version 2.2 can read much of the information in files written through the HDF5 interface.

 

HDF5 data type

NetCDF data type

Restrictions

0 = Fixed-Point

byte, short, int, long

8, 16, 32, 64 bits only

signed only

1 = Floating-Point

float, double

32, 64 bit IEEE only

2 = Time

String

Not implemented

3= String

char

 

4 = Bitfield

 

Not supported

5 = Opaque

 

Not supported

6 = Compound

Structure

 

7 = Reference

Variable

Read-only

8 = Enumeration

 

Not supported

9 = Variable-Length

String, Sequence

 

10 = Array

Variable

No index permutations

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Object Modification Date and Time Messages are put into a String-valued Attribute called "_LastModified" using Date format.

 

Object Comment Message put in _description attribute.

 

Not implemented: external data storage. file drivers. deflate compression only. slib compression. data  type reference, bitfields (other than ??), enums,. opaque, time.

7.3. NetCDF – OpenDAP Interface

 

The ucar.nc2.dods package provides access to OpenDAP (a.k.a. DODS) datasets, using the OpenDAP protocol, a specialized version of the HTTP protocol for network access to scientific data. OpenDAP datasets can be accessed transparently by passing an OpenDAP dataset URL to the NetcdfFile constructor. See [OpenDAP] for details of the OpenDAP protocol. Also see Appendix B for more detailed examples.

 

As of version 2.2, the ucar.nc2 package uses the Common Data Model API to provide access to all OpenDAP data, so no specialized code need to be written.  The implementing libraries are reasonably efficient in minimizing network latency for common data access patterns. However, there may be times when an application may want to handle remote OpenDAP datasets differently from local netCDF files. To do so, you can cast the ucar.nc2 objects into their corresponding ucar.nc2.dods objects, e.g. DODSNetcdfFile is a subclass of NetcdfFile.

 

Table 1 shows how OpenDAP primitive data types are mapped to netCDF primitive Java types. Note that the OpenDAP unsigned integer types are widened so that they can be represented correctly with signed Java primitive types.

 

OpenDAP primitive                     NetCDF primitive

 

DBoolean

boolean

DByte

byte

DFloat32

float

DFloat64

double

DInt16

short

DInt32

int

DUInt16

int

DUInt32

long

 

Table 1.

 

 

OpenDAP Type                     NetCDF Object

 

BaseType scalar

DODSVariable (rank 0)

DArray of primitive

DODSVariable

DArray of structures

DODSStructureArray  (*)

DGrid

DODSGrid (*)

DList

ignored

DSequence

DODSSequence  (**)

DString, DURL

DODSVariable of type char[n]

DStructure

DODSStructure  (*)

 

Table 2.

(* extension: data also available from plain netCDF API)

(**extension: data not available in plain netCDF API)

 

 

 

OpenDAP Grids and Shared Dimensions

    OpenDAP (version 2) does not have explicit shared Dimensions in its object model, except for Map arrays inside of Grids. Other than for Grids, unnamed OpenDAP dimensions are mapped to anonymous Dimension objects, while named dimensions are mapped to named Dimensions that are local to the Variable.

 

A Grid object is made into a Structure containing the grid array and map Variables. The maps are made into coordinate variables which share their Dimension with the array Variable. These shared Dimensions are added to the containing Group, which is the root Group since OpenDAP doesn't have groups. If more than one Grid has the same map name, then there is a potential conflict. First the coordinate arrays' values are checked for equality. If they are equal, then the Grids will share the Dimension (and thus the coordinate variable). If not, a new Dimension and coordinate variable is created, whose name is qualified by the Grid name.

 

Mapping netCDF to OpenDAP on the server

The semantic mismatch between OpenDAP and netCDF has led to several possible conventions in mapping from netCDF to OpenDAP on the server, and from OpenDAP to netCDF on the client.  One design goal is that netCDF files on the server should be semantically equivalent as seen by a client using the netCDF API. Another goal is to make the common case of a rank 1 char array in netCDF map to an OpenDAP String.

 

Older versions of the netCDF - OpenDAP C++ library mapped char[n] arrays to a DArray of element type DString, where each DString has length 1. Currently, the following convention is the "correct" way for a OpenDAP server to represent netCDF char[n] arrays:

 

1.        A char[n] array maps to a DString. (this is the common case)

2.        A rank k char[n, m, …, p, q] NetCDF array maps to a rank k-1 OpenDAP DArray[n, m, …p] of element type DString, where each DString has length q. An attribute "strlen" is added to the variable, inside an attribute table called "DODS" (to distinguish it as an attribute added by the DODS layer). The strlen attribute means that all of the DString data elements have the same data length (in this example, length q).

 

Attributes {

    var1 {

       DODS {

         Int32 strlen 54;

       }

}

}

Attribute Assignment.

 

  1. If the OpenDAP Attribute Table has same name as a variable, sequence, or structure, the attributes of the table are added to the variable, sequence, or structure
  2. If the OpenDAP Attribute Table name = NC_GLOBAL or HDF_GLOBAL, the attributes of the table are added to the global attributes.
  3. If the OpenDAP Attribute Table name = DODS_EXTRA and it has an attribute with key Unlimited_Dimension, then the attribute value names the unlimited dimension.
  4. All other OpenDAP Attribute Table are added as global attributes, with the table name prepended (separated by ".") to the attribute name.

 

 

NetCDF API Compatibility

 

Java netCDF-OpenDAP has the goal to make as much of an OpenDAP dataset available through the standard netCDF API as possible, so that programs can view a dataset as a NetcdfFile. The main issues are: 1) OpenDAP Strings, which are described in detail in section 2.1 under "String Data Type"; 2) Nested data arrays in DODSStructure, DODSGrid and DODSStructureArray objects, which are “flattened” into global variables; 3) high latencies to the server can degrade performance. All of these problems are reasonably dealt with, and can be explicitly controlled using this constructor:

 

public DODSNetcdfFile(String datasetURL, boolean flatten,   
  int stringArrayPreloadLimit, int stringArrayDefaultSize, int preloadLimit);

 

The flatten parameter controls whether Arrays inside of Structures are made into global variables. The stringArrayPreloadLimit value controls whether arrays of Strings are preloaded during construction. If the value is < 0, arrays are always preloaded; if 0, arrays are never preloaded, otherwise an array is preloaded if the size of the Array (the number of Strings) is less than stringArrayPreloadLimit. When an array of Strings is not preloaded, the string lengths are set to the value of stringArrayDefaultSize. When those arrays are read, longer Strings will be truncated and shorter Strings will be zero padded.  When the OpenDAP dataset  is opened, all variables whose size < preloadLimit are read and cached. When using the DODSNetcdfFile(String datasetURL) constructor, the defaults are flatten=true, stringArrayPreloadLimit=200, stringArrayDefaultSize=100, preloadLimit=100.

 

One issue that is not solved through the standard netCDF API is access to DODSSequence data variables. In this case, you must explicitly work with the DODSNetcdfFile object.

 

 

IOServiceProvider

 

The NetCDF API can be used to access other kinds of scientific data files. The general mechanism for this is to define a class that implements IOServiceProvider, and register it with the NetcdfFile class. When a file is opened through NetcdfFile .open(), it is first checked to see if it is a "native" (netcdf3, netcdf4, hdf5) file, then any registered IOServiceProviders are checked by calling isValidFile(). If this returns true, then a new instance of the class is used for all I/O to the file.

 

To register, call this static method in NetcdfFile:

 

   static public void registerIOProvider( Class iospClass);

 

The iospClass must have a no-argument constructor, and it must implement the IOServiceProvider  interface.
Appendix A. Examples

Read data from a Variable

 

Variable v = ncFile.findVariable("Pressure");

Array a = v.read();             // does the actual I/O

Index ima = a.getIndex();

double p = a.getDouble(ima.set(0,0)); // get first value

 

// looping

double sum = 0.0;

for (int i=0; i<a.getShape(0); i++)

  for (int j=0; j<a.getShape(1); j++)

    p = a.getDouble(ima.set(i, j));

double avg = sum/a.size();

 

// another way to loop

IndexIterator iter = a.getIndexIterator();

double sum = 0.0;

while (iter.hasNext))

  sum += iter.getDoubleNext();

double avg = sum/a.size();

 

// another way to get the sum

double sum = MAMath.sumDouble( a);

 

If you know the type and rank, a convenient variant is:

 

Variable v = ncFile.findVariable("Pressure ");

ArrayDouble.D2 a = (ArrayDouble.D2) v.read();

double p = a.get(0, 0);

 

// looping

for (int i=0; i<a.getShape(0); i++)

  for (int j=0; j<a.getShape(1); j++)

    p = a.get(i, j);

 

Read data sections from a Variable

 

Here we read the entire data for a Variable, but only one “slice” at a time, by looping over the first (outermost) dimension:

 

Variable v = ncFile.findVariable("Pressure");

int[] shape = v.getShape();      // get copy of shape array

int[] origin = new int[ v.getRank()]; // start with all zeroes

int outerDimSize = shape[0];           // outer Dimension size

 

for (int i=0; i<outerDimSize; i++) {

  origin[0] = i; // fix first index

  shape[0] = 1; 

  try {

    Array a = v.read(origin, shape);    // read data from file

    a = a.reduce();               // reduce rank to n-1

    myProcessData( a);

  } catch( IOException ioe) { ... }

  } catch( InvalidRangeException ioe) { ... }  // cant happen

}

 

 

Read Variable data into a Java array

 

Variable v = ncFile.findVariable("ozone");

Array a = v.read();        // read entire data array

double[][][] ozone3Darray = (double[][][]) a.copyToNDJavaArray();

double[] ozone1Darray = (double[]) a.copyTo1DJavaArray();

 

Note that you need to know the type and rank in order to make the correct type casts.

 

Create a netCDF File

 

import ucar.ma2.*;

import ucar.nc2.*;

import java.io.IOException;

 

/**

 * Simple example to create a new netCDF file corresponding to the following

 * CDL:

 *  netcdf example {

 *  dimensions:

 *     lat = 3 ;

 *     lon = 4 ;

 *     time = UNLIMITED ;

 *  variables:

 *     int rh(time, lat, lon) ;

 *              rh:long_name="rela