NetCDF-Java
(version 2.2) User’s Manual
John Caron, August 18, 2004
2.8 ucar.nc2.NetcdfFileWriteable
3. Semantic
Layer: NetcdfDataset
5. NetCDF
Markup Language (NcML)
6.1 ucar.ma2.Array: in-memory multidimensional arrays
6.4 Type and rank specific Arrays
7.1 Remote access to netCDF files through an HTTP server
7.3. NetCDF – OpenDAP Interface
Read data sections from a Variable
Read Variable data into a Java array
Print data from a netCDF dataset
Appendix B: Example OpenDAP to netCDF Conversion
Example: Scalars and Arrays of Primitives
This is user documentation for version 2.2 of the ucar.nc2, ucar.ma2 and related Java packages, also known as "NetCDF-Java version 2" and "MultiArray version 2" packages.
NetCDF-Java version 2.2 provides an Application Programmer Interface (API) for a scientific data model called the Common Data Model (CDM). The CDM is the result of merging the NetCDF (version 3), OpenDAP (version 2), and the HDF5 (version 1.6) data models. NetCDF and HDF5 define standard file formats, while OpenDAP is a network data access protocol. These handle the details of data access, and (for NetCDF and HDF5) file layout and data writing.
The Common Data Model has several layers, which build on top of each other to add successively richer semantics:
NetCDF-Java version 2.2 currently provides read/write access to NetCDF-3 files, read access to most HDF5 files, and read access to OpenDAP datasets. It will also provide read/write access to NetCDF-4 files as that file format becomes available. We are also working on read access to GRIB files, as well as integration with THREDDS dataset annotation.
All of these packages are freely available and the source code is released under the Lesser GNU Public License [LGPL]. They require Java version 1.4 or above.
It is useful to understand the Common Data Model as an abstract data model (a.k.a. an object model) independent of its APIs, which are language dependent, or its file format, which is really an implementation detail. Here is the object model for the CDM data layer:
Fig 1.
NetCDF-Java 2.2 Abstract
Data Model in UML
A Dataset is a generalization of a netCDF file. It may be a netCDF file, an HDF5 file, an OpenDAP dataset, a collection of files, or anything else which can be accessed through the netCDF API.
A Variable is a container for data. It has a dataType, a set of Dimensions that define its array shape, and optionally a set of Attributes.
A Group is a logical collection of Variables. The Groups in a Dataset form a hierarchical tree, like directories on a disk. A Group has a name and optionally a set of Attributes. There is always at least one Group in a dataset, the root Group, whose name is the empty string.
A Dimension has an index length, and is used to define the array shape of a Variable. It may be shared among Variables, which provides a simple yet powerful way of associating Variables. When a Dimension is shared, it has a unique name within the Dataset. It may have a coordinate Variable, which gives each index a coordinate value.
An Attribute has a name and a value, used for associating arbitrary metadata with a Variable or a Group. The value can be a one dimensional array of Strings or numeric values.
A Structure is a type of Variable that contains other Variables, analogous to a struct in C. In general, a Structure's data are physically stored close together on disk, so that it is efficient to retrieve all of the data in a Structure at the same time.
A Sequence is a one dimensional Variable whose length is not known until you actually read the data. All other Variable types know what their array lengths are without having to read the data. You can have sequences of sequences, which is equivalent to ragged arrays.
An Array contains the actual data for a Variable, read from the disk or network, and stored in memory. You get an Array from a Variable by calling read() or its variants.
StructureData contains the actual data for a Structure, like an Array does for a Variable.
A String is a variable length array of UTF-8 encoded Unicode characters.
The primitive types are boolean, byte,
char, short, int, long, float and double, same as in the
Java language. Together with String and Structure,
these correspond to a Variable's DataType. (LOOK:
what about signed / unsigned?)
Groups, Variables, Dimensions and Attributes can be located by name. The full name of a Group or Variable includes the parent group names separated by a "/", as in file names in directories, e.g. "group1/group2/varName". (Note that the root group name is an empty name, rather than "/". This makes objects in the root group look like they are directly contained in the Dataset, for backwards compatibility). When a Variable is a member of a Structure, a "." is used to separate the structure names, e.g. "group1/group2/struct1.struct2.varName". These rules imply that a Variable's short name is unique within its containing Group, or within its containing Structure.
An Attribute's short name is unique within the Group or Variable it belongs to. Its full name uses an "@" as separator e.g. "group1/varName@attName".
[ Dimension short names are unique within the Group it belongs to. Its full name uses the usual "/" as group separator e.g. "group1/group2/dimName". ]
-OR- (hard to do both)
[ Dimensions are scoped by the Group they belong to. When a Variable refers to a Dimension by name, the Dimension is looked for in the Variable's parent group, and if not found, in its parent, etc. ]
The character set for Object names is restricted. A name must start
with a letter or underscore (however, starting with an underscore is reserved
for system defined objects). The rest of the name must be an alphanumeric
character or dash '-' or underscore '_'. [what about
escaping other characters? it does make life harder ]
Array sections can be specified with Fortran 90 array section syntax, using zero-based indexing. For example, varName(12:22,0:100:2,:,17) specifies an array section for a four dimensional variable. The first dimension includes all the elements from 12 to 22 inclusive, the second dimension includes the elements from 0 to 100 inclusive with a stride of 2, the third includes all the elements, and the fourth includes just the 18th element. For structures, you can specify nested selectors, e.g. record(12).wind(1:20,:,3) does a selection on the wind member variable on the record structure at index 12. If you don’t specify a section, it means read the entire variable, e.g. record.wind indicates all the wind variables in all the record structures.
Formally:
variableSection
:= selector | selector '.' selector
selector :=
varName ['(' sectionSpec ')']
varName :=
STRING
sectionSpec:=
dim | dim ',' sectionSpec
dim := ':' |
slice | start ':' end | start ':' end ':' stride
slice :=
INTEGER
start :=
INTEGER
stride :=
INTEGER
end :=
INTEGER
where:
varName = valid variable name
':' = all
indices
slice = one
index = to given value
start:end =
all indices from start to end inclusive
start:end:stride = all indices from start to end inclusive with given stride
This notation is used in the NetcdfFile.read( String variableSection, boolean flatten), Variable.read( String sectionSpec), and NCdump methods.
The following is an overview of the important public interfaces of the ucar.nc2 classes. Consult the javadoc for complete details.
A Dimension object specifies the length of an array dimension. If the Dimension is shared, then it has a name that is unique within its Group. Otherwise it is an anonymous dimension that is local to the Variable that uses it, and it doesn’t have a name. If the Dimension is unlimited, then the length can increase; otherwise, it is immutable. A sequence Variable can have a Dimension that is unknown, whose length varies for each Variable that uses it, which can only be determined by actually reading the Variable's data.
public class Dimension {
public String getName();
public int getLength();
public boolean isUnlimited();
public boolean isUnknown();
public boolean isShared();
public Variable getCoordinateVariable(); // null if none
}
The method getCoordinateVariable() returns the associated coordinate
variable or null if none exists. A coordinate variable is defined as a
Variable with the same name as the
Dimension, whose single dimension is the Dimension, for example: float
lat(lat);
When a variable is displayed in NCdump or equivalent program,
its shape is indicated by its dimensions, e.g. varName(time=12, lat=114,
lon=288). When a variable uses non-shared (anonymous) dimensions, just the
dimension lengths are shown, e.g. varName(
21, 114, 288). When the variable is a sequence, so the length is unknown, a
"*" is used, e.g. varName(*).
DataType is an enumeration of the possible data types that a Variable can take.
public class DataType
{
public static final ucar.nc2.DataType BOOLEAN;
public static final ucar.nc2.DataType BYTE;
public static final ucar.nc2.DataType CHAR;
public static final ucar.nc2.DataType SHORT;
public static final ucar.nc2.DataType INT;
public static final ucar.nc2.DataType LONG;
public static final ucar.nc2.DataType FLOAT;
public static final ucar.nc2.DataType DOUBLE;
public static final ucar.nc2.DataType STRING;
public static final ucar.nc2.DataType STRUCTURE;
public static DataType getType(String
name); // find by name
public static DataType getType(Class
class); // find by class
public String toString();
// eg "double"
public Class getPrimitiveClassType();
// eg double.class
public Class getClassType(); //
eg Double.class
}
An Attribute is a (name, value) pair,
where the value may be a scalar or one dimensional array of String or
Number. An Attribute is attached to a Variable or a Group.
public class Attribute {
public String getName();
public DataType getDataType();
public boolean isString();
public int getLength(); // = 1 for scalars
public Array getValues();
public String getStringValue();
public String getStringValue(int elem);
public Number getNumericValue();
public Number getNumericValue(int elem);
}
A Group is a
container for variables, attributes, dimensions, or other groups. Groups form a
tree of nested groups, similar to file directories. There is always at least
one group in a NetcdfFile, the root group, which has an empty
string as its name. The full name of a group uses "/" as separator,
so "group1/group2/group3" names group3 with parent group2
with parent group1, with parent the root group.
public class Group {
public String getName(); // full name, starting from root group
public String getShortName(); // name local to parent group
public List getVariables();
// list of Variables directly in this group
public Variable findVariable(String shortName); // find specific
Variable
public List getDimensions();
// list of Dimension directly in this group
public Dimension
findDimension(String
name); // find specific Dimension
public List getAttributes();
// list of Attributes directly in this group
public Attribute
findAttribute(String
name); // find specific Attribute
public Attribute
findAttributeIgnoreCase(String name);
public List getGroups();
// list of Groups directly in this group
public Group findGroup(String shortName); // find specific
Group
public Group getParentGroup();
}
A Variable is a multidimensional array of primitive data, Strings or Structures (a scalar variable is a rank-0 array). It has a name and a collection of dimensions and attributes, as well as logically containing the data itself. The rank of a variable is its number of dimensions, and its shape is the lengths of all of its dimensions. The dimensions are returned in getDimensions() in order of most slowly varying first (leftmost index for Java and C programmers). The data elements have type getDataType().
Each Variable is contained in a Group. The full name of a variable uses "/" as separator from its parent groups, so "group1/group2/varName" names varName contained in group2 with parent group1 in the root group.
public class Variable {
public String getName(); // full name, starting from root group
public String getShortName(); // name local to parent group
public Group getParentGroup();
public int getRank(); //
rank of the array
public int[] getShape(); //
shape of the array
public long getSize();
// total number of elements
public int getElementSize(); // byte size of one element
public DataType getDataType();
// data type of elements
public List getDimensions(); // get ordered list of Dimensions
public Dimension getDimension(int i); // get the ith Dimension
public int findDimensionIndex(String name);
// find named Dimension
public List getAttributes(); //
get list of Attributes
public Attribute findAttribute(String attName);
public Attribute findAttributeIgnoreCase(String attName);
public boolean isUnlimited(); //
if any Dimension is unlimited
public boolean isSequence(); //
is a Sequence
public boolean isCoordinateVariable();
// read the data
public Array read();
// read all data
public Array read(int[] origin, int[] shape); // read a
section
public Array read(List section); // list of
ucar.ma2.Range objects
public Array read(String sectionSpec); // fortran90
syntax, eg "3,0:100:10,:"
// for scalar data
public byte readScalarByte();
public double readScalarDouble();
public float readScalarFloat();
public int readScalarInt();
public long readScalarLong();
public short readScalarShort();
public String readScalarString();
// for members of Structures : see section
2.7 below
public boolean isMemberOfStructure();
public Structure getParentStructure()
public Array readAllStructures(List section,
boolean flatten);
public Array readAllStructuresSpec(String sectionSpec,
boolean flatten);
// for Variable sections
public boolean isSection();
public Variable section(List section);
public List getRanges();
}
Data access is handled by calling the various read methods (all of which throw IOException) which return the data in a memory-resident Array object (further manipulation can be done on the Array object, see section 5.2 below). Each read call potentially causes a disk or network access.
The read() method reads all the data and returns it in a memory resident Array which has the same shape as the Variable itself.
The read(int[] origin,int[] shape), read(List section), and read(String sectionSpec) methods specify a section of the Variable to read, whose returned shape is the shape of the requested section. The origin and shape parameters specify a starting index and number of elements to read, respectively, for each of the Variable's dimensions. The section parameter is a list of Range objects, one for each dimension. A null Range in the list means to use the entire shape for that dimension. A Range is constructed with a first and last (inclusive) index value, and optionally a stride. (This is the only way to do strided access)
public class ucar.ma2.Range {
Range(int first, int last);
Range(int first, int last, int stride);
int length();
}
The sectionSpec parameter
specifies the section with a String, using Fortran 90 array section syntax. For
example, "10, 1:3, :, 0:1000:100". See "Array section syntax" in section 2 for
details.
When the Variable is a scalar, you
can use the scalar read routines; for example readScalarDouble() returns
a scalar's value as a double.
When the Variable is a member of
an array of Structures, things are somewhat more complicated. The standard read
methods operate only on the first element of the Structure array. To get data
from other Structure elements, or from multiple Structure elements, you must
use the readAllStructures() and related methods. See next section for
details.
A Variable can be created as a
section of another Variable by using the section() method. A section Variable is a "first class
object" that can be treated like any other Variable. If you call read(
section) on a section Variable, the section parameter will refer to the new
Variable's shape. Generally, a section Variable is a logical section, meaning
that no data is read until a read method is called.
A sequence Variable is indicated by isSequence() of true. Sequences are one dimensional arrays of any DataType whose length
is not determined until the data is actually read. So getShape() and getSize()
return 0. You
cannot call section() or read( section) on a Sequence. IS THIS TRUE?
A Structure is a subclass of Variable that contains member Variables, similar to a struct in the C language. A Structure represents a multidimensional array whose elements have type DataType.STRUCTURE. A read on a Structure returns an array of StructureData objects, which contains data Arrays for the member Variables.
All of a Structure's Variables' data may be assumed to be stored "close" to each other, so it may be more efficient to read all the data in one Structure at once using the Structure.read() method, rather than the read() method on its individual member Variables. This is generally true for remote access protocols like OpenDAP, since each read() call typically incurs the cost of a round-trip to a server.
Both a Structure and a Group contain collections of Variables, but a Group is a logical collection, and a Structure can be considered a physical collection of data stored close together. A Group can contain shared Dimensions, but a Structure cannot. Both can contain Attributes. Most importantly, a Structure can be an array, but a Group cannot.
public class Structure extends
ucar.nc2.Variable {
public List getVariables(); // list of Variables
public List getVariableNames(); // list
of String : members' short names
public Variable findVariable(String
shortName);
public StructureData readStructure(); // for scalar data
public StructureData readStructure(int elem);
// for rank 1 data
public Structure.Iterator getStructureIterator(); //
iterator returns StructureData
}
public class Structure.Iterator {
public boolean hasNext();
public StructureData next() throws
java.io.IOException;
}
To access the data in a structure, typically you use the read()
and read(section) methods inherited from the Variable superclass.
These return arrays of StructureData objects. The readStructure()
and readStructure(int elem) methods can be used when the structure is
scalar or one-dimensional, respectively. For any shape structure, use getStructureIterator()
which returns an iterator that reads one structure in the array each time iterator.next()
is called, and returns a StructureData object.
public class StructureData {
public List getMembers(); // StructureData.Member objects
public java.util.List getMemberNames();
public StructureData.Member findMember(String shortName);
public StructureData.Member findNestedMember(String fullName);
public Array findMemberArray(String memberName);
// for scalar members
public byte getScalarByte(String memberName);
public double getScalarDouble (String memberName);
public float getScalarFloat(String memberName);
public int getScalarInt(String memberName);
public long getScalarLong(String memberName);
public short getScalarShort(String memberName);
public String getScalarString(String memberName); // ok for 1D char
type, too
}
public class StructureData.Member {
public ucar.nc2.Variable v;
public ucar.ma2.Array data;
}
A StructureData object contains the data of the member variables of the structure, after the data has been read into memory with a read method. Each member variable is associated with its data Array in a StructureData.Member object. The getScalarXXX() methods are convenience methods for extracting the data when the member Variable is a scalar.
The read methods on variables that are members of structures are complicated by the fact that the parent structure may be a non-scalar array. The member variable's read method therefore is within the context of some element of its parent structure. For this reason, a member variable is not truly a "first class" citizen. Therefore if you use the ordinary read methods on a variable with isMemberStructure() true, these will return the data only from the first structure.
To read data from a member variable across multiple structures, use the member variable's readAllStructures() method. However, if you need data from more than one member variable, it is probably more efficient to call the read methods on the structure itself, and extract the member data from the StructureData object. You can use the getStructureIterator() method to access all of the data in an array of structures sequentially.
If a Variable is a member of a Structure, its full name uses the
"." as a separator from the Structure name, so e.g. "group1/record.varName"
names varName as a member of the record Structure, which is in group1
whose parent is the root group.
A NetcdfFile provides read-only access to datasets through the netCDF API (to write data, use NetcdfFileWriteble, below). Use the static NetcdfFile.open methods to open a local netCDF (version 3 or 4) file, and most HDF5 files. See NetcdfDataset.open for more general reading capabilities, including OpenDAP and NcML.
You can look at all the variables, dimensions and global/group
attributes through the top level getVariables(), getDimensions(), and getGlobalAttributes()
methods. Alternatively, you can explore the contained objects using groups,
starting with the root group by calling getRootGroup().
Generally, reading NetCDF version 3 files through the NetCDF-Java 2.2 API is backwards compatible with programs using the NetCDF-Java 2.1 API (there are a some name changes), since there are no Groups or Structures in netCDF version 3 files. However, when working with NetCDF version 4 files, HDF5 files, OpenDAP and other kinds of datasets, you will need to deal with the possible presence of Groups, Structures, Sequences, Strings, etc, that are new to the NetCDF-Java 2.2 data model.
public class NetcdfFile {
static NetcdfFile open(String location);
static NetcdfFile open(String location,
String id, String title, boolean useV3,
CancelTask cancel);
public void close();
public String getLocation();
// dataset location
public String getId(); // globally unique id, may be null
public String getTitle(); // human readable title, may be null
public Group getRootGroup();
public List getVariables(); // all Variables in all Groups
public Variable findVariable(String
fullName); // find specific Variable by full name
public Variable findTopVariable(String fullName); // no Structure member
names
public list getDimensions(); //
shared Dimensions
public Dimension findDimension(String
dimName); // get specific Dimension
public List getGlobalAttributes(); //get list of global Attributes
public Attribute findGlobalAttribute(String
attName); //specific Attribute
public Attribute findGlobalAttributeIgnoreCase(String
attName);
public String findAttValueIgnoreCase(Variable v, String attName, String def);
public Array read(String sectionSpec, boolean flatten);
public void writeCDL(java.io.OutputStream os);
public void writeNcML(java.io.OutputStream os);
}
The getVariables() method returns all Variables in all Groups, but does not include members of Structures, since those are considered to be part of the Structure Variable. Similarly, findTopVariable() only searches over these "non-nested" Variables. The findVariable() method will also look for Variables that are members of Structures, e.g. "group1/struct.member ".
The findAttValueIgnoreCase() method is a convenience method for String–valued attributes which returns the specified default value if the attribute name is not found. If you specify a null for the variable then it will search for global attributes, otherwise it will search for the attribute in the given variable.
The read( sectionSpec, flatten) method is a general way to
read from any variable in the NetcdfFile. It accepts a String like varname(0:1,13,:).
See "Array section syntax" section above for details on the
syntac of the sectionSpec string. The flatten parameter is only used
when the requested variable is a member of a structure.
A NetcdfFileWriteable allows a NetCDF version 3 file to be created or written to.
A NetCDF file can only have dimensions, attributes and variables added to it at creation time. Thus, when a file is first opened, it is in "define mode" where these may be added. Once create() is called, the dataset structure is immutable. After create() has been called you can then write the data values. See example below.
public class NetcdfFileWriteable extends NetcdfFile {
public NetcdfFileWriteable(); // create new file
public void setName(String filename);
public setFill(boolean fill); // default false
public Dimension addDimension(String dimName, int dimLength);
public void addGlobalAttribute(String attName, String value);
public void addGlobalAttribute(String attName, Number value);
public void addGlobalAttribute(String attName, Array values);
// add new Variable and variable Attributes
public void addVariable(String varName, DataType varType, List
dims);
public void addVariableAttribute(String varName, Attribute att);
// finish structure definition, create
file
public void create();
// open existing file and allow writing
data
public NetcdfFileWriteable(String filename);
// write data to file
public boolean write(String varName, Array data);
public boolean write(String varName, int[] origin, Array
data);
public void flush(); // flush to disk
public void close();
}
The ucar.nc2.dataset package and related packages are an extension to the NetCDF API which recognize standard attributes, provides support for general and georeferencing coordinate systems, and provide support for the NetCDF Markup Language (NcML). Opening a NetcdfDataset is also the general way to open any dataset as a NetcdfFile.
NcML is an XML document format that allows you to create "virtual" netCDF datasets, including combining multiple netCDF files into one dataset. This section focuses on the extension of the netCDF API to include coordinate systems. Section 4 below explains NcML and how to create virtual datasets.
A standard attribute is a NetCDF attribute that has a defined name and meaning. Several have become widespread and important enough to be handled by the library instead of at the application level. A variable's description is a sentence that describes the meaning of the variable. The units indicate the variable's scientific data units. Missing data values are special values that indicate that the data is not present at that location. Floating point data is often packed into byte or shorts using scale and offset values. Coordinate systems are also often specified through standard attributes.
The VariableEnhanced interface is implemented by classes in the ucar.nc2.dataset package to provide standard methods to access standard attributes. In addition, dataset annotation systems like NcML and THREDDS can add this information to datasets that lack it. It's useful for application developers to use this interface in order to take advantage of both the standard implementations and the opportunity for annotation systems to add value.
The getDescription() and getUnitsString() methods are simple convenience routines for getting the variable's description and units, respectively. More complicated is the handling of packed data and missing data:
· Packed data. If the variable has the scale_factor and/or add_offset attributes, hasScaleOffset() will return true, and the variable will automatically be converted to type double or float and its data unpacked using the formula:
unpacked_value = scale_factor * packed_value + add_offset
(scale_factor will default to 1.0 and add_offset will default to 0.0 if they are missing)
· Missing data. If the variable has any of the invalid or missing data attributes (_FillValue, missing_value, valid_min, valid_max, or valid_range), hasMissing() will return true. To test if a specific value is missing, call isMissing(val). Note that the data is converted and compared as a double. When the variable element type is float or double (or is set to double because it is packed), then missing data values are set to NaN (IEEE not-a-number), which makes further comparisons more efficient.
public
interface VariableEnhanced {
public String getDescription();
// long_name attribute
public String getUnitsString();
// units attribute
public boolean hasScaleOffset();
// is packed
public boolean hasMissing();
public boolean isMissing( double val); // check specific value
}
To use this, open the file through the NetcdfDataset.open method, described below.
A well-established convention with netCDF files is the use of coordinate variables to name the coordinate values of a dimension. A coordinate variable is a one-dimensional variable with the same name as a dimension, e.g. float lat(lat) . It must not have any missing data (for example, no _FillValue or missing_value attributes) and must be strictly monotonic (values increasing or decreasing). Many programs that read netCDF files recognize and use any coordinate variables that are found in the file.
Because coordinate variables must be one-dimensional, they cannot represent some commonly used coordinates, for example float lat(x,y) and float lon(x,y) assign latitude and longitude coordinates to points on a projection plane. A coordinate axis is a generalization of a coordinate variable. It must be a variable with no missing data, and no extra dimensions besides its coordinate dimensions (i.e. it may not be vector valued). It may have any number of dimensions. A variable that uses a coordinate axis must have all of the dimensions that the coordinate axis has, for example, the variable T(time, z, y, x) can have a coordinate axis lat(x,y) because x and y are both used by T, but variable R (time, y) does not use the x dimension, so cannot have lat as a coordinate axis.
A coordinate system is a set of coordinate axes used by a variable. If S is a coordinate system for a variable V, and there are n coordinate axes in S, then S assigns to each value of V n coordinate values, one from each coordinate axis. For example if the variable T(time, z, y, x) has a coordinate system S consisting of the coordinate axes lat(x, y), lon(x,y), level(z), and time(time), then the coordinate values for T(t, k, j, i) in S are (lat(i,j), lon(i,j), level(k), time(t)). Note that the order of the indices in the lat and lon axes is important! Another interesting example is assigning positions to a trajectory, for example lat(sample), lon(sample), altitude(sample) might be a coordinate system for variable O3(sample). A coordinate transformation is a function that transforms the values in one coordinate system to the values in another coordinate system.
In general, coordinate systems can be used by more than one variable, and a variable can have 0, 1, or more coordinate systems.
Fig 2.
Coordinate System Data
Model in UML
In the earth sciences, special consideration is given to coordinate systems that locate data in physical space and time, called georeferencing coordinate systems. In the simplest case, for example, a georeferencing coordinate system identifies the latitude, longitude, height, and time of the data. For numerical model data often the horizontal coordinates are specified on a projection plane, whose (latitude, longitude) coordinates can be found by using a mathematical formula from projective geometry. Satellite data can be extremely complicated, and the position and extent of each datum must be calculated using sensor-dependent empirical algorithms.
Implicit in georeferencing coordinate systems are various standards, for example the "Clarke ellipsoidal earth model", "polar stereographic projection", "mean sea level" and "Gregorian calendar". These are part of "well-known" reference coordinate systems. Abstractly, a georeferencing coordinate system is one which allows us to calculate the latitude, longitude, height, and time of each point in its domain. Concretely, the ability to make that calculation might depend on the availability of a library or service which implements a coordinate transformation from the coordinate system specified in the netCDF dataset to a (latitude, longitude, height, time) reference coordinate system.
The specification of the reference coordinate system may be arbitrarily complicated, depending upon how accurate the position must be calculated. Importing data into geographic information systems (GIS) can also require much information that may not be explicitly stored in the netCDF file. The netCDF Dataset API focuses on a minimal set of information needed for implementing georeferencing coordinate systems. Future extensions will define more complete metadata requirements for GIS interoperability.
For our purposes, a georeferencing coordinate system is one in which we have identified the lat, lon, height, and time axes of the coordinate system, or have identified coordinate transformations from which latitude, longitude, height, and time can in principle be calculated.
Many netCDF files follow naming and attribute Conventions that allow readers to understand what the variables and dimensions in the files mean, and in particular that identify the coordinate systems in use. The classes in the ucar.nc2.dataset.conv package implement some of the important Conventions that Unidata is familiar with. In this context, implementing the Convention means to identify the coordinate systems, and the georeferencing information that are present in the netCDF file, and create a NetcdfDataset object that holds that information.
The logic of each Convention is coded in a class that extends ucar.nc2.dataset.conv.Convention, whose augmentDataset() method adds addition information to a NetcdfDataset:
public abstract void augmentDataset( NetcdfDataset ncDataset);
We highly recommend that you document and register your naming and
attribute Conventions. Unidata
maintains a web page of such Conventions. If
you are using Conventions that are not included in this package, you are
encouraged to extend the abstract class ucar.nc2.dataset.conv.Convention, implementing
your Convention, and. register it (using ucar.nc2.dataset.conv.Convention.register()).
The factory method
will look in the netCDF file for a global attribute named
"Conventions", and match its value against registered names. If it
finds a match, it will use the registered class to instantiate an object of
that class (using the no-argument constructor, so make sure you have such a
constructor if you are implementing your Convention).
The following presents the important public methods of the ucar.nc2.dataset classes. See the javadoc for complete details.
A NetcdfDataset is a NetcdfFile and so can be used wherever a NetcdfFile object is used. Use these factory methods to open any kind of dataset through the NetCDF API, whether you want to enhance it or not.
NetcdfDataset.openFile() returns a NetcdfFile: the uriString can start with prefix dods: for OpenDAP files, http: for netCDF files on an HTTP server, or file: for a local netCDF file. If it does not have one of those prefixes, then it should be a local file name. If it has a suffix of .xml or .ncml it is assumed to be an NcML file.
The NetcdfDataset.open() factory methods open the dataset, optionally enhance it by reading conventions and attributes, and return a NetcdfDataset. This contains ucar.nc2.dataset.VariableDS objects instead of ucar.nc2.Variable objects. If enhanced, the getCoordinateAxes() method returns a list of all of the variables used as coordinate axes, and the getCoordinateSystems() method returns a list of all of the coordinate systems used in the dataset. Note that the coordinate axes are still variables that are included in the getVariableIterator() and findVariable() methods.
public class NetcdfDataset extends ucar.nc2.NetcdfFile {
public static NetcdfFile openFile(String location, String id, String title,
boolean useV3, CancelTask cancelTask);
public static NetcdfDataset open(String location, String id,
String title);
public static NetcdfDataset open(String location, String id, String title,
boolean useV3, boolean addCoordinates, CancelTask cancelTask);
public NetcdfDataset(String ncmlLocation); // create from NcML XML doc
public List getCoordinateSystems(); // list of CoordinateSystem objects
public List getCoordinateAxes(); // list of CoordinateAxis objects
}
A VariableDS object extends Variable, and so can be used wherever a Variable object is used. It implements all of the VariableEnhanced methods shown above, along with the getCoordinateSystems() method that returns a list of all of the coordinate systems used for the variable. Note that this list may be empty unless a ucar.nc2.dataset.conv class (or equivalent) has been able to identify the coordinate systems.
public class VariableDS extends Variable implements
VariableEnhanced {
...
public List getCoordinateSystems();
}
To get the VariableDS objects out of a NetcdfDataset, you must cast, eg:
for (Iterator iter =
ncd.getVariables().iterator(); iter.hasNext(); ) {
VariableDS varDS = (VariableDS) iter.next();
...
}
A CoordinateAxis extends VariableDS, so is also a Variable, one that is used as part of a Coordinate System. The getAxisType() optionally indicates the type of the coordinate axis, which can be Lat, Lon, Height, Pressure, Time, GeoX, GeoY, or GeoZ. The getPositive() method returns the String POSITIVE_DOWN or POSITIVE_UP, and is used only for vertical axes. If POSITIVE_UP, then increasing values of that coordinate go vertically up.
public class
CoordinateAxis extends VariableDS {
...
public AxisType getAxisType(); //
null if not special type
public String getUnitString(); // coordinate units
public String getDescription(); // description
public String getPositive(); // up or down
public boolean isNumeric(); //
numeric or String valued
public boolean isContiguous(); // contiguous vs disjoint edges
public double getMinValue();
// minimum coordinate value
public double getMaxValue();
// maximum coordinate value
}
A CoordinateAxis1D is a one dimensional CoordinateAxis. This is the common case that implies a one to one correspondence between a dimensional index and a coordinate value. If isContiguous() is true, you can use the double[len+1] edge array from getCoordEdges() (where value[i] is contained in interval [edge[i], edge[i+1]], otherwise you must use the more general getCoordEdges(int i) which returns the 2 edges for the ith coordinate. If isRegular(), then value[i]=getStart()+i*getIncrement().
public class CoordinateAxis1D extends CoordinateAxis {
public String getCoordName(int index); // String name
public double getCoordValue(int index); // value from index
public int findCoordElement(double value); // index from value
public double[] getCoordValues(); // double array length len
public double[] getCoordEdges(); // double array length len+1
public double getCoordEdge(int index); // edge value from index
public double[] getCoordEdges(int i); // edges for ith coord
public boolean isRegular(); // if evenly spaced
public double getStart();
// value = start + i * increment
public double getIncrement();
}
A CoordinateAxis2D is a two dimensional CoordinateAxis, for example float lat(i,j) and float lon(i,j). Currently is just has convenience routines for fetching the coordinate values.
public class CoordinateAxis2D extends CoordinateAxis {
...
public double getCoordValue(int i, int j); // get i, j coordinate
public double[] getCoordValues(); // get coordinates as 1D array
}
A CoordinateSystem has a list of coordinate axes and an optional list of coordinate transforms, along with various convenience routines for extracting georeferencing information.
public class CoordinateSystem {
...
public List getCoordinateAxes(); // list of CoordinateAxis
public List getCoordinateTransforms();
// list of CoordinateTransform
public boolean isProductSet(); // all axes CoordinateAxis1D
public boolean isGeoReferencing();
public boolean isGeoXY();
public boolean isLatLon();
public boolean hasVerticalAxis();
public boolean hasTimeAxis();
public ucar.unidata.geoloc.ProjectionImpl getProjection();
public CoordinateAxis getXaxis(); // look for AxisType
public CoordinateAxis getYaxis();
public CoordinateAxis getZaxis();
public CoordinateAxis getTaxis();
public CoordinateAxis getLatAxis();
public CoordinateAxis getLonAxis();
public CoordinateAxis getHeightAxis();
public CoordinateAxis getPressureAxis();
}
A CoordinateTransform represents a transformation from the containing CoordinateSystem to a “well known” reference system. It has a name, a naming authority, a set of name/value parameters used by the transformation, and an optional TransformType. A transform type of Projection indicates that it transforms from GeoX, GeoY coordinates to Lat, Lon coordinates.
public class CoordinateTransform {
public String getName();
public String getAuthority();
public List getParameters(); // list of Attribute objects
public Attribute findParameterIgnoreCase(String
name);
public TransformType getTransformType();
}
Note: these classes should be considered experimental and may be
refactored in the next release.
The ucar.nc2.dataset objects handle coordinate systems in a general way. The classes in the ucar.nc2.dataset.grid package are specialized for georeferencing coordinate systems in which the x, y, z, and time axes are explicitly recognized.
In order for it to be georeferencing, a coordinate system must have a lat/lon coordinate axis or a geoX/geoY coordinate axis and a projection that transforms it to lat/lon. It may optionally have vertical and time axes. Currently, vertical and time axes, if they exist, must be one-dimensional, and x/y or lat/lon axes must be 1 or 2 dimensional. Variables that have georeferencing coordinate systems are made into GeoGrids.
A GridDataset is the collection of GeoGrids found in a netCDF Dataset. You can wrap an already opened netCDF Dataset into a GridDataset, or you can use the static factory method to open a netCDF file, read its Conventions and extract the GeoGrids all at once:
GridDataset gds =
ucar.nc2.dataset.grid.GridDataset.factory(uriString);
is equivalent to:
String uriString = ; // file:, dods:, http:,
or local filename
NetcdfFile ncfile =
ucar.nc2.dataset.NetcdfDataset.factory( uriString, null);
NetcdfDataset ds =
ucar.nc2.dataset.conv.Convention.factory( ncfile);
GridDataset gds = new GridDataset ( ds);
GridDataset has the following public interface:
public class GridDataset {
public static GridDataset factory(String uriString);
public GridDataset(NetcdfDataset dset); // wrap a dataset
public void close();
public String getName();
public List getGrids(); // list of GeoGrids
public Collection getGridSets(); // sorted by coordinate system
public GeoGrid findGridByName(String name);
public NetcdfDataset getNetcdfDataset(); // underlying dataset
}
A GridCoordSys wraps a georeferencing coordinate system. It always has 1D or 2D XHoriz and YHoriz axes, and optionally 1D vertical and time axes. The XHoriz/YHoriz axes will be lat/lon if isLatLon() is true, otherwise they will be GeoX,GeoY with an appropriate Projection. The getBoundingBox() method returns a bounding box from the XHoriz/YHoriz corner points. The getLatLonBoundingBox() method returns the smallest lat/lon bounding box that contains getBoundingBox().
If there is a vertical axis, isZPositive() is true if increasing values of the vertical axis should be displayed “up”. The getLevels() and getTimes() returns the list of levels and times, if they exist, as lists of NamedObjects that are convenient for display. If there is a time axis, and it can be converted (via ucar.units) into a Date, isDate() is true, and getTimeDates() returns the time coordinates as Date objects. The findTimeCoordElement() method does a reverse lookup, finding the time index that corresponds to a given Date.
public class GridCoordSys extends CoordinateSystem {
public GridCoordSys(CoordinateSystem sys);
public CoordinateAxis getXHorizAxis(); // GeoX or Lon
public CoordinateAxis getYHorizAxis(); // GeoY or Lat
public CoordinateAxis1D getVerticalAxis(); // Height,Pressure,or GeoZ
public CoordinateAxis1D getTimeAxis();
public ArrayList getLevels(); // list of
ucar.nc2.util.NamedObject
public ArrayList getTimes();
// list of ucar.nc2.util.NamedObject
public String getLevelName(int idx);
public String getTimeName(int idx);
public ucar.unidata.geoloc.ProjectionImpl getProjection();
public ProjectionRect getBoundingBox();
public LatLonRect getlatLonBoundingBox()
public boolean isLatLon();
public boolean isZPositive(); // increasing means ‘up’
public boolean isDate(); // has Date time axes
public java.util.Date[] getTimeDates(); // get time coords as Dates
public int findTimeCoordElement(java.util.Date d);
}
interface ucar.nc2.util.NamedObject{
public String getName();
public String getDescription();
}
A GeoGrid wraps a VariableDS in a VariableStandarized, and also has a GridCoordSys. You can think of it as a specialized Variable that explicitly handles X,Y,Z,T dimensions, which are put into canonical order: (t, z, y, x). It has various convenience routines that expose methods from the GridCoordSys and VariableDS objects.
public class GeoGrid implements ucar.nc2.util.NamedObject {
public String getName();
public GridCoordSys getCoordinateSystem();
public int getRank();
public List getDimensions();
public Dimension getDimension(int idx);
public Dimension getTimeDimension();
public Dimension getZDimension();
public Dimension getYDimension();
public Dimension getXDimension();
public int getTimeDimensionIndex();
public int getZDimensionIndex();
public int getYDimensionIndex();
public int getXDimensionIndex();
// from VariableEnhanced
public String getDescription();
public String getUnitString();
public boolean hasMissingData();
public boolean isMissingData(double val);
public float[] setMissingToNaN(float[] vals);
public ucar.nc2.Attribute findAttributeIgnoreCase(String attName);
// from GeoCoordSys
public ucar.unidata.geoloc.ProjectionImpl getProjection();
public java.util.ArrayList getLevels();
public java.util.ArrayList getTimes();
public ucar.ma2.MAMath$MinMax getMinMaxSkipMissingData(ucar.ma2.Array data);
// read data
public ucar.ma2.Array readVolumeData(int t); // z,y,x volume
public ucar.ma2.Array readYXData (int t,int z); // y,x slice
public ucar.ma2.Array readZYData (int t,int x); // z,y slice
public ucar.ma2.Array getDataSlice(int t,int z,int y,int x); // any
}
coming soon
The NetCDF Markup Language (NcML) is an XML representation of the metadata in a netCDF file. It is described formally by a schema model expressed in XML Schema, see http://www.unidata.ucar.edu/schemas/netcdf.xsd.
http://www.unidata.ucar.edu/packages/netcdf/ncml/index.html
Here is an example netCDF file in CDL:
netcdf
example.nc {
dimensions:
lat = 3;
// (has coord.var)
lon = 4;
// (has coord.var)
variables:
int rh(lat, lon);
:long_name = “relative humidity”;
:units = “percent”;
float lat(lat);
:units = “degrees_north”;
float lon(lon);
:units = “degrees_east”;
// Global Attributes:
:title = “Example Data”;
}
which has the following NcML representation:
<?xml
version=”1.0” encoding=”UTF-8”?>
<nc:netcdf
xmlns:nc=http://www.ucar.edu/schemas/netcdf
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation=”http://www.ucar.edu/schemas/netcdf
http://www.unidata.ucar.edu/schemas/netcdf-cs.xsd”
uri=”example.nc”>
<nc:dimension name=”lat” length=”3” />
<nc:dimension name=”lon” length=”4” />
<nc:attribute name=”title” type=”string”
value=”Example Data” />
<nc:variable name=”rh” shape=”lat lon”
type=”int”>
<nc:attribute name=”long_name”
type=”string” value=”relative humidity” />
<nc:attribute name=”units”
type=”string” value=”percent” />
</nc:variable>
<nc:variable name=”lat” shape=”lat”
type=”float”>
<nc:attribute name=”units”
type=”string” value=”degrees_north” />
</nc:variable>
<nc:variable name=”lon” shape=”lon”
type=”float”>
<nc:attribute name=”units”
type=”string” value=”degrees_east” />
</nc:variable>
</nc:netcdf>
The NcML base schema simply reflects the netCDF data model:
NcML Coordinate Systems is an extension of the NcML base schema. It significantly extends the netCDF data model in order to capture the semantics of general coordinate systems, and georeferencing coordinate systems used in the earth sciences. The important additions to the base schema are:
See http://www.unidata.ucar.edu/schemas/netcdf-cs.xsd for the schema.
See http://www.unidata.ucar.edu/packages/netcdf/ncml/AnnotatedNetcdfCS.html for more information.
NcML Dataset is an extension of the NcML base schema, which defines the public metadata of a netCDF Dataset. A netCDF Dataset is a generalization of a NetCDF file. Its purpose is to allow
See http://www.unidata.ucar.edu/schemas/netcdf-cs.xsd for the schema.
See http://www.unidata.ucar.edu/ packages/netcdf/ncml/NetcdfDataset.html for more information.
The ucar.ma2 package implements multidimensional arrays of arbitrary rank and element type. Actual data storage is done with Java 1D arrays and stride index calculations. This makes our Arrays rectangular, i.e. these cannot be "ragged arrays" where different elements can have different lengths as in Java multidimensional arrays, which are arrays of arrays.
The ucar.ma2 package is independent of the ucar.nc2 package, and is intended for general multidimensional array use. Its design is motivated by the needs for NetCDF data to be handled in a general, arbitrary rank, type independent way, and also by the requirements of the JavaGrande numeric working group [Caron2000].
It is often critically important for performance that the movement of data between memory and disk is carefully managed. To obtain the data in a ucar.nc2.Variable object you must call read() to bring the data into memory in the form of an Array. Any method that potentially makes an IO call will have an IOException in its signature. Note that none of the methods on Array do. The fact that a Variable can throw an IOException but an Array object cannot may in fact be a critical factor in how these objects are used [Waldo94].
The following is an overview of the important public interfaces of the ucar.ma2 classes. Consult the javadoc for more complete and recent details.
The data type, rank, and shape of an array are immutable, while the data values themselves are mutable. Generally this makes Arrays thread-safe, and no synchronization is done in the Array package. (There is the possibility of non-atomic read/writes on 64 bit primitives (long, double). In this case the user should add their own synchronization if needed. Presumably 64-bit CPUs will make those operations atomic also.)
public
abstract class Array {
// array shape and type
public long getSize(); // total # elements
public int getRank();
// array rank
public int[] getShape(); // array dimension sizes
public Class getElementType(); // data type of backing array
// accessor helpers
public Index getIndex(); // random access
public IndexIterator getIndexIterator(); // sequential access
public IndexIterator getRangeIterator(Range[]
ranges); // access subset
public IndexIterator getIndexIteratorFast(); // arbitrary order
// accessors: for each data type (double,
float, long, int, short,
// byte, char, boolean) there are methods
of the form eg:
public double getDouble(Index ima);
public void setDouble(Index ima,
double value);
...
// create new Array, no data copy
public Array flip( int dim); // invert dimension
public Array permute( int[] dims);
// permute dimensions
public Array reduce(); // rank reduction for any dims of
length 1
public Array reduce(int dim); // rank reduction for specific dimension
public Array section( Range[]
ranges); // create logical subset
public Array sectionNoReduce( Range[]
ranges); // no rank reduction
public Array slice(int dim, int
val); // rank-1 subset
public Array transpose( int dim1,
int dim2); // transpose dimensions
// create new Array, with data copy
public Array copy();
public Array sectionCopy( Range[]
ranges); // subset
public Array reshape( int []
shape); // total # elements must be the
same
// conversion to Java arrays
public java.lang.Object copyTo1DJavaArray();
public java.lang.Object copyToNDJavaArray();
}
The getShape() method returns an integer array containing the length of the Array in each dimension. The getRank() method returns the number of dimensions, and getSize() returns the total number of elements in the Array. The getElementType() method returns the data type of the backing store, e.g. double.class, float.class, etc.
Data element access is described in the sections following this one.
Logical "views" of the array are created in several ways. The section() method creates a subarray of the original array. The slice() method is a convenience routine for the common section() operation of rank-1 section of the array. The transpose() method transposes two dimensions, while permute() is a general permutation of the indices. The flip() method flips the index of the specified dimension so that it logically runs from n-1 to 0, instead of from 0 to n-1. The reduce() method allows user control over rank-reduction. All of these logically reorder or subset the data without copying.
Methods that create new Arrays by copying the data are copy(), sectionCopy() and reshape().
The data can be copied into a Java array using the copyTo1DjavaArray() and copyToNDjavaArray() methods. In the first case, a 1D Java array of the appropriate primitive type is created and the data is copied to it in logical order (rightmost indices varying fastest). In the second case, an N-dimensional Java array is created that matches the Array shape, and the data is copied into it. The user must cast the returned Object to the appropriate Java array type.
Accesses to specific array elements are made using an Index object, for example:
double sum = 0.0;
Index index = A.getIndex();
int [] shape = A.getShape();
for (i=0; i<shape[0]; i++)
for (j=0; j< shape[1]; j++)
for (k=0; k< shape[2]; k++)
sum += A.getDouble(index.set(i,j,k));
Note that in this example, A can be of any type convertible to a double. Index has various convenience methods for setting the element index:
public
class Index {
// general
public Index set(int []
index);
public void setDim(int dim, int
value);
// convenience methods for rank 0-7
public Index set(int v0); // set index 0
public Index set(int v0, int
v1); // set index 0,1
public Index set(int v0, int
v1, int v2); // set index 0,1,2
... // ..up to dimension 7
public Index set0(int v); // set index 0
public Index set1(int v); // set index 1
public Index set2(int v); // set index 2
... //
..up to dimension 7
}
Because an Index object stores state, threads that share an Array object must obtain their own Index from the Array.
An IndexIterator is used to sequentially traverse all data in an Array in logical (row-major) order. For example, logical order for A(i,j,k) has k varying fastest, then j, then i. Note that because of the possibility that A is a flipped or permuted view, logical order may not be the same as physical order. Example:
double sum = 0.0;
IndexIterator iter =
A.getIndexIterator();
while (iter.hasNext())
sum += iter.getDoubleNext();
Note that in the above example A can be of arbitrary rank.
public
interface IndexIterator {
public boolean hasNext();
// for each data type
public double getDoubleNext();
public double getDoubleCurrent();
public void setDoubleNext(double val);
public void setDoubleCurrent(double val);
...
}
There are two special kinds of iterators: Array.getIndexIteratorFast() returns an Iterator that iterates over the array in an arbitrary order. It can be used to make iteration as fast as possible when the order of the returned elements is immaterial, for example in the summing example above. Array.getRangeIterator() returns a "range" iterator that iterates over a subset of an array, in logical order. This is an alternative (and equivalent) to first creating an array section, and then obtaining an Iterator. In the following example, the sum is made only over the first 10 rows, and all columns, of the array:
int sum = 0;
IndexIterator iter =A.getRangeIterator(new
Ranges[2]{new Range(0,9), null});
while (iter.hasNext())
sum += iter.getIntNext();
For each data type, there is a concrete class that extends ArrayAbstract, e.g., ArrayDouble, ArrayFloat, ArrayByte,etc. ArrayObject is used for all reference types. For each of these, there is a concrete subclass for each rank 0-7, for example ArrayDouble.D3 is a concrete class specialized for double arrays of rank 3. These rank-specific classes are static inner classes of their superclass. This design allows handling arrays completely generally (through the Array class), in a rank-independent way (though the Array<Type> classes), or in a rank and type specific way for ranks 0-7 (through the Array<Type>.D<rank> classes).
The most general way to create an Array is to use the static factory method in Array:
public abstract class Array {
static public Array factory( Class type, int [] shape);
...
}
Array a = Array.factory( double.class,
new int[] {128} );
Will create a 1D double array analogous to new double[128].
The type-specific subclasses can be instantiated directly with an arbitrary rank. These also add type-specific get/set accessors, for example:
public class ArrayDouble extends Array {
// constructor
public ArrayDouble(int [] dimensions);
// type-specific
accessors
public double get(Index i);
public void set(Index i, double value);
...
}
ArrayDouble a = new ArrayDouble ( new
int[] {128, 64} );
Will create a 2D double array analogous to new double[128][64].
If you create your own Array objects, you should usually use the rank and type specific subclasses, which will provide the most efficient access. These classes also add rank-specific get/set routines, for example:
public static class D3 extends ArrayDouble {
// constructor
public D3 (int len0, int len1, int len2);
// type and rank specific accessors
public double get(int i, int j, int k);
public void set(int i, int j, int k, double value);
}
ArrayDouble a = new ArrayDouble.D3 (
128, 64, 32);
Will create a 3D double array analogous to new double[128][64][32].
You may also create an Array from an N-dimensional Java array:
public static Array factory(java.lang.Object javaArray);
Array a = Array.factory( new short[] {128, 123, 43} );
will create a 1D short array of length 3, with the given values.
Note that in this case, the data elements are copied out of the Java array into the private Array storage.
IndexImpl is a concrete, general rank implementation of Index, and is extended by rank specific subclasses for efficiency (we have rank 0-7 implementations). Array and its subclasses have an IndexImpl of the appropriate rank that is delegated all rank-specific functions. This orthogonal design keeps the number of classes small, and makes adding new ranks or data types quite simple.
The "logical view" operations (flip, section, transpose, slice and permute) are implemented by manipulating the index calculation within the IndexImpl object. These operations are affine, as is the operation that transforms the n-dim index into the 1-dim element index, and therefore any composition is an affine transformation. The resulting transformation is immutable, and can be computed during the IndexImpl object construction. Therefore there is no extra cost associated with the index calculation for these operations (or any composition of them) during element access. These operations do logical data reordering; physical reordering can be done by making an array copy.
An IndexIterator traverses array elements in logical order, which we have defined as row-major (as in C). An iterator can in principle be more efficient than other element accesses because 1) the index values cannot be out of range, and therefore do not need to be bounds checked, and 2) the element calculation usually changes by a fixed stride each time. We take advantage of these facts in our implementation, as well as package-private accessor methods, to reduce the number of method calls per data access from 3 to 2.
The NetcdfFile.open methods expect a location that starts with http: for files served from an HTTP server, or a location that starts with file: or is a local file name, for files to be opened as local files. These files must be in one of the following formats:
The NetcdfDataset.openFile method opens:
· anything that NetcdfFile.open does
The NetcdfDataset.open method calls NetcdfDataset.openFile, then wraps the NetcdfFile object in a NetcdfDataset if it is not already. It will optionally identify the coordinate systems if it can. If it is a THREDDS dataset, it will add THREDDS information into the NetcdfDataset.
Each scientific data type has a factory method which uses the NetcdfDataset.open method to get a NetcdfDataset, then wraps that in the specialized object:
NetCDF-3 files can be made accessible over the network by simply placing them on an HTTP (web) server, like Apache. The server must be configured to set the "Content-Length" and "Accept-Ranges: bytes" headers.
The client that wants to read these files just uses the usual NetcdfFile.open(String location, …) method to open a file. The location contains the URL of the file, for example: http://www.unidata.ucar.edu/staff/caron/test/mydata.nc. In order to use this option you need to have the HttpClient.jar file in your classpath.
The ucar.nc2 library uses the HTTP 1.1 protocol's "Range" command to get ranges of bytes from the remote file [HTTP]. The efficiency of the remote access depends on how the data is accessed. Reading large contiguous regions of the file should generally be good, while skipping around the file and reading small amounts of data will be poor. In many cases, reading data from a Variable should give good performance because a Variable's data is stored contiguously, and so can be read with a minimal number of server requests. A record Variable, however, is spread out across the file, so can incur a separate request for each record index. In that case you may do better copying the file to a local drive, or putting the file into a DODS server which will more efficiently subset the file on the server.
Java-Netcdf version 2.2 can read much of the information in files written through the HDF5 interface.
HDF5 data type |
NetCDF data type |
Restrictions |
0 = Fixed-Point |
byte, short, int, long |
8, 16, 32, 64 bits only signed only |
1 = Floating-Point |
float, double |
32, 64 bit IEEE only |
2 = Time |
String |
Not implemented |
3= String |
char |
|
4 = Bitfield |
|
Not supported |
5 = Opaque |
|
Not supported |
6 = Compound |
Structure |
|
7 = Reference |
Variable |
Read-only |
8 = Enumeration |
|
Not supported |
9 = Variable-Length |
String, Sequence |
|
10 = Array |
Variable |
No index permutations |
Object Modification Date and Time Messages are put into a String-valued Attribute called "_LastModified" using Date format.
Object Comment Message put in _description attribute.
Not implemented: external data storage.
file drivers. deflate compression only. slib compression. data type reference, bitfields (other than ??),
enums,. opaque, time.
The ucar.nc2.dods package provides access to OpenDAP (a.k.a. DODS) datasets, using the OpenDAP protocol, a specialized version of the HTTP protocol for network access to scientific data. OpenDAP datasets can be accessed transparently by passing an OpenDAP dataset URL to the NetcdfFile constructor. See [OpenDAP] for details of the OpenDAP protocol. Also see Appendix B for more detailed examples.
As of version 2.2, the ucar.nc2 package uses the Common Data Model API to provide access to all OpenDAP data, so no specialized code need to be written. The implementing libraries are reasonably efficient in minimizing network latency for common data access patterns. However, there may be times when an application may want to handle remote OpenDAP datasets differently from local netCDF files. To do so, you can cast the ucar.nc2 objects into their corresponding ucar.nc2.dods objects, e.g. DODSNetcdfFile is a subclass of NetcdfFile.
Table 1 shows how OpenDAP primitive data types are mapped to netCDF primitive Java types. Note that the OpenDAP unsigned integer types are widened so that they can be represented correctly with signed Java primitive types.
OpenDAP primitive
NetCDF primitive
DBoolean |
boolean |
DByte |
byte |
DFloat32 |
float |
DFloat64 |
double |
DInt16 |
short |
DInt32 |
int |
DUInt16 |
int |
DUInt32 |
long |
Table
1.
OpenDAP
Type NetCDF Object
BaseType scalar |
DODSVariable (rank 0) |
DArray of primitive |
DODSVariable |
DArray of structures |
DODSStructureArray (*) |
DGrid |
DODSGrid (*) |
DList |
ignored |
DSequence |
DODSSequence (**) |
DString, DURL |
DODSVariable of type char[n] |
DStructure |
DODSStructure (*) |
Table
2.
(*
extension: data also available from plain netCDF API)
(**extension:
data not available in plain netCDF API)
OpenDAP (version 2) does not have explicit shared Dimensions in its object model, except for Map arrays inside of Grids. Other than for Grids, unnamed OpenDAP dimensions are mapped to anonymous Dimension objects, while named dimensions are mapped to named Dimensions that are local to the Variable.
A Grid object is made into a Structure containing the grid array and map Variables. The maps are made into coordinate variables which share their Dimension with the array Variable. These shared Dimensions are added to the containing Group, which is the root Group since OpenDAP doesn't have groups. If more than one Grid has the same map name, then there is a potential conflict. First the coordinate arrays' values are checked for equality. If they are equal, then the Grids will share the Dimension (and thus the coordinate variable). If not, a new Dimension and coordinate variable is created, whose name is qualified by the Grid name.
The semantic mismatch between OpenDAP and netCDF has led to several possible conventions in mapping from netCDF to OpenDAP on the server, and from OpenDAP to netCDF on the client. One design goal is that netCDF files on the server should be semantically equivalent as seen by a client using the netCDF API. Another goal is to make the common case of a rank 1 char array in netCDF map to an OpenDAP String.
Older versions of the netCDF - OpenDAP C++ library mapped char[n] arrays to a DArray of element type DString, where each DString has length 1. Currently, the following convention is the "correct" way for a OpenDAP server to represent netCDF char[n] arrays:
1. A char[n] array maps to a DString. (this is the common case)
2. A rank k char[n, m, …, p, q] NetCDF array maps to a rank k-1 OpenDAP DArray[n, m, …p] of element type DString, where each DString has length q. An attribute "strlen" is added to the variable, inside an attribute table called "DODS" (to distinguish it as an attribute added by the DODS layer). The strlen attribute means that all of the DString data elements have the same data length (in this example, length q).
Attributes {
var1 {
DODS {
Int32 strlen 54;
}
}
}
Java netCDF-OpenDAP has the goal to make as much of an OpenDAP dataset available through the standard netCDF API as possible, so that programs can view a dataset as a NetcdfFile. The main issues are: 1) OpenDAP Strings, which are described in detail in section 2.1 under "String Data Type"; 2) Nested data arrays in DODSStructure, DODSGrid and DODSStructureArray objects, which are “flattened” into global variables; 3) high latencies to the server can degrade performance. All of these problems are reasonably dealt with, and can be explicitly controlled using this constructor:
public DODSNetcdfFile(String datasetURL, boolean flatten,
int stringArrayPreloadLimit, int stringArrayDefaultSize, int preloadLimit);
The flatten parameter controls
whether Arrays inside of Structures are made into global variables. The stringArrayPreloadLimit value controls whether arrays of
Strings are preloaded during construction. If the value is < 0, arrays are
always preloaded; if 0, arrays are never preloaded, otherwise an array is
preloaded if the size of the Array (the number of Strings) is less than stringArrayPreloadLimit.
When an array of Strings is not preloaded, the string
lengths are set to the value of stringArrayDefaultSize.
When those arrays are read, longer Strings will be truncated and shorter
Strings will be zero padded. When the
OpenDAP dataset is opened, all
variables whose size < preloadLimit are read and cached. When using the DODSNetcdfFile(String datasetURL)
constructor, the defaults are flatten=true, stringArrayPreloadLimit=200,
stringArrayDefaultSize=100, preloadLimit=100.
One issue that is not solved through the standard netCDF API is access to DODSSequence data variables. In this case, you must explicitly work with the DODSNetcdfFile object.
The NetCDF API can be used to access other kinds of scientific data files. The general mechanism for this is to define a class that implements IOServiceProvider, and register it with the NetcdfFile class. When a file is opened through NetcdfFile .open(), it is first checked to see if it is a "native" (netcdf3, netcdf4, hdf5) file, then any registered IOServiceProviders are checked by calling isValidFile(). If this returns true, then a new instance of the class is used for all I/O to the file.
To register, call this static method in NetcdfFile:
static public void registerIOProvider(
Class iospClass);
The iospClass must have a
no-argument constructor, and it must implement the IOServiceProvider interface.
Appendix A. Examples
Variable
v = ncFile.findVariable("Pressure");
Array
a = v.read(); //
does the actual I/O
Index
ima = a.getIndex();
double
p = a.getDouble(ima.set(0,0)); // get first value
//
looping
double
sum = 0.0;
for
(int i=0; i<a.getShape(0); i++)
for (int j=0; j<a.getShape(1); j++)
p = a.getDouble(ima.set(i, j));
double
avg = sum/a.size();
//
another way to loop
IndexIterator
iter = a.getIndexIterator();
double
sum = 0.0;
while
(iter.hasNext))
sum += iter.getDoubleNext();
double
avg = sum/a.size();
//
another way to get the sum
double
sum = MAMath.sumDouble( a);
If you know the type and rank, a convenient variant is:
Variable
v = ncFile.findVariable("Pressure ");
ArrayDouble.D2
a = (ArrayDouble.D2) v.read();
double
p = a.get(0, 0);
//
looping
for
(int i=0; i<a.getShape(0); i++)
for (int j=0; j<a.getShape(1); j++)
p = a.get(i, j);
Here we read the entire data for a Variable, but only one “slice” at a time, by looping over the first (outermost) dimension:
Variable
v = ncFile.findVariable("Pressure");
int[]
shape = v.getShape(); // get copy of shape array
int[]
origin = new int[ v.getRank()]; // start with all zeroes
int
outerDimSize = shape[0]; // outer Dimension size
for
(int i=0; i<outerDimSize; i++) {
origin[0] = i; // fix first index
shape[0] = 1;
try {
Array a = v.read(origin, shape); // read data from file
a = a.reduce(); // reduce rank to n-1
myProcessData( a);
} catch( IOException ioe) { ... }
} catch( InvalidRangeException ioe) { ...
} // cant happen
}
Variable
v = ncFile.findVariable("ozone");
Array
a = v.read(); // read entire data
array
double[][][]
ozone3Darray = (double[][][]) a.copyToNDJavaArray();
double[]
ozone1Darray = (double[]) a.copyTo1DJavaArray();
Note that you need to know the type and rank in order to make the correct type casts.
import
ucar.ma2.*;
import
ucar.nc2.*;
import
java.io.IOException;
/**
* Simple example to create a new netCDF file
corresponding to the following
* CDL:
*
netcdf example {
*
dimensions:
* lat = 3 ;
* lon = 4 ;
* time = UNLIMITED ;
*
variables:
* int rh(time, lat, lon) ;
* rh:long_name="rela