DODS Review
Each SEACOOS partner installed, tested, and evaluated DODS/OPeNDAP software. Below, we provide the evaluation from each institution on their experience or experiences with implementing DODS/OPeNDAP software as a form of data sharing.
University of South Florida / COMPS
We have installed the
NetCDF and Matlab servers on our Dell
PowerEdge? Server
Running Red Hat Linux 7.3. Extensive testing was performed on the
NetCDF server
using files of various types and sizes. Testing of the Matlab server was really
more of an exercise in trial and error. The results of this testing as well as
our decisions server usage here at USF is as follows:
A.
OPeNDAP? (DODS)
NetCDF Server
- Installation of this server (as well as the Matlab server) is quite easy
as is the setup of the data file paths. This is a very efficient method
for the transfer of small to medium sized files. Larger files seem to be
much slower to transfer through the
OPeNDAP? interface than through FTP.
There also exist a number of client programs that were written to work
quite easily with the
OPeNDAP? NetCDF server
- Installed and tested the
OPeNDAP? Matlab Client software on both the PC
and Linux/UNIX workstations. Once installed, it became very easy to
directly read
NetCDF files from an
OPeNDAP? server. The only drawback to
this method is that the user needs to know the
OPeNDAP? server to be used
as well as the fully qualified path to the data. There are no browser
utilities associated with this software.
B.
OPeNDAP? (DODS) Matlab Server
- While this server was quite easy to set up, using it was another
matter. We performed extensive tests with many different types of our
stored data products, and not a single one was recognized as containing
a single valid data type. It was finally determined that none of the new
Matlab data types (e.g. structured or cell arrays) were valid in
OPeNDAP?.
This server was written based on the data types that were available as
of the Matlab version 4.x product line. Use of this produce would
require a total reformatting of all of our Matlab data files.
Conclusions:
1)
NetCDF is an excellent way to share small to medium files among all SEACOOS
member organizations. Larger files are slower using this method, but the
process still works.
2) Due to the need for additional client software,
OPeNDAP? is not the best
method for providing our data to Educational Outreach partners and end users.
3) The
OPeNDAP? Matlab Server will not be used at USF. It would require the
reformat of all of our data files or the re-coding of our processing
software.
4) Using the
OPeNDAP? Matlab Client on both PC and Linux/UNIX workstations will
allow the direct reading of
NetCDF data from an
OPeNDAP? server into a Matlab
program as long as the fully qualified
OPeNDAP? server address and data path
are known.
5) It would be very helpful in the future to find some way of combining a
NVODS/LAS type browser with Matlab so that users can search for a file to
load and use when it is needed rather than needing to know its location ahead
of time.
--Main.JeffDonovan
--Main.VembuSubramanian
University of South Carolina / CaroCOOPS
The two main issues which give rise to the use of
OPeNDAP? servers are the ability for these servers to provide a common data selection interface to mixed underlying data formats (netCDF, HDF, relational databases, ASCII files) and use of a common transport layer(HTTP) to run queries and selected results between distributed systems.
The issue of
OPeNDAP? providing a common interface to many underlying data formats is subject to interpretation. There is much overlap between the
OPeNDAP? development group at URI(University of Rhode Island) and the netCDF development group at Unidata. netCDF is the best supported format of all listed supported formats and the reasons for this are partly due to the popularity/ease of use of the netCDF file format in combination with active communication between the netCDF and
OPeNDAP? developer groups.
Different data formats facilitate different users and different data types. Some users are not familiar with the range of possible storage formats and may store their information in ASCII files as an expedient choice. Some users set their systems up long ago before the advent of currently available formats. Some data formats are better suited to process raster, vector or binary data types of varying structure.
The strengths of netCDF are several. It is an easy(in relative terms) data format to implement which includes metadata(information about the data) in the header portion of the file itself. It has a strong set of functional libraries which can be used to compress, decompress, describe, subset and transform netCDF files. There are also many additional outside developed tools which are designed to visualize or tranform netCDF files into other data products.
The
OPeNDAP? group has been successful in enabling HTTP as a transport protocol for the netCDF format and its libraries.
My review on several
OPeNDAP? issues are as follows:
1)If your data storage format is netCDF, an
OPeNDAP? server is an easy way to facilitate sharing your data. Conversely, data storage formats other than netCDF are not as easily supported or transparent to the
OPeNDAP? operations as netCDF.
2)OPeNDAP will benefit from Unidata's pursual to better merge netCDF(Unidata) and HDF(NCSA) function libraries.
OPeNDAP? will be able to absorb technical improvements in handling the HDF format via continued Unidata interaction.
3)OPeNDAP support for other formats(relational databases, matlab) will be dependent on their ability to receive funding related to greater development for those formats in particular.
A common complaint and discouragement for data formats other than netCDF is that an
OPeNDAP? server will not 'automatically' be able to interact with these types of sources. The developer must first manually create a mapping layer or other intermediate processes which allow the scheme to work. If the data description/layout is unchanging, then this is usually a one time process, but if the data description/layout is changed, then the intermediate process must reflect these changes.
4)DODS changed its name to
OPeNDAP? to place the focus on the technology(Data Access Protocol) rather than any one specific scientific field of study(Oceanography). If
OPeNDAP? hopes to grow beyond its current limit it will need to release some of its control and more actively pursue attempts to open source the project(via something like sourceforge.net) and provide better communication and communication forums with developers outside of the current core group. Even if the project were more openly promoted and developed, there are existing parallel efforts by many other groups which will compete with the goals of this technology(providing universal access and selection to distributed datasets). Projects which deal with the 'Semantic Web' and the establishment of web ontologies and resource description frameworks(RDF) also are attempting to allow human and automated agents to describe, discover and retrieve data across the web. At this point in time though, these projects are experimental and do not offer a clear advantageous alternative.
5)Moving data description, attribute and other files into an XML format is good in that it provides a standard programmatic syntax to other programs which may want to utilize this data.
6)Either through 'Ancillary Information Services', ncML(netCDF Markup Language) or some other method
OPeNDAP? will need to better support metadata use and semantic resolution of various metadata and convention standards across differing datasets.
7)The choice of C++ or Java technology for most
OPeNDAP? projects is a boon to those who work in those frameworks, but excludes developers who are working in Perl/PHP, Python/Zope or other development languages/frameworks. Developers may also turn away from the developed code because of real or perceived issues of code quality or security.
The question of application framework by itself is a very interesting one as it affects many issues. Much debate around
OPeNDAP? functionality centers around what resources(system ports or channels, memory) or system state(how many files are opened, which data elements have been processed) the
OPeNDAP? server is allowed to utilize as provided by the environment. Generally I think the less that the
OPeNDAP? system controls of the user environment(the smaller the footprint) the better, since the solution space will vary widely depending on the particular server resources and environment at hand. Perhaps a messaging scheme or API between the
OPeNDAP? server and the application framework or system environment would be the best compromise.
8)For utilization in the broadest way, tools which leverage
OPeNDAP? or netCDF should be capable of being run as a service so that client browsers can interact with these tools without having to install additional software. For example, both the ODC(
OPeNDAP? Data Connector) and ncBrowse would be of greater utility to us as services which we could set up for client browsers. This is repeated in the IOOS guidelines as 'browse and visualize the data through standard Web browsers.'
9)It would be helpful if federal agencies(USGS, NWS, NOAA, NASA) made their data available via
OPeNDAP? or other standard data services(say similar to WSDL) or formats(say column oriented with documentation) as opposed to having to screen-scrape web page results for data which are fed to client programs. Questions concerning catalog registry for data discovery, network usage/mirroring and security will need further addressing.
LAS(Live Access Server) review
LAS is being promoted as an IOOS first step in promoting visualization of data products available using IOOS data management standards. While this is an admirable first step, I think of LAS as one of many possible tools/approaches for browsing and visualizing data. LAS does not require that data providers utilize an
OPeNDAP? client, just that the data been in COARDS compliant netCDF format. LAS is also flexible in that it has a default visualization tool of FERRET, but the data output can be rerouted to other visualization tools such as Matlab. The core functionality of LAS is that it provides a way to select a bounding box geographic area, variables of interest and a time slice for display as a variety of output products.
My review points on LAS are as follows:
1)The technician who supports LAS must manually add and recompile new netCDF datasets to the catalog on the left hand selection column. There is not a way for application users to dynamically search, discover and add new netCDF files to this list. A user may select a maximum of two datasets for some basic types of comparisons.
2)LAS should be given credit for running as a service. I'd personnaly prefer to have something more like ncBrowse running as a service, but currently LAS is the only service which quickly facilitates netCDF visualization(depending on how easily your netCDF file conforms to COARDS, LAS and FERRET syntax). Understanding the syntax demands/debug errors of the netCDF COARDS convention, LAS and FERRET can be a source of frustration.
3)LAS is also given credit for being flexible in its implementation and supporting a broad range of outputs, but a developer with knowledge of netCDF and the scripts which LAS is using to do these outputs should not have too difficult of a time utilizing this code within other applications.
4)LAS is limited in its application design and makes for a good 'beginning' tool, but it also intersects with other visualization tools which are better specifically defined for certain application needs and audiences. For example, GIS is specifically designed to facilitate 2 dimensional spatial overlays, zooming, panning and dynamic displays of resolution and has data libraries in their own GIS format which are immediately broad and useful. Other tools are better at displaying graphs or 3 dimensional data. LAS addresses an initial need, but the visualization and application needs of many different audiences will likely require other additional services to be provided which are outside of the LAS domain.
--
JeremyCothran
University of North Carolina / NC-COOS
The University of North Carolina (UNC) at Chapel Hill established a DODS server as part of activities and tasks associated with SEACOOS Information Management. While we had a server up and running in very little time, we were limited in our time and expertise to test and evaluate except to experience the server in a production mode. We describe our experience here.
Additionally, we attempted some DODS-client development tasks but were met with difficulties if not complete road blocks. Below we detail the specific stumbling and road blocks that we encountered.
We admit our software development expertise is minimal and our time is limited to test, evaluate, and problem-solve. However, our failures were frustrating in coordinating our data efforts and forced us to revert to gorilla practices of downloading whole files and processing them locally. This does not take advantage of
OPeNDAP? and is a backward approach in order to produce the results we needed quickly.
The easy stuff
We installed
NetCDF-DODS server (v.3.2) on a Sun V880. The
NetCDF-DODS server install was easy with the binary package install with an established Apache Webserver. The webserver hardware and DODS installation was performed by UNC’s Academic and Technology Network group. This group provides networking and system administrative services for the whole university. At this writing, I am not aware of major difficulties or stumbling blocks that were encountered by this group in the installation. They had it up and running in several hours from the time of the request. This server has been running since December 2002 including a hardware move and system rebuild in May 2003.
While most of our data processing is done in MATLAB, we did not install and test a MATLAB-DODS server, since our SEACOOS partner institution, the University South Florida, met with difficulties with regard to version incompatibilities with MATLAB data-types. Since we can generate netCDF files using the MATLAB
NetCDF Toolbox (runs on top of mexCDF), we chose to only install a
NetCDF-DODS server. This was the path of least resistance to get up and running using
OPeNDAP? technologies.
Once the webserver and the DODS server were installed, data directories were easily set up and data stored in netCDF to be served. Access to data is straight forward if the complete server name and data path are known. We are able to access our data and others easily either through the web-browser pages served by the DODS-server or through other DODS clients such as MATLAB Command Line Tool.
The not-so easy stuff
We developed a data client to directly access oceanographic timeseries data that was then reformatted to an encoded card file and then pushed data to the National Data Buoy Center (NDBC). Although we were able to get this code to compile and link to
OPeNDAP? libraries, it was with much wrangling and forced limitations. This code was called “D2N” which is short for DODS-to-NDBC. This is C++ code requiring DODS-dap-library.
We also attempted to develop a MATLAB-based “data scout” that would poke around a set of DODS-URLs and return data that matched a standard naming convention for a provided with a variable attribute. After many difficulties we ran out of time to continue development and have not completed the code. This code was called (my_loaddods). This is MATLAB code using MATLAB Command Line Tool. Parallel development was conducted using i686-pc-linux and sparc-sun-solaris2.6 both running MATAB version 6.5.1 (R13).
The stumbling blocks
The difficulties faced with development of D2N.
1) We could only get D2N to compile and link under Redhat Linux 7.3 using gcc (v 2.96.110) and
OPeNDAP? libraries (DODS-dap-library-3.3.0 and DODS-packages-3.3.0) when the libraries for ssl and crypto were locally installed on the system.
2) Failed to link under a similar Redhat Linux system using the same gcc version and
OPeNDAP? libraries but with distributed libraries for ssl and crypto. We assume there is a versioning conflict beyond our understanding.
3) Failed to link after upgrading the
OPeNDAP? libraries (DODS-dap-library-3.4.8 and DODS-packages-3.4.4)
The limitations and difficulties faced with development of my_loaddods.
1) There is a limitation with loaddods (version 3.4.1) that only returns data into MATLAB structure that matches hierarchy and structure of the DDS. There is no mechanism to get a similar return of the DAS-like info into MATLAB. Without such, this requires us to build a work-around to get to a particular variable attribute if the variable is buried in the DDS as a dimension variable.
2) While we did not run into problems running sparc-sun-solaris2.6 MATLAB Command Line Tool, we did get errors submitting the same command under i686-pc-linux. This was reported to
support@unidata..ucar.edu. We do not know the status of the bug report.
No further combinations were tried for D2N and no further investigation on my_loaddods since we ran out of time and have not been able to get back to this either coding projects.
--
SaraHaines - 10 Dec 2004
Skidaway Institute of Oceanography / SABSOON
University of Miami / Explorer of the Seas
--
SaraHaines - 26 Jan 2005
to top