Skip to topic | Skip to bottom
Home
Main
Main.DODSReviewr1.2 - 03 Feb 2005 - 15:33 - JeremyCothrantopic end

Start of topic | Skip to actions
DODS Review

Each SEACOOS partner installed, tested, and evaluated DODS/OPeNDAP software. Below, we provide the evaluation from each institution on their experience or experiences with implementing DODS/OPeNDAP software as a form of data sharing.

University of South Florida / COMPS

We have installed the NetCDF and Matlab servers on our Dell PowerEdge? Server Running Red Hat Linux 7.3. Extensive testing was performed on the NetCDF server using files of various types and sizes. Testing of the Matlab server was really more of an exercise in trial and error. The results of this testing as well as our decisions server usage here at USF is as follows:

A. OPeNDAP? (DODS) NetCDF Server - Installation of this server (as well as the Matlab server) is quite easy as is the setup of the data file paths. This is a very efficient method for the transfer of small to medium sized files. Larger files seem to be much slower to transfer through the OPeNDAP? interface than through FTP. There also exist a number of client programs that were written to work quite easily with the OPeNDAP? NetCDF server

- Installed and tested the OPeNDAP? Matlab Client software on both the PC and Linux/UNIX workstations. Once installed, it became very easy to directly read NetCDF files from an OPeNDAP? server. The only drawback to this method is that the user needs to know the OPeNDAP? server to be used as well as the fully qualified path to the data. There are no browser utilities associated with this software.

B. OPeNDAP? (DODS) Matlab Server - While this server was quite easy to set up, using it was another matter. We performed extensive tests with many different types of our stored data products, and not a single one was recognized as containing a single valid data type. It was finally determined that none of the new Matlab data types (e.g. structured or cell arrays) were valid in OPeNDAP?. This server was written based on the data types that were available as of the Matlab version 4.x product line. Use of this produce would require a total reformatting of all of our Matlab data files.

Conclusions:

1) NetCDF is an excellent way to share small to medium files among all SEACOOS member organizations. Larger files are slower using this method, but the process still works.

2) Due to the need for additional client software, OPeNDAP? is not the best method for providing our data to Educational Outreach partners and end users.

3) The OPeNDAP? Matlab Server will not be used at USF. It would require the reformat of all of our data files or the re-coding of our processing software.

4) Using the OPeNDAP? Matlab Client on both PC and Linux/UNIX workstations will allow the direct reading of NetCDF data from an OPeNDAP? server into a Matlab program as long as the fully qualified OPeNDAP? server address and data path are known.

5) It would be very helpful in the future to find some way of combining a NVODS/LAS type browser with Matlab so that users can search for a file to load and use when it is needed rather than needing to know its location ahead of time.

--Main.JeffDonovan --Main.VembuSubramanian

University of South Carolina / CaroCOOPS

The two main issues which give rise to the use of OPeNDAP? servers are the ability for these servers to provide a common data selection interface to mixed underlying data formats (netCDF, HDF, relational databases, ASCII files) and use of a common transport layer(HTTP) to run queries and selected results between distributed systems.

The issue of OPeNDAP? providing a common interface to many underlying data formats is subject to interpretation. There is much overlap between the OPeNDAP? development group at URI(University of Rhode Island) and the netCDF development group at Unidata. netCDF is the best supported format of all listed supported formats and the reasons for this are partly due to the popularity/ease of use of the netCDF file format in combination with active communication between the netCDF and OPeNDAP? developer groups.

Different data formats facilitate different users and different data types. Some users are not familiar with the range of possible storage formats and may store their information in ASCII files as an expedient choice. Some users set their systems up long ago before the advent of currently available formats. Some data formats are better suited to process raster, vector or binary data types of varying structure.

The strengths of netCDF are several. It is an easy(in relative terms) data format to implement which includes metadata(information about the data) in the header portion of the file itself. It has a strong set of functional libraries which can be used to compress, decompress, describe, subset and transform netCDF files. There are also many additional outside developed tools which are designed to visualize or tranform netCDF files into other data products.

The OPeNDAP? group has been successful in enabling HTTP as a transport protocol for the netCDF format and its libraries.

My review on several OPeNDAP? issues are as follows:

1)If your data storage format is netCDF, an OPeNDAP? server is an easy way to facilitate sharing your data. Conversely, data storage formats other than netCDF are not as easily supported or transparent to the OPeNDAP? operations as netCDF.

2)OPeNDAP will benefit from Unidata's pursual to better merge netCDF(Unidata) and HDF(NCSA) function libraries. OPeNDAP? will be able to absorb technical improvements in handling the HDF format via continued Unidata interaction.

3)OPeNDAP support for other formats(relational databases, matlab) will be dependent on their ability to receive funding related to greater development for those formats in particular.

A common complaint and discouragement for data formats other than netCDF is that an OPeNDAP? server will not 'automatically' be able to interact with these types of sources. The developer must first manually create a mapping layer or other intermediate processes which allow the scheme to work. If the data description/layout is unchanging, then this is usually a one time process, but if the data description/layout is changed, then the intermediate process must reflect these changes.

4)DODS changed its name to OPeNDAP? to place the focus on the technology(Data Access Protocol) rather than any one specific scientific field of study(Oceanography). If OPeNDAP? hopes to grow beyond its current limit it will need to release some of its control and more actively pursue attempts to open source the project(via something like sourceforge.net) and provide better communication and communication forums with developers outside of the current core group. Even if the project were more openly promoted and developed, there are existing parallel efforts by many other groups which will compete with the goals of this technology(providing universal access and selection to distributed datasets). Projects which deal with the 'Semantic Web' and the establishment of web ontologies and resource description frameworks(RDF) also are attempting to allow human and automated agents to describe, discover and retrieve data across the web. At this point in time though, these projects are experimental and do not offer a clear advantageous alternative.

5)Moving data description, attribute and other files into an XML format is good in that it provides a standard programmatic syntax to other programs which may want to utilize this data.

6)Either through 'Ancillary Information Services', ncML(netCDF Markup Language) or some other method OPeNDAP? will need to better support metadata use and semantic resolution of various metadata and convention standards across differing datasets.

7)The choice of C++ or Java technology for most OPeNDAP? projects is a boon to those who work in those frameworks, but excludes developers who are working in Perl/PHP, Python/Zope or other development languages/frameworks. Developers may also turn away from the developed code because of real or perceived issues of code quality or security.

The question of application framework by itself is a very interesting one as it affects many issues. Much debate around OPeNDAP? functionality centers around what resources(system ports or channels, memory) or system state(how many files are opened, which data elements have been processed) the OPeNDAP? server is allowed to utilize as provided by the environment. Generally I think the less that the OPeNDAP? system controls of the user environment(the smaller the footprint) the better, since the solution space will vary widely depending on the particular server resources and environment at hand. Perhaps a messaging scheme or API between the OPeNDAP? server and the application framework or system environment would be the best compromise.

8)For utilization in the broadest way, tools which leverage OPeNDAP? or netCDF should be capable of being run as a service so that client browsers can interact with these tools without having to install additional software. For example, both the ODC(OPeNDAP? Data Connector) and ncBrowse would be of greater utility to us as services which we could set up for client browsers. This is repeated in the IOOS guidelines as 'browse and visualize the data through standard Web browsers.'

9)It would be helpful if federal agencies(USGS, NWS, NOAA, NASA) made their data available via OPeNDAP? or other standard data services(say similar to WSDL) or formats(say column oriented with documentation) as opposed to having to screen-scrape web page results for data which are fed to client programs. Questions concerning catalog registry for data discovery, network usage/mirroring and security will need further addressing.

LAS(Live Access Server) review

LAS is being promoted as an IOOS first step in promoting visualization of data products available using IOOS data management standards. While this is an admirable first step, I think of LAS as one of many possible tools/approaches for browsing and visualizing data. LAS does not require that data providers utilize an OPeNDAP? client, just that the data been in COARDS compliant netCDF format. LAS is also flexible in that it has a default visualization tool of FERRET, but the data output can be rerouted to other visualization tools such as Matlab. The core functionality of LAS is that it provides a way to select a bounding box geographic area, variables of interest and a time slice for display as a variety of output products.

My review points on LAS are as follows:

1)The technician who supports LAS must manually add and recompile new netCDF datasets to the catalog on the left hand selection column. There is not a way for application users to dynamically search, discover and add new netCDF files to this list. A user may select a maximum of two datasets for some basic types of comparisons.

2)LAS should be given credit for running as a service. I'd personnaly prefer to have something more like ncBrowse running as a service, but currently LAS is the only service which quickly facilitates netCDF visualization(depending on how easily your netCDF file conforms to COARDS, LAS and FERRET syntax). Understanding the syntax demands/debug errors of the netCDF COARDS convention, LAS and FERRET can be a source of frustration.

3)LAS is also given credit for being flexible in its implementation and supporting a broad range of outputs, but a developer with knowledge of netCDF and the scripts which LAS is using to do these outputs should not have too difficult of a time utilizing this code within other applications.

4)LAS is limited in its application design and makes for a good 'beginning' tool, but it also intersects with other visualization tools which are better specifically defined for certain application needs and audiences. For example, GIS is specifically designed to facilitate 2 dimensional spatial overlays, zooming, panning and dynamic displays of resolution and has data libraries in their own GIS format which are immediately broad and useful. Other tools are better at displaying graphs or 3 dimensional data. LAS addresses an initial need, but the visualization and application needs of many different audiences will likely require other additional services to be provided which are outside of the LAS domain.

-- JeremyCothran

University of North Carolina / NC-COOS

The University of North Carolina (UNC) at Chapel Hill established a DODS server as part of activities and tasks associated with SEACOOS Information Management. While we had a server up and running in very little time, we were limited in our time and expertise to test and evaluate except to experience the server in a production mode. We describe our experience here.

Additionally, we attempted some DODS-client development tasks but were met with difficulties if not complete road blocks. Below we detail the specific stumbling and road blocks that we encountered.

We admit our software development expertise is minimal and our time is limited to test, evaluate, and problem-solve. However, our failures were frustrating in coordinating our data efforts and forced us to revert to gorilla practices of downloading whole files and processing them locally. This does not take advantage of OPeNDAP? and is a backward approach in order to produce the results we needed quickly.

The easy stuff

We installed NetCDF-DODS server (v.3.2) on a Sun V880. The NetCDF-DODS server install was easy with the binary package install with an established Apache Webserver. The webserver hardware and DODS installation was performed by UNC’s Academic and Technology Network group. This group provides networking and system administrative services for the whole university. At this writing, I am not aware of major difficulties or stumbling blocks that were encountered by this group in the installation. They had it up and running in several hours from the time of the request. This server has been running since December 2002 including a hardware move and system rebuild in May 2003.

While most of our data processing is done in MATLAB, we did not install and test a MATLAB-DODS server, since our SEACOOS partner institution, the University South Florida, met with difficulties with regard to version incompatibilities with MATLAB data-types. Since we can generate netCDF files using the MATLAB NetCDF Toolbox (runs on top of mexCDF), we chose to only install a NetCDF-DODS server. This was the path of least resistance to get up and running using OPeNDAP? technologies.

Once the webserver and the DODS server were installed, data directories were easily set up and data stored in netCDF to be served. Access to data is straight forward if the complete server name and data path are known. We are able to access our data and others easily either through the web-browser pages served by the DODS-server or through other DODS clients such as MATLAB Command Line Tool.

The not-so easy stuff

We developed a data client to directly access oceanographic timeseries data that was then reformatted to an encoded card file and then pushed data to the National Data Buoy Center (NDBC). Although we were able to get this code to compile and link to OPeNDAP? libraries, it was with much wrangling and forced limitations. This code was called “D2N” which is short for DODS-to-NDBC. This is C++ code requiring DODS-dap-library.

We also attempted to develop a MATLAB-based “data scout” that would poke around a set of DODS-URLs and return data that matched a standard naming convention for a provided with a variable attribute. After many difficulties we ran out of time to continue development and have not completed the code. This code was called (my_loaddods). This is MATLAB code using MATLAB Command Line Tool. Parallel development was conducted using i686-pc-linux and sparc-sun-solaris2.6 both running MATAB version 6.5.1 (R13).

The stumbling blocks

The difficulties faced with development of D2N.

1) We could only get D2N to compile and link under Redhat Linux 7.3 using gcc (v 2.96.110) and OPeNDAP? libraries (DODS-dap-library-3.3.0 and DODS-packages-3.3.0) when the libraries for ssl and crypto were locally installed on the system. 2) Failed to link under a similar Redhat Linux system using the same gcc version and OPeNDAP? libraries but with distributed libraries for ssl and crypto. We assume there is a versioning conflict beyond our understanding. 3) Failed to link after upgrading the OPeNDAP? libraries (DODS-dap-library-3.4.8 and DODS-packages-3.4.4)

The limitations and difficulties faced with development of my_loaddods.

1) There is a limitation with loaddods (version 3.4.1) that only returns data into MATLAB structure that matches hierarchy and structure of the DDS. There is no mechanism to get a similar return of the DAS-like info into MATLAB. Without such, this requires us to build a work-around to get to a particular variable attribute if the variable is buried in the DDS as a dimension variable. 2) While we did not run into problems running sparc-sun-solaris2.6 MATLAB Command Line Tool, we did get errors submitting the same command under i686-pc-linux. This was reported to support@unidata..ucar.edu. We do not know the status of the bug report.

No further combinations were tried for D2N and no further investigation on my_loaddods since we ran out of time and have not been able to get back to this either coding projects.

-- SaraHaines - 10 Dec 2004

Skidaway Institute of Oceanography / SABSOON

University of Miami / Explorer of the Seas


-- SaraHaines - 26 Jan 2005
to top

You are here: Main > DODSReview

to top

Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding DMCC? Send feedback