Seacoos XML and related web services (xml metadata header + column oriented data -> seacoos netcdf, etc)
To accommodate other potential data providers who may wish to provide their data for aggregation, but without the effort of utilizing netCDF formats or installed software, a secondary seacoos convention XML format has been specified which might be more convenient or expedient for data providers or aggregators to use.
The below example for a fixed-point data format category describes a xml metadata header file which describes other data file (.dat) files of the same filename prefix. I'd like to provide similar examples for the other 5 data format categories ( fixed-profiler, fixed-map, moving-point-2D, moving-point-3D and moving-profiler ). The header file is a derivation of the existing seacoos netCDF header file and xml elements and attributes could mirror netCDF development.
For data providers with Excel/column oriented data, a web form based tool has been designed which returns a XML data descriptive file after successfully submitting the web page. This XML file would be associated with data upload streams using a similar filename prefix. Data streams could be push(ftp) or pull(http) oriented.
The current web form
has been designed to be useful as a stand-alone application without tying into any extensive databases, but this product will be worked into the Meta-door product allowing it to better tie into previously collected information for platform, sensor_group, sensor, observed variables(standard_names) and units. The header file being in xml has the advantages of xml schema and xml namespaces in terms of better documentation, validation and exchange.
The following website
was used to generate the attached
xml schema for the sample xml metadata header document.
Data uploaded could be organized as separate tables by observation type
or as a singular table with a observation type index. It would probably be good to split either implementation into recent and archived tables so that queries on the current or recent state respond faster and re-indexing of the archived observations could be offset scheduled/buffered. Table design and spatial indexing to accomodate GIS presentation is also a consideration.
Example xml header file
<?xml version="1.0" encoding="UTF-8"?>
<!-- Format for header filename is <provider_id>_<platform_id>_<package_id>_hdr.xml
This metadata header file applies to all submitted data files (.dat for example) within the same directory
Suggested filename suffixes for data files are ..._<iso8601_style_format_datetime_of_last_measurement>_latest.dat
For example, the metadata header file carocoops_CAP2_buoy_hdr.xml
would apply against the file carocoops_CAP2_buoy_20041122_130000_latest.dat
<!-- For standard_name and units elements reference similar columns in the
seacoos data dictionary at
Units are same as units supported by udunits listed at
<!-- conventions listing may evolve to an xml schema namespace
reference, for now represents expected elements and possible values
seacoos netcdf documentation at
<!-- repeatable element -->
<!-- platform information -->
<!-- format_category list
<contact_info>Jeremy Cothran (email@example.com)</contact_info>
<institution_desc>Baruch Institute, University of South Carolina at Columbia</institution_desc>
<title>carocoops data for buoy CAP2</title>
<!-- latitude +/-90.0 degrees north -->
<!-- longitude +/-180.0 degrees east -->
<!-- other information -->
<!-- file information -->
<reference>mean sea level (MSL)</reference>
<!-- positive list [up,down] -->
<!-- column information -->
<!-- repeatable element -->
Example data file
Note that the time(measurement time) field is traditionally listed first in an ISO 8601 format (see http://www.w3.org/TR/NOTE-datetime
) including the time zone but not requiring a character ‘T’to indicate the beginning of the time element (2003-11-11 20:00:00-04 for example).
Web form to Seacoos XML document
An online web form is available which returns a XML data descriptive file after successfully submitting the form. The link to web form is
The code which runs this application is attached below.
- 'seacoos_xml.html' is the html webpage where the user enters their information
- 'seacoos_xml_validate.php' is the php page which validates the input data and forwards the validated form to the perl script 'seacoos_xml.pl'
- 'seacoos_xml.pl' uses the XML text template 'seacoos_xml_template.txt' performing substitutions based on the web form fields to create the final XML document
There are several improvements which could be made to the web form and XML document.
- stronger error checking on the entered form
- currently only a single pre-chosen unit is available for each standard name, this should be as flexible as what the udunits library will allow in terms of conversion between units
- a template parameter could be added to the http address allowing the user to point to another XML document for filling out similar form info multiple times
Note also that the same approach and code could be used for creating other community XML documents as well.
The advantage of having this XML document is that now other groups can use XML oriented tools and mappings such as XSLT transformations to more easily discover, aggregate/archive and process the underlying data.
Seacoos XML document to Seacoos netCDF
The web service 'seacoos_xml_netcdf.php' allows the user to specify the following http parameters
- xml_source= (currently needs to be a Seacoos XML document)
- data_source= (needs to be a column oriented ascii file described by the xml_source)
- return_type=txt (optional - a text file is returned for inspection if 'txt' is specified, the default return is a binary netcdf (.nc file))
and uses the xml_source information to transform the column oriented ascii file into a Seacoos netcdf.
Viewable text file (.txt)
Binary netcdf file (.nc)
The code which runs this application is attached below.
- 'seacoos_xml_netcdf.php' is the php page which forwards the http arguments and displays the returned product
- 'seacoos_xml_netcdf.pl' uses the XML text template 'seacoos_xml_netcdf_template.txt' performing substitutions based on the processing of the data file as guided by the associated xml file to create the final Seacoos netcdf.
XML, Web Services (http+xml): Making data/processing more available/transparent
The combination of these two tools will hopefully allow new data providers to participate in the Seacoos/IOOS aggregation/integration efforts more easily with the only effort needed by the data provider is describing their underlying data using the online web form. Having these data sources described in XML relating the key dimensions of time, space and observation type also opens up other processing opportunities. The ability to easily mix and match various community developed tools via web services is something which should be pursued as well in our processing efforts like qa/qc and database population scripts.
- 12 May 2005
Update: June 1, 2005
In working through an actual dataset, a filter service was developed to transform the data from one format to another. A filter uses the same parameters as above(xml descriptor file and data file) in conjunction with a requested filter service and filter arguments to transform one or several columns in the dataset to a better state. The below dataset needed to breakout the date to a more standard format before the processing to seacoos netcdf format. The below unfiltered file has a date reading in the third column
which is transformed to a better format using the appended filter arguments on a call to seacoos_filter.php
'_SEP_' becomes the field separator string for the file.
The argument 'file_ref=true' is appended to give a reference to the transformed file instead of the file itself for use in chaining the output as input to a further step.
The design of the example filter code is fairly generic and changing the functionality to transform given columns should be relatively straightforward to produce other filters which can be chained together in a data pre-processing/scrubbing operation.
Harvesting ASCII to Seacoos netCDF
The harvesting process is based on the existing model for seacoos netcdf format file processing. This prior harvesting pre-processes raw column oriented ASCII data into the Seacoos netcdf format, so the secondary harvester can work with those files.
The shell process called regularly by cron is get_outside.sh
This in turn calls get_latest_data.pl
with a subcall to get_latest_listing.pl
Processes use working subdirectories seacoos_xml, netcdf_latest and fetch_logs.
Update: July 27, 2005 python csv transform to wfs service
Charlton clued me into the following link which uses python to create a WFS service from csv file