Skip to topic | Skip to bottom
Home
Main
Main.CodeRepositorySeacoosXMLServicesr1.3 - 27 Jul 2005 - 22:03 - JeremyCothrantopic end

Start of topic | Skip to actions

Seacoos XML and related web services (xml metadata header + column oriented data -> seacoos netcdf, etc)

To accommodate other potential data providers who may wish to provide their data for aggregation, but without the effort of utilizing netCDF formats or installed software, a secondary seacoos convention XML format has been specified which might be more convenient or expedient for data providers or aggregators to use.

The below example for a fixed-point data format category describes a xml metadata header file which describes other data file (.dat) files of the same filename prefix. I'd like to provide similar examples for the other 5 data format categories ( fixed-profiler, fixed-map, moving-point-2D, moving-point-3D and moving-profiler ). The header file is a derivation of the existing seacoos netCDF header file and xml elements and attributes could mirror netCDF development.

For data providers with Excel/column oriented data, a web form based tool has been designed which returns a XML data descriptive file after successfully submitting the web page. This XML file would be associated with data upload streams using a similar filename prefix. Data streams could be push(ftp) or pull(http) oriented.

The current web form has been designed to be useful as a stand-alone application without tying into any extensive databases, but this product will be worked into the Meta-door product allowing it to better tie into previously collected information for platform, sensor_group, sensor, observed variables(standard_names) and units. The header file being in xml has the advantages of xml schema and xml namespaces in terms of better documentation, validation and exchange.

The following website was used to generate the attached xml schema for the sample xml metadata header document.

Data uploaded could be organized as separate tables by observation type or as a singular table with a observation type index. It would probably be good to split either implementation into recent and archived tables so that queries on the current or recent state respond faster and re-indexing of the archived observations could be offset scheduled/buffered. Table design and spatial indexing to accomodate GIS presentation is also a consideration.

fixed-point

Example xml header file

http://trident.baruch.sc.edu/storm_surge_data/latest/test/carocoops_CAP2_buoy_hdr.xml


<?xml version="1.0" encoding="UTF-8"?>
<xml>
<!-- Format for header filename is <provider_id>_<platform_id>_<package_id>_hdr.xml

This metadata header file applies to all submitted data files (.dat for example) within the same directory

Suggested filename suffixes for data files are ..._<iso8601_style_format_datetime_of_last_measurement>_latest.dat

For example, the metadata header file carocoops_CAP2_buoy_hdr.xml
would apply against the file carocoops_CAP2_buoy_20041122_130000_latest.dat
-->

<!-- For standard_name and units elements reference similar columns in the
seacoos data dictionary at
http://nautilus.baruch.sc.edu/seacoos_dd

Units are same as units supported by udunits listed at
http://my.unidata.ucar.edu/content/software/udunits/udunits.txt
-->

<global_attributes>
        <!-- conventions listing may evolve to an xml schema namespace
reference, for now represents expected elements and possible values

seacoos netcdf documentation at
http://nautilus.baruch.sc.edu/twiki_dmcc/pub/Main/WebHome/SEACOOSNetCDFStandardv2.0.doc
 -->
        <!-- repeatable element -->
        <conventions>CF-1.0</conventions>
        <conventions>SEACOOS-NETCDF-2.0</conventions>
        <conventions>SEACOOS-XML-1.0</conventions>

   <!-- platform information -->

        <!-- format_category list
[fixed-point,fixed-profiler,fixed-map,moving-point-2D,moving-point-3D,moving-profiler]
-->
        <format_category>fixed-point</format_category>

        <contact_info>Jeremy Cothran (jcothran@carocoops.org)</contact_info>

        <institution_desc>Baruch Institute, University of South Carolina at Columbia</institution_desc>
        <institution_url>http://carocoops.org</institution_url>

        <institution_id>carocoops</institution_id>
        <platform_id>CAP2</platform_id>
        <package_id>buoy</package_id>

   <station_id>carocoops_CAP2_buoy</station_id>
   <title>carocoops data for buoy CAP2</title>

        <!-- latitude +/-90.0 degrees north -->
        <latitude>32.80</latitude>

        <!-- longitude +/-180.0 degrees east -->
        <longitude>-79.62</longitude>

   <!-- other information -->
        <dods_url>http://trident.baruch.sc.edu/dods</dods_url>
        <comment></comment>

   <!-- file information -->

        <data_url>http://trident.baruch.sc.edu/storm_surge_data/latest</data_url>
        <filename_search></filename_search>
        <file_row_start>2</file_row_start>
        <file_row_comment></file_row_comment>
        <file_field_separator>_SEP_</file_field_separator>
        <file_field_missing_value></file_field_missing_value>
       
   <column_time></column_time>
   <measurement_time_zone></measurement_time_zone>

</global_attributes>

<independent_variables>
        <z>
                <reference>mean sea level (MSL)</reference>

                <!-- positive list [up,down] -->
                <positive>up</positive>

                <units>m</units>
        </z>
</independent_variables>

   <!-- column information -->

<dependent_variables>

        <!-- repeatable element -->
        <variable>
                <column_number>2</column_number>
                <standard_name>wind_speed</standard_name>
                <units>m s-1</units>
                <z>3.0</z>
        </variable>

        <variable>
                <column_number>3</column_number>
                <standard_name>wind_from_direction</standard_name>
                <units>degrees_true</units>
                <z>3.0</z>
        </variable>

        <variable>
                <column_number>4</column_number>
                <standard_name>sea_surface_temperature</standard_name>
                <units>degree_Celcius</units>
                <z>-1.0</z>
        </variable>

        <variable>
                <column_number></column_number>
                <standard_name></standard_name>
                <units></units>
                <z></z>
        </variable>

</dependent_variables>
</xml>

Example data file

Note that the time(measurement time) field is traditionally listed first in an ISO 8601 format (see http://www.w3.org/TR/NOTE-datetime ) including the time zone but not requiring a character Tto indicate the beginning of the time element (2003-11-11 20:00:00-04 for example).

http://trident.baruch.sc.edu/storm_surge_data/latest/test/test_data.dat


time,wind_speed,wind_from_direction,sea_surface_temperature
2004-10-22 14:00:00+00_SEP_5.0_SEP_120.0.0_SEP_12.0
2004-10-22 15:00:00_SEP_6.0_SEP_125.0_SEP_13.0
2004-10-22 16:00:00_SEP_7.0_SEP_130.0_SEP_14.0
2004-10-22 17:00:00_SEP_8.0_SEP_135.0_SEP_15.0

Web form to Seacoos XML document

An online web form is available which returns a XML data descriptive file after successfully submitting the form. The link to web form is

http://nautilus.baruch.sc.edu/services/seacoos_xml.html

The code which runs this application is attached below.

  • 'seacoos_xml.html' is the html webpage where the user enters their information
  • 'seacoos_xml_validate.php' is the php page which validates the input data and forwards the validated form to the perl script 'seacoos_xml.pl'
  • 'seacoos_xml.pl' uses the XML text template 'seacoos_xml_template.txt' performing substitutions based on the web form fields to create the final XML document

There are several improvements which could be made to the web form and XML document.

  • stronger error checking on the entered form
  • currently only a single pre-chosen unit is available for each standard name, this should be as flexible as what the udunits library will allow in terms of conversion between units
  • a template parameter could be added to the http address allowing the user to point to another XML document for filling out similar form info multiple times

Note also that the same approach and code could be used for creating other community XML documents as well.

The advantage of having this XML document is that now other groups can use XML oriented tools and mappings such as XSLT transformations to more easily discover, aggregate/archive and process the underlying data.

Seacoos XML document to Seacoos netCDF

The web service 'seacoos_xml_netcdf.php' allows the user to specify the following http parameters

  • xml_source= (currently needs to be a Seacoos XML document)
  • data_source= (needs to be a column oriented ascii file described by the xml_source)

  • return_type=txt (optional - a text file is returned for inspection if 'txt' is specified, the default return is a binary netcdf (.nc file))

and uses the xml_source information to transform the column oriented ascii file into a Seacoos netcdf.

Viewable text file (.txt) http://nautilus.baruch.sc.edu/services/seacoos_xml_netcdf.php?xml_source=http://trident.baruch.sc.edu/storm_surge_data/latest/test/carocoops_CAP2_buoy_hdr.xml&data_source=http://trident.baruch.sc.edu/storm_surge_data/latest/test/test_data.dat&return_type=txt

Binary netcdf file (.nc) http://nautilus.baruch.sc.edu/services/seacoos_xml_netcdf.php?xml_source=http://trident.baruch.sc.edu/storm_surge_data/latest/test/carocoops_CAP2_buoy_hdr.xml&data_source=http://trident.baruch.sc.edu/storm_surge_data/latest/test/test_data.dat

The code which runs this application is attached below.

  • 'seacoos_xml_netcdf.php' is the php page which forwards the http arguments and displays the returned product
  • 'seacoos_xml_netcdf.pl' uses the XML text template 'seacoos_xml_netcdf_template.txt' performing substitutions based on the processing of the data file as guided by the associated xml file to create the final Seacoos netcdf.

XML, Web Services (http+xml): Making data/processing more available/transparent

The combination of these two tools will hopefully allow new data providers to participate in the Seacoos/IOOS aggregation/integration efforts more easily with the only effort needed by the data provider is describing their underlying data using the online web form. Having these data sources described in XML relating the key dimensions of time, space and observation type also opens up other processing opportunities. The ability to easily mix and match various community developed tools via web services is something which should be pursued as well in our processing efforts like qa/qc and database population scripts.

-- JeremyCothran - 12 May 2005

Update: June 1, 2005

Filters/Pipeline processes

In working through an actual dataset, a filter service was developed to transform the data from one format to another. A filter uses the same parameters as above(xml descriptor file and data file) in conjunction with a requested filter service and filter arguments to transform one or several columns in the dataset to a better state. The below dataset needed to breakout the date to a more standard format before the processing to seacoos netcdf format. The below unfiltered file has a date reading in the third column

http://www.dnal.gatech.edu/wavebuoy/status/status_200411010030.csv

which is transformed to a better format using the appended filter arguments on a call to seacoos_filter.php (filter=time_filter_1&filter_args=breakoutYYYYMMDDHHMM)

'_SEP_' becomes the field separator string for the file.

http://nautilus.baruch.sc.edu/services/seacoos_filter.php?xml_source=http://trident.baruch.sc.edu/storm_surge_data/seacoos_xml/seacoos_GTSAV1_buoy_status_hdr.xml&data_source=http://www.dnal.gatech.edu/wavebuoy/status/status_200411010030.csv&filter=time_filter_1&filter_args=breakoutYYYYMMDDHHMM

The argument 'file_ref=true' is appended to give a reference to the transformed file instead of the file itself for use in chaining the output as input to a further step.

http://nautilus.baruch.sc.edu/services/seacoos_filter.php?xml_source=http://trident.baruch.sc.edu/storm_surge_data/seacoos_xml/seacoos_GTSAV1_buoy_status_hdr.xml&data_source=http://www.dnal.gatech.edu/wavebuoy/status/status_200411010030.csv&filter=time_filter_1&filter_args=breakoutYYYYMMDDHHMM&file_ref=true

The design of the example filter code is fairly generic and changing the functionality to transform given columns should be relatively straightforward to produce other filters which can be chained together in a data pre-processing/scrubbing operation.

Harvesting ASCII to Seacoos netCDF

The harvesting process is based on the existing model for seacoos netcdf format file processing. This prior harvesting pre-processes raw column oriented ASCII data into the Seacoos netcdf format, so the secondary harvester can work with those files.

The shell process called regularly by cron is get_outside.sh

This in turn calls get_latest_data.pl with a subcall to get_latest_listing.pl and mk_netcdf_latest.pl

Processes use working subdirectories seacoos_xml, netcdf_latest and fetch_logs.

Update: July 27, 2005 python csv transform to wfs service

Charlton clued me into the following link which uses python to create a WFS service from csv file
http://zcologia.com/news/31
to top

I Attachment Action Size Date Who Comment
seacoos_xml_v1.0.xsd manage 4.5 K 01 Jun 2005 - 04:18 JeremyCothran NA
seacoos_xml.html manage 12.4 K 01 Jun 2005 - 03:19 JeremyCothran NA
seacoos_xml_validate.php.txt manage 8.0 K 01 Jun 2005 - 03:19 JeremyCothran NA
seacoos_xml.pl.txt manage 2.6 K 01 Jun 2005 - 03:20 JeremyCothran NA
seacoos_xml_template.txt manage 4.2 K 01 Jun 2005 - 03:20 JeremyCothran NA
seacoos_xml_netcdf.php.txt manage 0.5 K 12 May 2005 - 14:54 JeremyCothran NA
seacoos_xml_netcdf.pl.txt manage 8.5 K 01 Jun 2005 - 03:20 JeremyCothran NA
seacoos_xml_netcdf_template.txt manage 1.7 K 12 May 2005 - 14:55 JeremyCothran NA
seacoos_filter.php.txt manage 0.9 K 01 Jun 2005 - 03:51 JeremyCothran NA
time_filter_1.pl.txt manage 4.6 K 01 Jun 2005 - 03:51 JeremyCothran NA
get_latest_data.pl.txt manage 2.0 K 01 Jun 2005 - 04:05 JeremyCothran NA
get_latest_listing.pl.txt manage 2.1 K 01 Jun 2005 - 04:05 JeremyCothran NA
mk_netcdf_latest.pl.txt manage 0.6 K 01 Jun 2005 - 04:05 JeremyCothran NA
get_outside.sh.txt manage 0.2 K 01 Jun 2005 - 04:06 JeremyCothran NA

You are here: Main > CodeRepositorySeacoosXMLServices

to top

Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding DMCC? Send feedback