This page was excerpted from an email(July 2004) to another tech asking for sample code of how I go from our platform data streams to populating the relational database.
Attached 'buoy2.sql' which is postgresql sql used to create one of our buoy tables (this is buoy2, 1 table per buoy) a year and a half ago. Also attached a sample buoy2 input file that is received from the modem every 2 hours along with the perl script 'processBuoys.pl' used to parse and store the data fields to the database. 'processBuoys.ini' is the support text file used for setting environment parameters across scripts. Also included a telnet script 'telnetWLS.pl used to poll our water level stations just to give more ideas about how quick it is to write perl script for grabbing and parsing data for insertion to the database.
use the above sql 'by platform' kind of table layout, but I'd say to go instead with a 'by variable' approach to your tables with separate tables for variables, and support tables platforms and sensors. So for example, you might have this postgresql for a 'water_temperature' table:
CREATE TABLE water_temperature
(row_id SERIAL PRIMARY KEY,
row_update_date DATETIME, /* optional? */
station_id VARCHAR(32) NOT NULL,
The SERIAL type is like an autocount datatype and will create a row_id_seq sequence that will get autoincremented and used to store in that field for every insert.
I added 'row_update_date' even though I don't think it should be necessary to update what are essentially final measurements.
The water level station tables are set up on a 'by variable' basis with station_id a unique key for each station. The label for station_id can serve as a foreign key into a station table which lists each of the stations and their attributes. Note that a station could potentially be a fixed or moving platform.
I added sensor_id because some stations will measure the same variable at different points like ADCP where you might have 10 bins each measuring water current and direction at different depths. The definition of sensor_id on a sensor table is dependent on the station_id. You could have one sensor table, but you'd want the station id as part of the primary key to distinguish between identical sensors at different stations.
Note that a variable doesn’t always have to be a singular measurement - it might be a combination of two measurands like 'wind' which might include 'speed' and 'direction'.
qc_level and qc_flag are similar to what I mentioned was discussed at the QARTOD meeting http://www.qartod.org
. qc_level is simply an integer in the set [0,1,2,3,-9] to denote what the level of qc processing has been performed on the data (good, bad, suspect, ...). qc_flag could hold one or several qc flags to denote the qc results as performed on the data using a given flagging scheme such as that offered by NDBC http://www.ndbc.noaa.gov/handbook.pdf
There are obviously lots of other support tables you can build, but if you have this kind of basic schema broken down by station/sensor/variable, it should make getting the data into models or GIS easier since those tend to be more variable than platform focused.
- 28 Jan 2005