Difference between revisions of "Data Warehouse"

From HiveTool
Jump to: navigation, search
Line 17: Line 17:
 
The data needs to be:
 
The data needs to be:
  
*cleaned up
+
*cleaned up,
*converted (lb <=> kg, Fahrenheit <=> Celsius)
+
*converted (lb <=> kg, Fahrenheit <=> Celsius),
*partitioned into yearly or seasonal periods
+
*partitioned into yearly or seasonal periods,
*transformed (manipulation changes filtered out)
+
*transformed (manipulation changes filtered out),
*summarized (daily weight changes)
+
*summarized (daily weight changes),
*cataloged and  
+
*cataloged and tied into the metadata (foundation type, hive orientation, mite treatment, etc.),
*made available for use by researchers for data mining, online analytical processing, research and decision support  
+
*tracked and controlled with version control software,
 +
*released for use by researchers for data mining, online analytical processing, research and decision support.
  
 
[[File:Database_servers_1_2.jpg|thumb 640px|Data Warehouse]]
 
[[File:Database_servers_1_2.jpg|thumb 640px|Data Warehouse]]

Revision as of 02:43, 3 April 2015

Since HiveTool is open source/open notebook, the entire primary record is publicly available online as it is recorded. Two databases are used: Operational and Research.

Each hive sends in data every five minutes, 288 times a day, inserting over 100,000 rows a year into the Operational Database. One thousand hives would generate 100 million rows per year. In addition to the measured data, there are external factors that need to be systematically and consistently documented. This metadata includes hive genetics, manipulations, mite treatments, data conversion and calibration formulas, etc.

Storing, organizing and providing access to the Research Database is done at the Data Center for Honeybee Research.

Operational and Research Databases

The procedures that move the data from the Operational Database to the Research Database should:

  • Structure the data so that it makes sense to the researcher.
  • Structure the data to optimize query performance.
  • Make research and decision–support queries easier to write.
  • Maintain data and conversion history.
  • Improve data quality by flagging and fixing bad data and assigning quality codes and descriptions.

The data needs to be:

  • cleaned up,
  • converted (lb <=> kg, Fahrenheit <=> Celsius),
  • partitioned into yearly or seasonal periods,
  • transformed (manipulation changes filtered out),
  • summarized (daily weight changes),
  • cataloged and tied into the metadata (foundation type, hive orientation, mite treatment, etc.),
  • tracked and controlled with version control software,
  • released for use by researchers for data mining, online analytical processing, research and decision support.

Data Warehouse