Difference between revisions of "Data Warehouse"

Revision as of 08:43, 9 May 2014

HiveTool is open notebook. The entire primary record is publicly available online as it is recorded. There is no 'insider information'. This transparent approaches to research includes making available failed and otherwise unpublished experiments.

Storing, organizing and providing access to the data for research is challenging. The measurements bring in large amounts of data. Each hive sends in data every five minutes - 288 times a day. Each hive inserts over 100,000 rows a year into the Operational Database.

In addition to the measured data, there are other external factors that need to be systematically documented. Metadata includes hive genetics, manipulation data, what mite treatment is used, etc.

The procedures that move the data from the Operational Database to the Research Database should:

Structure the data so that it makes sense to the researcher.
Structure the data to optimize query performance, even for complex analytic queries, without impacting the operational systems.
Make research and decision–support queries easier to write.
Maintain data and conversion history.
Improve data quality with consistent quality codes and descriptions, flagging and fixing bad data.

The data needs to be:

cleaned up
converted (lb <=> kg, Fahrenheit <=> Celsius)
transformed (manipulation changes filtered out)
cataloged and
made available for use by researchers for data mining, online analytical processing, research and decision support

@@ Line 1: / Line 1: @@
-HiveTool is an open source/open notebook project.  Open Notebook is the practice of making the entire primary record of a research project publicly available online as it is recorded. There is no 'insider information'. It is the logical extreme of transparent approaches to research and explicitly includes the making available of failed, less significant, and otherwise unpublished experiments.
+HiveTool is open notebook.  The entire primary record is publicly available online as it is recorded. There is no 'insider information'. This transparent approaches to research includes making available failed and otherwise unpublished experiments.
 Storing, organizing and providing access to the data for research is challenging.  The measurements bring in large amounts of data.  Each hive sends in data every five minutes - 288 times a day. Each hive inserts over 100,000 rows a year into the Operational Database.
-In addition to the measured data, there are other external factors that need to be systematically documented. Metadata includes manipulation data, what mite treatment is used, etc.
+In addition to the measured data, there are other external factors that need to be systematically documented. Metadata includes hive genetics, manipulation data, what mite treatment is used, etc.
 [[File:Database_servers_1_1.jpg|thumb 640px|Operational and Research Databases]]
-The procedures that move the data from the Operational Database to the Research Database:
+The procedures that move the data from the Operational Database to the Research Database should:
 *Structure the data so that it makes sense to the researcher.
@@ Line 15: / Line 15: @@
 *Improve data quality with consistent quality codes and descriptions, flagging and fixing bad data.
-The data is:
+The data needs to be:
 *cleaned up
 *converted (lb <=> kg, Fahrenheit <=> Celsius)
-*transformed
+*transformed (manipulation changes filtered out)
 *cataloged and
 *made available for use by researchers for data mining, online analytical processing, research and decision support
 [[File:Database_servers_1_2.jpg|thumb 640px|Data Warehouse]]

Difference between revisions of "Data Warehouse"

Revision as of 08:43, 9 May 2014

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Uses

Hardware

Software

Tools