goodygracious.com goodygracious.com
  Site Home >> About Us >> Add Your Link >> Security & Privacy >> ToS >> Add Article
Search:   
 
 

Get A College Education Online

There are many of us out there who would love to extend our education in order to make more money in ... - Billy Strandon
 

Natural Environmentally-Friendly Household Cleaners

Many people use a vast array of chemical-laden household cleaners and products around the home on a ... - Debi Harris
 

Nitrous oxide - Laughing gas uses in modern society

Laughing gas, nitrous oxide used to be routinely given as a dental anaesthetic. This is no longer th ... - Ventura
 
 

Saving Sea Turtles From Extinction: Turtle Friendly Outdoor Lighting Solutions

Improper outdoor lighting is one of the greatest issues affecting sea turtles - Florida Fish and Wil ... - Anthony Arrigo
 

Distance Learning, Online Education, Electronic Education, Electronic Learning...Call It What You Wa

In looking at this rapidly expanding and viable form of education and training, there are a few basi ... - Barbara Snyder
 
 

Site Home › Academics & Learning › Science Courses
 

Capturing the Data and Making It Useful

 
Author: Aaron Hall
 

Redesigning GDB and GSDB

The explosive growth of information and the challenges of acquiring, representing, and providing access to data pose new and monumental tasks for the large public databases. Ken Fasman [Genome Database (GDB)] and Gifford Keen [Genome Sequence Data Base (GSDB)] discussed the restructuring of GDB and GSDB to handle the flood of data and make it useful for downstream biology.

GDB

Observing that one can't scroll or BLAST through 3 billion base pairs in a meaningful way, Fasman defined GDB's future role as the coordination site for the complete electronic description of the human genome. The map, he asserted, provides an ideal framework for jumping into the sequence (http://www.gdb.org/).

Fasman described the extensive changes made to GDB over the last 2 years that have culminated in the enhanced representation of genomic maps and gene information in GDB V6.0, which was released early this year [HGN 7(3-4), 13-14 and 7(5), 15].

Redesign of the database schema and front-end interfaces now provide true graphical genetic and physical map representation; direct community editing and curation, including third-party annotation; and an improved model for gene information that includes links to databases describing function, structure, products, expression, and associated phenotypes. A user can create a link from any GDB object to any other entity on the Internet. GDB plans to become the focal point for accessing information about the human genome.

Under the Hood

New technologies used in developing V6.0 include an object-oriented data model, object broker, data-driven WWW interface, and graphical interfaces for the most popular computer platforms. The new GDB architecture depends heavily on OPM developed by Victor Markowitz and colleagues at LBNL (see "GDB-LBNL"). GDB 6.0 data representation is captured in a schema file that drives all other pieces of software. This new architecture will enable GDB to adapt more quickly to changes in biological knowledge and representation of maps, genes, and other structures.

At the heart of the system is a Sybase database server that communicates in SQL, the relational query language. Everything from that point forward deals in complex objects, rather than in the rows and tables of a relational database.

Goals

Future enhancements will include improved map editing, an integrated editing environment, improved polymorphism and mutation representation, and integration with the specialized GSDB Sequence Annotator and Mouse Genome Database interfaces. To tie GDB to the evolving sequence databases, an interface is being developed to represent gene structure maps (maps of introns, exons, and regulatory regions associated with genes).

GSDB

Keen identified data acquisition, representation, and access as major issues for sequence databases.

Capturing and Annotating Data

Data acquisition is a two-part challenge, he said. Vast quantities of sequence data will be captured with custom software for bulk-submission processes; future plans include direct database-to-database communication for direct downloading of data from laboratories into GSDB. The more difficult task in data acquisition, he noted, is capturing the follow-on sequence annotation, which is usually published in print journals and subsequently "lost." This data will be crucial for studying gene expression, variation, and function. GSDB Annotator, a graphical browser and editor, is being developed to facilitate community annotation of the database. Researchers are also working to provide access to such common analysis algorithms as BLAST and GRAIL.

Data Representation: Building Whole Chromosomes

In addition to captured sequences and annotations, information needs to be generated about relationships between sequences. The data must be maintained in a form capable of supporting complex, ad hoc queries. GSDB is working toward a model within the near future of 24 sequences for humans, one for each chromosome. As data comes in, it will be aligned to the representative sequence, which initially will have many gaps. Keen drew an analogy of GSDB as a community laboratory information-management system supporting what is essentially a multiyear, multilaboratory, multiorganism shotgun-assembly process. Feature accession numbers will enable separation of annotation from sequences.

Data Access

Although GSDB has the tools and the structure (normalized and atomized data) to answer such robust queries as annotation relationships, problems with data quality and consistency do not allow this to be done well. GSDB is now mounting a major effort to develop software for rationalizing the data stream as it enters the database.

GSDB has also developed an object-oriented access library that sits on top of the database. Almost all GSDB applications and the software that imports data from other databases work through this object layer. GSDB will make the object libraries and an application programming interface available to the public. Programmatic access will be through assigned accounts, and the database can be accessed either through the object libraries or directly on the table, row, and column level.

Availability

The new GSDB schema is complete and should be operational later this year. After fairly extensive alpha and beta testing, GSDB Annotator should be released at the same time on Mac and Sun, with Windows to follow. Software will be available via ftp from NCGR's Web site

 
 
 

Related Articles

 
Software Engineering Schools Offer College Degrees
 
Hoot - Children/Youth Book Review
 
The Demand for Water and Water Privatisation
 
Lots of High School Choices in the New York City Schools
 
How your brain works.
 
Save On Home Energy By Doing Nothing
 
Online Bachelor Degree: Lower Unemployment and Higher Pay
 
CEO Bonnie Copeland Leaving Baltimore Schools Cause for Concern
 
Distance Learning, Online Education, Electronic Education, Electronic Learning...Call It What You Wa
 
Nashville Schools Begin New Projects to Help Teachers and Parents
 
 
 
Add Url
 

Online Shopping

Technology & Science

Culture & Art

Recreation

News & Media

Sports

Teens & Children

Jobs & Employment

Automobiles

Self Management

Lifestyle & Fashion

Law & Politics

Banking & Finance

Healthcare & Medicine

Travel & Vacation

People & Communities

Drink & Food

Indoor Games

Property & Estate

Business & Companies

Home Family & Garden

Academics & Learning

Computers & Software

Hygiene & Health

 
Site Home >> Security & Privacy >> ToS  
Copyright © www.goodygracious.com - All Rights Reserved Worldwide.