May 08 , 2018

ADAM

Share

Adam : Opensource Software in Healthcare Series – 8

 

  • Name: ADAM
  • Category: Genomics Sequencing
  • Programming Language: SPARK

Download ADAM Now

Overview

ADAM is an open-source library with a command line toolkit for analyzing genomic data. It is a set of formats, APIs, and processing state implementations that has focused on improving cross-platform portability, which can ultimately, lead to performance improvements. ADAM was designed to address some of the problems with distributing sequence data and parallelizing the processing of sequence data, including issues with scalability and compression.[i]

As a growing field in medical research, genomics deals with vast amounts of data as well as rapidly evolving sequencing technologies. Until relatively recently, software options for genetic sequencing proved difficult to use; it was filled with individually-developed tools, files rather than databases, file formats rather than data models, and with little-to-no parallelism across computing environments,[ii] which is a vital function in genomic sequencing.

Pros

The biggest benefit of ADAM is that it is revolutionary software that allows for genomic sequencing at a level that has never been done before. It provides a platform that addresses many of the needs that were lacking in existing genomics applications, including scalability and interoperability. By using Scala and Spark, it has addressed the issue of scalability in a way that provides better compression. It also offers efficient coding of nested type structures, optimized distributed storage, efficient serialization that is schema-based, and the ability to achieve massive parallel sequencing of random 100-150 base reads. ADAM structures the data in a way that is indexable and creates libraries.[iii] This combination of features makes this system a unique platform in the open source genomics field.

ADAM is also relatively easy to use. The code is much simpler than its predecessors, and it results in a platform that is much more effective in organizing large genomics analysis pipelines and workflows. As it is compatible with Hadoop, it is easy to deploy and support in most existing bioinformatics infrastructures and the learning curve for using ADAM is relatively low, with beginners only needing a basic familiarity with SPARK. The data science tools it uses include R, Python, Tableau, and Spotfire, which are very common and also, very user-friendly.

Cons

One of the bigger factors that makes ADAM appealing is its ability to function as a database, although it has only recently shown compatibility with SQL. Despite Spark SQL’s introduction in 2015, it did not work with ADAM until recent changes were made to the platform, which now allows for queries against data provided it ins in one of the more common genomics file formats. This capability is not universal to all formats.

ADAM also has limitations regarding its API support across several languages. While it is a leap forward in terms of cross platform compatibility, there are still barriers in terms of languages that are incompatible with the software, although the developers acknowledge this shortcoming, and intend to increase access to the platform over time.[iv]

Similar Opensource Software:

[OpenAPS : https://blueehr.com/blogs/openaps/]

[Carekit : https://blueehr.com/blogs/carekit/]

The Future

ADAM was initially developed by a group within the University of California of Berkeley, but the project is now being supported, in part, by a National Institute of Health (NIH) BD2K award, as well as a NIH Cancer Cloud Pilot award. Its network of contributors has also extended to include individuals from UC Santa Cruz, Icahn School of Medicine at Mount Sinai, Microsoft Research, Cloudera, and the Broad Institute.[v] The platform also appears to have a robust group of users and contributors that regularly post to the wiki. It appears that ADAM is a critical open source tool in genomics research that will likely be used, and improved upon, extensively in the coming years.

 


[i] Massie, M., Nothaft, F, Hartl, C., Kozanitis, C., Schumacher, A., Joseph, A.D, & Patterson, D.A. (2013). ADAM: Genomics formats and processing patterns for cloud scale computing. [Technical Report]. Electrical Engineering and Computer Sciences, University of California at Berkeley.

[ii]Danford, T. (2016). Cancer genomics analysis in the cloud with Spark and ADAM.Strata+Hadoop world.

[iii]Petrella, A. (2014). Lightning fast genomics with Spark, Adam, and Scala. {Presentation}.

[iv]Massie, M., Nothaft, F, Hartl, C., Kozanitis, C., Schumacher, A., Joseph, A.D, & Patterson, D.A. (2013). ADAM: Genomics formats and processing patterns for cloud scale computing. [Technical Report]. Electrical Engineering and Computer Sciences, University of California at Berkeley.

[v] Big Data Genomics. (2018). About Us.

Summary
Adam : Opensource Software in Healthcare Series - 8
Article Name
Adam : Opensource Software in Healthcare Series - 8
Description
ADAM is an open-source library with a command line toolkit for analyzing genomic data. It is a set of formats, APIs, and processing state implementations.
Author
Publisher Name
ZH Healthcare
Publisher Logo