Open Source Genomic Insights at Scale

Integration with Spark

Leverage the Spark execution engine, Scala and Python SDK and SQL with the Parque columnar storage format.

Advanced query tool for bioinformaticians

Work with genomic and phenotypic tabular data using declarative relational query language in a parallel execution engine.

Genomic ordered data architecture

Efficient data structures and commands for genomic analysis use-cases, such as range-queries and table joins.

GORpipe query syntax

Combines the best of SQL and Unix shell pipe syntax, supporting seek-able nested queries, materialized views, and a rich set of commands and functions.

Support for external commands

Define new commands using JVM language or shell scripts.

Compatible with standard formats

BAM, CRAM, VCF, Tabix, TSV, CSV.

Stored procedures

Setup parameterized functions using YML and FreeMarker scripts.