Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome or Safari browser.

Through the Galaxy into the bwGrid
That presentation was meant as live presentation. So the live parts will not be shown. Sorry for that.

May 2012 • Björn Grüning

  • An analysis and data integration tool

  • Open source, community driven software that makes integrating our own tools simple

  • Part of GMOD

http://galaxyproject.org

"Enable accessible, reproducible, and transparent computational research."
accessible - Webinterface
accessible - REST API
  • data upload and download

  • create and delete historys's

  • import and execute workflows

  • connecting galaxy to your sequencer

  • manage user and roles

"Enable accessible, reproducible, and transparent computational research."
reproducible - Reload experiments
reproducible - Reload experiments
reproducible - Workflows
"Enable accessible, reproducible, and transparent computational research."
transparent - sharing everything
share
transparent for other researchers
transparent
tools
  • Text Manipulation
  • Format Converters
  • Filtering and Sorting
  • Join, Subtract, Group
  • Sequence Tools
  • Multi-species Alignment Tools
  • Genomic Interval Operations
  • Summary Statistics
  • Graphing / Plotting
  • Regional Variation
  • EMBOSS
  • Evolution / Phylogeny
  • RNA-seq
  • ChIP-seq
  • GATK
  • RGenetics
  • CADDSuite
  • ...and more

    Integrating your own tools in Galaxy

    utilizing open source standards
    • xml based

    • cheetah template language

    • restructuredtext (docutils) syntax

    • build in unit-tests

    • automatic requirements check

    basic example
    transparent
    advanced features
    • conditionals

    • multiple outputs with different formats

    • build in parallelism method

    • input dependend selection lists

    • repeat

    • validator

    • config file creation

    community is key
    • ~1000 additional tools

    • ~315 publications, 150 in 2012

    • 21 known public instances

    • 3000 registered users (galaxy-users)

    • 600 registered developers (galaxy-dev)

    • 400 mails per month

    centralized - Galaxy main
    • ~500 new users per month

    • ~100 TB of user data

    • ~140,000 analysis jobs per month


    Centralized solution cannot scale to meet data analysis demands

    http://usegalaxy.org

    http://omicsmaps.com/

    local Galaxy instances
    • completely self-contained

    • easy to deploy

    • run jobs on existing compute clusters


    http://galaxy.pharmaceutical-bioinformatics.org

    pharmaceutical bioinformatics
    • 37 users

    • 20 jobs per day

    • 1 TB of user data

    • ~50 additional Tools

      • genomic annotation pipeline

      • exploring the chemical universe

    • GBrowse integration


    http://galaxy.pharmaceutical-bioinformatics.org

    downsides
    • investment in training users

    • missing tools

    • reproducibility and transparentness is storage intensive


    What else?
    The vision!
    • enter the cluster!
    • join forces
    • get more users on the cluster
    • extention beyond a local cluster?

    Marek Dynowski

    Michael Janczyk

    /me

    http://galaxy.bfg.uni-freiburg.de

    http://wiki.g2.bx.psu.edu/Community/GalaxyCzars

    Questions?

    Thank You!

    Björn Grüning

    bjoern.gruening@pharmazie.uni-freiburg.de

    http://www.pharmaceutical-bioinformatics.org