Managing data in your own lab - Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Managing data in your own lab Ian Berry Software Developer Oxford Protein Production Facility BIOXHIT Working Group 1 Coordinator
Overview • What is information management? • Types of laboratory processes • Benefits of a LIMS • What are the potential pitfalls • The ideal world… and reality • Tools: – PiMS – xtalPiMS – eHTPX – Data Processing and Structure Solution • Data mining
What is information management? • A process for storing information where it can be retrieved later – Processes include human memory and paper systems as well as sophisticated relational database systems – Purpose of retrieval can vary from supporting the next experiment to depositing data – Automated systems may require electronic information management – In a laboratory setting it can be considered a branch of bioinformatics
Types of Information Management • Paperbased records – Well suited to independent research – Longterm archive • Electronic Laboratory Notebooks (ELN) – Central repository of information – Electronic version of paper systems • Laboratory Information Management Systems (LIMS) – Relational database – Model for laboratory processes – Snapshot of current state of laboratory
Types of laboratory process • Full projects (studying one protein?) – Long workflow with many decision points – No two projects are identical • Oneoff experiments (specific assays?) – Planned for each experiment – Number of inputs and outputs undetermined • Routine experiments (purification?) – Performed the same way for several targets • Highthroughput experiments (cloning?) – Using specialised equipment including robots – Tracking samples becomes paramount
Benefits of LIMS • Distributed projects – Information can be accessed anywhere • Collaborative projects – Different people record into same store • Miniaturized projects – Labelling of samples becomes impossible • Automated projects – Handling layouts in plates etc. • Highthroughput processes – System managed by computer with automated sample tracking
In an ideal world… • When depositing your new structure you would be able to include: – Exactly how you processed bioinformatics – Exactly how you created the protein • What chemicals, sources, batch numbers, etc. – Exactly how this protein crystallized • All conditions with hits, which were used for collection (and why) – What Ligands / Soaks were used
In an ideal world… • Exactly how the data was collected – Which synchrotron / home source – Which beamline (beamline parameters) • Exactly how the data were processed? – Which programs / arguments • Plus… – A summary of the failed experiments to get the final result! i.e. the methods section of your structure paper!
Back to reality… • Good Information management is important: – It means we can do better science – It will mean that you have more time to do the science as more things are recorded automatically* * eventually!
Potential pitfalls of LIMS • Data loss – Hardware failure – manageable – Data corruption – potentially catastrophic • Data integrity – Data need to be described properly – LIMS can default to being ELN • Extra burden of recording data – Takes time for no immediate benefit – Need easy and intuitive input – risk of sloppiness • Compliance – Unrecorded data are lost – Incomplete data may break data “chain”
Potential pitfalls of LIMS • Different Lab practices: – Through the development of these tools it has become obvious…. • Everyone works differently! • Every lab has different processes! • Every lab has a different focus! – Do we create the LIMS according to the processes of one lab and force another to fit with that or make it so generic that it does not model any system perfectly?
What tools are we using? • PiMS for Protein Production • xtalPiMS for Crystallization • eHTPX for Managing Synchrotron trips • ISPyB for managing data collection at the Synchrotron • CCP4, Xtrack (and others) for managing data processing
How do they fit Machine together? Integration Data Management Crystallization (Xtrack) & Data Deposition (xtalPiMS) Processing (PDB) (CCP4) PiMS (Protein Production) eHTPX Synchrotro Machine Integration n (ISPyB) Machine Integration
What is PiMS? • A software development project aiming to develop an easytouse Laboratory Information Management System (LIMS) suitable for tracking the complex and rapidly evolving laboratory practices associated with protein production in the context of structural biology. • PiMS is being developed to commercial software standards of reliability and usability and will be freely available to academic laboratories.
Funding and Usage • UK Funded development with input and support from European labs. • It is being used or evaluated in several labs around the UK and Europe, e.g. Oxford, St Andrews, NKI Amsterdam. • Interest has been shown as far afield as China as well as by several major pharmaceutical companies.
Basic concepts of PiMS PiMS uses a few simple key concepts which can be linked together to model complex workflows • Targets – Description of sequences, store annotations • Constructs – Starting points for real experiments, link to targets • Samples – Tracked samples made & used by experiments – Samples have types, owners, locations etc. • Experiments – Take one (or more samples), produce new sample(s) as outputs
Experiments and protocols • A protocol is a reusable userdefined template describing what you record for your experiments. • Parameters – Numerical values, free text values, T/F. E.g. incubation temperature or the number of PCR cycles; details of incubation conditions; was reagent added? • Input Samples – Samples or reagents used when performing an experiment that you wish to track • Output Samples – Samples or reagents produced when performing an experiment that you wish to track
More about protocols
Typing of PiMS items Typing helps PiMS offer sensible choices: only a plasmid can be used for transfection experiments… • Samples – Typed to show what they are • Input/Output samples for protocols – State what type of sample can be used and what is produced • Experiments and protocols – An experiment type is defined by its protocol. A protocol type links similar protocols together
Experiments & samples → Workflows Sample A Expt 3 Expt 1 Sample D Sample B Expt 2 Expt 4 Sample C Sample E1 Sample E2
The PiMS holder (plate experiments) A holder groups samples. This allows PiMS to perform plate experiments in groups • Samples – For plate experiments output samples of previous experiment are mapped to input samples of next. (Provided sample type matches!) • User interface for plate experiments – Gives graphical and spreadsheet views. Allows editing, reformatting and spreadsheet upload
a c h a tt n Ca iles to & f l e s s m p ent sa erim exp
What is xtalPiMS? • An extension to PiMS to cover crystallization, crystal handling and data collection • Will integrate with automatic and manual imaging systems • Will integrate with liquid handling robots • Provides a single interface for viewing images from multiple imagers
Funding and usage • Funded for two years by BIOXHIT until June 2008. • Three developers: – Ian Berry (OPPF, UK) – Gael Seroul (EMBL Grenoble, France) – Diederick de Vries (NKI, The Netherlands) • Current version in full time use at the OPPF (20,000 plates, 50,000,000 images)
Basic concepts of xtalPiMS • Liquid Handling robot integration • Imager integration • Image processing / analysis • Webbased interface – Create experiments – Monitor experiments – Screen Management – Optimisation – Trip Management (merging with eHTPX) – Data Collection results
OPPF Crystallization Facility Robots
Concepts • Plate Experiment – Each plate can contain 1 or more plate experiments (either separated by subposition or location) • Plate Inspections – Whenever a plate is inspected by an imager or human on a microscope a new plate inspection is created • Annotations – An annotation is a “score” for an image, e.g. crystal
What is eHTPX? • Simple answer: – A great many things! • Longer answer… – A client at the home lab for managing: • Crystal Handling. • Synchrotron trips. • Shipments between sites. • Data collection information. – A server at a remote site • For receiving information from the home sites. • Providing the “service” (e.g. data collection). • Providing access to the results / data about what happened.
What is eHTPX? • For a scientist it provides a management tool for: – Lab hardware (pins, pucks, etc) – Crystal handling – Mounting crystals – Shipments to synchrotrons – Return shipments – Beamline metadata retrieval – Upload of data via Excel Spreadsheet
Funding and usage • Funding came from UK BBSRC • eHTPX clients have been used by: – York Structural Biology Lab, University of York – OPPF, Oxford – University of Oulu, Finland – Adam Mickiewicz University, Poznan, Poland – University of Crete, Crete • Servers (ISPyB) at ESRF and Diamond
Basic Concepts • Crystal Drops • Mounted Drops • Pins • Pucks / Canes • Dewars • Plate Storage • Shipping Agents • Locations • Diffraction Metadata
“Speaks eHTPX” • Every Synchrotron has a different data collection database. • Every home source has a different data collection database. • But… as long as they “Speak eHTPX”, you will be able to submit your crystal data and get your data collection metadata home again to store in a local database. • We are working with Synchrotrons to integrate e HTPX messaging into their systems.
The future of eHTPX • The current version is available for use – Not straightforward to install at this stage • eHTPX will be integrated into xtalPIMS to provide the seamless integration of crystallization and data collection information for the home lab.
Data Processing and Structure Solution
Data Processing with XIA2 • xia2 is an automated data reduction system designed to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure solution, e.g. through Mr BUMP or your favourite experimental phasing suite.
Structure Solution • Many data processing pipelines available • Several systems for storing data collection information… • Existing solutions: – Harvesting tools exist within CCP4. – pdb_extract suite – HKL 3000 – The XTRACK database • A system based on these will be included in xtalPIMS
Loading from ISPyB to XTrack
XTrack http://xray.bmc.uu.se/xtrack/
Data Mining
Data Mining • The bonus of having everything recorded is the ability to feedback information and improve techniques and get better science! • Example: OPPF Glycosylated Protein Screen – conditions taken from standard crystallization sparsematrix screens and reformatted to provide a good first pass at getting crystals based on prior knowledge
Acknowledgements • BIOXHIT • OPPF – Robert Esnouf – Jon Diprose – Dave Stuart • NKI – Tassos Perrakis – Diederick de Vries (xtalPiMS developer) • EMBL Grenoble – Josan Marquez – Gael Seroul (xtalPiMS developer) • All the PiMS Developers • All the eHTPX Developers • All the ESRF ISPyB Developers
More information • PiMS – http://www.pimslims.org • xtalPiMS – http://www.oppf.ox.ac.uk/xtalpims • eHTPX – http://www.oppf.ox.ac.uk/ehtpx • XIA2 – http://www.ccp4.ac.uk/xia/ • XTrack – http://xray.bmc.uu.se/xtrack/
You can also read