Driving research for new drug therapies using big data storage

Driving research for new drug therapies using big data storage

‘I had a gut instinct that a hybrid on-premise and cloud computing solution was the best option for the Institute. The approach gave us the flexibility to grow and scale computing on demand’

The Van Andel Institute (VAI), has a 20-year legacy of biomedical research and scientific education with a focus on improving health and enhancing the lives of current and future generations.Formed in 1996, the organisation has evolved into a premier centre for research and education where more than 360 scientists, educators and staff are working to determine the origins of cancer, Parkinson’s and other diseases.

For example, scientists in VAI’s Center for Epigenetics are shedding light on the mechanisms that control how genes are regulated to help understand what happens when a cell transitions from a normal state into a rapidly dividing cancer cell.

The Institute’s scientists are working to translate discoveries into highly innovative and effective diagnostics and treatments. This field, called epigenetics, is a rapidly emerging and a hugely important area of cancer research.

>See also: OpenStack is great but ‘vendor nonsense’ is corrupting it

Zachary Jamjan joined the organisation in 2014 and is the HPC Solutions Architect. One of his roles is to ensure there is ample compute and storage power to continually push the research envelope.

With that in mind, he led the design, implementation and operation of a high-performance compute and storage solution that would meet both current and future research needs. The goal was to create a progressive computing platform anchored by powerful, scalable storage.

Storage to meet researchers’ needs

In the beginning, the institute sought to replace the fragmented storage silos with primary shared storage for all instrument and other research data. Centralised storage provides major cost savings and provides an extra measure of data protection by moving irreplaceable research and instrumentation data from individual hard drives onto a single system.

Centralised storage and computing meant the Institute could accommodate major growth, allowing us to expand our structural biology research programme, which is now home to a cryo-electron microscopy (cryo-EM) facility. These microscopes allow scientists to see the structure of molecules at a resolution of one-10,000th the width of a human hair.

Having sufficient compute and storage capacity is critical, as these high-end microscopes generate about 20TB of data every three days. The resulting strain on compute and storage can present challenges, especially since multiple users need to access images simultaneously.

‘Gut instinct’

As a result, a thorough evaluation of next generation HPC and storage solutions began, including cluster and cloud computing as well as parallel file and object storage. On one hand, the organisation wanted to take advantage of the performance and scalability delivered by a traditional Spectrum Scale-based parallel file system.

On the other hand, they also wanted the flexibility to implement a private cloud for storing and crunching large amounts of data efficiently and cost-effectively.

>See also: Hyperconvergence or how to optimally manage secondary data

Zach commented: “I had a gut instinct that a hybrid on-premise and cloud computing solution was the best option for the Institute. The approach gave us the flexibility to grow and scale computing on demand.”

The institute reviewed several file-system solutions before choosing DDN Storage’s GRIDScaler appliance, which offered all the desired features needed, in an easily expandable platform.

In addition, it wanted to be able to present a single, federated namespace across file and object. This was a key consideration in our selection of its object storage solution (also DDN) as both an active archive for storing ever-increasing amounts of unstructured data and as a research collaboration solution that facilitates data sharing and research collaboration.

The storage was actually one of its easiest decisions. The Institute can offer HPC users a place to work on their data and archive results while also ingesting massive amounts of instrument data from the dozens of next generation sequencers and electron microscopes.

13TB of data per day

The new storage solution makes it incredibly easy for Van Andel to put data where it best belongs, all within the context of a single system. The ability to store data in the most performance- and cost-efficient place gives enough flexibility to grow as research needs dictate, which is a “major breakthrough”.

>See also: 6 trends that will influence servers and computing in 2017

Hundreds of thousands of dollars will be saved by centralising storage for data-intensive research and a dozen data-hungry scientific instruments, which is great news. Standards of protection can be elevated, compliance increased and the boundaries of science can be pushed on a single yet highly scalable storage platform.

Expectations are that up to 13TB of data will be generated each day because extremely large datasets are automatically transferred to storage without our scientists having to worry about it at all.

Zachary finished: “It’s quite amazing that we can keep our data there and everyone can access it at the same time. This will allow our scientists to conduct even more groundbreaking research and accelerate the pace of major scientific discoveries.”


Sourced by Zachary Ramjan, research computing architect, Van Andel Institute

Source: Driving research for new drug therapies using big data storage