Bioprocessing engineers know that they can benefit from machine learning, mainly through improved control over quality and performance parameters, but they also believe that machine learning requires a lot of work—the installation, configuration, and maintenance of various data systems and tools. In fact, this work looks so daunting that most bioprocessing engineers are inclined to postpone any moves to machine learning.
“A data system typically becomes wedded to the process that existed when the IP was created, resulting in a rigid system that prevents process improvements,” says Timothy Gardner, PhD, the founder and CEO of Riffyn. “So, you live with it.”
Overcoming a data system’s limitations is especially difficult if it is approached as a large, one-time task, rather than as a succession of smaller tasks. The latter approach, Gardner realized, reminded him of his son’s musicianship. Instead of writing a guitar part ahead of time, all in one go, Gardner’s son works with riffs, refining and elaborating them over and over.
AI aggregates decision-making data
“The DMAIC cycle—define, measure, analyze, improve, control—is the essence of what we do in process development, but it’s hard to do in bioprocessing,” Gardner explains, “because there are so many unit operations, process variables, instruments, and data sources that are siloed and need to be brought together.”
Aggregating them all can take months, and bioprocessing engineers lack the time. Instead, bioprocessing engineers resort to expedients. “They look at bits of data, make inferences, and hope for the best,” says Gardner.
Another approach is possible. It’s SDE, Riffyn’s machine learning system. According to Gardner, it has halved process development time for many of its customers, while halving the effort required from customers’ personnel. Riffyn’s SDE system flexibly integrates siloed data with visual drag-and-drop process design, automated data context and integration, user configurable file parsing, and programmatic interfaces that integrate with third-party applications.
Gardner insists that the system is based on long-proven engineering techniques. “Linear regression,” he points out, “is a form of machine learning and the foundation of good engineering processes, and so is predictive statistical modeling for closed-loop controls. But they aren’t widely used. Most engineers still use basic hand calculations of small sets of batch data to determine their conclusions.” Consequently, data is often incomplete and insufficient.
If machine learning is to be effective, it needs properly aligned and annotated data from multiple experiments. Even then, process variability may obscure any patterns. “Other industries overcome that by averaging the data from multiple batches or experiments, but biotech companies often don’t have that data,” Gardner says. Instead, they have point-in-time data from many instruments.
Seeing is believing
Gardner saw the benefits of using machine learning in bioprocessing firsthand, working for an industrial biotech company. “We mined 500 historical fermentation profiles from unrelated experiments on a family tree of cell lines, extracting process factor correlations that led to a breakthrough in process modeling,” he recalls. “We scaled from 0.5-L reactors to 200,000-L reactors in just 3 months while attaining perfect performance replication. Previously, a comparable scaleup project took 12 months, and performance was still inferior to that achieved at laboratory scale.”
In 2014, he founded Riffyn. “Initially, we had to educate people on the basics of statistics (beyond calculating p-values) and to structure data for analysis,” he says. Then Riffyn had to teach clients about design and implementation. Today, potential clients already understand the benefits of machine learning. Now they are focused on implementation.
Riffyn’s clients are seeing benefits. Gardner cites a biologics developer that optimized its cell line development and thereby achieved a 20-fold increase in bioprocessing throughput. This improvement allowed the developer to introduce four new, category-leading products within 18 months using half the previous effort.
Map the process
“Before you start, you need a blueprint of your process data system,” Gardner advises. “Otherwise, this is like engineering a vehicle with 100 sensors, but not specifying whether the vehicle is a motorcycle or truck.”
“Riffyn’s SDE system creates a model of the complete process that captures all the data and tracks each design change,” he continues. This enables data to be analyzed and correlated to changes in outcomes, thus enabling continuous improvements.
The system populates the model with real data and automates the integration of design and data within seconds, so users can see the data and its implications within minutes. This process-centered approach, says Gardner, sets
Riffyn’s SDE apart from its competitors’ systems.
Bringing the data together provides the context, and automatic integration lays the foundation for process changes. “We focus on the capture, context, and preparation of integrative data for machine learning,” he explains. “We put it all together so clients can run analytics. We make machine learning routine.”
Typical applications include:
- Process variance analysis.
- Multivariate root cause analysis.
- Time series smoothing and interpolation.
- Deviation and outlier analysis.
- Pattern search.
- Model parameterization and prediction.
Tips to start
Implementing machine learning isn’t about doing the same thing with a new tool. Instead, it’s about changing the way bioprocess engineers work. To achieve the gains promised by machine learning, it is necessary to change processes, workflows, and data collection and analysis procedures, as well to provide ample workforce training.
For anyone integrating machine learning into their operations, the most important thing, Gardner says, is to identify what you want to achieve. Then you can figure out the metrics you need to follow to determine whether you’ve succeeded. “People often have a vague sense of improving, but don’t determine what that really means,” Gardner notes. “Focus on quality. Know your finish line. If you can’t define success, you’ll never achieve it.”
He advises clients to first map out all their bioprocesses and all the variables and materials driving them. Subsequently, clients should “perform a prospective risk analysis to estimate the key drivers of both value and risk,” he says. “These are the critical aspects of the process they should begin tracking.” With this foundation, they can use machine learning to improve their process performance.
Gardner describes what happens once a client signs with Riffyn: “Our scientific teams work side by side with the client to build the first example (building a process, collecting data against it, and analyzing it using machine learning). Then the client’s team builds another example with our guidance. Finally, the client’s team teaches other client personnel to build one.” Depending on the scope and complexity of the process, the training exercise typically takes 3–12 weeks.
Next: Expanding capabilities
As a self-aware startup company, Riffyn recognizes that it may, like many startups, be tempted to overextend itself. Consequently, the company is careful to guard against this temptation. “In the early days, some team members literally worked round the clock,” Gardner admits, “which caused stress and frustration.” The balance is better now, and the company is building out its operational capability to help customers.
“The industry needs a data interchange standard that translates like the batch machine learning and XML standards that are used in manufacturing but are not yet used between R&D and manufacturing.” When such a standard is developed, implementing machine learning will be simpler than ever.