Showing posts with label data analysis. Show all posts
Showing posts with label data analysis. Show all posts

Tuesday, January 17, 2017

Fast Acquisition of Diverse Unstructured Data Sources Makes IDOL API Tools a Star at LogitBot

Transcript of a discussion on how high-performing big-data analysis powers an innovative artificial intelligence-based investment tool.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition to the Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on digital transformation. Stay with us now to learn how agile businesses are fending off disruption -- in favor of innovation.

Gardner
Our next case study highlights how high-performing big-data analysis powers an innovative artificial intelligence (AI)-based investment opportunity and evaluation tool. We'll learn how LogitBot in New York identifies, manages, and contextually categorizes truly massive and diverse data sources.

By leveraging entity recognition APIs, LogitBot not only provides investment evaluations from across these data sets, it delivers the analysis as natural-language information directly into spreadsheets as the delivery endpoint. This is a prime example of how complex cloud-to core-to edge processes and benefits can be managed and exploited using the most responsive big-data APIs and services.

To describe how a virtual assistant for targeting investment opportunities is being supported by cloud-based big-data services, we're joined by Mutisya Ndunda, Founder and CEO of LogitBot in New York. Welcome.

Mutisya Ndunda: Thank you so much for having us.

Gardner: We're also here with Michael Bishop, CTO of LogicBot. Welcome, Michael.

Michael Bishop: Thank you for having us. It’s good to be here.
Humanization of Machine Learning
For Big Data Success
Learn More
Gardner: Let’s look at some of the trends driving your need to do what you're doing with AI and bots, bringing together data, and then delivering it in the format that people want most. What’s the driver in the market for doing this?

Ndunda: LogitBot is all about trying to eliminate friction between people who have very high-value jobs and some of the more mundane things that could be automated by AI.

Ndunda
Today, in finance, the industry, in general, searches for investment opportunities using techniques that have been around for over 30 years. What tends to happen is that the people who are doing this should be spending more time on strategic thinking, ideation, and managing risk. But without AI tools, they tend to get bogged down in the data and in the day-to-day. So, we've decided to help them tackle that problem.

Gardner: Let the machines do what the machines do best. But how do we decide where the demarcation is between what the machines do well and what the people do well, Michael?

Bishop: We believe in empowering the user and not replacing the user. So, the machine is able to go in-depth and do what a high-performing analyst or researcher would do at scale, and it does that every day, instead of once a quarter, for instance, when research analysts would revisit an equity or a sector. We can do that constantly, react to events as they happen, and replicate what a high-performing analyst is able to do.

Gardner: It’s interesting to me that you're not only taking a vast amount of data and putting it into a useful format and qualitative type, but you're delivering it in a way that’s demanded in the market, that people want and use. Tell me about this core value and then the edge value and how you came to decide on doing it the way you do?

Evolutionary process

Ndunda: It’s an evolutionary process that we've embarked on or are going through. The industry is very used to doing things in a very specific way, and AI isn't something that a lot of people are necessarily familiar within financial services. We decided to wrap it around things that are extremely intuitive to an end user who doesn't have the time to learn technology.

So, we said that we'll try to leverage as many things as possible in the back via APIs and all kinds of other things, but the delivery mechanism in the front needs to be as simple or as friction-less as possible to the end-user. That’s our core principle.

Bishop: Finance professionals generally don't like black boxes and mystery, and obviously, when you're dealing with money, you don’t want to get an answer out of a machine you can’t understand. Even though we're crunching a lot of information and  making a lot of inferences, at the end of the day, they could unwind it themselves if they wanted to verify the inferences that we have made.

Bishop
We're wrapping up an incredibly complicated amount of information, but it still makes sense at the end of the day. It’s still intuitive to someone. There's not a sense that this is voodoo under the covers.

Gardner: Well, let’s pause there. We'll go back to the data issues and the user-experience issues, but tell us about LogitBot. You're a startup, you're in New York, and you're focused on Wall Street. Tell us how you came to be and what you do, in a more general sense.

Ndunda: Our professional background has always been in financial services. Personally, I've spent over 15 years in financial services, and my career led me to what I'm doing today.

In the 2006-2007 timeframe, I left Merrill Lynch to join a large proprietary market-making business called Susquehanna International Group. They're one of the largest providers of liquidity around the world. Chances are whenever you buy or sell a stock, you're buying from or selling to Susquehanna or one of its competitors.

What had happened in that industry was that people were embracing technology, but it was algorithmic trading, what has become known today as high-frequency trading. At Susquehanna, we resisted that notion, because we said machines don't necessarily make decisions well, and this was before AI had been born.

Internally, we went through this period where we had a lot of discussions around, are we losing out to the competition, should we really go pure bot, more or less? Then, 2008 hit and our intuition of allowing our traders to focus on the risky things and then setting up machines to trade riskless or small orders paid off a lot for the firm; it was the best year the firm ever had, when everyone else was falling apart.

That was the first piece that got me to understand or to start thinking about how you can empower people and financial professionals to do what they really do well and then not get bogged down in the details.

Then, I joined Bloomberg and I spent five years there as the head of strategy and business development. The company has an amazing business, but it's built around the notion of static data. What had happened in that business was that, over a period of time, we began to see the marketplace valuing analytics more and more.

Make a distinction

Part of the role that I was brought in to do was to help them unwind that and decouple the two things -- to make a distinction within the company about static information versus analytical or valuable information. The trend that we saw was that hedge funds, especially the ones that were employing systematic investment strategies, were beginning to do two things, to embrace AI or technology to empower your traders and then also look deeper into analytics versus static data.

That was what brought me to LogitBot. I thought we could do it really well, because the players themselves don't have the time to do it and some of the vendors are very stuck in their traditional business models.

Bishop: We're seeing a kind of renaissance here, or we're at a pivotal moment, where we're moving away from analytics in the sense of business reporting tools or understanding yesterday. We're now able to mine data, get insightful, actionable information out of it, and then move into predictive analytics. And it's not just statistical correlations. I don’t want to offend any quants, but a lot of technology [to further analyze information] has come online recently, and more is coming online every day.

For us, Google had released TensorFlow, and that made a substantial difference in our ability to reason about natural language. Had it not been for that, it would have been very difficult one year ago.

At the moment, technology is really taking off in a lot of areas at once. That enabled us to move from static analysis of what's happened in the past and move to insightful and actionable information.
Relying on a backward-looking mechanism of trying to interpret the future is kind of really dangerous, versus having a more grounded approach.

Ndunda: What Michael kind of touched on there is really important. A lot of traditional ways of looking at financial investment opportunities is to say that historically, this has happened. So, history should repeat itself. We're in markets where nothing that's happening today has really happened in the past. So, relying on a backward-looking mechanism of trying to interpret the future is kind of really dangerous, versus having a more grounded approach that can actually incorporate things that are nontraditional in many different ways.

So, unstructured data, what investors are thinking, what central bankers are saying, all of those are really important inputs, one part of any model 10 or 20 years ago. Without machine learning and some of the things that we are doing today, it’s very difficult to incorporate any of that and make sense of it in a structured way.

Gardner: So, if the goal is to make outlier events your friend and not your enemy, what data do you go to to close the gap between what's happened and what the reaction should be, and how do you best get that data and make it manageable for your AI and machine-learning capabilities to exploit?

Ndunda: Michael can probably add to this as well. We do not discriminate as far as data goes. What we like to do is have no opinion on data ahead of time. We want to get as much information as possible and then let a scientific process lead us to decide what data is actually useful for the task that we want to deploy it on.

As an example, we're very opportunistic about acquiring information about who the most important people at companies are and how they're connected to each other. Does this guy work on a board with this or how do they know each other? It may not have any application at that very moment, but over the course of time, you end up building models that are actually really interesting.

We scan over 70,000 financial news sources. We capture news information across the world. We don't necessarily use all of that information on a day-to-day basis, but at least we have it and we can decide how to use it in the future.

We also monitor anything that companies file and what management teams talk about at investor conferences or on phone conversations with investors.

Bishop: Conference calls, videos, interviews.

Audio to text

Ndunda: HPE has a really interesting technology that they have recently put out. You can transcribe audio to text, and then we can apply our text processing on top of that to understand what management is saying in a structural, machine-based way. Instead of 50 people listening to 50 conference calls you could just have a machine do it for you.

Gardner: Something we can do there that we couldn't have done before is that you can also apply something like sentiment analysis, which you couldn’t have done if it was a document, and that can be very valuable.

Bishop: Yes, even tonal analysis. There are a few theories on that, that may or may not pan out, but there are studies around tone and cadence. We're looking at it and we will see if it actually pans out.

Gardner: And so do you put this all into your own on-premises data-center warehouse or do you take advantage of cloud in a variety of different means by which to corral and then analyze this data? How do you take this fire hose and make it manageable?

Bishop: We do take advantage of the cloud quite aggressively. We're split between SoftLayer and Google. At SoftLayer we have bare-metal hardware machines and some power machines with high-power GPUs.
Humanization of Machine Learning
For Big Data Success
Learn More
On the Google side, we take advantage of Bigtable and BigQuery and some of their infrastructure tools. And we have good, old PostgreSQL in there, as well as DataStax, Cassandra, and their Graph as the graph engine. We make liberal use of HPE Haven APIs as well and TensorFlow, as I mentioned before. So, it’s a smorgasbord of things you need to corral in order to get the job done. We found it very hard to find all of that wrapped in a bow with one provider.

We're big proponents of Kubernetes and Docker as well, and we leverage that to avoid lock-in where we can. Our workload can migrate between Google and the SoftLayer Kubernetes cluster. So, we can migrate between hardware or virtual machines (VMs), depending on the horsepower that’s needed at the moment. That's how we handle it.

Gardner: So, maybe 10 years ago you would have been in a systems-integration capacity, but now you're in a services-integration capacity. You're doing some very powerful things at a clip and probably at a cost that would have been impossible before.

Bishop: I certainly remember placing an order for a server, waiting six months, and then setting up the RAID drives. It's amazing that you can just flick a switch and you get a very high-powered machine that would have taken six months to order previously. In Google, you spin up a VM in seconds. Again, that's of a horsepower that would have taken six months to get.

Gardner: So, unprecedented innovation is now at our fingertips when it comes to the IT side of things, unprecedented machine intelligence, now that the algorithms and APIs are driving the opportunity to take advantage of that data.

Let's go back to thinking about what you're outputting and who uses that. Is the investment result that you're generating something that goes to a retail type of investor? Is this something you're selling to investment houses or a still undetermined market? How do you bring this to market?

Natural language interface

Ndunda: Roboto, which is the natural-language interface into our analytical tools, can be custom tailored to respond, based on the user's level of financial sophistication.

At present, we're trying them out on a semiprofessional investment platform, where people are professional traders, but not part of a major brokerage house. They obviously want to get trade ideas, they want to do analytics, and they're a little bit more sophisticated than people who are looking at investments for their retirement account.  Rob can be tailored for that specific use case.

He can also respond to somebody who is managing a portfolio at a hedge fund. The level of depth that he needs to consider is the only differential between those two things.

In the back, he may do an extra five steps if the person asking the question worked at a hedge fund, versus if the person was just asking about why is Apple up today. If you're a retail investor, you don’t want to do a lot of in-depth analysis.

Bishop: You couldn’t take the app and do anything with it or understand it.
If our initial findings here pan out or continue to pan out, it's going to be a very powerful interface.

Ndunda: Rob is an interface, but the analytics are available via multiple venues. So, you can access the same analytics via an API, a chat interface, the web, or a feed that streams into you. It just depends on how your systems are set up within your organization. But, the data always will be available to you.

Gardner: Going out to that edge equation, that user experience, we've talked about how you deliver this to the endpoints, customary spreadsheets, cells, pivots, whatever. But it also sounds like you are going toward more natural language, so that you could query, rather than a deep SQL environment, like what we get with a Siri or the Amazon Echo. Is that where we're heading?

Bishop: When we started this, trying to parameterize everything that you could ask into enough checkboxes and forums pollutes the screen. The system has access to an enormous amount of data that you can't create a parameterized screen for. We found it was a bit of a breakthrough when we were able to start using natural language.

TensorFlow made a huge difference here in natural language understanding, understanding the intent of the questioner, and being able to parameterize a query from that. If our initial findings here pan out or continue to pan out, it's going to be a very powerful interface.

I can't imagine having to go back to a SQL query if you're able to do it natural language, and it really pans out this time, because we’ve had a few turns of the handle of alleged natural-language querying.

Gardner: And always a moving target. Tell us specifically about SentryWatch and Precog. How do these shake out in terms of your go-to-market strategy?

How everything relates

Ndunda: One of the things that we have to do to be able to answer a lot of questions that our customers may have is to monitor financial markets and what's impacting them on a continuous basis. SentryWatch is literally a byproduct of that process where, because we're monitoring over 70,000 financial news sources, we're analyzing the sentiment, we're doing deep text analysis on it, we're identifying entities and how they're related to each other, in all of these news events, and we're sticking that into a knowledge graph of how everything relates to everything else.

It ends up being a really valuable tool, not only for us, but for other people, because while we're building models. there are also a lot of hedge funds that have proprietary models or proprietary processes that could benefit from that very same organized relational data store of news. That's what SentryWatch is and that's how it's evolved. It started off with something that we were doing as an import and it's actually now a valuable output or a standalone product.

Precog is a way for us to showcase the ability of a machine to be predictive and not be backward looking. Again, when people are making investment decisions or allocation of capital across different investment opportunities, you really care about your forward return on your investments. If I invested a dollar today, am I likely to make 20 cents in profit tomorrow or 30 cents in profit tomorrow?

We're using pretty sophisticated machine-learning models that can take into account unstructured data sources as part of the modeling process. That will give you these forward expectations about stock returns in a very easy-to-use format, where you don't need to have a PhD in physics or mathematics.
We're using pretty sophisticated machine-learning models that can take into account unstructured data sources as part of the modeling process.

You just ask, "What is the likely return of Apple over the next six months," taking into account what's going on in the economy.  Apple was fined $14 billion. That can be quickly added into a model and reflect a new view in a matter of seconds versus sitting down in a spreadsheet and trying to figure out how it all works out.

Gardner: Even for Apple, that's a chunk of change.

Bishop: It's a lot money, and you can imagine that there were quite a few analysts on Wall Street in Excel, updating their models around this so that they could have an answer by the end of the day, where we already had an answer.

Gardner: How do the HPE Haven OnDemand APIs help the Precog when it comes to deciding those sources, getting them in the right format, so that you can exploit?

Ndunda: The beauty of the platform is that it simplifies a lot of development processes that an organization of our size would have to take on themselves.

The nice thing about it is that a drag-and-drop interface is really intuitive; you don't need to be specialized in Java, Python, or whatever it is. You can set up your intent in a graphical way, and then test it out, build it, and expand it as you go along. The Lego-block structure is really useful, because if you want to try things out, it's drag and drop, connect the dots, and then see what you get on the other end.

For us, that's an innovation that we haven't seen with anybody else in the marketplace and it cuts development time for us significantly.

Gardner: Michael, anything more to add on how this makes your life a little easier?

Lowering cost

Bishop: For us, lowering the cost in time to run an experiment is very important when you're running a lot of experiments, and the Combinations product enables us to run a lot of varied experiments using a variety of the HPE Haven APIs in different combinations very quickly. You're able to get your development time down from a week, two weeks, whatever it is to wire up an API to assist them.

In the same amount of time, you're able to wire the initial connection and then you have access to pretty much everything in Haven. You turn it over to either a business user, a data scientist, or a machine-learning person, and they can drag and drop the connectors themselves. It makes my life easier and it makes the developers’ lives easier because it gets back time for us.

Gardner: So, not only have we been able to democratize the querying, moving from SQL to natural language, for example, but we’re also democratizing the choice on sources and combinations of sources in real time, more or less for different types of analyses, not just the query, but the actual source of the data.
The power of a lot of this stuff is in the unstructured world, because valuable information typically tends to be hidden in documents.

Bishop: Correct.

Ndunda: Again, the power of a lot of this stuff is in the unstructured world, because valuable information typically tends to be hidden in documents. In the past, you'd have to have a team of people to scour through text, extract what they thought was valuable, and summarize it for you. You could miss out on 90 percent of the other valuable stuff that's in the document.

With this ability now to drag and drop and then go through a document in five different iterations by just tweaking, a parameter is really useful.

Gardner: So those will be IDOL-backed APIs that you are referring to.

Ndunda: Exactly.

Bishop: It’s something that would be hard for an investment bank, even a few years ago, to process. Everyone is on the same playing field here or starting from the same base, but dealing with unstructured data has been traditionally a very difficult problem. You have a lot technologies coming online as APIs; at the same time, they're also coming out as traditional on-premises [software and appliance] solutions.

We're all starting from the same gate here. Some folks are little ahead, but I'd say that Facebook is further ahead than an investment bank in their ability to reason over unstructured data. In our world, I feel like we're starting basically at the same place that Goldman or Morgan would be.

Gardner: It's a very interesting reset that we’re going through. It's also interesting that we talked earlier about the divide between where the machine and the individual knowledge worker begins or ends, and that's going to be a moving target. Do you have any sense of how that changes its characterization of what the right combination is of machine intelligence and the best of human intelligence?

Empowering humans

Ndunda: I don’t foresee machines replacing humans, per se. I see them empowering humans, and to the extent that your role is not completely based on a task, if it's based on something where you actually manage a process that goes from one end to another, those particular positions will be there, and the machines will free our people to focus on that.

But, in the case where you have somebody who is really responsible for something that can be automated, then obviously that will go away. Machines don't eat, they don’t need to take vacation, and if it’s a task where you don't need to reason about it, obviously you can have a computer do it.

What we're seeing now is that if you have a machine sitting side by side with a human, and the machine can pick up on how the human reasons with some of the new technologies, then the machine can do a lot of the grunt work, and I think that’s the future of all of this stuff.
I don’t foresee machines replacing humans, per se. I see them empowering humans.

Bishop: What we're delivering is that we distill a lot of information, so that a knowledge worker or decision-maker can make an informed decision, instead of watching CNBC and being a single-source reader. We can go out and scour the best of all the information, distill it down, and present it, and they can choose to act on it.

Our goal here is not to make the next jump and make the decision. Our job is to present the information to a decision-maker.

Gardner: It certainly seems to me that the organization, big or small, retail or commercial, can make the best use of this technology. Machine learning, in the end, will win.

Ndunda: Absolutely. It is a transformational technology, because for the first time in a really long time, the reasoning piece of it is within grasp of machines. These machines can operate in the gray area, which is where the world lives.

Gardner: And that gray area can almost have unlimited variables applied to it.

Ndunda: Exactly. Correct.
Humanization of Machine Learning
For Big Data Success
Learn More
Gardner: I'm afraid we'll have to leave it there. We've been exploring how high-performing big-data analysis powers an innovative artificial intelligence-based investment opportunity in a valuation tool, and we've learned how LogitBot in New York identifies, manages, and contextually categorizes truly massive and diverse data sources.

So please join me in thanking our guests, Mutisya Ndunda, Founder and CEO of LogitBot in New York. Thank you, sir.

Ndunda: It was a pleasure. Thank you so much.

Gardner: We've also been here with Michael Bishop, CTO of LogicBot. Thank you, Michael.

Bishop: Thank you, Dana.

Gardner: And a big thank you as well to our audience for joining us for this Hewlett-Packard Enterprise, Voice of the Customer digital transformation discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE sponsored interviews. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how high-performing big-data analysis powers an innovative artificial intelligence-based investment opportunity. Copyright Interarbor Solutions, LLC, 2005-2016. All rights reserved.

You may also be interested in:

Tuesday, October 11, 2016

How Propelling Instant Results to the Excel Edge Democratizes Advanced Analytics

Transcript of a discussion on how HTI Labs in London provides the means and governance with their Schematiq tool to bring critical data to the spreadsheet interface users want most.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition to the Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on digital transformation. Stay with us now to learn how agile businesses are fending off disruption in favor of innovation.

Gardner
Our next case study highlights how powerful and diverse financial information is delivered to the ubiquitous Excel spreadsheet edge. We'll explore how HTI Labs in London provides the means and governance with Schematiq to bring critical data to the interface users want.

By leveraging the best of instant cloud-delivered information with spreadsheets, Schematiq democratizes end-user empowerment while providing powerful new ways to harness and access complex information.

To describe how complex cloud core-to-edge processes and benefits can be managed and exploited, we're joined by Darren Harris, CEO and Co-Founder of HTI Labs in London.

Welcome, Darren.
Learn More About
Haven OnDemand
Sign Up Now
Darren Harris: Thank you. It's great to be here.

Gardner: We're also here with Jonathan Glass, CTO and Co-Founder of HTI Labs. Welcome, Jonathan.

Jonathan Glass: Hi. Thank you.

Gardner: Let's put some context around this first. What major trends in the financial sector led you to create HTI Labs, and what are the problems you're seeking to solve?

Harris
Harris: Obviously, in finance, spreadsheets are widespread and are being used for a number of varying problems. A real issue started a number of years ago, where spreadsheets got out of control. People were using them everywhere, causing lots of operational risk processes. They wanted to get their hands around it for governance, and there were loads that we needed to eradicate -- Excel-type issues.

That led to the creation of centralized teams that locked down rigid processes and effectively took away a lot of the innovation and discovery process that traders are using to spot opportunities and explore data.

Through this process, we're trying to help with governance to understand the tools to explore, and [deliver] the ability to put the data in the hands of people ... [with] the right balance.

So by taking the best of regulatory scrutiny around what a person needs, and some innovation that we put into Schematiq, we see an opportunity to take Excel to another level -- but not sacrifice the control that’s needed.

Gardner: Jonathan, are there technology trends that allowed you to be able to do this, whereas it may not have been feasible economically or technically before?

Upstream capabilities

Glass: There are lot of really great back-end technologies that are available now, along with the ability to either internally or externally scale compute resources. Essentially, the desktop remains quite similar. Excel has remained quite the same, but the upstream capabilities have really grown.

Glass
So there's a challenge. Data that people feel they should have access to is getting bigger, more complex, and less structured. So Excel, which is this great front-end to come to grips with data, is becoming a bit of bottleneck in terms of actually keeping up with the data that's out there that people want.

Gardner: So, we're going to keep Excel. We're not going to throw the baby out with the bathwater, so to speak, but we are going to do something a little bit different and interesting. What is it that we're now putting into Excel and how is that different from what was available in the past?

Harris: Schematiq extends Excel and allows it to access unstructured data. It also reduces the complexity and technical limitations that Excel has as an out-of-the-box product.

We have the notion of a data link that's effectively in a single cell that allows you to reference data that’s held externally on a back-end site. So, where people used to ingest data from another system directly into Excel, and effectively divorce it from the source, we can leave that data where it is.

It's a paradigm of take a question to the data; don’t pull the data to the question. That means we can leverage the power of the big-data platforms and how they process an analytic database on the back-end, but where you can effectively use Excel as the front screen. Ask questions from Excel, but push that query to the back-end. That's very different in terms of the model that most people are used to working with in Excel.

Gardner: This is a two-way street. It's a bit different. And you're also looking at the quality, compliance, and regulatory concerns over that data.

Harris: Absolutely. An end-user is able to break down or decompose any workflow process with data and debug it the same way they can in a spreadsheet. The transparency that we add on top of Excel’s use with Schematiq allows us to monitor what everybody is doing and the function they're using. So, you can give them agility, but still maintain the governance and the control.

In organizations, lots of teams have become disengaged. IT has tried to create some central core platform that’s quite restrictive, and it's not really serving the users. They have gotten disengaged and they've created what Gartner referred to as the Shadow BI Team, with databases under their desk, and stuff like that.

By bringing in Schematiq we add that transparency back, and we allow IT and the users to have an informed discussion -- a very analytic conversation -- around what they're using, how they are using it, where the bottlenecks are. And then, they can work out where the best value is. It's all about agility and control. You just can't give the self-service tools to an organization and not have the transparency for any oversight or governance.

To the edge

Gardner: So we have, in a sense, brought this core to the edge. We've managed it in terms of compliance and security. Now, we can start to think about how creative we can get with what's on that back-end that we deliver. Tell us a little bit about what you go after, what your users want to experiment with, and then how you enable that.

Glass: We try to be as agnostic to that as we can, because it's the creativity of the end-user that really drives value.

We have a variety of different data sources, traditional relational databases, object stores, OLAP cubes, APIs, web queries, and flat files. People want to bring that stuff together. They want some way that they can pull this stuff in from different sources and create something that's unique. This concept of putting together data that hasn't been put together before is where the sparks start to fly and where the value really comes from.

Gardner: And with Schematiq you're enabling that aggregation and cleansing ability to combine, as well as delivering it. Is that right?
The iteration curve is so much tighter and the cost of doing that is so much less. Users are able to innovate and put together the scenario of the business case for why this is a good idea.

Harris: Absolutely. It's that discovery process. It may be very early on in a long chain. This thing may progress to be something more classic, operational, and structured business intelligence (BI), but allowing end-users the ability to cleanse, explore data, and then hand over an artifact that someone in the core team can work with or use as an asset. The iteration curve is so much tighter and the cost of doing that is so much less. Users are able to innovate and put together the scenario of the business case for why this is a good idea.

The only thing I would add to the sources that Jon has just mentioned is with HPE Haven OnDemand, [you gain access to] the unstructured analytics, giving the users the ability to access and leverage all of the HPE IDOL capabilities. That capability is a really powerful and transformational thing for businesses.

They have such a set of unstructured data [services] available in voice and text, and when you allow business users access to that data, the things they come up with, their ideas, are just quite amazing.

Technologists always try to put themselves in the minds of the users, and we've all historically done a bad job of making the data more accessible for them. When you allow them the ability to analyze PDFs without structure, to share that, to analyze sentiment, to include concepts and entities, or even enrich a core proposition, you're really starting to create innovation. You've raised the awareness of all of these analytics that exist in the world today in the back-end, shown end-users what they can do, and then put their brains to work discovering and inventing.

Gardner: Many of these financial organizations are well-established, many of them for hundreds of years perhaps. All are thinking about digital transformation, the journey, and are looking to become more data-driven and to empower more people to take advantage of that. So, it seems to me you're almost an agent of digital transformation, even in a very technical and sophisticated sector like finance.

Making data accessible

Glass: There are a lot of stereotypes in terms of who the business analysts are and who the people are that come up with ideas and intervention. The true power of democratization is making data more accessible, lowering the technical barrier, and allowing people to explore and innovate. Things always come from where you least expect them.

Gardner: I imagine that Microsoft is pleased with this, because there are some people who are a bit down on Excel. They think that it's manual, that it's by rote, and that it's not the way to go. So, you, in a sense, are helping Excel get a new lease on life.

Glass: I don’t think we're the whole story in that space, but I love Excel. I've used it for years and years at work. I've seen the power of what it can do and what it can deliver, and I have a bit of an understanding of why that is. It’s the live nature of it, the fact that people can look at data in a spreadsheet, see where it’s come from, see where it’s going, they can trust it, and they can believe in it.
Learn More About
Haven OnDemand
Sign Up Now
That’s why what we're trying to do is create these live connections to these upstream data sources. There are manual steps, download, copy/paste, move around the sheet, which is where errors creep in. It’s where the bloat, the slowness, and the unreliability can happen, but by changing that into a live connection to the data source, it becomes instant and it goes back to being trusted, reliable, and actionable.

Harris: There's something in the DNA, as well, of how people interact with data and so we can lay out effectively the algorithm or the process of understanding a calculation or a data flow. That’s why you see a lot of other systems that are more web-based or web-centric and replicate an Excel-type experience.

The user starts to use it and starts to think, "Wow, it’s just like Excel," and it isn’t. They hit a barrier, they hit a wall, and then they hit the "export" button. Then, they put it back [into Excel] and create their own way to work with it. So, there's something in the DNA of Excel and the way people lay things out. I think of [Excel] almost like a programing environment for non-programers. Some people describe it as a functional language very much like Haskell, and the Excel functions they write were effectively then working and navigating through the data.


Gardner: No need to worry that if you build it, will they come; they're already there.

Harris: Absolutely.

Gardner: Tell us a bit about HTI Labs and how your company came about, and where you are on your evolution.

Cutting edge

Harris: HTI labs was founded in 2012. The core backbone of the team actually worked for the same tier 1 investment bank, and we were building risk and trading systems for front-office teams. We were really, I suppose, the cutting edge of all the big data technologies that were being used at the time -- real-time, disputed graphs and cubes, and everything.

As a core team, it was about taking that expertise and bringing it to other industries. Using Monte Carlo farms in risk calculations, the ability to export data at speed and real-time risk. These things were becoming more centric to other organizations, which was an opportunity.

At the moment, we're focusing predominately on energy trading. Our software is being used across a number of other sectors and our largest client has installed Schematiq on 120 desktops, which is great. That’s a great validation of what we're doing. We're also a member of the London Stock Exchange Elite Program, based in London for high-growth companies.

Glass: Darren and I met when we were working for the same company. I started out as a quant doing the modeling, the map behind pricing, but I found that my interest lay more in the engineering. Rather than doing it once, can I do it a million times, can I do these things reliably and scale them?
The algorithms are built, but the key to making them so much more improved is the feedback loop between your domain users, your business users, and how they can enrich and train effectively these algorithms.

Because I started in a front-office environment, it was very spreadsheet-dominated, it was very VBA-dominated. There's good and bad in that. A lot of those lessened, and Darren and I met up. We crossed the divide together from the top-down, big IT systems and the bottom-up end-user best-developed spreadsheets, and so on. We found a middle ground together, which we feel is a quite powerful combination.

Gardner: Back to where this leads. We're seeing more-and-more companies using data services like Haven OnDemand and starting to employ machine learning, artificial intelligence (AI), and bots to augment what the humans do so well. Is there an opportunity for that to play here, or maybe it already is? The question basically is, how does AI come to bear on what you can deliver out to the Excel edge?

Harris: I think what you see is that out of the box, you have a base unit of capability. The algorithms are built, but the key to making them so much more improved is the feedback loop between your domain users, your business users, and how they can enrich and train effectively these algorithms.

So, we see a future where the self-service BI tools that they use to interact with data and explore would almost become the same mechanism where people will see the results from the algorithms and give feedback to send back to the underlying algorithm.

Gardner: And Jonathan, where do you see the use of bots, particularly perhaps with an API model like Haven OnDemand?

The role of bots

Glass: The concept for bots is replicating an insight or a process that somebody might already be doing manually. When people create these data flows and analyses that they maybe run once so it’s quite time-consuming to run. The real exciting possibility is that you make these things run 24×7. So, you start receiving notifications, rather than having to pull from the data source. You start receiving notifications from your own mailbox that you have created. You look at those and you decide whether that's a good insight or a bad insight, and you can then start to train it and refine it.

The training and refining is that loop that potentially goes back to IT, gets back through a development loop, and it’s about closing that loop and tightening that loop. That's the thing that really adds value to those opportunities.

Gardner: Perhaps we should unpack Schematiq a bit to understand how one might go back and do that within the context of your tool. Are there several components of the tool, one of which might lend itself to going back and automating?

Glass: Absolutely. You can imagine the spreadsheet has some inputs and some outputs. One of the components within the Schematiq architecture is the ability to take a spreadsheet, to take the logic and the process that’s embedded in our spreadsheet, and turn it into an executable module of code, which you can host on your server, you can schedule, you can run as often as you like, and you can trigger based on events.
It’s very much all about empowering the end-user to connect, create, govern, share instantly and then allow consumption from anybody on any device.

It’s a way of emitting code from a spreadsheet. You take some of the insight, you take without a business analysis loop and a development loop, and you take the exact thing that the user, the analyst, has programmed. You make it into something that you can run, commoditize, and scale. That’s quite an important way in which we reduce that development loop. We create that cycle that’s tight and rapid.

Gardner: Darren, would you like to explain the other components that make-up Schematiq?

Harris: There are four components of Schematiq architecture. There's the workbench that extends Excel and allows the ability to have large structured data analytics. We have the asset manager, which is really all about governance. So, you can think of it like source control for Excel, but with a lot more around metadata control, transparency, and analytics on what people are using and how they are using it.

There's a server component that allows you just to off-load and scale analytics horizontally, if they do that, and build repeatable or overnight processes. The last part is the portal. This is really about allowing end-users to instantly share their insights with other people. Picking up from Jon’s point about the compound executable, but it’s defined in Schematiq. That can be off-loaded to a server and exposed as another API to a computer, the mobile, or even a function.

So, it’s very much all about empowering the end-user to connect, create, govern, share instantly and then allow consumption from anybody on any device.

Market for data services

Gardner: I imagine, given the sensitive nature of the financial markets and activities, that you have some boundaries that you can’t cross when it comes to examining what’s going on in between the core and the edge.

Tell me about how you, as an organization, can look at what’s going on with the Schematiq and the democratization, and whether that creates another market for data services when you see what the demand entails.

Harris: It’s definitely the case that people have internal datasets they create and that they look after. People are very precious about them because they are hugely valuable, and one of the things that we strive to help people do is to share those things.

Across the trading floor, you might effectively have a dozen or more different IT infrastructures, if you think of what’s existing on the desk as being a miniature infrastructure that’s been created. So, it's about making easy for people to share these things, to create master datasets that they gain value from, and to see that they gain mutual value from that, rather than feeling closed in, and don’t want to share this with their neighbors.

If we work together and if we have the tools that enable us to collaborate effectively, then we can all get more done and we can all add more value.
If we work together and if we have the tools that enable us to collaborate effectively, then we can all get more done and we can all add more value.

Gardner: It's interesting to me that the more we look at the use of data, the more it opens up new markets and innovation capabilities that we hadn’t even considered before. And, as an analyst, I expect to see more of a marketplace of data services. You strike me as an accelerant to that.

Harris: Absolutely. As the analytics are coming online and exposed by API’s, the underlying store that’s used is becoming a bit irrelevant. If you look at what the analytics can do for you, that’s how you consume the insight and you can connect to other sources. You can connect from Twitter, you connect from Facebook, you can connect PDFs, whether it’s NoSQL, structured, columnar, rows, it doesn’t really matter. You don’t see that complexity. The fact that you can just create an API key, access it as consumer, and can start to work with it is really powerful.

There was the recent example in the UK of a report on the Iraq War. It’s 2.2 million words, it took seven years to write, and it’s available online, but there's no way any normal person could consume or analyze that. That’s three times the complete works of Shakespeare.

Using these APIs, you can start to pull out mentions, you can pull out countries, locations and really start to get into the data and provide anybody with Excel at home, in our case, or any other tool, the ability to analyze and get in there and share those insights. We're very used to media where we get just the headline, and that spin comes into play. People turn things on their, head and you really never get to delve into the underlying detail.

What’s really interesting is when democratization and sharing of insights and collaboration comes, we can all be informed. We can all really dig deep, and all these people that work there, the great analysts, could start to collaborate and delve and find things and find new discoveries and share that insight.

Gardner: All right, a little light bulb just went off in my head whereas we would go to a headline and a new story and we might have a hyperlink to a source. I could get a headline and a news story, open up my Excel spreadsheet, get to the actual data source behind the entire story and then probe and plumb and analyze that any which way I wanted to.

Harris: Yes, Exactly. I think the most savvy consumer now, the analyst, is starting to demand that transparency. We've seen in the UK, words, election messages and quotes and even financial stats where people just don’t believe the headlines. They're demanding transparency in that process, and so governance can only be really a good thing.
Learn More About
Haven OnDemand
Sign Up Now
Gardner: I'm afraid we will have to leave it here. We've been exploring how powerful and diverse financial information is delivered to the ubiquitous Excel spreadsheet edge. And we have learned how HTI Labs in London provides the means and governance with their Schematiq tool to bring critical data to the interface that users want most.

So please join me in thanking our guests, Darren Harris, CEO and Co-Founder of HTI Labs, and Jonathan Glass, CTO and Co-Founder of HTI Labs.

And a big thank you to our audience as well for joining us for this Hewlett Packard Enterprise Voice of the Customer digital transformation discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing series of HPE-sponsored interviews. Thanks again for listening, and please do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how HTI Labs in London provides the means and governance with their Schematiq tool to bring critical data to the spreadsheet interface that users want most. Copyright Interarbor Solutions, LLC, 2005-2016. All rights reserved.

You may also be interested in:

Tuesday, August 23, 2016

How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medicine and Entrepreneurship

Transcript of a discussion on how HudsonAlpha leverages modern IT infrastructure and big data analytics to power research projects as well as pioneering genomic medicine findings.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition to the Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT Innovation -- and how it's making an impact on people's lives.

Gardner
Our next IT infrastructure thought leadership case study explores how the HudsonAlpha Institute for Biotechnology engages in digital transformation for genomic research and healthcare paybacks.

We'll now hear how HudsonAlpha leverages modern IT infrastructure and big-data analytics to power a pioneering research project incubator and genomic medicine innovator.

To describe new possibilities for exploiting cutting-edge IT infrastructure and big data analytics for healthcare innovation, we're joined by Dr. Liz Worthey, Director of Software Development and Informatics at the HudsonAlpha Institute for Biotechnology in Huntsville, Alabama. Welcome, Liz.

Dr. Liz Worthey: Thanks for inviting me.

Gardner: It seems to me that genomics research and IT have a lot in common. There's not much daylight between them -- two different types of technology, but highly interdependent. Have I got that right?
Start Your HPE Vertica
Community Edition Trial
Worthey: Absolutely. It used to be that the IT infrastructure was fairly far away from the clinic or the research, but now they're so deeply intertwined that it necessitates many meetings a week between the leadership of both in order to make sure we get it right.

Gardner: And you have background in both. Maybe you can tell us a little bit about that.

Worthey: My background is primarily on the biology side, although I'm Director of Informatics and I've spent about 20 years working in the software-development and informatics side. I'm not IT Director, but I'm pretty IT savvy, because I've had to develop that skill set over the years. My undergraduate degree was in immunology, and since then, my focus has really been on genetics informatics and bioinformatics.

Gardner: Please describe what genetic informatics or genomic informatics is for our audience.

Worthey: Since 2003, when we received the first version of a human reference genome, there's been a large field involved in the task of extracting knowledge that can be used for society and health from genomic data.

Worthey
A [human] genome is 3.2 billion nucleotides in length, and in there, there's a lot of really useful information. There's information about which diseases that individual may be more likely to get and which diseases they will get.

It’s also information about which drugs they should and shouldn't take; information about which types of procedures, surveillance procedures, what colonoscopies they should have. And so, the clinical aspects of genomics are really developing the analytical capabilities to extract that data in real time so that we can use it to help an individual patient.

On top of that, there's also a lot of research. A lot of that is in large-scale studies across hundreds of thousands of individuals to look for signals that are more difficult to extract from a single genome. Genomics, clinical genomics, is all of that together.

Parallel trajectory

Gardner: Where is the societal change potential in terms of what we can do with this information and these technologies?

Worthey: Genomics has existed for maybe 20 years, but the vast majority of that was the first step. Over the last six years, we've taken maybe the second or third step in a journey that’s thousands of steps long.

We're right on the edge. We didn’t used to be able to do this, because we didn't have any data. We didn't have the capability to sequence a genome cheaply enough to sequence lots. We also didn't have the storage capabilities to store that data, even if we could produce it, and we certainly didn't have enough compute to do the analysis, infrastructure-wise. On top of that, we didn’t actually have the analytical know-how or capabilities either. All of that is really coalescing at the same time.

As we are doing genomics, and that technology and the sequencing side has come up, the compute and the computing technologies have come up at the time. They're feeding each other, and genomics is now driving IT to think about things in a very different way.

Gardner: Let's dive into that a little bit. What are the hurdles technologically for getting to where you want to be, and how do you customize that or need to customize that, for your particular requirements?

Worthey: There are a number of hurdles. Certainly, there are simpler hurdles that we have to get past, like storage, storage tied with compression. How do you compress that data to where you can store millions of genomes at a price that's affordable.

A bigger hurdle is the ability to query information at a lot of disparate sites. When we think about genomic medicine, one of the things that we really want do is share data between institutions that are geographically diverse. And the data that we want to share is millions of data points, each of which has hundreds or thousands of annotations or curations.
When we think about genomic medicine, one of the things that we really want do is share data between institutions that are geographically diverse.

Those are fairly complex queries, even when you're doing it in one site, but in order to really change the practice of medicine, we have to be able to do that regionally, nationally, and globally. So, the analytics questions there are large.

We have 3.2 billion data points for each individual. The data is quite broad, but it’s also pretty deep. One of the big problems is that we don’t have all the data that we need to do genomic medicine. There's going to be data mining -- generate the data, form a hypothesis, look at the data, see what you get, come back with a new hypothesis, and so on.

Finally, one of the problems that we have is that a lot of algorithms that you might use only exists in the brains of MDs, other clinical folks, or researchers. There is really a lot of human computer interaction work to be done, so that we can extract that knowledge.

There are lots of problems. Another big problem is that we really want to put this knowledge in the hands of the doctor while they have seven minutes to see the patient. So, it’s also delivery of answers at that point in time, and the ability to query the data by the person who is doing the analysis, which ideally will be an MD.

Cloud technology

Gardner: Interestingly, the emergence of cloud methods and technology over the past five or 10 years would address some of those issues about distributing the data effectively -- and also perhaps getting actionable intelligence to a physician in an actual critical-care environment. How important is cloud to this process and what sort of infrastructure would be optimal for the types of tasks that you have in mind?

Worthey: If you had asked me that question two years ago, on the genomic medicine side, I would have said that cloud isn't really part of the picture. It wasn't part of the picture for anything other than business reasons. There were a lot of questions around privacy and sharing of healthcare information, and hospitals didn’t like the idea.

They're very reluctant to move to the cloud. Over the last two years, that has started to change. Enough of them had to decide to do it, before everybody would view it as something that was permissible.

Cloud is absolutely necessary in many ways, because we have periods where lots of data that has to be computed and analytics has to be run. Then, we have periods where new information is coming off the sequencer. So, it’s that perfect crest and trough.

If you don't have the ability to deal with that sort of fluctuation, if you buy a certain amount of hardware and you only have it available in-house, your pipeline becomes impacted by the crests and then often sits idle for a long time.
Start Your HPE Vertica
Community Edition Trial
But it’s also important to have stuff in-house, because sometimes, you want to do things in a different way. Sometimes, you want to do things in a more secure manner.

It's kind of our poster child for many of the new technologies that are coming out that look at both of those, that allow you to run things in-house and then also allow you to run the same jobs on the same data in the cloud as well. So, it’s key.

Gardner: That brings me to the next question about this concept of genomics as a service or a platform to support genomics as a service. How do you envision that and how might that come about?

Worthey: When we think about the infrastructure to support that, it has to be something flexible and it has to be provided by organizations that are able to move rapidly, because the field is moving really quickly.

It has to be infrastructure that supports this hypothesis-driven research, and it has to be infrastructure that can deal with these huge datasets. Much of the data is ordered, organized, and well-structured, but because it's healthcare, a lot of the information that we use as part of the interpretation phase of genomic medicine is completely unstructured. There needs to be support for extraction of data from silos.

My dream is that the people who provide these technologies will also help us deal with some of these boundaries, the policy boundaries, to sharing data, because that’s what we need to do for this to become routine.

Data and policy

Gardner: We've seen some of that when it comes to other forms of data, perhaps in the financial sector. More and more, we're seeing tokenization, authentication, and encryption, where data can exist for a period of time with a certain policy attached to it, and then something will happen if the data is a result for that policy. Is that what you're referring to?

Worthey: Absolutely. It's really interesting to come to a meeting like HPE Discover because you get to see what everybody else is doing in different fields. Much of the things that people in my field have regarded as very difficult are actually not that hard at all; they happen all the time in other industries.

A lot of this -- the encryption, the encrypted data sharing, the ability to set those access controls in a particular way that only lasts for a certain amount of time for a particular set of users -- seems complex, but it happens all the time in other fields. A big part of this is talking to people who have a lot of experience in a regulated environment. It’s just not this regulated environment and learning the language that they use to talk to the people that set policy there and transferring that to our policy makers and ideally getting them together to talk to one another.

Gardner: Liz, you mentioned the interest layers in getting your requirements to the technology vendors, cloud providers, and network providers. Is that under way? Is that something that's yet to happen? Where is the synergy between the genomic research community and the technology-vendor platform provider community?
This is happening fast. For genomics, there's been a shift in the volume of genomic data that we can produce with some new sequencing technology that's coming.

Worthey: This is happening fast. For genomics, there's been a shift in the volume of genomic data that we can produce with some new sequencing technology that's coming. If you're a provider of hardware or service user solutions to deal with big data, looking at genomics, as the people here are probably going to overtake many of those other industries in terms of the volume and complexity of the data that we have.

The reason that that's really interesting is because then you get invited to come and talk at forums, where there's lots of technology companies and you make them aware of the work that has to be done in the field of medicine, and in genomic research, and then you can start having those discussions.

A lot of the things that those companies are already doing, the use cases, are similar and maybe need some refinement, but a lot of that capability is already there.

Gardner: It's interesting that you’ve become sort of the “New York” of use cases. If you can make it there, you can make it anywhere. In other words, if we can solve this genomic data issue and use the cloud fruitfully to distribute and gather -- and then control and monitor the data as to where it should be under what circumstances -- we can do just about anything.

Correct me if I am wrong, though. We're using data in the genomic sense for population groups. We're winnowing those groups down into particular diseases. How farfetched is it to think about individuals having their own genomic database that would follow them like an authenticated human design? Is that completely out of the bounds? How far would that possibly be?

Technology is there

Worthey: I’ve had my genome sequenced, and it’s accessible. I could pick it up and look at it on the tools that I developed through my phone sitting here on the table. In terms of the ability to do that, a lot of that technology is already here.

The number of people that are being sequenced is increasing rapidly. We're already using genomics to make diagnosis in patients and to understand their drug interactions. So, we are here.

One of the things that we are talking about just now is, at what point in a person’s life should you sequence their genome. I and a number of other people in the field believe that that is earlier, rather than later, before they get sick. Then, we have that information to use when they get those first symptoms. You are not waiting until they're really ill before you do that.

I can’t imagine a future where that's not what's going to happen, and I don’t think that future is too far away. We're going to see it in our lifetimes, and our children are definitely going to see it in theirs.
The data that we already have, clinical information, is really for that one person, but your genome is shared among your family, even distant relatives that you’ve never met.

Gardner: The inhibitors, though, would be more of an ethical nature, not a technological nature.

Worthey: And policy, and society; the society impact of this is huge.

The data that we already have, clinical information, is really for that one person, but your genome is shared among your family, even distant relatives that you’ve never met. So, when we think about this, there are many very hard ethical questions that we have to think about. There are lots of experts that are working on that, but we can’t let that get in the way of progress. We have to do it. We just have to make sure we do it right.

Gardner: To come back down a little bit toward the technology side of things, seeing as so much progress has been made and that there is the tight relationship between information technology and some of the fantastic things that can happen with the proper knowledge around genomic information, can you describe the infrastructure you have in place? What’s working? What do you use for big-data infrastructure, and cloud or hybrid cloud as well?

Worthey: I'm not on the IT side, but I can tell you about the other side and I can talk a little bit on the IT side as well. In terms of the technologies that we use to store all of that varying information, we're currently using Hadoop and Mongo DB. We finished our proof of concept with HPE, looking at their Vertica solution.

We have to work out what the next steps might be for our proof of concept. Certainly, we’re very interested in looking at the solutions that they have in here. They fit with our needs. The issue that’s been addressed on that side is lots of variants, complex queries, that you need to answer really fast.

On the other side, one of the technological hurdles that we have to meet is the unstructured data. We have electronic health record (EHR) information that’s coming in. We want to hook up to those EHRs and we want to use systems to process that data to make it organized, so that we can use it for the interpretation part.

In-house solution

We developed in-house solutions that we're using right now that allow humans to come in and look at that data and select the terms from it. So, you’d select disease terms. And then, we have in-house solutions to map them to the genomic side. We're looking at things like HPE’s IDOL as a proof-of-concept (POC) on that side. We're talking to some EHR companies about how to hook up the EHR to those solutions to our software to make it a seamless product and that would give us all that.

In terms of hardware, we do have HPE hardware in-house. I think we have 12 petabytes of their storage. We also have data direct network hardware, a general parallel file system solution. We even have things down to graphics processors for some of the analysis that we do. We’ve a large deck of such GPUs because in some cases it’s much faster for some other types of problems that we have to solve. So we are pretty IT-rich, a lot of heavy investment on the IT side.

Gardner: And cloud -- any preference to the topology that works for you architecturally for cloud, or is that still something you are toying with?
We not only do the research and the clinical, but we also have a lab that produces lots of data for other customers, a lab that produces genomic data as a service.

Worthey: We're currently looking at three different solutions that are all cloud solutions. We not only do the research and the clinical, but we also have a lab that produces lots of data for other customers, a lab that produces genomic data as a service.

They have a challenge of getting that amount of data returned to customers in a timely fashion. So, there are solutions that we're looking at there. There are also, as we talked at the start, solutions to help us with that in-flow of the data coming off the sequencers and the compute -- and so we're looking at a number of different solutions that are cloud-based to solve some of those challenges.

Gardner: Before we close, we’ve talked about healthcare and population impacts, but I should think there's also a commercial aspect to this. That kind of information will lend itself to entrepreneurial activities, products and services, a great demand in the marketplace? Is that something you're involved with as well, and wouldn’t that help foot the bill for some of these many costly IT infrastructure investments?

Worthey: One of the ways that HudsonAlpha Institute was set up was just that model. We have a research, not-for-profit side, but we also have a number of affiliate companies that are for-profit, where intellectual property and ideas can go across to that site and be used to generate revenue that fund the research and keep us moving and be on the cutting-edge.

We do have a services lab that does genomic sequencing in analytics. You can order that from them. We also service a lot of people who have government contracts for this type of work. And then, we have an entity called Envision Genomics. For disclosure, I'm one of founders of that entity. It’s focused on empowering people to do genomic medicine and working with lots of different solution providers to get genomic medicine being done everywhere it’s applicable.

Gardner: Well, it's been a fascinating discussion. Thank you for sharing that, and I look forward to tracking the close relationship between IT and genomics, because as you say, they are self-supporting, reinforcing, and both very powerful in their impacts on society.
Start Your HPE Vertica
Community Edition Trial
We've been learning how the HudsonAlpha Institute for Biotechnology engages in digital transformation for genomic research and for healthcare paybacks. We’ve heard how HudsonAlpha leverages modern IT and cloud infrastructures and big data analytics to power research projects as well as pioneering medicine uses and companies.

So please join me in thanking our guest, Dr. Liz Worthey, Director of Software Development and Informatics at the HudsonAlpha Institute for Biotechnology in Huntsville, Alabama. Thank you so much, Liz.

Worthey: Thank you very much. Thanks for inviting me.

Gardner: And I will also thank our audience as well for joining us for this Hewlett Packard Enterprise Voice of the Customer Podcast. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how HudsonAlpha leverages modern IT infrastructure and big data analytics to power research projects as well as pioneering genomic medicine findings. Copyright Interarbor Solutions, LLC, 2005-2016. All rights reserved.

You may also be interested in: