Showing posts with label software. Show all posts
Showing posts with label software. Show all posts

Tuesday, June 15, 2010

Delta Air Lines Improves Customer Self-Service Apps Quickly Using Quality Assurance Tools

Transcript of a BriefingsDirect podcast with Delta Air Lines development leaders on gaining visibility into application testing to improve customer self-service experience.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to a special BriefingsDirect podcast series, coming to you from the HP Software Universe 2010 Conference in Washington, D.C. We're here the week of June 14, 2010, to explore some major enterprise software and solutions trends and innovations making news across HP’s ecosystem of customers, partners, and developers.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions and I'll be your host throughout this series of HP sponsored Software Universe Live discussions.

Our customer case study today focuses on Delta Air Lines and the use of HP quality assurance products for requirements management as well as mapping the test cases and moving into full production. We are here with David Moses, Manager of Quality Assurance for Delta.com and its self service efforts. Thanks for joining us, David.

David Moses: Thank you, very much. Glad to be here.

Gardner: We're also here with John Bell, a Senior Test Engineer at Delta. Welcome John.

John Bell: Thank you.

Gardner: Tell me about the market drivers. What is the problem set when it comes to managing the development process around requirements and then quality and test out through your production? What are the problems that you're generally facing these days?

Moses: Generally, the airline industry, along with the lot of other industries I'm sure, is highly competitive. We have a very, very quick, fast-to-market type environment, where we've got to get products out to our customers. We have a lot of innovation that's being worked on in the industry and a lot of competing channels outside the airline industry that would also like to get at the same customer set. So, it's very important to be able to deliver the best products you can as quickly as possible. "Speed Wins" is our motto.

Gardner: What is it about the use of some of the quality assurance products that helps you pull off that dual trick of speed, but also reliability and high quality?

Moses: The one thing I really like about the HP Quality Center suite especially is that your entire software development cycle can live within that tool. Whenever you're using different tools to do different things, it becomes a little bit more difficult to get the data from one point to another. It becomes a little bit more difficult to pull reports and figure out where you can improve.

Data in one place

What you really want to do is get all your data in one place and Quality Center allows you to do that. We put our requirements in in the beginning. By having those in the system, we can then map to those with our test cases, after we build those in the testing phase.

Not only do we have the QA engineers working on it in Quality Center, we also have the business analysts working on it, whenever they're doing the requirements. That also helps the two groups work together a bit more closely.

Gardner: Do you have anything to add to that, John?

Bell: The one thing that's been very helpful is the way that the Quality Center tabs are set up. It allows us to follow a specific process, looking at the release level all the way down to the actual cycles, and that allows us to manage it.

It's very nice that Quality Center has it all tied into one unit. So, as we go through our processes, we're able to go from tab to tab and we know that all of that information is interconnected. We can ultimately trace a defect back to a specific cycle or a specific test case, all the way back to our requirement. So, the tool is very helpful in keeping all of the information in one area, while still maintaining the consistent process.

Gardner: Can you give us a sense of how much activity you process or how many applications there are -- the size of the workload you’ve got these days?

Bell: There is a lot. I look back to metrics we pulled for 2008. We were doing fewer than 70 projects. By 2009, after we had fully integrated Quality Center, we did over 129 projects. That also included a lot of extra work, which you may have heard about us doing related to a merger.

Gardner: With that increase in the number of applications that you're managing and dealing with, did you have any metrics in terms of the quality that you were able to manage, even though that volume increased so dramatically?

Moses: We were able to do that. That's one of the nice things. You can use your dashboard in Quality Center to pull those metrics up and see those reports. You can point out the projects that were your most troublesome children and look at the projects where you did really well.

Best-case scenario

You can go back and do a best-case scenario, and see what you did great and what you could improve. Having that view into it really helps. It’s also beneficial, whenever you have another project similar to one that was such an issue. You can have a heads up to say, "Okay, we need to treat this one differently this time."

Gardner: It’s the visibility to have repeatability when things go well, and, I suppose, visibility to avoid repeatability when things didn't go well.

Moses: Exactly.

Gardner: Let’s take a look at some of the innovation you've done. Tell me a bit about what you've worked with in terms of Quality Center in some of your own integration or tweaking?

Bell: One thing that we've been able to do with Quality Center is connect it with Quick Test Pro, and we do have Quality Center 10, as well as Quick Test Pro 10. We've been able to build our automation and store those in the Test Plan tab of Quality Center.

This has really been beneficial for us, when we go into our test labs and build our test set. We're able to take all of these automated pieces and combine them into test set. What this has allowed us to do is run all of our automation as one test set. We've been able to run those on a remote box. It's taken our regression test time from one person for five days, down to zero people and approximately an hour and 45 minutes.

Also, with the Test Lab tab, we're able to schedule these test sets to run during off hours. A lot of times our automation for things such as regression or sanity, can run on off hours. We schedule those to run at perhaps 6 o'clock in the morning. Then, when we come in at 8 o'clock in the morning, all of those tests would have already run.

That frees up our testers to be doing more of the manual functional testing and that allows us to know that we have complete coverage with the automation, as well as our sanity pieces. So, that's a unique way that we've used Quality Center to help manage that and to reduce our testing times by over 50 percent.

Gardner: Thank you, John. David, there have been some ways in which your larger goals as a business have been either improved upon or perhaps better aligned with the whole development process. I guess I'm looking for whether there is some payback here in terms of your larger business goals?

Moses: It definitely is. It goes back to speed to market with new functionality and making the customer's experience better. In all of our self-service products, it's very important that we test from the customers’ point of view.

We deliver those products that make it easier for them to use our services. That's one of the things that always sticks in my mind, when I'm at an airport, and I'm watching people use the kiosk. That's one of the things we do. We bring our people out to the airports and we watch our customers use our products, so we get that inside view of what's going on with them.

A lot on the line

I'll see people hesitantly reaching out to hit a button. Their hand may be shaking. It could be an elderly person. It could be a person with a lot on the line. Say it’s somebody taking their family on vacation. It's the only vacation they can afford to go on, and they’ve got a lot of investment into that flight to get there and also to get back home. Really there's a lot on the line for them.

A lot of people don’t know a lot about the airline industry and they don’t realize that it's okay if they hit the wrong button. It's really easy to start over. But, sometimes they would be literally shaking, when they reach out to hit the button. We want to make sure that they have a good comfort level. We want to make sure they have the best experience they could possibly have. And, the faster we can deliver products to them, that make that experience real for them, the better.

Gardner: I should think the whole notion of self service is usually important. It's important for the customer to be able to move through and do things their way, and I suppose there are some great cost savings and efficiencies on your end as well.

Dave, you could just highlight a little bit about how the whole notion of self service embedded into applications. It's important how some of the quality assurance tools and processes have helped there.

Moses: I go back to anytime you have to give up whenever you're having an issue with products, while you're online. You're on a website, and you have to call customer service. I think most people just sort of feel defeated at that point. People like to handle things themselves. You need a channel there for the customer to go to, if they need additional help.

So many clients and customers these days are so tech savvy. They know the industry they are in, and they know the tools they're working with, especially frequent flyers. I'd venture to say that most frequent flyers can hit the airport, check-in, get through security, and get to their plane really quickly. They just know their airports and they know everything they need to know about their flight, because this is where they live part of their lives.

You don't want to make them wait in line. You don't want to make them wait on a phone tree, when they make a phone call. You want them to be able to walk into the airport, hit a couple of buttons, get through security, and get to their gate.

By offering these types of products to the customers, you give them the best of both worlds. You give them a fast pass to check in. You give them a fast pass book. But, you can also give the less-experienced customer an easy-to-understand path to do what they need as well.

Gardner: And, to get those business benefits, those customer loyalty benefits, is really a function of good software development overall, isn't it?

Moses: Exactly. You have to give the customer the right tools that they want to get the job done for them.

Gardner: For other enterprises that are perhaps are going to be working towards a higher degree of quality in their software, but probably also interested in reducing the time to develop and time to value, do you have any suggestions, now that you’ve gone through this, that you might offer to them?

Interim approach

Bell: In using Quality Center, we've used an interim approach. Initially, we just used the Defects tab of Quality Center. Then, we slowly began to add the Requirements piece, and then Test Cases, and ultimately the Releases and Cycles.

One thing that we've found to be very beneficial with Quality Center is that it shows the development organization that this just isn't a QA tool that a QA team uses. What we've been able to do by bringing the requirements piece into it and by bringing the defects and other parts of it together, is bring the whole team on board to using a common tool.

In the past, a lot of people have always thought of Quality Centers as just a little tool that the QA people use in the corner and nobody else needs to be aware of. Now, we have our business analysts, project managers, and developers, as well as the QA team and even managers, because each person can get a different view of different information.

From Dashboard, your managers can look at your trends and what type of overall development lifecycle is coming through. Your project managers can be very involved in pulling the number of defects and see which ones are still outstanding and what the criticality of that is. The developers can be involved via entering information in on defects when those issues have been resolved?

We've found that Quality Center is actually a tool that has drawn together all of the teams. They're all using a common interface, and they all start to recognize the importance of tying all of this together, so that everyone can get a view as to what's going on throughout the whole lifecycle.

Moses: John hits on a really good point there. You have to realize the importance of it, and we did a long time ago. We've realized the importance of automating and we've realized the importance of having multiple groups using the same tool.

In all honesty, we were just miserable in our own history of trying to get those to work. You really take certain shots at it. For the past eight years, if we can go back that far, we've been using Quality Center tools for Test Director, just trying to get things automated, using the tools we had at the time.

The one thing that we never actually did was dedicate the resources. It's not just a tool. There are people there too. There are processes. There are concepts you're going to have to get in your head to get this to work, but you have to be willing to buy-in by having the people resources dedicated to building the test scripts. Then, you're not done. You've got to maintain them. That's where most people fall short and that's where we fell short for quite some time.

Once we were able to finally dedicate the people to the maintenance of these scripts to keep them active and running, that's where we got a win. If you look at a web site these days, it's following one of two models. You either have a release schedule, that’s a more static site, or you have a highly dynamic site that's always changing and always throwing out improvements.

We fit into that "Speed Wins," when we get the product out for the customers’ trading, and improve the experience as often as possible. So, we’re a highly dynamic site. We'll break up to 20 percent of all of our test scripts, all of our automated test scripts, every week. That's a lot of maintenance, even though we're using a lot of reusable code. You have to have those resources dedicated to keep that going.

Gardner: Well, I appreciate your time. We've been talking about the quality assurance process and the use of some HP tools. We've been learning about experiences from Delta Air Lines development executives. I want to thank our guests today, David Moses, Manager of Quality Assurance for Delta.com in the self-service function there. Thank you, David.

Moses: Thank you, very much.

Gardner: We've also been joined by John Bell, Senior Test Engineer there at Delta Air Lines. Thanks to you too, John.

Bell: It's been a pleasure.

Gardner: And, thanks to our audience for joining us for this special BriefingsDirect podcast coming to you from the HP Software Universe 2010 conference in Washington, DC.

Look for other podcasts from this HP event on the hp.com website, as well as via the BriefingsDirect Network.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this series of Software Universe Live Discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast with Delta Air Lines development leaders on gaining visibility into application testing to improve customer self-service experience. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in:

McKesson Shows Bringing Testing Tools on the Road Improves Speed to Market and Customer Satisfaction

Transcript of a BriefingsDirect podcast from the HP Software Universe 2010 Conference in Washington, DC on field-testing software installations using HP Performance Center products.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to a special BriefingsDirect podcast series, coming to you from the HP Software Universe 2010 Conference in Washington, D.C. We're here the week of June 14, 2010, to explore some major enterprise software and solutions trends and innovations making news across HP’s ecosystem of customers, partners, and developers.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, and I'll be your host throughout this series of HP sponsored Software Universe Live discussions.

Our customer case-study today focuses on McKesson Corp., a provider of certified healthcare information technology, including electronic health records, medical billing, and claims management software. McKesson is a user of HP’s project-based performance testing products used to make sure that applications perform in the field as intended throughout their lifecycle.

To learn more about McKesson’s innovative use of quality assurance software, please join me in welcoming Todd Eaton, Director of Application Lifecycle Management Tools in the CTO’s office at McKesson. Welcome to the show, Todd.

Todd Eaton: Thank you.

Gardner: Todd, tell me a little bit about what's going on in the market that is making the performance-based testing, particularly onsite, such an important issue for you.

Eaton: Well, looking at McKesson’s businesses, one of the things that we do is provide software for sale for various healthcare providers. With the current federal government regulations that are coming out and some of these newer initiatives that are planned by the federal government, these providers are looking for tools to help them do better healthcare throughout their enterprises.

With that in mind, they're looking to add functionality, they're looking to add systems, and they look to McKesson, as the leader in healthcare, to provide those solutions for them. With that in mind, our group works with the various R&D organizations within McKesson, to help them develop software for the needs of those customers.

Gardner: And what is it about performance-based testing that is so important now. We've certainly had lots of opportunity to trial things in labs and create testbeds. What is it about the real-world delivery that's important?

Eaton: It's one thing that we can test within McKesson. It's another thing when you test out at the customer site, and that's a main driver of this new innovation that we’re partnering up with HP.

When we build an application and sell that to our customers, they can take that application, bring it into their own ecosystem, into their own data center and install it onto their own hardware.

Controlled testing

The testing that we do in our labs is a little more controlled. We have access to HP and other vendors with their state-of-the-art equipment. We come up with our own set of standards, but when they go out to the site and get put in to those hospitals, we want to ensure that our applications act at the same speed and same performance at their site that we experience in our controlled environment. So, being able to test on their equipment is very important for us.

Gardner: And it's I suppose difficult for you to anticipate exactly what you're going to encounter, until you're actually in that data center?

Eaton: Exactly. Just knowing how many different healthcare providers there are out there, you could imagine all the different hardware platforms, different infrastructures, and the needs or infrastructure items that they may have in their data centers.

Gardner: This isn’t just a function of getting set up, but there's a whole life-cycle of updates, patches, improvements, and increased functionality across the application set. Is this something that you can do over a period of time?

Eaton: Yes, and another very important thing is using their data. The hospitals themselves will have copies of their production data sets that they keep control of. There are strict regulations. That kind of data cannot leave their premises. Being able to test using the large amount of data or the large volume of data that they will have onsite is very crucial to testing our applications.

Gardner: Todd, tell me the story behind gaining this capability of that performance-based testing onsite -- how did you approach it, how long has it been in the making, and maybe a little bit about what you’re encountering?

Eaton: When we started out, we had some discussion with some of the R&D groups internally about our performance testing. My group actually provides a performance-testing service. We go out to the various groups, and we’re doing the testing.

We always look to find out what we can do better. We’re always doing lesson learns and things like that and talking with these various groups. We found that, even though we did a very good job of doing performance testings internally, we were still finding defects and performance issues out at the site, when we brought that software out and installed it in the customer’s data center.

After further investigation, it became apparent to us that we weren’t able to replicate all those different environments in our data center. It’s just too big of a task.

The next logical thing to do was to take the testing capabilities that we had and bring it all out on the road. We have these different services teams that go out to install software. We could go along with them and bring the powerful tools that we use with HP into those data centers and do the exact same testing that we did, and make sure that our applications were running as expected on their environments.

Gardner: Getting it right the first time is always one of the most important things for any business activity. Any kind of failure along the way is always going to cost more and perhaps even jeopardize the relationship with the customer.

Speed to market

Eaton: Yeah, it jeopardizes the relationship with the customer, but one of the things that we also drive is speed to market. We want to make sure that our solutions get out there as fast as possible, so that we can help those providers and those healthcare entities in giving the best patient care that they can.

Gardner: What was the biggest hurdle in being able to, as you say, bring the testing capability out to the field. What were some of the hang-ups in order to accomplish that?

Eaton: Well, the tool that we use primarily within McKesson is Performance Center, and Performance Center is an enterprise-based application. It’s usually kept where we have multiple controllers, and we have multiple groups using those, but it resides within our network.

So, the biggest hurdle was how to take that powerful tool and bring it out to these sites? So, we went back to our HP rep, and said, "Here’s our challenge. This is what we’ve got. We don’t really see anything where you have an offering in that space. What can you do for us?"

Gardner: How far and wide have you been able to accomplish this? Are you doing it in terms of numbers of facilities, in what kind of organizations?

Eaton: Right now we have it across the board in multiple applications. McKesson develops numerous applications in the healthcare space, and we’ve used those across the board. Currently, we have two engagements going on simultaneously with two different hospitals, testing two different groups of applications, and even the application themselves.

I’ve got one site that’s using it for 26 different applications and other that’s using it for five. We’ve got two teams going out there, one from my group and one from one of the internal R&D groups that are assisting the customer and testing the applications on their equipment.

Gardner: From these experiences so far, are there metrics of success, paybacks, not only for you and McKesson, but also for the providers that you service?

Eaton: The first couple of times we did this, we found that we were able to reduce the performance defects dramatically. We’re talking something like 40-50 percent right off the bat. Some of the timing that we had experienced internally seemed to be fine, well within SLAs. But as soon as I got out to a site and onto different hardware configurations, it took some application tuning to get it down. We were finding 90 percent increases with our help of continual testing and performance tweaks.

Items like that are just so powerful, when you are bringing that out to the various customer, and can say, "If you engage us, and we can do this testing for you, we can make sure that those applications will run in the way that you want them to."

Gardner: How about for your development efficiency? Are you learning some lessons on the road that you wouldn’t have had before that you can now bring into the next rep. Is there a feedback loop of sorts?

Powerful feedback

Eaton: Yes. It’s a pretty powerful one back to our R&D groups, because getting back to that data scenario, the volume and types of data that the customers have can be unexpected. The way customers use systems, while it works perfectly fine, is not one of the use cases that is normally found in some applications, and you get different results.

So, finding them out in the field and then being able to bring those back to our R&D groups and say, "This is what we’re seeing out in the field and this is how people are using it," gives them a better insight and makes them able to modify their code to fit those use cases better.

Gardner: Todd, is there any advice that you would give to those considering doing this, that is to say, taking their performance testing out on the road, closer to the actual site where these applications are going to reside?

Eaton: The main one is to work with your HP rep on what they have available for this. We took a product that everybody is familiar with, LoadRunner, and tweaked it so it became portable. The HP reps know a lot more about how they packaged that up and what’s best for different customers based on their needs. Working with a rep would be a big help in trying to roll this out to various groups.

Gardner: Okay, great. We’ve been learning about how McKesson is bringing performance-based testing products out to their customers’ locations and gaining a feedback capability as well as reducing time to market and making the quality of those applications near 100 percent right from the start.

I want to thank our guest. We’ve been joined by Todd Eaton, Director of Application Lifecycle Management Tools in the CTO’s office at McKesson. Thank you so much Todd.

Eaton: You’re welcome. Nice talking to you.

Gardner: And, thanks to our audience for joining us for this special BriefingsDirect podcast, coming to you from the HP Software Universe 2010 Conference in Washington, DC.

Look for other podcasts from this HP event on the hp.com website under HP Software Universe Live podcast, as well as through the BriefingsDirect Network.

I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this series of HP-sponsored Software Universe Live Discussions. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: HP.


Transcript of a BriefingsDirect podcast from the HP Software Universe 2010 Conference in Washington, DC on field-testing software installations using HP Performance Center products. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in:

Tuesday, May 04, 2010

Confluence of Global Trends Ups Ante for Improved IT Governance to Prevent Costly Business 'Glitches'

Transcript of a sponsored BriefingDirect podcast on the growing danger from faulty software and how to overcome it.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: WebLayers.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on the nature of, and some possible solutions for, a growing parade of enterprise-scale glitches. The headlines these days are full of big, embarrassing corporate and government "gotchas."

These complex snafus cost a ton of money, severely damage a company’s reputation, and most importantly, can hurt or even kill people.

From global auto recalls to bank failures, and the cyber crime that can uproot the private information from millions of users, the scale and damage that technology-accelerated glitches can inflict on businesses and individuals has probably never been higher. So what is at the root?

Is it a technology run amok problem, or a complexity spinning out of control issue -- and why is it seemingly worse now?

A new book is coming out this summer that explores the relationship between glitches and technology, specifically the role of software use and development in the era of cloud computing.

It turns out the role and impact of governance over people, process, and technology comes up again and again in the new book.

We have with us here today the author of the book as well as a software expert from IBM to delve into the causes and effects of glitches and how governance relates to the problem and fixes.

Please join me in welcoming our guests, Jeff Papows, President and CEO of WebLayers, and the author of Glitch: The Hidden Impact of Faulty Software. Welcome to the show, Jeff.

Jeff Papows: Thanks, Dana. Thanks for having us on.

Gardner: We're also here with Kerrie Holley, IBM fellow and Chief Technology Officer for IBM’s SOA Center of Excellence. Welcome to the show, Kerrie.

Kerrie Holley: Thank you, very much.

Gardner: Jeff, let me start with you. Now, the general trends around these complex issues are affecting business and probably affecting just about everyone’s lives. How do these seem to be something that’s different? Is there an inflection point? Is there something different now that 20 years ago in terms of the intersection of business with technology?

Papows: There is. I’ve done a lot of research in the past 10 months and what we're actually seeing is the confluence of three primary factors that are creating an information technology perfect storm of sorts. Some of these are obvious, but it’s the convergence of the three that’s creating problems on the scale that you are describing here.

The first is a loss of intellectual capital. For the first time in our careers -- the three of us have all been at this for a long time now -- we saw, between 2000 and 2007, the first drop in computer science graduates. That's the other side of the dot-com implosion.

Mainframe adoption patterns

While it’s not always popular or glamorous to talk about, 70 percent of the world’s critical infrastructure still runs on IBM mainframes. Yet, the focus of most of our new computer science graduates and early life professionals is on Java, XML, and the open and more modern development languages.

For the first time in our lifetimes and careers, the preponderance of that COBOL-based analytical community is retiring and/or -- God forbid -- aging and dying. That’s created a significant problem, concurrent with a time where the merger and consolidation activity -- the other side of the recession of 2008 -- have created this massive complexity in these giant mash-ups and critical back-office systems. For example, the mergers between Bank of America and Countrywide, and on and on.

The third factor is just the sheer ubiquity of the technological complexity curve. It’s the magnitude of technology that’s now part of our social fabric, whether it’s literally one million transistors that now exist for every human being on the planet or the six billion network devices that exist in the world today, all of which are accessing the same critical, in many cases, back-office structures.

It's reached the point, Dana, from a consumer standpoint, where 60 percent of the value of our automobiles now consists of networked electronic components -- not the drive trains, engines, and the other things. Look at the recent glitches you have seen at places like Toyota.

You take those three meta-level factors and put them together and we're making the morning broadcast news cycles now on a daily basis with, as you said, more and more of these embarrassing things coming to light. They're not just inconvenient, but there are monumental economic consequences -- and we're killing people.

Gardner: Kerrie Holley, we've looked at some of these issues -- society issues, organizational issues, and the technology behind them -- but technology has also been part of the solution or the ability to scale and manage and automate. I think service oriented architecture (SOA) has a major impact on that.

So, are we at a point where the ability of technology to keep up with the rate of growth is out of whack? What do you sense is behind some of this and why hasn't the technology been there to fix it along the way?

Holley: Jeff brought up some excellent points, which are spot-on. The other thing that we see is that we've had this growth of distributed computing. The easy stuff we've actually accomplished already.

If we look at a lot of what businesses are trying to accomplish today, whether it’s a new business model, differentiation, or whatever they're trying to do compete, what we are finding is that the complexity of that solution is pretty significant.

It's something that we obviously can do. If we look at a lot of technologies that are out in the market place, unfortunately, in many cases they are siloed. They repair or they help with a part of the problem, but perhaps they're not holistic in dealing with the whole life-cycle that is necessary to create some of this value.

Secondly -- this is a point-in-time statement -- we're seeing rapid improvements in the technology to solve this. With Jeff's company and other organizations, we are seeing that today. It hasn’t caught up, but I think it will. In summary, Jeff brought up several points in terms of the fact that we have ubiquitous devices and a tremendous amount of computing power. We have programming available to the masses. We have eight-year-olds, grandmothers, and everyone in between, writing software.

Connecting devices

We have a tremendous need to connect mobile devices and front-ends. We have 3D Internet. We just have an explosion of technologies that we have to integrate. Along with that comes some of the challenges in terms of how we make this agile, and how we make it such that it doesn't break. How do we make sure that we actually get the value propositions that we see? Clearly, SOA is a part of the solution, but it's certainly not the end-all in terms of how we repair and how we get better.

Gardner: One of the things that intrigues me about SOA is the emphasis on governance. To get the best out of a distributed services-orientation, you need to think at the very beginning and throughout the process about how to manage, automate, and reuse, as well as the feedback loops into the process -- all on an ongoing basis.

It strikes me that if that works for SOA, it probably also works for management and organizations, and it works for the relationship between workers and customers. Let me take this back to you, Jeff. Is governance also in catch-up mode? Do we have a sense of how to govern the technology, but not necessarily the process? Is that what's behind some of it?

Papows: You're right, Dana. There's a cultural maturation process here. Let's look at a couple of the broad economic planks that have affected how we got here, because I've been in the software industry for 30 years now. Remember that the average computer scientist, at least in North America, on average, makes 32 percent more than the mean average in the U.S. economy. And, software, computer services and infrastructure has accounted for about 37 percent of the growth in the gross domestic product in the United States and Asia in the last decade.

So the economic impact and success of our industry almost can’t be overstated. Because of that, we've grown up for decades now where we just threw more and more bodies at the problem, as
the technological curve grew.

All that means is automating those best practices and turning them inward, so that we’re governing ourselves as an industry the way that we would automate or govern many things.



There was always this never-ending economic rosy horizon, where you would just add more IT professionals and you would acquire and you’d merge systems, but rarely would you render
portions of those workforces redundant.

In 2008, the economic malaise that we’re managing our way through changed all of that. Now, the only way out of this complexity curve that we’ve created, to use Kerrie's terms, is turning the innovation that has been the hallmark of our industry back on ourselves.

That means automating and codifying all of the best practices and human capital that’s been in-place and learning for decades in the form of active policy management and inference engines in what we typically think of as SOA and design-time governance.

Really, all that means is automating those best practices and turning them inward, so that we’re governing ourselves as an industry in the same way that we would automate or govern many things. But now it’s no longer a "nice to have." I would argue that it’s critical, because the complexity curve and the economics have crossed and there is no way to put this genie back in the bottle. There is no way to go backward.

Gardner: Kerrie, any thoughts about what’s perhaps now a critical role for governance, perhaps governance up and down the technology spectrum, design time, runtime, but also governance in terms of how the people and processes come together?

Holley: Absolutely. One of the nice things that the attention to SOA has brought to our marketplace is the recognition that we do need to focus on governance. I don’t know of a single client who’s got an SOA implementation who has not, as a minimum, thought about governance. They may not be doing everything they want to do or should be doing, but governance is clearly on the attention span of everyone in terms of recognizing that it needs to be done.

So, when we look at governance and when we look at it around SOA, IT governance is something that we’ve had for a long time. SOA governance is a subset, you could say. It complements, but at the same time, it focuses our attention on, what some of the deltas have brought to the marketplace that require improved governance.

Services lifecycles

That governance is not only around the technology. It’s not only around the life-cycle of services. It’s not only around the use of addressing processes and addressing application development. Governance also focuses on the convergence that’s required between business and IT.

The synergistic relationship that we seek will be promoted through the use of governance. Change management specifically brings about a pretty significant focus, meaning that there will be a focus on the part of the business and the IT organizations and teams to bring about the results that are sought.

Examples of problems

Gardner: Jeff, in your book you identify some examples. Are there any that really stand out I that we can trace back to some root cause in the software lifecycle?

Papows: There are, and it’s unfortunate. The ones that make the greatest memory points and often the national headlines, characteristically are the ones that affect the consumer broadly as opposed to the corporate ones.

Obviously, Toyota is in the headlines everyday now. Actually, there was another news cycle recently about Toyota’s Lexus vehicles. The new models apparently have a glitch in the software that controls the balance system.

The ones that make the greatest memory points and often the national headlines, characteristically are the ones that affect the consumer broadly as opposed to the corporate ones.



One of the most heartbreaking things in the research for the book was on software that controls the radiation devices in our hospitals for cancer treatment. I ran across a bunch of research where, because of some software glitches and policy problems in terms of the way those updates were distributed, people with fairly nominal cancers received massive overdoses in radiation.

The medical professionals running these machines -- like much of our culture, because something is computerized -- just assume that it’s infallible. Because of the problems in governance or lack of governance policy, people were being over-radiated. Instead of targeting small tumors in a very targeted way, people’s entire upper torsos, and unfortunately, in one case, head and neck were targeted.

There are lots of examples like that in the book that may not be as ubiquitous as Toyota, but there are many cases of widespread health, power, energy, and security risks as a consequence of the lack of policy management or governance that Kerrie was speaking to just a few minutes ago.

Gardner: Well, these examples certainly are very poignant and clearly something to avoid. I wonder if these are also perhaps just the tip of the iceberg. In addition to things that are problematic at a critical level, is there also a productivity hit? Are large aspects of work in process not nearly as optimal as they could be or are plagued by mistakes that drag down the process?

I want to take this over to Kerrie. IBM has its Smarter Planet approach. I think they're talking about the issue that we're just not nearly as efficient as we could be. What makes the headlines are these terrible issues, but what we're really talking about is a tremendous amount of waste. Aren’t we?

Things we could do better

Holley: We are. That’s exactly what inefficiency is. It speaks to a lot of waste and a lot of things we could do better. A lot of what we’ve been talking about from a Smarter Planet standpoint is actually the exact issues that Jeff has talked about, which is that the world is getting more instrumented. There are more sensors. There is a convergence of a lot of different technology, SOA, business process management, mobile computing, and cloud computing.

Clearly, on one end of the spectrum, it’s increasing the complexity. On the other end of the spectrum, it’s adding tremendous value to businesses, but it mandates this attention to governance.

Gardner: Jeff, in your book do you offer up some advice or solutions about what companies ought to be doing in this governance arena to deal with these glitches?

Papows: We do. We talk about what I call the IT Governance Manifesto, for lack of another catchy phrase. I make the argument that it’s almost reached the point now where we need to lobby for legislation that requires more stringent reporting of software glitches in cases where there is human health and life at stake. Or, alternately, that we impose fines upon individuals or organizations responsible for cover-ups that put people at risk. Or, we simply require a level of IT governance at organizations that produce products that directly affect productivity and quality of life issues.

Kerrie said this really well, Dana. Remember that about 70 percent of our computer scientists in a given year are basically contending with maintaining the existing application inventories that run all of our financial transactions in core sub-systems and topologies. So, 70 percent of our human capital is there to basically keep the stuff that’s in place running.

So, 70 percent of our human capital is there to basically keep the stuff that’s in place running.



Concurrently, we have this smarter planet, where we’ve got billions of RFID tags in motion and 64-bit microprocessors have reached a price point where they are making the way into our dishwashers. We’ve got this plethora of hand-held devices and applications that’s exploding.

All of that is against the backdrop of this more difficult economy, where we can’t just hire more people without automation. We haven't a prayer keeping our noses about water here.

So, God forbid that we ask the federal government, which moves at a dinosaur’s pace relative to Internet speed, to intercede and insist on some of the stuff. But, if we don’t police our own industry, if we don’t get more serious about this governance, whether it’s IBM or WebLayers or some other technological help, we run the risk of seeing the headlines we’re seeing today become completely ubiquitous.

Gardner: Kerrie, I understand that you’re also penning a book, and it’s focused on SOA. First, could you tell us about it, but then are there any aspects of it that address this issue of governance, maybe from a self-help perspective and of not waiting for some legislation or external direction on it.

Holley: The book that’s going to be out later this year is 100 SOA Questions: Asked and Answered. What my co-author [Ali Arsanjani] and I are trying to accomplish in the book, which distinguishes us from other SOA books in the marketplace, is based on thousands of questions that we’ve experienced over the decade in hundreds of projects where we’ve had first-hand roles in as consultants, architects, and developers. We provide the audience with a hands-on, prescriptive understanding of some of the more difficult questions, and not just have platitudes as answers, but really give the reader an answer they can act on.

We’ve organized the content in a way that you can go by domain. If you’re a business stakeholder, you can go to particular areas. That gets back to your question, because business clearly has a big role to play here. The convergence or the relationship between business and IT has a big role to play.

You can go directly into those sections. We do talk about governance. The book is not about governance, but a good percentage of the questions are on governance. What we try to do is help organizations, clients, practitioners, and executives understand what works what doesn’t work.

Always a choice

One of the examples, a small example, is that we always have a choice when we do a project. We can do it in multitude of ways, but we have a lot of evidence that when governance is not applied, when it’s not automated, when it’s not thought about upfront, the expense on the back-end side is enormous. That expense could be the cost of not having the agility that you foresaw.

The expense could be not having the cost reduction that you foresaw. The expense could be the defects that Jeff has spoken about -- the glitches. There is a tremendous downside to not focusing on governance on the front-side, not looking at it in the beginning. The book really tries to ask and answer the toughest SOA questions that we’ve seen in the marketplace over the last decade.

Gardner: We’ll certainly look forward to that. Back to you Jeff. When we think about governance, it has a bit of a siloed history itself. There's the old form of management, the red-light, green-light approach to IT management. We’ve seen design-time governance, but it seems to be somewhat divorced from, even on a different plane than, runtime or operational governance.

What needs to happen in order to make governance more holistic, more end-to-end?

Papows: It’s a good question, Dana. It’s like everything else in our industry. We’re sometimes our own worst enemy and we get hung up on language, and God forbid, we create yet another acronym headache.

There's an old expression, "Everybody wants governance, but nobody wants to be governed." We run the risk, and I think we’ve tripped over it several times, where we get to the point where developers don’t want to be slowed down. There is this Big Brother-connotation at times to governance. We’ve got to explore a different cultural approach to it.

Governance, whether it’s design time or run time, is really about automating and codifying best practices.



Governance, whether it’s design-time or run-time, is really about automating and codifying best practices, and it’s not done generically as was once taught. It can be, in my experience, very specific. The things we see Ford Motor Co. doing are very different. They're germane to their IT culture and organization, and very different than what we see the Bank of America do, as an example.

To Kerrie’s point about the cost of a lack of automated best practices, if we can use the new verb, it isn’t always quantitative. Look at the brand damage to a bank when they shut customers out of their ATM network, the other side of turning the switch when they merged back-office systems. Look at the number of people whose automated payment systems and whatnot were knocked out of kilter.

The brand damage affecting major corporations is a consequence of having these inane debates about whether SOA is alive or dead, whether you need design-time governance or run-time governance. What you need is a way to automate what you are doing, so that your best practices are enforced throughout the development lifecycle.

Kerrie answered your question well when he said it really is about waste. It’s not just about wasted human capital or wasted productivity or cycles. It’s about wasted go-to-market opportunity. Remember, we're now living in the era of market-facing systems. For almost every major business enterprise, our digital footprint is directly accessible in the marketplace, whether it’s an ATM network or a hand-held device. The line between our back-office infrastructure and our consumer experience is being obliterated.

I'd argue that rather than making distinctions between design and run-time governance, companies simply, one way or another, need to automate their best practices. The business mandates of the corporations need to be reflected in an automated way that makes it manageable across the information technology life-cycle -- or you exist at your own peril.

Gardner: Kerrie, any thoughts on this concept of governance and how we make it more ubiquitous and more enforced as the pain and the problems grow evident? The solution at a high level seems pretty clear. It seems to be the implementation where we stumble.

Governance mindset

Holley: You hit it on the head, and Jeff made the point as well. A lot of people think governance is onerous, that it’s a structure that forces people to do things a certain way. They look at it as rigid, inflexible, unforgiving. They think it just gets in the way.

That’s a mindset that people find themselves in, and it’s a reason not to do something. But when you think about the goals that you're seeking, most goals have something to do with efficiency, lower cost, customers, and making the company more agile. When you think about this, pretty much everybody in the marketplace knows that you don’t get those goals for free. There is some cultural change that’s often necessary to bring those goals about, some organizational change.

There's automation. You don’t start with automation. You actually start with the problem, the processes, and picking the right tool. But, automation has to be a part of that solution. One end of the spectrum, we’ve got to address this mindset that governance gets in the way, that it’s overhead, and that it’s unnecessary.

We know that organizations that are very successful, that are achieving many of their goals, when we peel the onion back, we see them focused on governance. One advice that we all know is that you shouldn’t boil the ocean, that you should do incremental change. We also need to do this in governance.

We need to have these incremental successes, where we are focused on automation holistically and looking at the life-cycle, not just looking at the part-of-the-problem space.

Looking for automation as a way out of the hole that has been created is a consequence of the industry’s own success.



Gardner: Jeff, it sounds like governance needs a makeover. Is there an opportunity? You are going to be discussing this book at the IBM Impact Conference 2010, their SOA conference? Is this a good opportunity? You have a lot of IT executive and software executives from the variety of enterprises on hand, but what would you tell them in terms of how to make governance a bit more attractive?

Papows: We all need to say, "I am a computer science professional. We have reached a point in the complexity curve where I no longer scale." You have to start with an admission of fact. And the reality is that the demands placed on today's IT organizations, the magnitude of the existing infrastructure that needs to continue to be cared for, the magnitude of application demands for new systems and access points from all of this new technology, simply is not going to correlate without a completely different highly automated approach.

Kerrie is right. You can't boil the ocean and you can’t do it at once, but you have to start with an honest self-assessment that, as an industry, we can't continue to go forward at the rate and pace that we have grown, given everything we know and that we see, without finally eating our own cooking.

Looking for automation as a way out of the hole that has been created is a consequence of the industry’s own success. We didn't get here because we failed to be fair to all of those developers in the audience. They're going to listen to this and say, "Why am I the bad guy?" They're not the bad guys.

The reality is, as I said, that we're responsible for the greatest percentage of growth in the gross domestic product. We're responsible for the greatest percentage workforce productivity. We've changed the way civilization lives and works. We've dealt with a quantum leap -- and the texture of human existence is a consequence of this technology.

It's time that we simply admit that we need to turn back on ourselves in order to continue to manage this or we, literally, I believe, are on the precipice of that digital equivalent of a Pearl Harbor, and the economic and productivity consequences of failing are extreme.

Gardner: Well, we'll have to leave it there. We're about out of time. We've been discussing how glitches in business have highlighted a possible breakdown in the continuity of technology and that governance is an important factor in making technology continue on its productivity curve, without falling at some degree under its own weight.

I want to thank our guests. We have been joined today by Jeff Papows, President and CEO of WebLayers, and the author of the new book, Glitch: The Hidden Impact of Faulty Software. Thank you so much, Jeff.

Papows: Thank you, Dana, and thank you, Kerrie.

Gardner: And, we have been joined also by Kerrie Holley, an IBM Fellow as well as the CTO for IBM’s SOA Center of Excellence. Thanks for your input, and we will look forward to your book as well.

Holley: Thank you, Dana, and thank you, Jeff.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: WebLayers.

Transcript of a sponsored BriefingDirect podcast on the growing danger from faulty software and how to overcome it. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

Wednesday, February 03, 2010

CERN’s Evolution to Cloud Computing Portends Revolution in Extreme IT Productivity?

Transcript of a BriefingsDirect podcast on the move to cloud computing for data-intensive operations, focusing on the work being done by the European Organization for Nuclear Research.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: Platform Computing.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Today, we present a sponsored podcast discussion on some likely directions for cloud computing based on the exploration of expected cloud benefits at a cutting edge global IT organization.

We are going to explore the thinking on how cloud computing both the private and public varieties might be useful at CERN, the European Organization for Nuclear Research in Geneva.

CERN has long been an influential bellwether on how extreme IT problems can be solved. Indeed, the World Wide Web owes a lot of its usefulness to early work done at CERN. Now the focus is on cloud computing. How real is it, and how might an organization like CERN approach cloud?

In many ways CERN is quite possibly the New York of cloud computing. If cloud can make it there, it can probably make it anywhere. That's because CERN deals with fantastically large data sets, massive throughput requirements, a global workforce, finite budgets, and an emphasis on standards and openness.

So please join us, as we track the evolution of high-performance computing (HPC) from clusters to grid to cloud models through the eyes of CERN, and with analysis and perspective from IDC, as well as technical thought leadership from Platform Computing.

Join me in welcoming our panel today, Tony Cass, Group Leader for Fabric Infrastructure and Operations at CERN. Welcome, Tony.

Tony Cass: Pleased to meet you.

Gardner: We’re also here with Steve Conway, Vice President in the High Performance Computing Group at IDC. Welcome, Steve.

Steve Conway: Thanks. Welcome to everyone.

Gardner: And, we're also here with Randy Clark, Chief Marketing Officer at Platform Computing. Welcome Randy.

Randy Clark: Thank you. Glad to be here.

Gardner: Over the last several years, we've seen cloud computing become quite popular as a concept. It remains largely confined to experimentation, but this notion of private cloud computing is being scoped out by many large and influential enterprises as well as large early adopters like CERN.

Let me go to you Steve Conway. What's the difference between private and public cloud and how far away are any tangible benefits of cloud computing from your perspective?

Already here

Conway: Private cloud computing is already here, and quite a few companies are exploring it. We already have some early adopters. CERN is one of them. Public clouds are coming. We see a lot of activity there, but it's a little bit further out on the horizon than private or enterprise cloud computing.

Just to give you an example, we just did a piece of research for one of the major oil and gas companies, and they're actively looking at moving part of their workload out to cloud computing in the next 6-12 months. So, this is really coming up quickly.

Gardner: So, this notion of having a cohesive approach to computing and blending what you do on premises with these other providers isn't just pie in the sky. This is really something people are serious about.

Conway: Well, CERN is clearly serious about it in their environment. As I said, we're also starting to see activity pick up with cloud computing in the private sector with adoption starting somewhere between six months from now and, for some, more like 12-24 months out.

Gardner: Randy Clark, from your perspective, how many customers of Platform Computing would you consider to be seriously evaluating what we now refer to as public or private cloud?

Clark: We have formally interviewed over 200 customers out of our installed base of 2,000. A significant portion -- I wouldn’t put an exact number on that, but it's higher than we initially anticipated -- are looking at private-cloud computing and considering how they can leverage external resources such as Amazon, Rackspace and others. So, it's easily a third and possibly more.

Gardner: Tony Cass, let's go to you at CERN. Tell us first a little bit about CERN for those of our readers who don’t know that much or aren't that familiar. Tell us about the organization and what it does, and then we can start to discuss your perceptions about cloud.

Cass: We're a laboratory that exists to enable, initially Europe’s and now the world’s, physicists to study fundamental questions. Where does mass come from? Why don’t we see anti-matter in large quantities? What's the missing mass in the universe? They're really fundamental questions about where we are and what the universe is.

We do that by operating an accelerator, the Large Hadron Collider, which collides protons thousands of times a second. These collisions take place in certain areas around the accelerator, where huge detectors analyze the collisions and take something like a digital photograph of the collision to understand what's happening. These detectors generate huge amounts of data, which have to be stored and processed at CERN and the collaborating institutes around the world.

We have something like 100,000 processors around the world, 50 petabytes of disk, and over 60 petabytes of tape. The tape is in just a small number of the centers, not all of the hundred centers that we have. We call it "computing at the terra-scale," that's terra with two R's. We’ve developed a worldwide computing grid to coordinate all the resources that we have with the jobs of the many physicists that are working on these detectors.

Gardner: So, to look at the IT problem and unpack it a little bit. You're dealing with such enormous amounts of data. You’ve been in the distribution of these workloads for quite some time. Maybe you could explain a little bit the evolution of how you've distributed and managed such extreme workload?

No central management

Cass: If you look at the past, in the 1990’s, we had people collaborating, but there was no central management. Everybody was based at different institutes and people had to submit the workloads, the analysis, or the Monte Carlo simulations of the experiments they needed.

We realized in 2000-2001 that this wasn’t going to work and also that the scale of resources that we needed was so vast that it couldn’t all be installed at CERN. It had to be shared between CERN, a small number of very reliable centers we call the Tier One centers and then 100 or so Tier Two centers at the universities. We were developing this thinking around the same time as the grid model was becoming popular. So, this is what we’ve done.

What a lot of the grid academics have done is in understanding or exploring what could be done with the grid, as an idea. What we've been focusing on is making it work and not pushing the envelope in terms of the technology, but pushing the envelope in terms of the scale to make sure that it works for the users. We connect the sites. We run tens of thousands of jobs a day across this and gradually we’ve run through a number of exercises to distribute the data at gigabytes a second and tens of thousands of jobs a day.

We've progressively deployed grid technology, not developed it. We've looked at things that are going on elsewhere and made them work in our environment.

Gardner: As I understand it, the interest you have in cloud isn’t strictly a matter of ripping and replacing, but augmenting what you're already doing vis-a-vis these grid models.

Cass: Exactly. The grid solves the problem in which we have data distributed around the world and it will send jobs to the data. But, there are two issues around that. One is that if the grid sends my job to site A, it does so because it thinks that a batch slot will become available at site A first. But, maybe a grid slot becomes available at site B and my job is site A. Somebody else who comes along later actually gets to run their job first.

Today, the experiment team submits a skeleton job to all of the sites in order to detect which site becomes available first. Then, they pull down my job to this site. You have lots of schedulers involved in this -- in the experiment, the grid, and the site -- and we're looking at simplifying that.

These skeleton jobs also install software, because they don’t really trust the sites to have installed the software correctly. So, there's a lot of inefficiency there. This is symptomatic of a more general problem. Batch workers are good at sharing resources that are relatively static, but not when the demand for resource types changes dynamically.

So, we’re looking at virtualizing the batch workers and dynamically reconfiguring them to meet the changing workload. This is essentially what Amazon does with EC2. When they don’t need the resources, they reconfigure them and sell the cycles to other people. This is how we want to work in virtualization and cloud with the grid, which knows where the data is.

Gardner: Steve Conway, you’ve been tracking HPC for some time at IDC. Maybe you have some perceptions on how CERN is a leading adopter of IT over the years, the types of problems they're solving now, or the types of problems other organizations will be facing in the future. Could you tell us about this management issue and do you think that this is going to become a major requirement for cloud computing?

World technology leader

Conway: Starting with CERN, their scientists have earned multiple Nobel prizes over the years for their work in particle physics. As you said before, CERN is where Tim Berners-Lee and his colleagues invented the World Wide Web in the 1980s.

More generally, CERN is a recognized world leader in technology innovation. What’s been driving this, as Tony said, are the massive volumes of data that CERN generates along with the need to make the data available to scientists, not only across Europe, but across the world.

For example, CERN has two major particle detectors. They're called CMS and ATLAS. ATLAS alone generates a petabyte of data per second, when it’s running. Not all that data needs to be distributed, but it gives you an idea of the scale or the challenge that CERN is working with.

In the case of CERN’s and Platform’s collaboration, as Tony said, the idea is not just to distribute the data but also the applications and the capability to run the scientific problem.

CERN is definitely a leader there, and cloud computing is really confined today to early adopters like CERN. Right now, cloud computing services constitute about $16 billion as a market.

IDC: By 2012, which is not so far away, we project that spending for cloud computing is going to grow nearly threefold to about $42 billion. That would make it about 9 percent of IT spending.



That’s just about four percent of mainstream IT spending. By 2012, which is not so far away, we project that spending for cloud computing is going to grow nearly threefold to about $42 billion. That would make it about 9 percent of IT spending. So, we predict it’s going to move along pretty quickly.

Gardner: How important is this issue that Tony brought up about being able to manage in a dynamic environment and not just more predictable static batch loads?

Conway: It’s the single biggest challenge we see for not only cloud computing, but it has affected the whole idea of managing these increasingly complex environments -- first clusters, then grids, and now clouds. Software has been at the center of that.

That’s one of the reasons we're here today with Platform and CERN, because that’s been Platform’s business from the beginning, creating software to manage clusters, then grids, and now clouds, first for very demanding, HPC sites like CERN and, more recently, also for enterprise clients.

Gardner: Randy Clark, as you look at the marketplace and see organizations like CERN changing their requirements, what, in your thinking, is the most important missing part from what you would do in management with HPC and now cloud? What makes cloud different, from a management perspective?

Dynamic resources

Clark: It’s what Tony said, which is having the resources be dynamic not static. Historically, clusters and grids have been relatively static, and the workloads have been managed across those. Now, with cloud, we have the ability to have a dynamic set of resources.

The trick is to marry and manage the workloads and the resources in conjunction with each other. Last year, we announced our cloud products -- Platform LSF and Platform ISF Adaptive Cluster -- to address that challenge and to help this evolution.

Gardner: Let’s go back to Tony Cass. Tell me what you’re doing with cloud in terms of exploration. I know you’re not in a position to validate, or you haven’t put in place, any large-scale implementation or solutions that would lead the market. But, I’m very curious about what the requirements are. What are the problems that you're trying to solve that you think cloud computing specifically can be useful in?

Cass: The specific problem that we have is to deliver the most physics we can within the fixed budget and the fixed amount of resources. These are limited either by money or by data-center cooling and generally are much less than the experiment wants. The key aim is to deliver the most cycles we can and the most efficient computing we can to the physicists.

I said earlier that we're looking at virtualization to do this. We’ve been exploring how to make sure that the jobs can work in a virtual environment and that we can instantiate virtual machines (VMs), as necessary, according to the different experiments that are submitting workloads at one time to integrate the instantiation of VMs with the batch system.

At the moment, we're looking at how you can reliably send a virtual image that's generated at one place to another site.



Once we got that working, we figured that the real problem was managing the number of VMs. We have something like 4,000 boxes, but if you have a VM per call, plus a few spare, then it can easily get to 60,000, 70,000, or 80,000 VMs. Managing these is the problem that we are trying to explore now, moving away from “can we do it” to “can we do it on a huge scale?”

Gardner: Are you yet at the point where you want to be able to manage the VMs that you have under your own control, and perhaps starting to deploy virtualized environments and workloads in someone else’s cloud and make them managed and complementary.

Cass: There are two aspects to that. The resources in our community are at other sites, and all of the sites are very independent. They are also academic environments. So, they are exploring things in their own way as well. At the moment, we're looking at how you can reliably send a virtual image that's generated at one place to another site.

Amazon does this, but there are tight constraints in the way they manage that cluster, because they built it thinking about this. Universities maybe didn’t build their own cluster in a way that separates that out from some of the other computing they're doing. So, there are security and trust implications there that we are looking at. That will be a thing to collaborate on long-term.

More cost effective

Certainly, if we configure things in our own way, when we look in a cloud environment, perhaps it will be more cost effective for us to only purchase the equipment we need for the average workload and they buy resources from Amazon or other providers. But, there are interesting things you have to explore about the fact that the data is not at Amazon, even if they have the cycles.

There are so many things that we’re thinking about. The one we’re focusing on at the moment is effectively managing the resources that we have here at CERN.

Gardner: Steve Conway, it sounds as if CERN has, with its partnered network, a series of what we might call private-cloud implementations and they're trying to get them to behave in concert at what we might call at a public cloud level. That exercise could, as with the World Wide Web, create some de-facto standards and approaches that might, in fact, help what we call hybrid cloud computing moving forward. Does that fairly surmise where we are?

Conway: That’s right. There are going to have to be more rigorous open standards for the clouds. What Tony was talking about at CERN is something that we see elsewhere. People are turning to public clouds today -- "turning to" just meaning exploring at this point for a way to handle overload work and search workloads.

But, we're seeing some smaller and medium-size businesses looking to public clouds as a way to avoid having to purchase their own internal resources . . . and also as a way of avoiding having to hire experts who know how to operate them.



The Internet itself is a pretty high latency network, if you think of it that way. People are looking to send portions of the workload that doesn't have a lot of communication dependencies particularly inter-processor communication dependencies, because the latency doesn't support that.

But, we're seeing some smaller and medium-size businesses looking to public clouds as a way to avoid having to purchase their own internal resources, clusters for example, and also as a way of avoiding having to hire experts who know how to operate them. For example, engineering services firms don't have those experts in house today.

Gardner: Back to you Tony Cass, I know this is still a bit hypothetical, but if there were the standards in place, and you were able to go to a third-party cloud provider for some of these spikes or occasionally dynamically generated workloads that perhaps exceed your current on-premise’s capabilities, would this be a financial boon to you, where you could protect your pricing and you could decide the right supply and demand fit when it comes to these extreme computing problems?

Cass: It would certainly be a boon. The possibility is being demonstrated by experiments that are actually based at Brookhaven to do simulations that are CPU-intensive, where they don't need much data transfer or data access. They have been able to run simulations cost-effectively with EC2.

Although their cycles, compared to some of the things we're doing, are more expensive, if we don't have to buy all of the resources, we could certainly save money. Another aspect is that it is beyond money in some sense. If you need to get something fixed for a conference, and you are desperately trying to decide whether or not you’ve discovered the Higgs then it's not a case of “money's no object,” but you can get the resources from a cloud much more quickly than you can install capacity at CERN. So both aspects are definitely of interest.

Gardner: Randy Clark, this makes a great deal of sense from the perspective of a large research organization. But, we're not just talking about specific workloads. We're talking about workloads that will be common across many other vertical industries or computing environments. Can you name a few, or mention some from your experience, where we should expect the same sorts of economic benefits to play out.

Different use cases

Clark: What we're seeing is across industries. Financial services is certainly taking a leadership role. There's a lot going on in the semiconductor or electronic industry. Business intelligence (BI) is across industries and government. So, across industries, we see different use cases.

To your point, these use cases are enterprise applications to run the business, and we're seeing that in Java applications, test and development environments, and traditional HPC environments.

That's something driven by the top of the organization. Tony and Steve laid it out well. They look at the public/private cloud economically, and say, "Architecturally, what does this mean for our business?" Without any particular application in mind they're asking how to evolve to this new model. So, we're seeing it very horizontally and, to your point, in enterprise and HPC applications.

Gardner: Steve Conway, thinking about these large datasets, Randy brought up BI, and that, of course, means warehousing, data analytics, and advanced analytics. A lot of organizations are creating datasets at a scale never anticipated, never mind seen before, things from sensors, mobile devices, network computing, or social networking.

BI is one of those markets that, in its attributes, straddles the world of HPC and enterprise computing just as financial services does . . .



How do we bring together these compute resources, the raw power with these large data sets. I think this is an issue that CERN might also be a bellwether on, in somehow managing these large data sets and the compute power, bringing them architecturally into alignment.

Conway: BI is one of those markets that, in its attributes, straddles the world of HPC and enterprise computing just as financial services does, in the sense that they have workloads that don't have a whole lot of communications dependencies. They don't need networks with very high latency for the most part.

You see organizations like the University of Phoenix, which has 280,000 online students, that have already made this evolution -- in this case, with Platform helping them out -- from clusters to grid computing today. Now, they're looking toward cloud computing as a way to take them further.

You also see that not just in the private sector side. One of the other active customers that's really looking in that same direction is the Centers for Disease Control (CDC), which has moved to from clusters to grid computing.

What you're seeing here is people who have already stepped through the earlier stages of this evolution. They've gone from clusters to grid computing for the most part and now are contemplating the next move to cloud computing. It's an evolutionary move. It could have some revolutionary implications, but, from a technological standpoint, sometimes evolutionary is much safer and better than revolutionary.

Gardner: Tell us about some of the solutions that you now need to bring to market or are bringing to market around management and other issues? Where have you found that the rubber hits the road, in terms of where people can take this in real time? What's the current state of the art? Rather than talking about hypothetical, what's now possible, when it comes to moving from cluster and grid to the revolution of cloud?

Interaction of technologies

Clark: What Platform sees is the interaction of distributed computing and new technologies like virtualization requiring management. What I mean by that is the ability, in a large farm or shared environment, to share resources and then make those resources dynamic. It's the ability to add virtualization into those on the resource side, and then, on the server side, to make it Internet accessible, have a service catalog, and move from providing IT support to truly IT as a competitive service.

The state of the art is that you can get the best of Amazon, ease of use, cost, accessibility with the enterprise configuration, scale, and dependability of the enterprise grid environment.

There isn't one particular technology or implementation that I would point to, to say "That is state of the art," but if you look across the installations we see in our installed base, you can see best practices in different dimensions with each of those customers.

Gardner: Randy, what are some typical ways that you're seeing people getting started, when they want to make these leaps from evolutionary progression to revolutionary paybacks? Where do they start making that sort of catalytic difference?

Taking a step back, we see customers thinking about architecturally how do they want to have that management layer.



Clark: The evolution is the technology, as Steve said. The revolution is in the approach architecturally to how to get to that new spot.

Taking a step back, we see customers thinking about architecturally and how they want to have that management layer. What is that management layer going to mean to them going forward? And, can they quickly identify a set of applications and resources and get started?

So, there is an architecture piece to it, thinking about what the future will hold, but then there is a very pragmatic piece -- let's get going, let's engage, let's build something and be able to scale that out over time. We saw that approach in grid computing. We're encouraging folks to think, but then also to get started.

Gardner: Tony Cass at CERN, what are your next steps? Where would you expect to be heading next as you explore the benefits and possible real-world opportunities?

Cass: We’re definitely concentrating for the moment on how we exploit effective resources here. The wider benefits we'll have to discuss with our community.

Gardner: What would you like to see happen next?

Focusing on delivery

Cass: What I would like to see happen next is a definite cloud environment at CERN, where we move from something that we're thinking about to something that is in operation, where we have the ability to use resources that aren’t primarily dedicated for physics computing to deliver cycles to experiment. I'd like to see a cloud, a dynamically evolving environment in our computer center. We’re convinced it's possible, but delivering that is what we’re focusing on.

Gardner: Steve Conway, where do you see things headed next? What are the next steps that we should look for, as we move from that evolutionary progression to more of a revolutionary productivity?

Conway: It's along a couple of dimensions. One is the dimension of people actually working in these environments. In that sense, the CERN-Platform collaboration is going to help drive the whole state of the art forward over the next period of time.

People are a little bit concerned about testing their data there. The evolution of standards is going to accelerate this trend.



The other one, as Randy mentioned before, it that the evolution of standards is going to be important. For example, right now, one of the barriers to public-cloud computing is vendor lock-in, where the cloud, the Amazons, the Yahoos, and so forth are not necessarily interoperable. People are a little bit concerned about testing their data there. The evolution of standards is going to accelerate this trend.

Gardner: Why don’t I give the last word today to Randy? Tell us about some information that's available out there for folks who are looking to explore and take some first steps toward this more revolutionary benefit.

Clark: I'd encourage everybody to visit our website. There are a number of white papers, webinars, and webcasts that we've done with other customers to highlight some other use cases within development, test, and production environments. I'd point people to the resource page on our website www.platform.com.

Gardner: I want to thank our guests. This has been a very interesting discussion, and I certainly look forward to following what CERN does, because I do think that they’re going to be a leader in terms of what many others will be end up doing in B2B cloud computing.

Thank you to Tony Cass, Group Leader for Fabric Infrastructure and Operations at CERN. Thank you, sir.

Cass: Thank you.

Gardner: And also a good, big thank you to Steve Conway, Vice President in the High Performance Computing Group at IDC. Thank you, Steve.

Conway: Thanks.

Gardner: And also, of course, thank you to Randy Clark, Chief Marketing Officer at Platform Computing.

Clark: Thank you for the opportunity.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast on what likely outcomes we can expect from cloud computing and architecture, on the progression from grid to cloud computing, and moving into a more revolutionary set of benefits. Thanks for listening and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: Platform Computing.

Transcript of a BriefingsDirect podcast on the move to cloud computing for data-intensive operations, focusing on the work being done by the European Organization for Nuclear Research. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in: