Friday, February 08, 2008

New Eclipse-Based Tools Offer Developers More Choices, Migrations and Paths to IBM WebSphere

Transcript of BriefingsDirect podcast on Eclipse-based tool choices for IBM WebSphere shops.

Listen to the podcast here. Sponsor: Genuitec.


Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Today, a sponsored podcast discussion about choices developers have when facing significant changes or upgrades to deployment environments. We'll be looking at one of the largest global installed bases of application servers, the IBM WebSphere platform.

Eclipse-oriented developers and other developers will be faced with some big decisions, as their enterprise architects and operators begin to adjust to the arrival of the WebSphere Application Server 6.1. That has implications for tooling and infrastructure in general.

The platform depends largely on the Rational Application Developer (RAD), formerly known as the WebSphere Studio Application Developer. This recent release is designed to ease implementations into Services Oriented Architecture (SOA) and improve speed for Web services.

However, the new Rational toolset comes with a significant price tag and some significant adjustments. Into this changeable environment, Genuitec, the company behind the MyEclipse IDE, is offering a stepping-stone approach to help with this WebSphere environment tools transition.

The MyEclipse Blue Edition arrives on March 15, after a lengthy beta run, and may be of interest to developers and architects in WebSphere shops as they move and adjust to the WebSphere Application Server 6.1.

To help us understand this transition, the market, and the products we are joined by Maher Masri, president of Genuitec. Welcome to the show, Maher.

Maher Masri: Thank you, Dana.

Gardner: Also, James Governor, a co-founder and industry analyst at RedMonk. Welcome, James.

James Governor: Hi, Dana.

Gardner: James, let’s start with you. We’re looking at a pretty dynamic marketplace around tools. There are certainly lots of different frameworks and approaches floating around. Folks are dealing with SOA, with Software as a Service (SaaS), with agile development. They are dealing with mashups and Enterprise 2.0 issues. We’re seeing increased use of REST and SOAP. This is just a big, fluid, dynamic environment.

On the other hand, we’re also seeing some consolidation around runtimes. Organizations looking to cut cost and infrastructure and trying to bring their data centers under as few runtime environments as possible. So, we’re left with somewhat of a conundrum, and into this market IBM is introducing a major upgrade.

Maybe you could paint a picture for us of what you see from the enterprises and developers you speak to on how they deal with, on one hand, choice and, on the other hand, consolidation.

Governor: It's a great question. In this industry we can expect continuing change. If anything is certain, it's that. When we look at this marketplace, if we go back a couple of years into the late 1990s, there was a truism that you could not make money as a tools company. The only way you could really sustain a business would be connected to, and interwoven with, the application server and the deployment environment. So it's interesting that now, sometime later, we’re beginning to rethink that.

If you look at a business like Genuitec, the economics are somewhat different. The Eclipse economics, in terms of open source and the change there, where there is a code based being worked on, have meant that it's actually easier to maintain yourself as an independent and work on a specific set of problems.

In terms of your question about Web 2.0, agile development, and so on, there are an awful lot of changes going on. That does create some opportunities for the third parties. Frankly, when you look at the very largest firms, it's actually quite difficult for them to maintain the sorts of innovation that we’re seeing from some of the smaller players.

In terms of the new development environments, it might be something like the fact that we’re seeing more Ruby on Rails. P scripting languages continue to be used in the enterprise. So, supporting those is really important, and you are not always going to get that from the lead vendors.

I'll leave it up to Genuitec to pitch what they do, but one of the interesting things they did, which you certainly wouldn’t have seen from IBM, was a while back, when they bridged the Eclipse world with the NetBeans ’ Matisse GUI building application development set.

Crossing some of those boundaries and being able to deal with that complexity and work on the customer problems, it's not surprising to me that we’ve seen this decoupling, largely driven by open source. Open source is re-enabling companies to focus on one thing, rather than saying, "Okay, we've got to be end-to-end."

Gardner: So, we've got a dynamic environment. We have some amazing uptake in Eclipse over the past several years becoming a dominant job oriented IDE. We have WebSphere as the dominant deployment platform.

As you pointed out, the economics around tools have shifted dramatically. It seems that the value add is not so much in the IDE now, but in building bridges across environments, making framework choices easier for developers, and finding ways of mitigating some of these complexity issues, when it comes to the transition on the platform side.

Let’s go to Maher. Tell me a little bit about why Eclipse has been so successful, and do you agree that it's the value add to the IDE where things are at right now?

Masri: Let me echo James’ point regarding the tools environment, and software companies not being able to make money at that. I think that was based on some perceived notion that people refuse to pay money for software. In fact, what we've found is that people don’t mind paying for value, and perceived value, when it’s provided at their own convenience and at their own price point.

That’s why we set the price for the MyEclipse Enterprise Workbench at such a low point that it could be purchased anywhere in the world without a series of internal financial company decisions, or even a heartbreaking personal decision.

Although the product was just the JSP editor when it was first launched, today it's a fully integrated development environment that rivals any Tier 1 product. It's that continuity of adding value continually with every release, multiple releases within the same year, to make sure that, a) we listen to our customer base, and b) they get the value that they perceive they need to compensate for the cost that we charge them.

Eclipse obviously has become the default standard for the development environment and for building tools on top of it. I don’t think you need to go very far to find the numbers that support those kinds of claims, and those numbers continue to increase on a year-to-year basis around the globe.

When it started, it started not as a one-company project, but a true consortium model, a foundation that includes companies that compete against each other and companies in different spaces, growing in the number of projects and trying to maintain a level of quality that people can build upon to provide software on top of it from a tools standpoint.

A lot of people forget that Eclipse is not just a tools platform. It's actually an application framework. So it could be, as we describe it internally, a floor wax and a dessert topping.

The ability for it to become that mother board for applications in the future makes it possible for it to move above and beyond a tools platform into what a lot of companies already use it for -- a runtime equation.

The next Ganymede 3.4 and the 4.0 extension of Eclipse is pushing it in exactly that direction. The OSGi adoption is making a lot of people reconsider their thought in terms of, "What application do I write for productivity applications internally, for tools that I provide to my internal and external customers, for which client implementations?"

It's forcing quite a bit of rethinking in terms of the traditional client/server models, or the Web-only application model, because of productivity requirements and so on.

IBM was the company that led the way for all of the IBM WebSphere implementation and many of their internal implementations. A lot of technologies are now based on Eclipse and based on Eclipse runtime.

Gardner: So, we have this big bear, Eclipse, in the market and we have this big bear, WebSphere, in the market. Why is there a need for someone like you to come in between and help developers?

Masri: The story that we hear internally from our own customers is pretty consistent, and it starts with the following. "We love you guys. You provide great values, great features, great support, except I cannot use you beyond a certain point." Companies for whatever internal reasons, from a vendor standpoint, are making the choices today to move forward with WebSphere 6.1, and that’s really the story we keep hearing.

"I am moving into 6.1, and the reason for that is I am re-implementing or have a revival internally for Web services, SOA, Rich-net applications, and data persistence requirements that are evolving out of the evolution of the technology in the broader space, and specifically as implemented into the new technology for 6.1."

Gardner: They need to modernize it.

Masri: But their challenge is similar. Every one of them tells us exactly the same story. "I cannot use your Web service implementation because, a) I have to use this web services within WebSphere or I lose support, and b) I have invested quite a bit of money in my previous tools like WebSphere Application Developer (WSAD), and that is no longer supported now.

"I have to transition into, not only a runtime requirement, but also a tools requirement." With that comes a very nice price tag that not only requires them to retool their development and their engineers, but also reinvest into that technology.

But the killer for almost all of them is, "I have to start from scratch, in the sense that every project that I have created historically, my legacy model. I can no longer support that because of the different project model that’s inside."

For example, Rational 7.0 is only one of the few versions of WebSphere that supports 6.1 and supports all of the standards for Web services, for AJAX support, for persistence requirements that they need to modernize. They have to implement it, but cannot take, for example, an existing WSAD project, import it into Rational 7.0, and continue development. They pretty much start from scratch.

Gardner: Let’s go to James for a moment. James, you’re familiar with the IBM stack and their road map. Why are they doing this? It seems to me that there is an application lifecycle management (ALM) set of benefits that the Rational toolset and platform bring that IBM is trying to encourage people to take advantage of. It does require transition, but they have a larger goal in mind. Perhaps we should address this ALM, or do you have other thoughts about this transition?

Governor: From an IBM perspective, it’s a classic case of kind of running ahead of the stack. If you see the commoditization further down the stack, you want to move on up. So IBM looks at the application developer role and the application development function and thinks to itself, "Hang on a second. We really need to be moving up in terms of the value, so we can charge a fair amount of money for our software," or what they see is a fair amount of money.

From an IBM standpoint, I think they really looked at players such as Genuitec, looked at where Eclipse was going, and they thought, "Wait a second. We really do need to be moving forward with this notion of software development."

If you talk to a lot of developers, they don’t really think of the world that way, but many of their managers do. So, the idea of moving to situation where there is better integration of the different datasets, where you've got one repository of metadata moving forward with that kind of stuff, that’s certainly the approach they are taking.

The idea is you've got "auditability," as you build applications. You’re going from a classic distributed development, but you’re doing a better job of centralizing, managing, and maintaining all the data that’s associated with that.

The fact that IBM is making that change is indicative of the fact that when they look at the market more broadly, they think to themselves, "Well, where is our margin coming from?"

IBM’s strategy is very much to look at business process as opposed to the focus on just a technical innovation. That certainly explains some of the change that's being made. They want to drive an inflection point. They can't afford to see orders-of-magnitude cheaper software doing the same thing that their products do.

Gardner: As we mentioned earlier, there are so many complexities involved in decision making now, different approaches to creating services, that the operators and the vice presidents of engineering are saying, “Wow, we need to manage this complexity.”

They are looking for life cycle approaches, ways of bridging design time and runtime. IBM is addressing some of these needs, but, as you point out, developers are often saying, "Hey, I just want my tool. I want to stick with what I know." So we’re left with a little bit of a disconnect.

I’m assuming, Maher, that this is where you’re stepping in and saying, "Aha, perhaps we can let the developers have it their way for a time to mitigate the pain of the transition, at the same time recognizing that these vice presidents of engineering and development are going to need to look at a much more holistic life-cycle approach. So, perhaps we can play a role in satisfying both." Am I reading too much into that?

Masri: No. We understand internally that different technologies have different adoption life cycle behind them. ALM is no different. It’s going to take a number of years for it to become the standard throughout the industry, and it is the right direction that almost every company is going to have to face at some time in the future.

The challenge for everybody, us and IBM, is the bottom-up sale process, to provide the tools and the capabilities for companies to embrace, for people to embrace those technologies, and, at the same time, putting the infrastructure in place for managers to be able to continue to manage projects into success.

Our decision is very simple. We looked at the market. Our customers looked back at us and basically gave us the same input. If you provide us this delta of functionalities, specifically speaking, if you’re able to make my life a little easier in terms of importing projects that exist inside of WebSphere Application Developer into your tool environment, if you can support the web services standard that’s provided by WebSphere.

If you can integrate better with ClearCase from a code management standpoint, and if you could provide a richer deployment model into WebSphere so my developers could feel as if they’re deploying it from within the IBM toolset, I don’t have the need to move outside of your toolset. I can continue to deploy, develop and run all my applications from a developer's standpoint, not from an administrator's.

Obviously if you are an administrator and have one to three people within the company that maintain a runtime version of WebSphere, you will need specific tools for that. We’re not targeting those one to three people. We’re targeting the 10 to 500 developers internally that need to build those applications. That’s really where Blue is coming from.

Governor: Maher, can you be a little bit more specific about it. You just used the top-down bottom-up or top-down in terms of your argument. Can you talk a little bit more to sort of that and your sales staff?

Certainly, from RedMonk’s standpoint, we do tend to be more aligned with the bottom-up, just in terms of our customer and community base. But, in terms of what you’re seeing and saying, how is what you do different from IBM? I didn’t quite get that from your last comments.

Masri: I'll give you a very simple example. Just take the experience of a developer installing MyEclipse or installing RAD from ground zero. MyEclipse, you can install in a two-megabyte root install. It installs a 600-megabyte version on your desktop that contains all the tools. You no longer need to buy additional tools from somewhere else. If you need to do UML development, if you need to do UI design, all that is included as one bundle within MyEclipse.

If you install RAD, you need a multi-DVD, six or seven gigabytes, I understand, in order just to begin the installation. The configuration is a nightmare. Everyone is telling us that it's a very difficult configuration process just get started.

MyEclipse is part of a very rich, simple profile that a user can download directly through the MyEclipse site or through our managed application environment inside of Pulse. You can be up and running with tools, with runtime configurations, and with examples, literally within minutes, as opposed to within hours or days beyond that.

On the issue of simplicity, the feedback that we keep getting is that our response level in terms of request for features, request for innovations, request in the technologies, we can deliver within months, as opposed to years or multi-months, when looking at the competition. All of that becomes internalized from the developer standpoint into, "I like this better, if it can bridge that gap that I now have to use this technology, in order to satisfy my business requirements."

Gardner: Perhaps another way of asking a similar question is: you are in beta now. You’re going to be coming out on March 15 with MyEclipse Blue Edition. What's the difference between MyEclipse and MyEclipse Blue Edition?

Masri: Excellent point. MyEclipse Blue Edition is inclusive of all MyEclipse professional features. It’s roughly on the order of 1,000 to 1,500 features above and beyond what the Eclipse platform provides, as well as the highly targeted functionalities that I mentioned. It can import and manage an existing project that you had previously inside WebSphere application developer and can develop to the Web services SOA standards that are specified into the WebSphere runtime.

It has much better integration into IBM code management, ClearCase technology, and almost an identical implementation of what you possibly could see inside Rational for deployment model and the ability to debug an existing project or a new project into the runtime environment.

Gardner: Developers, of course, are hard to come by in a lot of regions around the globe. There’s a lot of competition. Organizations like to keep their developers happy and productive. At the same time, they need to deal with some of the complexity issues of moving to SOA. If they're WebSphere shops, they know that they are going to be tied into that for some period of time. It does sound like you are trying to give both of these parties something to be a little bit cheery about.

Governor: The one of the things that I think is important about open source and understanding open source in the enterprise, but also more broadly. Sometimes you think about open source as a personal trainer for proprietary software companies. You've got these fat, flabby toys and they need to get a life. They need to get on the treadmill. They need to get thinner and more agile. They need to get more effective. Frankly, it was ever thus with IBM. IBM is a pretty big beast.

Let me go back to the old mainframe times to think about Amdahl as a third party. When the IBM salesperson came in, you always made sure you had an Amdahl mug on the desk, right in front of the salesperson. Obviously, we’re a few years on now, but that dynamic remains important. As much as organizations balance BEA WebLogic and WebSphere against one another, or WebLogic and JBoss Application Server against one another, you would also want a balance in your toolsets.

One interesting thing here is that because you've got the specificity around WebSphere, and the sort of value prop the third party is putting forward, you're able to start that balance, that conversation to drive innovation, to drive price down. That’s one of the really useful things that Eclipse has enabled and delivered in the marketplace. It helps to keep some of the bigger vendors honest.

Gardner: So, the need to support heterogeneity is going to remain in both tools and runtime, but we’re also facing the time when heterogeneity isn’t going to include hybrid approaches to deployment. And so, we’re seeing more people interested, particularly if they are ISVs or perhaps small- to medium-size businesses in taking advantage of some of these cloud-computing options. I'm thinking of course of Amazon and some others. Tell us, Maher, how this choice in tool and heterogeneity plays into some of these hybrid approaches of deployment in a cloud of some sort.

Masri: Let me expand on James’ point and then I’ll add to it. I just want to make sure that we’re not trying to present MyEclipse Blue as if we are trying to compete with IBM, which is really could be easily perceived there. What we see is an under-served market and people that are trying to make the decision, but cannot afford to make that decision.

There are companies that are always going to be a pure IBM shop and no one is going to be able to change their mind. The ability to provide choice is very important for those that need to make that decisions going forward, but they need some form of affordability to make that decision possible. I believe we provide that choice in spades in our current pricing model and our ability to continue to support without the additional premium above that.

Going forward, I fully agree with you that the hybrid model is very interesting, and we see it in the way that companies come back to us with very specific feedback on either MyEclipse or our Pulse product. There's quite a bit of confusion out there, in terms of how Web 2.0, Rich Internet Application (RIA), and Rich Client Application are designed and geared to provide and all the underlying technology to support that in terms of runtime.

There seems to be a dichotomy. I could go in the Web 2.0 world and provide a very rich, all Web enabled, all Web centric technologies for my end-users because I need to control my environment. The other side of that is the rich client application, where I have to have some form of a rich client implementation with full productivity applications for certain people, and I have to divorce the two because there is no way I can either rely on the Web or rely on the technologies or rely on anything else.

Everyone that we’ve talked to so far has a problem with that model. They have to have some form of very strong, rich implementation of not necessarily a very fat client, but some form of a client on the end-user’s desktop. They need to be able to control that, whether you are using very specific implementation of Web Services, talking to somebody else’s Web services, need to use a very specific persistent architecture, or have to integrate with other specific architectures. It gets very dicey very quickly.

That’s really where we saw the future of the market. This is probably not the right time to talk about this specifically, since the topic is Blue, but that’s why we also moved into the managed-application space and into our other product line called Pulse. This is for end-users who are using Eclipse-based technology right now, and in the future far more than that. They'll be able to assemble, share, deploy and manage a stack of applications, regardless of where those applications reside and regardless of the form of technology itself.

Take, for example, a rich-client runtime of Eclipse running on someone’s desktop. All of a sudden, you have a version of software that’s you can deploy and manage, but it already has an interface into a browser. You can provide other Web 2.0 and RIA models, as well as other rich Internet technology, such as a Flex and Flash. These technologies are merging very quickly, and companies have to be right there to make sure they meet those growing demands.

Gardner: It sounds like you're really talking about risk mitigation, trying to find some focal point that allows you to support your legacy, move to the rich-client and SOA activities, as well as be ready to go to what some people call Web Oriented Architecture, and take advantage of these new hybrid deployment options. Does that sound like what you're doing?

Masri: That's a fair statement.

Gardner: James, is this something that we can expect to shake out soon, or are companies going to be dealing with heterogeneity -- not just in terms of technology, but in approaches -- for some time?

Governor: We actually see an acceleration in this area -- tools and apps that span clients and the Web. I’ve taken to calling it the "synchronized Web." How can you have two different sets of services talk to one another? In terms of how you develop in that environment, you’ve got to develop conversationally. It’s about message passing. Because of that, we all are going to see some changes around the language choices.

We're seeing some interest in terms of some interesting new development languages, such as Erlang and Haskell. We are certainly seeing interest from developers in those areas.

It's like enterprise software companies not having an open-source strategy. Basically, you need one. From an economic standpoint, you just don't have a choice. Any software company that doesn’t have a thorough-going strategy for understanding and developing both for Web modes and offline modes is really missing the point.

Whether we're thinking of our clients that come from Google Gears, whether we are thinking about offline clients using an environment like Adobe's Apollo Integrated Runtime (AIR), we're already thinking about spanning clients and websites.

From an enterprise standpoint, the same choices need to be made. User expectations now are that, they are going to be able to have some of those benefits and centralization, but they are also going to be able to have rich experiences that they're used to on desktop clients.

This is a very important transition and, whether it’s Pulse or any number of the Web apps we're seeing this from, we are definitely seeing this in enterprise Web development. It's really important for us to be thinking about the implications, in terms of the language support and in terms of runtime. We've already mentioned the Amazon Web services back end. We're going to be seeing more and more of that stuff.

There’s a little company called Coghead, and it’s really focused on these kinds of areas and it’s now excellent. They've chosen Amazon Web services as a back end and they've chosen Derby Flex as a front-end to give that interactivity. The Amazon model teaches, or should teach, a lot of software companies some important lessons. When I look at developers, certainly grassroots developers, it has almost become a badge of honor that you're getting, "This is what Amazon charged me this week."

The notion of the back end in the cloud is growing in importance again. That’s probably why IBM just announced yet another one of its, "Hey, we're going to take a billion dollars and move it towards cloud-computing" kind of initiatives.

Gardner: Right. We’ve obviously seen a lot of change in the market. Organizations and enterprises that depend on an ongoing evolution on a single-stack approach need to try to come up with the tooling and framework and environment that allow them to accomplish what they need from the backwards-compatibility perspective. They also need to put themselves into as low a risk position as possible for taking advantage of these dynamic environments and the change in the economics and the landscape.

We've been talking about the transition to WebSphere Application Server 6.1 and the implications for tooling, the pending arrival of MyEclipse Blue Edition from Genuitec, helping companies find some additional choices to manage these transitions.

Helping us weed through some of this -- and I have enjoyed the conversation -- we have been joined by Maher Masri, president of Genuitec. Any last words, Maher?

Masri: Just a reminder that the Blue Edition first milestone releases will be available in February. There will be a number of milestone releases that will be available for immediate access and we encourage people to download and try it.

Gardner: Very good. And, also James Governor, co-founder and industry analyst at RedMonk. What's your parting shot on this one, James?

Governor: Let’s get specific again. Some of this has been a little bit blue sky. I think it’s very interesting that IBM is has posted a pretty good set of financial results today.

Gardner: They're not going away, are they?

Governor: They are not going away. That’s exactly right. It used to be said that IBM is not the competition; it is the environment in which you compete. It seems to me that Genuitec and many others are probably a pretty good example of that. That was well put by you. IBM isn't going away.

Gardner: Well, thanks. This is Dana Gardner, principal analyst at Interarbor Solutions. You’ve been listening to a sponsored BriefingsDirect podcast. Thanks, and come again next time.

Listen to the podcast here. Sponsor: Genuitec.

Transcript of BriefingsDirect podcast on tool choices for WebSphere shops. Copyright Interbarbor Solutions, LLC, 2005-2008. All rights reserved.

Tuesday, February 05, 2008

New Ways Emerge to Improve IT Operational Performance While Heading Off Future Datacenter Reliability Problems

Transcript of BriefingsDirect podcast on IT operational performance using Integrien Alive.

Listen to podcast here. Sponsor: Integrien.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, a sponsored podcast discussion about new ways to improve IT operational performance, based on real-time analytics and the ability to effectively compare data-center performance from a normal state to something that is going to be a problem. We’re going to look at the ability to get a “heads-up” that something is about to go wrong, rather than going into firefighting mode.

Today’s complexity in IT systems is making previous error prevention approaches for operators inefficient and costly. IT staffs are expensive to retain, and are increasingly hard to find. So even when operators have a sufficient staff, a quality staff, it simply takes too long to interpret and resolve IT failures and glitches, given the complexity of distributed systems.

There is also insufficient information about what’s going on in the context of an entire systems setup, and operators are using manual processes -- in firefighting mode -- to maintain critical service levels.

IT executives are therefore seeking more automated approaches to not only remediate problems, but also to get earlier detection. These same operators don’t want to replace their systems management investments, they want to better use them in a cohesive manner to learn more from them, and to better extract the information that these systems emit.

To help us better understand the problems and some of the new solutions and approaches to remediation and detection of IT issues, we’re joined by Steve Henning, the Vice President of Products for Integrien. Welcome to the show, Steve.

Steve Henning: Thanks a lot, Dana.

Gardner: Let’s take a look at some of the real-life issues that are affecting IT operators, drill down into them a bit, look at some of the solutions and benefits, and perhaps some examples of what these bring in terms of relief and increased savings of time and energy.

Tell me a little bit about complexity and problems. How do you view the current state of affairs in the datacenter operations field?

Henning: It’s a dichotomous situation for the vice president of IT operations at this point. On one hand, they're working at growing companies. They need to manage more things in their environment -- devices and resources. Also, given the changes and how people are deploying applications today, they are dealing with more complexity as well.

Service oriented architecture (SOA) and virtualization increase the management problem by at least a factor of three. So you can see that this is a more complex and challenging environment to manage.

On the other side of this equation is the fact that IT operations is being told to either keep their budgets static or to reduce them. Traditionally, the way that the vice president of IT operations has been able to keep the problems from occurring in these environments has been by throwing more people at it. We now see 70-plus-percent of the IT operations budget spent on labor costs.

Just the other day, I was talking to the vice president of IT operations of a large online financial company. He told me that he had 10 people on staff just to understand the normal behavior of their systems. They are literally cutting out graphs and holding them up to the light to compare them against what they have seen in previous incarnations of the system, trying to see when the behavior of this system is normal.

He told me that this is just not scalable. There is no way -- given the fact that he has to scale his infrastructure by a factor of three over the next two years -- that he can possibly hire the people that he would need to support that. Even if he had the budget, he couldn’t find the people today.

So it’s a very troubling environment these days. It’s really what’s pushing people toward looking at different approaches, of taking more of a probabilistic look, measuring variables, and looking at probable outcomes -- rather than trying to do things in a deterministic way, measuring every possible variable, looking at it as quickly as possible, and hoping that problems just don’t slip by.

Gardner: It seems as if we're looking at both a quality and a quantity issue here. We've got a quantity of outputs from these different systems, many times in different formats, but what we really need to do is find that “needle in the haystack” to detect the true issue that’s going to create a failure.

Do you agree that we are dealing with both quality and quantity issues?

Henning: Absolutely. If you look at most of the companies that we talk to today, they are mired in these monitoring events. Most of the companies we talk to have multiple monitoring tools, and they're siloed. You've got the network guys using one tool. You've got the OS and hardware guys using another. The app guys and database guys have their tools, and there is no place where all of this data is analyzed holistically.

Each system emits sets of events typically based on arbitrary hard thresholds that have been set in the environment. There's this massive manual effort of looking at these individual events that are coming from these systems and trying to determine whether they are the actual precursors to real problems, or if they're just a normal behavior of the system that can be ignored. It’s very difficult to keep your hands around that.

Gardner: I suppose it wasn’t that long ago where you could have specialists that would oversee different specific aspects of the IT infrastructure, and they would just be responsible for maintaining that particular part. But, as you mentioned, we have SOA, virtualization, datacenter consolidation, and finding ways of reducing total costs that, in effect, accelerate the interdependencies. I suppose we need more specialization, but -- at the same time -- those specialists need to communicate with the rest of the environment, or the people running it.

Henning: If you look at the applications that are being delivered today, monitoring everything from a silo standpoint and hoping to be able to solve problems in that environment is absolutely impossible. There has to be some way for all of the data to be analyzed in a holistic fashion, understanding the normal behaviors of each of the metrics that are being collected by these monitoring systems. Once you have that normal behavior, you’re alerting only to abnormal behaviors that are the real precursors to problems. That’s where Integrien comes in.

Gardner: You mentioned that you've got reams and reams of events pouring in, and that, in many cases, people are sifting through these manually, charting them, and then comparing them in sort of a haphazard way. What sort of solutions or alternatives are there?

Henning: One of the alternatives is separating the wheat from the chaff and learning the normal behavior of the system. If you look at Integrien Alive, we use sophisticated, dynamic thresholding algorithms. We have multiple algorithms looking at the data to determine that normal behavior and then alerting only to abnormal precursors of problems.

It’s really the hard-threshold-based monitoring that’s the issue here, because hard-threshold-based monitoring does two things. One, it results in alert storms for perfectly normal behavior. Two, it masks real problem behavior that you just can't catch with hard thresholds.

For example, let’s say that at 9 p. m. some online system's normal behavior is a set of servers it would be at 10 percent CPU utilization. But let’s say that it’s at 60 percent utilization. If you have your hard threshold set at 80 percent, you've got a pending problem that you have no idea about. That’s why it’s so important to have an adaptive learning mechanism for determining behavior and when something is important enough to raise to an operator.

Gardner: When you're able to do this comparison on the basis of, "Hey, this is deviating from a pattern," rather than a binary-basis, on-off problem, what kind of benefits can people derive?

Henning: Well, you're automating this massive manual effort that I was talking about. If you look at that vice president of IT operations of the online financial company I talked about earlier, he has 10 guys who are sitting around doing nothing but analyzing this data all day.

Now, that data analysis can be completely automated with sophisticated dynamic thresholding. These 10 guys are freed up to do real problem solving, rather than just looking at these event storms, trying to figure out what’s important and what’s not, when the company is having an issue with one of their mission-critical systems.

Gardner: Do you have any examples of how effective this has been for companies, if they start to take that manpower and focus it where it's most effective? What kind of paybacks are we talking about?

Henning: We see up to a 95 percent reduction in this manual effort around setting thresholds and dealing with events. So it’s a huge reduction in time. We see up to a 50 percent reduction in the time it takes to solve problems, because this kind of information, and the fact that we consolidate alerts based on topology, which makes it much quicker to get down to where the root cause of the problem is, and to focus efforts there.

Gardner: You mentioned getting this “normal state,” of gathering enough information and putting it in the context of use scenarios. How do operators do that? How do they know what’s going to lead to problems by virtue of detecting baseline?

Henning: If you look at most IT environments today, the IT people will tell you that three or four minutes before a problem occurs, they will start to understand that little pattern of events that lead to the problem.

But most of the people that I speak to tell me that’s too late. By the time they identify the pattern that repeats and leads to a particular problem -- for example, a slowdown of a particular critical transaction -- it’s too late. Either the system goes down or the slowdown is such that they are losing business.

We found these abnormal behaviors are the earliest precursors to problems in the IT environment -- either slowdowns or applications actually going down. Once you've learned the normal behavior of the system, these abnormal behaviors far downstream of where the problem actually occurs are the earliest precursors to these problems. We can pick up that these problems are going to occur, sometimes an hour before the problem actually happens.

If you think about a typical IT environment, you're talking about tens of thousands of servers and hundreds of thousands, even millions, of metrics to correlate all that data and understand the relationships between different metrics and which lead up to problems. It’s really a humanly unsolvable problem. That’s where this ability to “connect the dots” -- this ability to model problems when they occur -- is a really important capability.

Gardner: I suppose we’re talking about some fairly large libraries of models to compare and contrast -- something that is far beyond the scale of 5 or 10 people.

Henning: Yes, but these models are learned based on the environment, understanding the normal behaviors of all the metrics in a particular IT operation, and understanding what the key indicators of business performance are.

For example, you might say that if this transaction ever takes more than five seconds, then I know I have a problem. Or you could say that if this database metric, open cursors, goes above 1,000, I know I have a problem. Once you understand what those key indicators are, you can set them. And when you have those, you can actually capture a model of what this problem looks like when that key indicator is exceeded.

That’s the key thing, building this model, having the analytic capability to be able to connect the dots and understand what the precursors that lead up to problems, even an hour before the problem occurs. That’s one of the things that Integrien Alive can do.

Gardner: What sort of benefits do we get from this deeper correlation of what’s good, what’s bad, and what’s gray and that could become bad? Are we talking about minutes or days? What sort of impact does this have on the business?

Henning: We see a couple of things. One is that it’s solving this massive data correlation issue that right now is very limited in the IT operations that we go into. There are just a few highly trained experts who have “tribal knowledge” of the application, and who know even the beginnings of what these correlations are. With a product like Integrien Alive you can solve that kind of massive data correlation issue.

The second benefit of it is that the first time a problem occurs, the capture of a model of the problem, with all the abnormal behaviors that led up to it, can often target for you the places in the applications that are performing abnormally and are likely to be the causes of the problem.

For example, you might find that a particular problem is showing abnormal behavior in the application server tier and the database tier. Now, there's no reason to get on the phone with the network guy, the Web server guy, and other people who can't contribute to the resolution of that problem. By targeting and understanding which metrics are behaving abnormally, can get you to a much quicker mean-time to identify and repair the problem. As I said, we see up to 50 percent reduction in the time it takes to resolve problems.

The final thing is the ability to get predictive alerts, and that’s kind of the nirvana of IT operations. Once you’ve captured models of the recurring problems in the IT environment, a product like Integrien Alive can see the incoming stream of real-time data and compare that against the models in the library.

If it sees a match with a high enough probability it can let you know ahead of time, up to an hour ahead of time, that you are going to have a particular problem that has previously occurred. You can also record exactly what you did to solve the problem, and how you have diagnosed it, so that you can solve it.

Gardner: Then, you can share that. Now, you mentioned “tribal knowledge.” It sounds like we are taking what used to reside in wetware -- in people’s minds and experience. Instead of having to throw those people at a problem without knowing the depth of the problem, or even losing that knowledge if they walk out the door, we're saying, "Enough of that. Let’s go and instantiate this knowledge into the systems and become less dependent on individual experienced people."

Henning: The way I look at it is that we're actually enhancing the expertise of these folks. You're always going to need experts in there. You’re always going to need the folks who have the tribal knowledge of the application. What we are doing, though, is enabling them to do their job better with earlier understanding of where the problems are occurring by adding and solving this massive data correlation issue when a problem occurs.

Even the tribal experts will tell you that just a few minutes before a problem occurs they can start to see the problem. We are offering them a solution that allows them to see this problem forming up to an hour ahead of time, notifying them of abnormal behavior and patterns of behavior that would be seemingly unrelated to them based on their current knowledge of the application.

Gardner: When you do resolve a problem and capture that and make that available for future use, that sounds more like a collaboration issue. How do we deal with so many inputs, so much information, not only on the receiving end, but on the outgoing end, after a resolution?

Henning: This is what we were talking about before. You’ve got all of the siloed sources of monitoring data and alerts, and there's currently no way to consolidate that data for holistic problem solving. So, it’s very important that any kind of solution can integrate a wide variety of monitoring tools, so that all the data can be in one place and available for this kind of collaborative problem solving.

For example, in one environment that we went into we had an alert that went to an application server administrator. He happened to notice that there was a prediction that a database key indicator was going out of its normal range, which would have caused a crash of the database with 85 percent probability in 15 minutes. Armed with that information, he got the alert over to the database administrator who was made able to make some configuration changes that staved off the problem.

Being able to analyze this data holistically and being able to share the data that’s typically been in the siloed monitoring solutions allows quicker and more collaborative problem resolution. We're really talking about centralizing and automating data analysis across the silos of IT.

Gardner: It also reminds me, conceptually, of SOA, where you want to transform the information into a form that can be used generally. It sounds like you are doing that and applying it to this whole notion of IT management and remediation.

Henning: Very much so. There are seemingly unrelated things happening within an application infrastructure that can result in a problem. The fact that all the data is analyzed in a single place holistically through these statistical algorithms, allows us to provide an interface where people can work together and collaborate. This makes the team more effective and makes it much easier for people to solve problems quickly.

Gardner: So, we standardize gathering and managing the information. We also standardize the way in which people can access it and use it, so that they are not fixing the same broken wheel over and over again at different times. It can recognize when they are going to need to do it and have it fixed ready to go. This sounds like a real big saver when it comes to labor and lowering costs for your staff, but also gets that root saving around no downtime or reduced downtime.

Henning: Right. When we typically work with customers, most of the IT operations folks that we talk to are really concerned with reducing the labor costs and reducing the time to identify and resolve the problem. In truth, the real benefit to the business is really removing downtime and removing slowdowns of the applications that cause you to lose or reduce business.

So although we see major benefits of real-time analytic solutions in providing reduction in labor costs, we also say that it’s a very big boon to the business, in terms of keeping the applications effectively generating revenue.

Gardner: Another current trend is the ability to gather interface views, graphical views of the system. There are a lot of dashboards out there for business issues. What do we get in terms of visibility for end-to-end operations, even in a real-time or close to real-time setting from the Integrien Alive that you are describing?

Henning: Once again, it’s still a real issue when you have siloed monitoring tools. Even though a lot of companies have a manager of managers, that’s typically used by the level-one operations folks to filter through the alerts and determine who they need to be passed off to, who can actually take a look at them and resolve them. But, we find that most of the companies that we talk to don’t have any tools that allow them to be efficient in role-based problem solving.

One of the things that Integrien Alive provides is this idea of customizable role-based dashboards, this library of custom analysis widgets that allows people to slice and dice the data in whatever way is most effective for that particular individual in problem solving. We talked earlier about the holistic data analysis that was really enabling effective teamwork. When we talk about role-based dashboards for problem solving showing the database administrator exactly what they need, we are really talking about making each team member more effective.

That’s one of the benefits of the role-based dashboards. The other thing is giving visibility all the way up to the CIO and the vice president of operations who are concerned with much different views. They want it filtered in a much different way, because they are more concerned about business performance than any individual server or resource problems that might be occurring in the environment.

Gardner: What sort of views do those business folks prefer over what the outputs of some of these monitoring tools might be?

Henning: You want to look at things from a business-service perspective, how are my critical business services performing? If I have an investment banking solution, and I’ve got a couple of other mission-critical applications that are outward facing, I want to know how those are performing right now, in terms of the critical transaction performance.

I want to be able to accommodate business data as well. So, if I see that from an IT performance level the transaction seemed to be performing well and I can see that I am also processing a consistent number of transactions that are enabling my business, I have a good view that things are going well in my operation at this point. So, it’s really a higher level view.

I am going to be much more concerned with any kind of alerts that are affecting my entire business service. If we see an alert that’s been consolidated all the way up to the investment banking business-service level, that’s going to be something that’s very important for the VP of IT operations, because he’s got a problem now that’s actually affecting his business.

Gardner: I suppose from the IT side the more that we can show and tell to the business folks about how well we are doing the better. It makes us seem less like we are firefighters and that we're proactive and on top of things. If there are any decisions several months or years out about outsourcing, we have a nice trail, a cookie-crumb trail, if you will, of how well things are going and how costs are being managed.

Henning: That’s absolutely true. I was talking to the CIO of a large university the other day. One thing that was very frustrating for him was that he was in a meeting with the president of the university, and the president was saying that it seemed like the applications were and some of the critical applications were down a lot.

This CIO was very frustrated, because he knew that wasn’t the case, but he didn’t have effective reporting tools to show that it was not the case. That was one of the things that he was very excited about, when he took a look at our product.

Gardner: We know that complexity is substantial. It’s pretty clear that that complexity is going to continue as we see organizations move toward SOA and software as a service, and hybrid issues, where a holistic business process could be supported by your systems, partner systems, or perhaps third-party systems.

I can just imagine there is going to be finger pointing when things go wrong. You’re going to want to be able to say, "Hey, not my problem, but I am ready, willing and able to help you fix it. In fact, I've got more insight into your systems than you do."

Henning: That’s absolutely the case.

Gardner: Give me a sense of where Integrien and Alive, as a product set, are going in the future, I know you can't pre-announce things, but as these new complexities in terms of permeable organizational boundaries kick in and virtualization kicks in, what might we expect in the future?

Henning: One of the things that you’re going to see from us is a comprehensive solution around the virtualized environment. Several other companies claim to have solutions in this space, but from what we have been able to see so far, the issue of motion of virtual machines (VM), moving them between different servers, is still an issue for all of these solutions.

We’re working extremely diligently to solve the issue of how to deal with performance monitoring in a virtualized environment, where you have got the individual VMs moving all over the place, based on changes in capacity, and things like that. So, look out for that solution coming from Integrien in the coming months.

Gardner: So we're talking about instances of entire stacks, provisioning and moving dynamically among systems. That sounds like a whole other level of complexity that we are adding to an already difficult situation.

Henning: Yes, it’s a big math problem. You can also compound that with the fact that when a VM moves from one physical server to another, it might be allocated a different percentage of resources. So, when you think about this whole hard-threshold based monitoring paradigm that IT is in now, what does a hard-threshold really mean in an environment like that? It makes absolutely no sense at all.

If you don’t have some way to understand the normal behavior, to provide context, and to quickly learn and adapt to changes in the environment, managing the virtualized environment is going to be an absolute nightmare. Based on spending some time with the folks over at VMware, and attending the VMWorld show this year, you could certainly see in their customers this concern about how to deal with this complex management problem.

Gardner: The old manual wetware approaches just aren’t going to cut it in that environment?

Henning: That’s correct.

Gardner: I appreciate your candor and I look forward to seeing some of these newer solutions focused on virtualization.

We have been talking about remediation and ability to get in front of problems for IT operators using predictive and analytic algorithmic approaches. To help us understand this, we have been joined by Steve Henning, the Vice President of Products at Integrien. Thank you, Steve.

Henning: Thank you very much, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You have been listening to BriefingsDirect. Thanks and come back next time.

Listen to podcast here. Sponsor: Integrien.

Transcript of BriefingsDirect podcast on IT operational performance using Integrien Alive with Integrien's Steve Henning. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.