Showing posts with label complex event processing. Show all posts
Showing posts with label complex event processing. Show all posts

Wednesday, June 11, 2008

Live TIBCO Panel Examines Role and Impact of Service Performance Management in Enterprise SOA Deployments

Transcript of BriefingsDirect podcast on service performance management recorded live at TUCON 2008 in San Francisco on April 30, 2008.

Listen to the podcast here. Sponsor: TIBCO Software.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Today, a sponsored podcast discussion about service performance management in support of service-oriented architecture (SOA).

We are here live at the TUCON 2008 conference, TIBCO Software’s user event in San Francisco, to look into the issues around SOA integrity, particularly in the context of widespread enterprise use, the myriad demands that are going to be put on services, and how infrastructure is going to need to adapt and perform in a way that probably has not been the case for infrastructure up until now.

Helping us to weed through service performance management and how it relates to SOA governance and other issues of total architecture, we are joined by a panel of industry analysts, experts and representatives from TIBCO.

Let's start by introducing our panel. We are joined by Joe McKendrick, an independent analyst and SOA blogger. Welcome to the show, Joe.

Joe McKendrick: Hi, Dana, happy to be here.

Gardner: We are also joined by Sandy Rogers, the program director for SOA, Web services and integration at IDC. Welcome, Sandy.

Sandy Rogers: Thanks, Dana.

Gardner: We are also joined by Anthony Abbattista, the vice president of enterprise technology strategy and planning for Allstate Insurance Co. Welcome to the show, Anthony.

Anthony Abbattista: It’s good to be here, Thanks.

Gardner: And also joining us, Rourke McNamara, director of product marketing for TIBCO Software. Welcome Rourke.

Rourke McNamara: Thank you, Dana.

Gardner: We saw and listened to some presentations this morning at the TIBCO conference. One of the things that struck me is this notion of pulling together what had been not only in disparate technology silos -- but really joining what had been in functional and organizational silos. Particularly, I mean the design and process creation phases that we have heard so much about with SOA.

And then how that relates to the secondary aspect of functional activities, which is the operations -- keeping the trains running on time, and making sure that service-level agreements (SLAs) are met. That means making sure that users get very fine-grained services coming through uninterrupted in aggregated applications -- without hiccups, without slowdowns.

These processes are moving to mission-critical, and so there needs to be more opportunity for these two aspects of SOA, design and operations, to work together. Performance management of services gives more insight into what takes place beneath those services, and is, therefore, becoming essential.

First, let's take a look at this landscape of what's going on in SOA, and why, as we move toward enterprise-wide deployment, service performance management becomes so important.

Sandy at IDC, what do you see in terms of enterprises that are early adopters of SOA? How concerned are they that, when they throw the switch, so to speak, with these composite business processes -- made up of services from a variety of different sources with a variety of different support infrastructure -- that they really feel confident that this is going to hold up in real world situations?

Rogers: What I find is interesting is that even if you have one service that you have deployed, you need to have as much information as possible around how it is being used and how the trending is happening regarding the up-tick in the consumption of the service across different applications, across different processes.

So, most organizations need to present an environment where individuals and stakeholders in the company feel more comfortable in relying on services and also allowing others to potentially handle the operational dynamics of those services, once they are in production.

They need a lot more visibility and an understanding of the strains that are happening on the system, and they need to really build up a level of trust. Once they can add on to the amount of individuals that have that visibility, that trust starts to develop, more reuse starts to happen, and it starts to take off.

Eventually they get to a stage, where they are concerned about the scalability and how far they can push the limits of these deployments. It could be the way that they’ve designed it architecturally, or it could be just that they are getting familiar with the new technologies to support SOA infrastructure.

Gardner: It seems that at the very time when SOA is putting more emphasis on a diversified portfolio of services -- repurposing those services, extending visibility -- that, at the same time, IT as an organization is being tasked with behaving more maturely as a business within a business.

The Information Technology Infrastructure Library (ITIL) and other compliance standards are being placed on IT departments, so they behave more like we would expect a human resources department to behave. Let's go to Joe McKendrick.

Joe, just now that we are looking at the need for IT to perform like a mature business, is there a risk here of finger-pointing -- that when something goes wrong and so many constituents are involved with the support of a service, that no one will really be able to take responsibility?

McKendrick: Yes, that’s been a problem all along. There is always a lot of finger-pointing, and IT tends to get blamed for everything. Sandy made an excellent point that the foundation of SOA is trust.

The business units are being asked to sign on to a SOA to provide support, and perhaps even some of them to provide funding, and they are looking to the services that they will consume to be scalable, looking to the services to have uptime to be available, perhaps 24x7. And if this trust is not there, the SOA, the whole foundation of the SOA, breaks down and IT will get the blame again.

It very much hinges on IT and performance management. We're actually talking about two levels here, governance and performance management. They are integrated, and they need each other. But governance deals more with how the business addresses SOA. Performance management is an IT challenge and is rightly put into the IT "sphere of influence."

Gardner: I was struck, when I heard Anthony’s presentation this morning, by your example of what things used to be like, where you would get 40 people on a conference call when something went wrong, and you would be yelling out URLs in order to find the right server to either shut down or replace.

What's the issue from your perspective now on solving this issue when things go wrong? Is this something that we can rely on people to solve, or do we need to move more toward a systems-based approach?

Abbattista: First, I'd like to wind through an earlier question you asked. When we went to SOA, when we put in our enterprise service bus (ESB), and when we chose TIBCO for our bus, a lot of people thought of SOA as, "Well, I am just going to construct some WSDL and call some SOAP or HTTP, and that’s SOA."

But the first thing we did is talk through the governance part of why we want to "get on the toll road" and "pay a toll for the bus," and really that became the consistency in measurement and governance, and lets us operate the things once we have created them.

So the first thing we had to do was get through the whole idea of that. It was worth it, and it wasn’t a matter of whether the bus would work or not. For the first year-and-a-half that we put our ESB in and we started to market services on it, we would hear the words, "TIBCO is down."

It didn’t matter whether the back-end service is down. It didn’t matter whether the mainframe was broken, they would say "TIBCO is down." We finally started to get the root cause, saying, "No, so and so service is down." The basis for us having good measurement of performance is helping to "pay the toll," of getting on the bus and actually having measurement points that are well understood.

I also don’t agree exactly that governance is a business-unit thing. Governance for us is also a lot about the SLAs around the services, of having good expectations up front about how they will behave and how they will be called. That way, we have a benchmark or baseline to compare ourselves to on this. All of sudden, if we get a 100,000 calls a day to something that is designed for, or is expected to have, 1,000 -- we at least understand what to be looking for.

Gardner: Let's provide a level-set for our listeners. You are representing Allstate, which is a very large organization, with 17 million customers, $156 billion in assets. Give us a sense of the scale that we are talking about in terms of your IT organization?

Abbattista: Our claims organization, for example, has an IT shop of about 400 people that are employees, and we are not counting offshore or other people to support that. Each of our business units is a substantial IT shop in of itself, each with 500 to 1,000 people.

Then, what we choose to federate becomes an issue, because they need to talk to each other. They need to talk to themselves. They need to talk to the outside world. So what we layer then in my area is an infrastructure of components on how to do those tasks.

The massiveness of it is how do you measure and monitor that to get end-to-end composite services that we really can monitor and supply a good customer experience from? The massiveness is amazing. We have about 5,000 servers -- UNIX, Windows, mainframes, AS400s -- we have them all at this point.

Gardner: How many services do you have that have to "pay their toll" on the service bus, so to speak?

Abbattista: About 750.

Gardner: Wow! That’s pretty good.

Abbattista: We actually front our document management services and collapse all that into Oracle, but we fronted that with TIBCO. We did that so that we would have the measurement from day one, and it’s worked amazingly well.

People argued it would be just as easy to shove the document to the database and make an HTTP-SOAP call, but this governed ESB approach has paid off a 1,000 times over, because we now predicatively know when something is going awry.

Gardner: All right, now let's go to Rourke. We understand that enterprises are hesitant about going toward SOA on a holistic basis, if they haven’t got performance backstops in place. We are a little bit weary of finger pointing, because there is such a complex stew of components and services that makes it very difficult after the fact to point out and say who is responsible.

And, we're dealing with organizations like Allstate, which have massive size and scale, with 750 services. What do people need to be considering, as we moving into to yet more complexity with virtualization, cloud computing, utility grids? Give us a little bit of level-set about what's important to consider when moving toward a solution before the fact?

McNamara: SOA, virtualization, and governance -- all of these technologies have pluses and minuses. And, on the whole, when you finish computing out the equation, you are definitely on the plus side, you are definitely on the positive side.

But, you need to make sure that, as you move from the older ways of doing things -- from the siloed applications, the siloed business unit way of doing things -- to the SOA, services-based way of doing things, you don’t ignore the new complexities you are introducing.

Don’t ignore the new problems that you are introducing. Have a strategy in place to mitigate those issues. Make sure you address that, so that you really do get the advantage, the benefits of SOA.

What I mean by that is with SOA you are reusing services. You are making services available, so that that functionality, that code, doesn’t need to be rewritten time and time again. In doing so you reduce the amount of work, you reduce the cost of building new applications, of building new functionality for your business organization.

You increase agility, because you have reduced the amount of time it takes to build new functionality for your business organization. But, in so doing, you have taken what was one large application, or three large applications, and you have broken them down into dozens or tens of separate smaller units that all need to intercommunicate, play nice with each other, and talk the same language.

Even once you have that in production, you now have a greater possibility for finger-pointing, because, if the business functionality goes down, you can’t say that that application that we just put on is down.

The big question now is what part of that application is down? Whose service is it? Your service, or someone else’s service? Is it the actual servers that support that? Is it the infrastructure that supports that? If you are using virtualization technology, is it the hardware that’s down, or is it the virtualization layer? Is it the software that runs on top of that?

You have this added complexity, and you need to make sure that doesn’t prevent you from seeing the real benefit of doing SOA.

Gardner: So after the fact of failures, in trying to do forensics and root cause analysis and putting more agents and agent-less systems in place, if it's all telling you what's wrong after the fact that it’s wrong, it’s probably too late.

McNamara: Absolutely.

Gardner: How do we get to this vision of proactive, anticipatory systems awareness via service performance management? Let me first take this to Sandy. How important is it for us to get to this sense that something isn't quite right, in advance of it failing?

Rogers: Obviously, there are different use cases and different companies that are really interested in that dynamic, autonomic type of environment, where you can adjust to the demands of the environment, but we are also becoming much more Web-based.

What we are seeing is that, as services are exposed externally to customers, partners, and other systems, it affects the ability to fail-over, to have redundant services deployed out, to be able to track the trends, and be able to plan, going forward, what needs to be supported in the infrastructure, and to even go back to issues of funding. How are you going to prove what's being used by whom to understand what's happening?

So, first, yes, it is visibility. But, from there, it has to be about receiving the information as it is happening, and to be able to adjust the behavior of the services and the behavior of the infrastructure that is supporting. It starts to become very important. There are levels of importance in criticality with different services in the infrastructure that’s supporting it right now.

But, the way that we want to move to being able to deploy anywhere and leverage virtualization technologies is to break away from the static configuration of the hardware, to the databases, to where all this is being stored now, and to have more of that dynamic resourcing. To leverage services that are deployed external to an organization you need to have more real-time communication.

Gardner: So, the proposition remains, how do you do that? It’s clear that you want to get out in front of these problems, but with so many interdependencies, the large scale in number of services, different environments, probably inside and outside the organization, it raises questions. How do we move up in abstraction toward understanding the context of an entire business process, in order to go back and look for the signals that will tell us when something is approaching a breakdown, or when we need to provision more hardware and software resources?

Let me take this to you, Anthony. Where do you think that abstraction needs to be in order to forecast appropriately issues of SOA integrity?

Abbattista: I'll go back to the point on having some expectations or benchmarks of how the service should run when it’s designed and deployed in the first place. Then, you can understand if your baseline is correct and then, over time, you can look for fragmented behavior. But, I do think you need some level of end-to-end view of the process and of who is the customer on the end.

Ultimately, where these things show up en masse is at the end-points, and typically that’s in the consumer space, as we are frustrating an employee or someone on a website with a bad client experience. Those are unforgivable.

So, starting with the customer at the end-point of that business process and looking at some of those interactions, is part and parcel of deploying the service in the first place. If you don’t do that, you will be chasing your tail for the rest of your life in operations, until you go back and do that mapping. So I think it pays to do it upfront.

Gardner: You mentioned in your presentation that the "Walls must come down" between IT operations and development-deployment-requirements-test functions. It sounds like you're also saying it needs to go from end-to-end, beyond just that wall, but also across the entire event-processing landscape.

Abbattista: In that respect, I view our function in running the applications and supplying the applications as a utility. It's our job to point back to the groups that deploy the stuff. If I let them deploy junk, I am as complicit in that junk being delivered as anybody else. That’s a responsibility we take seriously. If you're going to put it in the shop and expect us to run it, I won't take junk.

Gardner: Right. So there is the adage of, "Garbage in, garbage out." Now, if garbage appears anywhere in the context of a complex process, it's garbage out. That’s even more difficult.

Let's go to Joe McKendrick. Tell us about the concept of complex event processing (CEP). How do you get any handle on a process? Do you look for the description of the process from a modeling perspective, through what's been done on the ESB, all of the above?

McKendrick: Definitely all of the above. CEP is something that’s just coming into the SOA realm. It is said that that’s the next phase for SOA. As was pointed out this morning, real time is not enough for a business. Business needs to be able to react and predict.

Rourke and I were talking about that a little bit earlier. You need to be able to predict what's going to happen, not only in the business, but in the systems. TIBCO is making some progress in this area in terms of being able to predict when the system may go down or when there will be spikes in demand. Predictive analytics, which is a subset of business intelligence (BI), is now moving into the systems management space.

Gardner: We're actually moving above the systems management space by an abstraction level or two. Let's go back to Rourke. You had a couple of product enhancement announcements today here at the TIBCO conference. You are getting out in front of service performance management, and your interest is to accomplish some of the things we have been describing, provide what the market is demanding for SOA in order to be trusted.

Tell us about CEP and why that is an important part of this predictive solution.

McNamara: One of our customers said it best last night over dinner, when I introduced the concept of the product I am going to mention in just a second. They saw immediately what problem it solved for them.

They said that their biggest fear is that their SOA initiative will be a victim of its own success. A service will be reused so many times so rapidly that the hardware it's deployed on, the manner in which it was deployed, won't be able to handle the load. That service, which is now used in a dozen different business applications, or exposed in a dozen different business applications, will go down or will degrade in its performance level.

That could make SOA a victim of its own success. They will have successfully sold the service, had it reused over and over and over and over again. But, then, because of that reuse, because they were successful in achieving the SOA dream, they now are going to suffer. All that business users will see from that is that "SOA is bad," it makes my applications more fragile, it makes my applications slow down because so many people are using the same stuff.

Gardner: What is it about CEP that gives us more visibility at the right abstraction, so that we can predict among all of these different complex components and assets where a problem is developing?

McNamara: The key is that we just can’t simply wait for the problem to develop or the problem to happen, because it will happen very quickly. We won't have a week’s warning, a month’s warning, or even necessarily a few hours’ warning. And we won't understand, when we deploy that service, all the places or all the manners in which it will used. So, we need to be able to predict these problems before they occur and do something to prevent those problems from occurring.

TIBCO is taking our CEP technology, the business events technology that we have, and applying the problem to our internal software, our infrastructure, the same way our customers apply it to their business problems.

We are using business events to monitor what's going on with service load and performance -- what the load profiles look like in a given organization, allowing it to understand some of the programs and marketing efforts that are going on within that company. Then, when it sees that a service load is approaching a dangerous level; when it sees that based on the events that are occurring that the service will become overloaded and will violate its SLAs, it’s able to tell other parts of the infrastructure to take action to prevent that problem.

Gardner: Let me see if I understand this. This sounds like a schematic about a business process and, by reverse engineering from that process down to the constituent ingredients to support it, you can predict where the loads will be building or will become erratic. Therefore, you can also detect what's going on within that system, put the two together, and come up with a heads-up?

McNamara: That’s exactly right. You need to understand what the interdependencies are between your services and what the load characteristics of the different component parts in that dependency graph in that environment are. Then, based on that, you need to understand what sort of events in your business or in your IT infrastructure will cause performance problems or overload conditions.

Gardner: Let's go back to Sandy. You mentioned earlier about how to automate toward these goals. It sounds like it’s going to be a bit of journey to get to full automation. On the other hand, having 40 people on a conference call to try to manually bear-wrestle these problems down doesn’t work either. How do we find a balance between too much automation, automation that can’t be attained, and purely manual, after-the-fact approaches?

Rogers: Everyone has to walk before they run with any type of new technology implementation. But, we are finding that most organizations are keying in on those services that are most important, and making sure that they are instrumented appropriately regarding the technologies that support management as being able to define what those thresholds are.

Being able to correlate those thresholds to real business needs and business value -- that’s one of the interesting things about what we were doing in a service level. We can start to associate the services that are most relevant and what there are going to have the most impact for.

We can make sure that information that is contained either in the payload or form the service itself as provided. So, you have that insight. I think that organizations are starting to realize that, in order to prove the value of the services, in order to prove that the value of having this level of coordination around management, they need to be able to make that association.

From an inventing point of view, what’s interesting is that there are a lot of parallel types of processing going on in this environment. Rather than wait until something happens in some linear, straight-through process, we're seeing the ability to watch and correlate some of those events vis-à-vis the thresholds, understand which thresholds are the most important, and start automating how to define the behavior, how the system is going to react to those conditions, and do it from a cost-benefit perspective in moving forward.

Gardner: Okay, so companies can take this approach, use a moderate pace, learn as they go, and use complex event processing to offer insights into the context of what’s going on. But, if human nature is any indication, people usually react to whatever the rules are about their job, and for IT this is going to be the view from the SLAs.

It strikes me that what’s going to happen is a lot of these organizations are going to reverse-engineer from the SLA, and that the rules and the models in the SLA become extremely important. Am I going out on the limb here, Anthony, or do you think that it will pan out that the SLAs will be the rules that the service performance management then needs to line up around?

Abbattista: That’s right. Again, it's back to what do you expect, and are you living up to it? You talk about failure not being from the SOA, but we could have a case where a service got deployed, people learned about it, and, before you know it, we are taking 100,000 hits a day on a service that no body ever gave any design thought to.

I would have to reach in there and get some agent information once in a while. And all the sudden, the supplier of the service, who did us a favor, put this on the bus, and then did a point-to-point interface, calls up and says, "Help!"

Someone might publish this thing and it had no modeling, because they thought it was some low-volume thing and it wasn’t important. All of a sudden, it becomes important because everybody found it. So, as we get to composite services, SOA performance is about the service expectation.

Gardner: And the governance?

Abbattista: The governance, and do you let them do it? Do you have governors? Do you have a cost model that burdens the caller, rather than the supplier? These are real questions we'll get into, and they are why I was talking about breaking down the walls. If it's truly a valuable service, then it's my job to figure out and pay for upgrades -- or to help you redesign it.

We take very much an advocate approach to, "Okay, if you come on the bus, we will help you with being successful." And the SLA is the baseline for that. But it also sets up that, "Hey, did you do a good enough job? And what if you are wildly successful?"

Gardner: Right. Let's throw this back to TIBCO. There's clearly a need in the market for a full lifecycle approach, feedback loops, many moving parts. What is it that you can do from a product perspective that helps get to that level of automation? That, in a sense, fills in the cracks about whom and what performs some of these necessary communications between the operations side and those associated with the ongoing requirements?

McNamara: Taking a step back, TIBCO offers a single user interface from the business analyst all the way through to the operational administrators who are running our applications. The idea is that, when you sit down to build out your services, when you sit down to build out your business processes, you use one tool to define what the business processes look like, what the touch points are between folks. Then, that diagram gets handed off from the business analyst to the implementer, who sits down and actually builds the services or builds the business process management (BPM) process that meets those requirements.

There is a direct link between the two. There is a round-tripping built into that tool, largely because it's a single data model and a single user interface with different views for people with different roles in your enterprise. That’s one major thing we do to help facilitate that communication, and that’s part of what we call the TIBCO ONE initiative. The product in question is the TIBCO Business Studio product, which forms that single user interface.

Gardner: And you’ve got hooks in a lot of the other parts of the SOA infrastructure for service enablement and delivery. How do you pull these parts together in a concerted effort?

McNamara: The other side of things is, even once you’ve built things out and deployed things to production, you need to make sure you can keep track, because a number of the folks on this panel have said exactly what’s going on. Ideally, you want to identify early on, as Sandy and Anthony said, which services are important to your enterprise and which services will have heavy load.

Unfortunately, you can’t always do that. Sometimes a little service, as Anthony said, where you think it's just helpful turns out to be a service used in 60 percent of the applications you are deploying. All of a sudden, you’ve got an issue.

You need to understand what the usage characteristics are on your services, not just the designed usage characteristics on your services. We’ve embedded both policy and performance management capabilities in our underlying service infrastructure. All the TIBCO ActiveMatrix products, all of our SOA enablement products, will transparently monitor for performance and usage of the services deployed in that environment.

Anything that you build in TIBCO ActiveMatrix BusinessWorks, ActiveMatrix Service Bus, ActiveMatrix Service Grid, and so on, is automatically monitored. And, you can automatically do some things around policy and access and control and rules.

So, even if you build that little service and you don’t think it's important, and you don’t want to go to the extra trouble to build some governance into it, it's there. It's already been embedded in that infrastructure. When you need it, you can just turn it on and make use of it, and you will automatically have some information about how people are using it with a fairly nice visual dashboard?

The key here is not just the ability to see some numbers in a report, because people miss that. You can have a report on, as Anthony said, more than 750 services running. By going through the performance numbers on each of those services on a regular basis, things get lost if it's just numbers. You need to have very good visualization tool, so you can see in "living color" what's going on with those services and how that relates to the SLAs and the rules you’ve set -- the expectations you’ve set for those services.

Gardner: All right, let's go back to Allstate. Anthony you’ve heard the announcements today, you’ve understood this vision, and you understand the need very well. Do you think that we are getting very close to realizing more of an automated approach to service performance management in an SOA environment?

Abbattista: Yes, we are getting closer and making rapid strides. We need to be careful though. We are being careful to manage the service deployment, the service bus grid, and the parts about how to operate it. What makes me a little nervous or restless is the idea that we start taking all that back into the system parameters and the Java environments and Oracle databases, and that sort of thing. I would hate to see us not solve this first.

I really don’t think we’re at a stage where I want to automatically be adjusting heap sizes in Java virtual machines, or Oracle database parameters, which could be a next logical extension. I did see a little twinkle in people's eyes today, when they looked at products like the BMC Suite and Matrix. I don’t know that I want to have system programmer types around, trying to debug the debugging environment. I think it could become very complicated, very quickly.

Gardner: So we need to keep this at that higher abstraction in order to appreciate the whole and not get down into the weeds?

Abbattista: That’s my belief. I would say that if this service is not performing, then maybe we get the three people on the phone, the database administrator, the platform person, and the network person -- and we take a look at it. But I don’t think we should drill too far into that, until we solve the other layer.

Gardner: I suppose the good news and bad news about all of this is that the metrics for success or failure will be quite evident. You are not going to be able to cover this up across a service-support environment and the business processes that those contribute to, if it doesn’t work. Any failures are going to be readily apparent, not just to a systems administrator, but also to the entire organization that’s affected.

Joe, let's go to you on this whole notion of metrics of success. We have seen some caution, but we also see great promise around SOA. If we got into an economic environment where the pressure becomes higher for better productivity -- of doing more with less -- it's likely we are going to see more companies look to virtualization, outsourced services, software as a service. When do you think the switch on wider SOA use will get thrown, and to what degree does service performance management contribute to that?

McKendrick: Wow, that’s the $64-billion question. It's interesting, I was speaking with the enterprise architect for a major distribution company a little bit earlier. She pointed out to me that, when they started out their service enablement years ago, even before Web services came on the scene and evolved over the past 10 years, they built their infrastructure to be service-enabled from the get-go.

There was no effort to identify what can be service enabled and try to build a service around and try to get acceptance of it. And I asked her, "Well, what do you consider to be success in terms of adoption of the SOA, and in terms of reuse -- and do you even measure a reuse success?"

Basically, to that company, if a service gets reused, fine. If it doesn’t get reused at all, that’s fine too. It doesn’t matter. The reason I'm bringing that up is because reuse is often brought up as the ultimate metric for a SOA success, as the most tangible metric, I should say. But, I think the best approach is to design applications or pieces of applications from the initial start to be service-enabled and employing the latest standards.

Gardner: Okay, so the risks are high, the rewards are high. It sounds like we are getting closer to a less manual, more automated approach, something that has visibility and hooks up and down, deep and wide. Let's wrap up with some last thoughts on this subject.

Sandy, if you are a CIO, a decision maker in the enterprise, and you are listening to this, what do you think that you want to hear that’s going to make you confident, given that you’ve already made a lot of investments in services-enablement? You have to recognize that this is the way for the future, but what are you going to want to put in place in order to start protecting yourself when it comes to your performance management?

Rogers: What we are seeing with IT executives today is a real interest in leveraging what you have, of being able to have speed for deployment, not having to worry about all of the issues, and to have people on board that understand all of the technical dynamics of how everything needs to be implemented from an infrastructure point of view.

So, they need to be able to support fast time to market, and not worry about throwing something out there. When you are deploying it, you have to step back and make sure all of the resources that you need are lined up to make that happen. You want to have an automated way to handle deployment, to handle governance, to handle all of these different issues.

There is also the self-service nature that’s starting to happen -- the ability to create services and allow anyone in the enterprise to be able to get at the information they need as quickly as possible, not have to have a whole army of developers out there. That means you need to feel comfortable that you are creating an infrastructure that could be consumed by multiple parties.

Setting up that infrastructure is really going to save cost. It's going to save time to market, and you need that level of assurance, so that you don’t need to baby sit every single service. There is also an issue of being able to outsource to different parties. You want to be able to leverage that, cost-effectively.

You need to set up an infrastructure, all the processes and rules that everyone needs to follow. And by doing that, you can now leverage whatever resources you want to develop and create what's necessary, and not have to worry about everyone falling in line and having their own infrastructure and having all of that reference architecture put together at each different resource.

It’s really that whole concept of creating a centralized type of platform and a framework to consume all these services. It’s going to be very, very important going forward. Everyone is talking about the issues of the economy, and it’s really the trade-offs of what do you need to do in order to move forward and think about things in more of a total cost of ownership (TCO) manner versus that of direct return on investment (ROI) -- that immediate cost-per-service type of measurement.

Gardner: It sounds like we are describing what could be thought of as insurance. You’ve already gone on the journey of SOA. It’s like going on a plane ride. Are you going to spend the extra few dollars and get insurance? And wouldn't you want to do that before you get into the plane, rather than afterward? Is that how you look at this? Is service performance management insurance for SOA? I am throwing that out to Anthony at Allstate.

Abbattista: It’s interesting to think of it as insurance. I think it’s a necessary operational device, for lack of better words.

Gardner: Service performance management -- not an option?

Abbattista: I don’t think it’s an option, because what will hurt if you fall down has been proven over and over again. As the guy who has to run an SOA now that's on insurance -- it’s not an option not to do it.

Gardner: Last words from you, Rourke? Do you view this as an insurance policy? I guess you have the choice of different insurers, right?

McNamara: I do. I actually do look at service performance management as insurance -- but along the lines of medical insurance. Anthony said people fall down and people get hurt. You want to have medical insurance. It shouldn't be something that is optional. It shouldn't be something you consider optional.

It’s something that you need to have, and something that people should look at from the beginning when they go on this SOA journey. But it is insurance, Dana. That’s exactly what it does. It prevents you from running into problems. You could theoretically go down this SOA path, build out your services, deploy them, and just get lucky. Nothing will ever happen. But how many go through life without ever needing to see a doctor?

Gardner: Okay, now we are going to take some questions from the audience.

Tony Baer: This is Tony Baer with OnStrategies. I want to seize on something that Anthony Abbattista from Allstate had mentioned before, which is that you hope that service performance management doesn’t degrade into getting down to "Java heap sizes." I surely don’t blame you on that one, but what I am wondering is, at what point does this become an IT service management issue?

Abbattista: Because we have gathered that responsibility together, I guess it all falls under one roof in our particular organization. I would think it was external services. One thing we are doing is measuring some of our external providers outside the organization. I guess it’s sort of the same phone call. You are calling yourself or you are calling the person who is responsible and holding him accountable. So, I don’t know it changes much.

McNamara: I would like to add something to that. With something like a Tivoli or a BMC solution, something like a business service management technology, your operational administrators are monitoring your infrastructure.

They are monitoring the application at the application layer and they understand, based on those things, when some thing is wrong. The problem is that’s the wrong level of granularity to automatically fix problems. And it’s the wrong level of granularity to know where to point that finger, to know whom to call to resolve the problem.

It’s right, if what's wrong is a piece of your infrastructure or an entire application. But if it’s a service that’s causing the problem, you need to understand which service -- and those products and that sort of technology won’t do that for you. So, the level of granularity required is at the service level. That’s really where you need to look.

Rogers: What I find is that it’s inevitable that we are going to go down that path, but standards between the systems that do IT management traditionally and this level of detail really haven’t been fleshed out. Most organizations are looking for a single, unified type of dashboard on some of the key indicators. They might want to have that for the operations team that has traditionally run IT service management.

A lot of the initiatives around ITIL Version 3.0 are starting to get some of those teams thinking in terms of how to associate the business requirements for how services are being supported by the infrastructure, and how they are supported by the utility of the team itself. But, we're a long way away from having everything all lined up, and then having it automatically amend itself. People are very nervous about relinquishing control to an automated system.

So, it is going to be step-by-step, and the first step is getting that familiarity, getting those integrations starting to happen and then starting to let loose. What's interesting is in some of the areas of virtualization technologies, where you might have some level of management that’s abstracted from the physical infrastructure, and then you have this level of abstracted management of services how they come together. It hasn't really been defined in the industry, but down the road -- two, three, four, five years from now -- I think you will be seeing a lot more around that.

McKendrick: Let me add that we're still in the very early stages of SOA. In fact, a lot of companies out there think they have SOA, when they actually have just the bunch of Web services, JBoss architecture, and point-to-point types of interfaces and implementations. A lot of companies are just starting to get their arms around exactly what SOA is and what it isn't.

Gardner: Very good. We have been discussing the issues around service performance management for SOA environments. We are talking with a panel of industry analysts and practitioners. I want to thank our panelists, Joe McKendrick, Sandy Rogers, Anthony Abbattista, and Rourke McNamara. Thanks.

This is Dana Gardner, principal analyst at Interarbor Solutions, and you have been listening to a sponsored BriefingsDirect podcast. Thanks and come back next time.

Listen to the podcast here. Sponsor: TIBCO Software.

Transcript of BriefingsDirect podcast on service performance management recorded live at TUCON 2008 in San Francisco on April 30, 2008. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.