Monday, May 26, 2008

Interop Lab Testing for VoIP Devices (2008 edition)


In an Interoperation (Interop) Lab, devices are made to work together. When they seem to, the vendors of the devices claim that they "interop with" each other. This is necessary, but not sufficient, to know things will work together.

Background

Suppose you make telephone soft-switch or application server, such as BroadSoft BroadWorks, Sylantro, MetaSwitch, or the Alcatel-Lucent Network Gateway (aka Lucent Compact Switch). The customers of your several-hundred-thousand-dollar device want to use VoIP telephones with it, but they've tried a few cheap ones and had trouble. And they had another one they liked, but when they upgraded it, call-hold stopped working. After opening several tickets with you and with the phone vendor, they give up and realize that neither of you did anything wrong.

The phone was always doing something reasonable -- though not quite what they hoped -- and the softswitch was doing something reasonable -- though not quite what they hoped. This is the curse of the VoIP Implementor: everybody can follow the rules, and working together, get nothing done.

After complaints from customers about this difficulty, you (the softswitch vendor) launch an interop testing program. Depending on the sort of technical staff you have, you might do it in-house, or outsource it.

Your goal is to confirm that specific products "work with" your product. You want your customers to be able to choose products confidently. It's a laudable goal.

A common interop lab setup might look like this, for a softswitch vendor's lab, when testing a new VoIP phone:



You've got the new phone (the Device Under Test, or DUT), and your piece of gear. Plus you connect them with an ordinary Ethernet switch. And you have another gold phone that you know to work with your platform, because phone calls require two phones.

Advantages and Limitations of Interop Testing



Interop testing of this variety can be very useful. SIP VoIP telephony is an evolving body of standards. There are more than one way to do something (such as put a call on hold, or make one phone number appear on two different phone (shared call appearance (SCA))). It is very useful to identify fundamental incompatibilities. Often, there are certain settings required on both sides: to use this device, The "rfc2543_hold" option must be turned on. Somebody has to figure this out: it had might as well be two the vendors involved.

This testing is only as good as the test plan. If the test plan doesn't include something that you need to do, then the testing isn't quite complete for you.

In addition, no two devices have all the same features. The vendor may test for T.38, but if the DUT doesn't do T.38, they don't want to mark it as failed for T.38. Instead, the testing engineer is likely just to mark that feature as Not Supported. Depending on policy, or haste, the testing engineer may just mark anything that doesn't work as Not Supported. There is no "one true standard" for interoperability. Ultimately, if the DUT and his softswitch work together at all, the two are "certified" as interoperable. It is to neither vendor's advantage to anger the other by claiming they're not interoperable.

Finally, Interop lab testing only tests the interoperation between a pair of device. Devices X and Y work together. But real systems are made of devices A through Z.

Real Integration Testing



A network for a VoIP Service Provider using SIP peering might look like this:

A real network for an SS7-connected CLEC might look more like this:

There are lots of devices. And many of them may affect the success you'll have with the DUT.

Take, for example, a simple SIP phone. It's easily possible that the signaling path for a PSTN call, in a CLEC case, is
  • SIP Phone
  • Customer Premise ALG
  • Session Border Controller
  • Softswitch
  • PSTN Gateway
  • SS7 STP networks
  • PSTN Class 5 Telephone Switch
  • PBX
  • PBX handset


It's definitely possible that the SIP phone could work with the softswitch, but irritate one of the other components. For example,
  • The SIP phone and the softswitch may use Requires: a specific SIP feature package, but the SBC doesn't allow it or support it.
  • Or perhaps yet-another form of caller ID is dreamed up, and the PSTN gateway can't support it, so that all calls from this phone appear to have no caller ID.
  • Or maybe the phone signals some sort of ISUP calling party category in SIP that the PSTN gateway passes through, and confuses a downstream PSTN telephone switch.
  • What if the phone needs SIP over TCP, but the ALG doesn't support it properly?
  • Or the DUT signals RFC2833 support properly, but the PSTN gateway expects telephone-events to have a specific codec number (i.e., tickling a bug that was already present)?
  • Perhaps it works fine if the SBC is configured with a "classic" configuration, but breaks miserably if you switch to the "new" configuration.
  • Maybe it can download configurations from the lab FTP server, but chokes if the FTP server is a little slow


My point is that we have to test components an in and-to-end environment to really get confidence that they work. This may change, but only if the complexity here is accidental (a side-effect of today's state of things) and not essential complexity (fundamental to the job). With all the new features that VoIP telephony systems try to support, and the upgradability the protocol designers intend, I'm not sure I can distinguish which it is now.


Show me a "simple SIP phone", and I'll show you a phone that nobody likes because it's not flexible enough to be configured to do crazy things.

Why won't anybody build far-end echo cancellation into their VoIP phones and ATAs?

Nobody VoIP Phone or ATA on the market offers talker far-end echo cancellation. They should.

Natural Rock Formation that amplifies echo.


Background



Some background: echo is when you hear yourself talking. It's usually the caused by the device on the other end of the call, but it's exacerbated in VoIP networks because they have long delays.

Suppose you have a phone call that includes a VoIP device (such as a PolyCom SoundPoint 650 IP SIP Phone, or a Cisco 7260 SIP phone, or a Cisco/LinkSys/Sipura ATA, or an Aastra 57i).



There's a VoIP phone plugged into some sort of IP network. In many cases, it's an Ethernet network with a DS1 (T1) connection to the VoIP service provider / ISP. There's a VoIP-PSTN gateway connected there, such as a MetaSwitch VP3510, or a Lucent Compact Switch (LCS), or an AudioCodes, or a Cisco AS5400. It has an Ethernet interface and TDM interfaces, such as DS3, DS1. Maybe it does ISUP and has SS7 A-Links, or ISDN PRI. Then there's the PSTN, which includes traditional telephone switches -- and actually a lot of VoIP hidden in there too. Finally, there's a PSTN phone.

When the VoIP phone user says something, he may hear an echo of his own voice coming back.



Cisco has a nice WAV file demonstrating what echo sounds like. (This page is known to some insiders as the Cisco Duck Quack page.)

Why does C3P0 hear his own voice coming back?

The PSTN Phone isn't perfect. Some of the electrical voice signal that enters it may be "reflected" back. This is sometimes called the "Two-wire-to-four-wire" conversion: a normal phone line is a two-wire electrical circuit with both sides of the conversation on it, but the handset itself has two wires for the speaker, and two wires for the microphone. If this isn't built just perfectly, some of the sound signal that's sent to the speaker will be reflected back down the wire. Many phones are not perfect.

The PSTN Phone may be on speakerphone, or pick up other acoustic echo. Normal room walls echo sound back to us. It's there, but we don't normally notice it.

The VoIP Network is long.We won't normally notice echo in a room, unless we're in a big room. Our brain is pretty good at filtering out sound that's echoed back very quickly; if we hear an echo less than 100 [ms] or so after we say it, we don't even notice it normally. So if the room is big enough that sound takes a long time to echo back, then we might notice it.

(How big would a room need to be? Around 55 feet / 17 meters across would work nicely to reflect off the far wall. Sound travels 340 meters / second, and we'll hear an echo if our voice takes around 100 [ms] to get back to us. That means it needs to be 0.1 * 340 meters from my mouth to my ear, or half that from one side of the room to the other because the sound travels that distance twice.)

It can take a long time for a sound signal carried via VoIP to get from the talker C3P0 back to his ear. 100 millisecond round-trip-time is easily achievable. Why? Because packets sit in buffers along the way. In traditional non-VoIP TDM networks, digital voice data is rarely "buffered" to any extent. But in VoIP networks, buffering always happens. This "buffering" means that packets are sitting inside devices -- phones, routers, switches -- effectively adding to the delay between the talker and his echo.


Put echo cancellation in the VoIP Phone.




This is all very annoying for the talker who hears himself. VoIP users suffer from this much more than PSTN users, because of this delay. Technically, the PSTN phone is creating the echo, it seems silly to just point blame. Technically, VoIP networks should be free of jitter (variation in packet transit delay) -- but networks aren't perfect, so VoIP phones have jitter buffers built in. It's standard equipment, like seat belts in cars.

But echo cancellation -- the sort that's really needed, that would cancel out the echo received back across the VoIP network -- just is not put into VoIP phones normally. Some phones, such as Polycom, have limit acoustic echo cancellation capability to prevent the VoIP phone itself from echoing back. That's nice, but that's not the main problem.

It seems VoIP phone vendors are spending their time building in G.722 wideband codec support ("HD Voice") , so conversations within your office building will sound nice. Whoop-ee.

VoIP Echo Cancellation is possible. Vendors are routinely building limited echo-cancellation ability into the PSTN-VoIP gateway device. (E.g., General Bandwidth G6, MetaSwitch). But it can't always be used, and it's not always effective. For example, if you connect to the PSTN through Level(3), then you don't get a gateway with echo cancellation. And on some gateways, using echo cancellation limits the number of calls you can make through the box.

But is echo cancellation in the VoIP gateway sufficient? Apparently not. If it were, I wouldn't hear about many complaints from my client base. And Ditech would have no reason to make a VoIP-only echo cancellation box. But as it is, VoIP carriers have lots of trouble with echo.

And, in general, centralizing the echo-cancellation capability doesn't seem optimal either. Putting lots of work and intelligence in one place tends to make one really-expensive place.

VoIP Phone and ATA Vendors should add echo cancellation to their devices. Polycom, Cisco, Aastra, Linksys, Snom, Adtran, etc. listen up: your device should cancel out the echo received back in the RTP stream from the VoIP network. It can't be that hard! I know there are DSPs that can help with this. Add this feature to your top-of-the-line phone! Charge more for it! Make it a premium add-on license if you must!

Some people just want a super-cheap VoIP phone. But some people are trying to replicate the services of traditional phone systems, but using VoIP. For them, the echo problem is real, and serious. It can't always be solved with a gateway. And an extra 25 percent in the cost of the CPE might be hard to swallow, but not as hard as having no solution at all.

Wednesday, May 14, 2008

Call Transfer Scenarios

The IETF Call Control - Transfer draft seems to have the best and latest info on call transfer scenarios. But I can't find a good summary of the cases involving transfer-like scenarios. Here's my stab at a complete list, although I haven't taken the time to try to prove this is comprehensive.


-- Blind transfer, original recipient is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Bob transfers to Charles,
Charles answers,
Charles and Alice talk.

-- Blind transfer, originator is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Alice transfers to Charles,
Charles answers,
Charles and Bob talk.

-- Blind transfer before call answer, original recipient is facilitator:
Alice calls Bob,
Bob's phone rings,
Bob diverts the call to Charles,
Charles answers,
Charles and Alice talk.

-- Blind transfer before call answer, originator is facilitator:
Alice calls Bob,
Bob's phone rings,
Alice diverts the call to Charles,
Charles answers,
Charles and Bob talk.

-- Attended transfer, original recipient is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Bob puts Alice on hold,
Bob makes call to Charles
Charles answers,
Bob and Charles talk,
Bob completes transfer,
Charles and Alice talk.

-- Attended transfer, originator is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Alice puts Bob on hold,
Alice makes call to Charles
Charles answers,
Alice and Charles talk,
Alice completes transfer,
Charles and Bob talk.

--  Busy-Failed Blind transfer, original recipient is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Bob transfers to Charles,
Charles is busy,
Alice and Bob talk.

-- Busy-Failed Blind transfer, originator is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Alice transfers to Charles,
Charles is busy,
Alice and Bob talk.

--  No-Answer-Failed Blind transfer, original recipient is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Bob transfers to Charles,
Charles never answers,
Alice and Bob talk.

-- No-Answer-Failed Blind transfer, originator is facilitator:
Alice calls Bob,
Bob answers,
Alice and Bob talk,
Alice transfers to Charles,
Charles never answers,
Alice and Bob talk.

I don't list the busy-failed attended call case, or the no-answer-failed attended call case, because those never get to the transfer (REFER message) case. So those just look like a call that gets put on hold, the facilitator makes another call, it fails, then the original call is retrieved from hold.