With the recent buzz on cell phone worms, awareness is being raised about the possibility of VoIP-specific attacks. A commonly used VoIP signaling protocol, Session Initiation Protocol (SIP), reminds me of the “Simple” in SNMP and how vulnerable it was in its early, non-secure incarnations. SIP is very lightweight and requires an intermediary to place VoIP calls. Once the end-points are discovered and connected the call goes peer-to-peer, unless the call is using an advanced feature like multi-party conferencing.
One industry pundit noted that “VoIP phones are more computer than phone.” How true. As a corollary, I would state that “VoIP is more protocol than voice.” Therefore the various protocols associated with VoIP are susceptible to hacks and attacks just like the IP world at large – worms, denial of service (DoS) attacks, Man-in-the-Middle (MITM or MTM) attacks, and so on.
It seems silly that a hacker would want to implement a MITM attack to modify the RTP voice data by: changing pitch or content, inserting silence, adding background noise or something more incriminating (placing you at a location you shouldn’t be at!), and so forth. But it is possible. A more likely scenario is simply eavesdroping on a conversation for which simpler tools are available.
SIP does have one thing going for it – it runs over UDP, which is a stateless protocol. This makes it less vulnerable to DoS attacks such as chewing up connections on a server. Sending out a rash of invites to a SIP router or PBX is probably just an annoyance - you really aren’t chewing up “connections” like TCP-based apps such as HTTP.
More interesting stuff to watch out for might be someone masquerading as a user (or inserting a worm into a PC-based SIP client) already logged into a SIP server and then initiating high-speed “war dialing” of SIP numbers or attempting bogus 911 calls where supported. Even worse, those “free” SIP servers often use plaintext identifiers and passwords inside of HTTP packets. We definitely need better authentication and provide encryption of voice-carrying RTP packets, especially in the public space, to protect against not only DoS attacks and such, but identify theft as well.