Networked video communication [1] is achieved by determining the available bandwidth along an end-to-end path and adapting the encoded video rate accordingly. Volatile traffic load conditions on the Internet create the need for videospecific measures [2] that can accurately and in a timely manner determine the network state. Packet loss has traditionally signalled congestion to TCP, which has achieved remarkable success in avoiding excessive Internet congestion. Nevertheless, the limitation of packet loss has been identified as a performance bottleneck in TCP and enhancements to the protocol with a packet delay-based indicator have been proposed [3]. However, TCP emulators [4] for real-time transport over UDP (to avoid unbounded delivery delay through TCP) do include a packet loss factor in their models. While packet loss is acceptable in file transfer, as lost packets are simply re-transmitted through TCP’s reliable transport, playout and decode deadlines must be met when streaming video. However, under congestion control of UDP, without avoidance measures, packet losses without retransmission do occur, degrading the delivered video quality. This paper shows that, because of reduced fluctuations in the sending rate, a fuzzy logic congestion controller (FLC) of video over UDP can more nearly approach an optimal regime, in which the available bandwidth is closely tracked with minimal packet loss. Input to the FLC is from a packet dispersion measure, which is a form of delay-based congestion control.
In [5], delay-based congestion control with the delay gradient was employed for applications such as video conferencing to arrive at low average end-to-end delay with minimal restriction of throughput. In the process it was found that output oscillationswere reduced along with delay variance. The method also avoided the ‘phase effect’ [6], whereby packet-loss probe-based congestion control introduces unfairness between streams across the same link, as the same stream may repeatedly suffer packet loss at the congested link. Congestion control of a video stream can be achieved through a rate-adaptive transcoder. Video transcoders, including ours [7] adapted for variable bit rate (VBR) streams, open up the possibility of sending a pre-encoded video bitstream at the maximum possible rate without overly exceeding the available network bandwidth. Hence, subsequent router buffering is able to cope with the output packet stream. In fact, it is quite possible to arrive at fewer packet losses or even avoid loss altogether by re-compressing an already compressed bitstream by means of a transcoder. Although in this paper we have applied fuzzy logic to a rate-adaptive transcoder, direct fuzzy logic control of the encoder quantizer step sizes is also possible. However, as pre-encoded video comprises the majority of video streams on the Internet, rather than live video streams, the description concentrates on transcoding.
Fuzzy logic, which has from its inception been extensively used for industrial and commercial control applications [8], is for us simply a convenient tool for handling un-modelled network congestion states. Within video coding it has found an application [9] in maintaining a constant video rate by varying the encoder quantization parameter according to the output buffer state, which is a complex control problem without an analytical solution. Fuzzy logic control of congestion is a sender-based system for unicast flows. The receiver returns a feedback message that indicates time-smoothed and normalized changes to packet inter-arrival time. These allow the sender to compute the network congestion level through pre-designed fuzzy models. The sender then applies a control signal to the transcoder’s quantization level, as a reflection of the anticipated congestion. Thus, congestion control without packet loss feedback is achieved by measuring packet stream dispersion arising when busy router queues are encountered, especially at tight links, representing the point of minimum available bandwidth on the network path. Fuzzy control is thus able to function in low packet loss environments.
A well-engineered FLC for transcoded video should:
1)Be TCP-friendly so that, in the event of proliferation of FLC streams within the Internet, there is a limited risk of congestion collapse.
2)Coexist with typical Internet traffic, consisting of long-term file transfer flows and short-term Web server connections.
3) Track the available bandwidth as closely as possible, though at the same time reducing or eliminating packet loss.
4) Achieve an optimally smooth stream, to avoid fluctuations in delivered video quality
Items one and two are a measure of the quality of the solution, as without these stipulations a controller could simply greedily acquire bandwidth from other traffic. Items three and four are highly desirable for video traffic.
The remainder of the paper is organized as follows. The details of the system architecture and the FLC are given in Section II. Section III reports a set of simulation experiments. Finally, Section IV draws some conclusions, explaining why this paper proposes an FLC for video streams.
Fig. 1 shows a video streaming architecture in which fuzzy logic is utilized to control the bitrate. A video transcoder at the server [7] is necessary for pre-encoded video-rate adaptation. The client-side timer unit monitors the dispersion of incoming packets and relays this information to the congestion level determination (CLD) unit. The CLD unit monitors the outgoing packet stream, especially the packet sizes, and combines this information with feedback from the client, as a basis for determining the network congestion level, CL. This unit also computes the congestion-level rate of change, δCL. The timer unit measures the arriving packet inter-packet gaps (IPG’s) before finding a time-smoothed and normalized estimate of the packet dispersion. An IPG is the time duration between the receipt of the end of one packet and the arrival of the next. The FLC takes CL and δCL, as inputs, and computes a sending rate that reflects the network’s state. The appropriate change in the transcoder quantization level is then calculated. Transported packets are received by the client, de-packetized, decoded and displayed at video rate.
At the server, the video transcoder inputs the pre-encoded video and reduces its bit-rate in response to the control signal from the FLC. The lower bound to the sending rate was set to be 10% of the input sending rate. For the average input sending rate of 2 Mb/s in the simulations in Section III, a lower limit of 200 kb/s is sufficient for an acceptable video quality. The transcoded video is packetized, with one slice per packet, and sent across the network within a UDP packet. Apart from error resilience due to decoder synchronization markers, per-slice packetization also reduces delay at the server. Transcoded video packets are subsequently output with a constant IPG at the point of transmission. Ensuring a constant IPG reduces packet inter-arrival jitter at the client and also renders the streamed video more robust to error bursts.
Fuzzy logic emulates a control process, as if a human expert were regulating the transmission rate. Multiple fuzzy membership functions model the uncertainty in that expert’s perception of the feedback, whereas an output rate decision is made precise by the process of defuzzification, which translates uncertainty in the output to a crisp value, i.e. a specific control signal value.
Fig. 2 is a block diagram of an FLC. Fuzzifiers convert the inputs CL and 未CL into suitable linguistic variables. A knowledge base encapsulates expert knowledge of the application with the required control goals. It defines the labels that help specify a set of linguistic rules. The inference engine block is the intelligence of the controller, with the capability of emulating the human decision making process, based on fuzzy logic, by means of the knowledge database and embedded rules for making those decisions. Lastly, the defuzzification block converts the inferred fuzzy control decisions from the inference engine to a crisp value, which is converted to a control signal, CT in Fig. 2, to the transcoder, which then outputs a re-compressed bitstream.
Fuzzification is the term given to the application of a membership function, µ, to a data value to find its membership possibility, i.e. µ(x) yields the possibility of membership of the fuzzy subset for which µ is the membership function. The input variables were fuzzified by means of triangular-shaped membership functions, being the usual compromise between reduced computation time at the expense of a sharper transition from one state to another. Choosing the number of membership functions is important, as it determines the smoothness of the bit-rate granularity. However, the number of membership functions is directly proportional to the computation time. The congestion level, the rate at which it changes, and the control output were each partitioned into a set of overlapping triangular membership functions, with the overlap such that extent of any one triangle reached the midpoint of the base of another.
The algorithm was simulated with the well-known ns-2 network simulator (version 2.30 used). The simulated network, with a typical ‘dumbbell’ topology, had a bottleneck link between two routers and all side link bandwidths were provisioned such that congestion would only occur at the bottleneck link. That is access links from the senders and to the receivers were set to 100 Mb/s. The default buffer size of the bottleneck link routers was configured to be twice the bandwidth delay product, as is normal in such experiments to avoid packet losses from too small a buffer. The one-way delay of the bottleneck link was set to 5 ms and the side links’ delays were set to 1 ms. The bottleneck link routers’ queueing policy was defaulted to be FIFO (drop-tail). Random Early Detection (RED) routers rather than drop-tail are reported [14] as difficult to configure in a simulation in such a way that the behavior is uniform across a range of background traffic intensities. Though it is expected that RED will improve congestion control if and when it is widely deployed, [14] reports that for lightly loaded links there is a danger of error bursts if RED is applied.
In all experiments, when under fuzzy control, the IPG at the sender was set to 2.2 ms, which corresponds to the video packetization characteristics of Section II-A. Network-state decisions were fed back from the receiver to the fuzzy controller after every frame or frame transmission interval (e.g. 40 ms). An MPEG-2 video ‘news clip’ with moderate motion and Group of Picture (GOP) structure of N=12, M=3 was selected, originally pre-encoded at an average rate of 2.0 Mb/s.
Comparison was made with the TCP-friendly Rate Control (TFRC) protocol, the subject of an RFC [15] and a prominent method of congestion control from the originators of the ‘TCP-friendly’ concept. To ensure fairness the publicly available TFRC ns-2 simulator model1 (in the form of object tcl scripts to drive the simulator) was employed. In TFRC, the sending rate is made a function of the measured packet loss rate during a single round-trip-time (RTT) duration measured at the receiver. The sender then calculates the sending rate according to the TCP throughput equation given in [16]. In the TFRC experiments, a TFRC controller dispatched fixed-size UDP packets across the same network tight link, varying the IPG according to the available bandwidth, as estimated by the TFRC feedback mechanism. In the TCP throughput equation employed by TFRC, notice that the packet length explicitly appears as a linear scaling factor, allowing TFRC to adjust its behavior according to a constant packet size. Packet loss and RTT appear as non-linear factors in the equation. In [17], it was found that the median packet length for UDP streaming was 640 B, which is similar to the fixed 700 B packet length of TFRC in these tests.
Additionally, comparison was made with RAP [14], which is an alternative to equation-based modeling. RAP varies the IPG between fixed-size packets to allow its average sending rate to approach TCP’s for a given available bandwidth. Every smoothed RTT, RAP implements an Arithmetic Increase Multiplicative Decrease (AIMD)-like algorithm [18], with the same thresholds and increments as TCP. Because this would otherwise result in TCP’s ‘sawtooth’-like rate curve, with obvious disruption to multimedia streams, RAP introduces fine-grained smoothing (turned on in this paper’s tests), which takes into account shortand long-term RTT trends. Because of its pioneering role and its close resemblance to TCP, RAP has frequently served as a point of comparison for congestion controllers. To ensure fairness to RAP public ns-2 models2 were availed of
Internet measurement studies [19] have demonstrated a typical Internet traffic mix to consist of longer term flows, ‘Tortoises’, representing file transfers, and transient HTTP connections, ‘Dragonflies’. In the set of experiments herein, one FLC video source and up to ten TCP sources were passed across the link. The first five TCP sources were ‘dragonflies’ with a random duration of between one and five seconds. These sources were generated from a uniform distribution and with an off duration of between one and five seconds, also randomly generated from a uniform distribution. The remaining five TCP sources wereconfigured as ‘tortoise’, with an on duration of between five and twenty seconds and an off duration between one and five seconds, all also randomly generated from a uniform distribution. In the first experiment, only one TCP source was present as background traffic, in the second two TCP sources acted as background traffic and so on, and all ten TCP sources were on as background traffic for a tenth experiment. All experiments were repeated ten times with different seeds and the mean result was taken.