QAVF 音频共享SWAM系统范围音频管理

QAVF 音频共享SWAM系统范围音频管理

The audio framework uses the System-Wide Audio Management (SWAM) layer to share audio-management information between the hypervisor host and guests. This information gives applications awareness of audio conditions external to their OS so they can perform audio ducking, suspending, or pausing based on the host’s and the guest-local audio policies.

These actions provide system-wide coordination of audio management between different OSs. This ensures that end users such as vehicle drivers and/or occupants don’t experience interfering music playback and can hear important sounds such as warning chimes when necessary.

This software layer is considered mature because our specification that defines its interface is not changing.

Key capabilities

SWAM provides these key capabilities:

  • Pausing audio in a guest or the host when another media player begins playing, whether the new player is in the same or a different OS.
  • Suspending and resuming audio when transient audio is played in the same or a different OS.
  • Making an OS aware of ducking being applied by the host’s audio management policy. This allows OS-local audio management policies or applications to decide what to do while their audio is being ducked.

Understanding the SWAM capabilities and design requires an explanation of several key terms; this is given in “Audio terminology”.

Architecture

The SWAM architecture is based on SWAM processes that run in the host and guests and that interact with the native (OS-local) audio services. There are two kinds of SWAM processes: the host audio management agent (HAMA) and the guest audio management agent (GAMA). The HAMA learns about audio events from the host’s audio service, manipulates the host’s audio environment, and communicates audio information to guests. The GAMA or an equivalent audio-management service communicates the guest’s audio information to the HAMA and manipulates the guest’s audio environment in response to the host-reported information.

The diagram below shows the interaction between the SWAM processes, guest applications, sound drivers used by the guest, host audio service (io-audio), and audio hardware.
QAVF 音频共享SWAM系统范围音频管理

The SWAM layer requires additional services for its operation. Currently, io-audio with audio management enabled is needed for backend support in the host. To enable communication with the Android HAL for the HAMA, a virtio-net or virtio-vsock vdev is needed. These vdevs provide the host-guest communications (sockets or vsockets) that the HAMA uses to communicate with Android guest audio-management services, through gRPC and other mechanisms, and must be configured within the guest’s VM and the hypervisor host. See the “Virtual Socket” chapter for details.

In an Android guest, the Android Audio HAL offers remote audio-management capabilities. The HAMA agent can send audio event information to the HAL (through a GAMA module it loads) and thus, no GAMA agent is required in the guest. Based on the information it receives, the HAL notifies the guest applications about audio events so they can duck, suspend, or pause their audio channels as needed. This activity affects the discrete audio streams that these applications send through the VirtIO sound driver to the virtio-snd vdev. There can be one or many vdev instances, depending on the audio types configured for the guest; for details, see “HAMA configuration”. The vdev instances then send the audio data (e.g., navigation prompts, music) to io-audio.

Conversely, in a QNX Neutrino or Linux guest, a GAMA agent is required to exchange audio-management information with the host and update the audio environment on the guest side.

To use shared audio and perform audio management in QNX Neutrino or Linux guests, you must work with QNX Engineering Services to implement the front ends for audio sharing and SWAM.

These types of guests would use some kind of VirtIO audio device (which is not shown in the diagram for simplicity) to send audio data to io-audio in the host. The guest applications can use audio channels of various types provided they are enabled for the VirtIO audio device. If the native audio service doesn’t support routing audio based on type, the audio channels used must be of the default type.

The audio management master for SWAM is io-audio on the host. The io-audio service reads the audio policy configuration file (e.g., audio-policy.conf) to learn about the various audio types defined for the system and their attributes such as priority and preemtability. In a virtualization frameworks system, this file must define all audio types needed by applications in the host or in any guest that will be run. For an explanation of the contents of this file, see “Syntax of the audio policy configuration file” in the Audio Developer’s Guide.

The HAMA agent monitors the audio zones (i.e., playback regions) containing the devices used by the host and guests, and receives information about audio events affecting these zones from io-audio. Based on this information and on its configuration, the HAMA agent sends along any event information that affects guest audio channels to the corresponding guests. If the events affect audio channels used by host applications (which aren’t shown in the diagram for simplicity), the agent updates the host’s audio environment by ducking, suspending, or pausing channels.

  • HAMA configuration
    The HAMA agent is configured through a JSON file that specifies the audio zones, the GAMA modules and the guests they support, and the properties and audio types associated with each guest.
  • Audio terminology
    SWAM is based on several interrelated concepts, which we explain here.

HAMA configuration

The HAMA agent is configured through a JSON file that specifies the audio zones, the GAMA modules and the guests they support, and the properties and audio types associated with each guest.

You can store your configuration file anywhere; /etc/hama.conf is a convenient location. You must specify this file’s path when starting the HAMA agent process, by using the -c option; otherwise, the process won’t start. The HAMA agent reads the configuration at startup and loads the required GAMA modules to communicate with the Android Audio HAL or the GAMA agents in guests.

Any fields that aren’t relevant for a given object are simply ignored. This allows comments to be included in a configuration by using JSON items with such names. JSON doesn’t support duplicate item names, so if you want to put more than one comment in a section, each comment needs a unique name. You can write a multiline comment, using as many array entries as you need, to describe an object. The example configuration makes use of comments to describe the options and their use.

The top-level objects in the configuration file are:

  • zones — an array of objects, each describing a zone that SWAM should monitor or otherwise interact with
  • modules — an array of objects, each describing a GAMA module that SWAM should load to communicate with guests, and any module-specific settings

The zones array

Each object in the zones array contains the following fields:

Name Type Content
name string The human-readable name for this zone
card number The number of the io-audio card containing the audio device associated with this zone
devno number The io-audio device number for the audio device associated with this zone
audiomgmt_id string The name of the audio-management profile to use. This should match the audiomgmt_id field in the io-audio configuration file’s [ctrl] section for the specified device
ducking_vol_ramp_ms number The number of milliseconds that an expected ramp down from full volume to mute will take. This should match the value in the io-audio audio-policy configuration file’s [vol_ramp], name=ducking entry

The order in which the zones are defined in this object determines the indexes used to identify them in subsequent configuration objects.

The modules array

Each modules array entry is an object with at least the first two of the following fields:

Name Type Content
dll string The name of the DLL to load
guests object array A set of objects describing characteristics of individual guests
port number The port that the module will listen to for guest GAMA client connections. This field is present only for the GAMA module used with guests that run a GAMA agent.

The guests array

Each object in the guests array contains at least the first three of the following fields, and one of the last two:

Name Type Content
name string The name of the guest to connect with
class number The class of the guest, which is a number from 1 to 4. The class tells SWAM which discrete audio channels and level of audio-management event messaging it can expect from the guest.
connect_to_zones array of numbers A mapping from host zone indexes to guest zone identifiers. This array must have the same number of entries as the zones array. Entries for host zones that are NOT to be connected to a guest zone must be set to -1.
server_addr string The address and port to use when connecting to the Android Audio HAL, in the format address:port. This field is present only in guests objects defined for the GAMA module used with Android guests. The address token is either an IP address if a standard socket is used, or the string vsock if the virtual socket (virtio-vsock) is used.
host_channel_types object An object describing the audio types of channels associated with a given guest’s transient, non-transient, and ducking audio. This object is present for class 1, 2, or 3 guests.
host_watch_channel_types object An object describing the audio types that should cause messages to be sent to the guest when the associated channels become active or become inactive again. This object is present for class 4 guests.

The host_channel_types object

The host_channel_types object contains the following fields:

Name Type Content
pause_proxy string The audio type of the channel that SWAM should open just to cause host or other guest media playback to pause. This field is used only for class 2 guests, because these guests can’t route transient and non-transient audio to different audio busses (and thus, there’s no non-transient audio type used by the guest’s virtio-snd vdev instances).
non-transient string The audio type used by the virtio-snd vdev stream through which the guest plays non-transient audio (e.g., music)
transient string The audio type used by the virtio-snd vdev stream through which the guest plays transient audio (e.g., navigation announcements)
duck string The audio type used by the virtio-snd vdev stream through which the guest plays audio that should duck but otherwise not interfere with other audio streams

The host_watch_channel_types object

The host_watch_channel_types object is used only by class 4 guests, and contains the following fields:

Name Type Content
suspend_on string array An array of audio type names, as defined in the audio policy configuration file (audio-policy.conf), that will cause suspend and resume notifications to be sent to the guest when it transitions to or from actively playing audio
duck_on string array An array of audio type names, as defined in the audio policy configuration file (audio-policy.conf), that will cause duck notifications to be sent to the guest when it transitions to or from actively playing audio

Guest classes

There are four classes of guests. These classes reflect their audio-management capabilities and the level of audio information they provide to the host.

Class 1

These guests:

  • Send audio streams of different purposes to different devices, each of which is implemented by its own vdev instance and has a different audio type for its channel (i.e., connection to io-audio)
  • Route transient and non-transient audio to different devices
  • Don’t inform the host of their audio-management activity. The host derives the guest’s audio-management state from the activity on the guest’s audio channels.

Class 2

These guests:

  • Send audio streams of different purposes to different devices, each of which is implemented by its own vdev instance and has a different audio type for its channel (i.e., connection to io-audio)
  • Route transient and non-transient audio to the same device
  • Inform the host when they begin playing non-transient audio
  • Don’t inform the host of their activity for managing transient audio. The host derives the guest’s audio-management state from the activity on the guest’s audio channels for transient audio.

Class 3

These guests:

  • Mix all audio streams from local applications into a single stream that gets routed to a single vdev instance. This means all of the guest’s audio is streamed through that vdev’s channel (i.e., connection to io-audio), with the same audio type.
  • Inform the host of their audio-management activity.
  • In their vdev configuration, use an audio type that’s lower priority than the multimedia type in the audio policy configuration file of the host. The default audio type is a good choice.

Class 4

These guests:

  • Mix all audio streams from local applications into a single stream that gets routed to a single vdev instance. This means all of the guest’s audio is streamed through that vdev’s channel (i.e., connection to io-audio), with the same audio type.
  • Don’t have a mechanism to inform the host of their audio-management activity or state.

Audio terminology

SWAM is based on several interrelated concepts, which we explain here.

Audio channel

  • A connection to io-audio over which an audio stream flows. Each channel is assigned an audio type when created, which determines the effect that the channel’s going active (the audio starts flowing) or inactive (the audio stops flowing) has on other channels.

Audio policy

  • The mapping of audio types to audio attributes. These attributes are specified in a policy configuration file that’s read by io-audio in the hypervisor host and by the equivalent audio services in guests. SWAM enables enforcement of the host’s audio policy across OSs, which results in ducking, suspending, and pausing of channels as needed.

Audio stream

  • A flow of audio data from an application to io-audio, which sends the data to hardware (e.g., speakers).

Audio type

  • An audio channel attribute that maps it to other attributes that determine the channel’s priority, transient status, preemptability, and ducking effect on other channels.

Audio zone

  • A playback region that contains specific devices, such as the front speakers or rear speakers in a car.

Ducking

  • Reducing the volume of audio channels when another high-priority audio channel becomes active. When an audio channel is fully reduced in volume (i.e., ducked to 0%, or muted), the data flow may be suspended, depending on the policy configuration. For more details about audio ducking, see “Understanding audio ducking” in the Audio Developer’s Guide.

Non-transient audio

  • Long duration audio that’s driven by user interaction; typically this refers to media playback.

Pausing

  • Suspending audio flow indefinitely, so it won’t be automatically resumed when any high-priority audio finishes playing. Paused audio must be explicitly restarted in the controlling application.

Suspending and resuming

  • Preventing audio data from flowing when its audio channel is ducked to 0% but then allowing it to flow again (resuming it) when the high-priority audio finishes playing.

Transient audio

  • Short duration audio that’s driven by external events rather than user interaction, such as navigation announcements or warning chimes.

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程