How Gong identifies speakers
Gong analyzes calls using multiple methods to determine who was on the call and when they spoke. This information is used to calculate stats, and is shown on the call page to help you navigate to the relevant parts of the call.
The first step in speaker identification is dividing the call into segments, each associated with a single speaker. Gong does this in one of two ways:
-
Conference calls: When Gong records a web conference call, we look up the participant list during the call to get a rough estimate of who is present, and when each participant speaks. Conferencing systems tend to exhibit large delays in presenting speaker switches, so the information we get from them regarding when a participant speaks is often inaccurate. To address this issue, we apply a proprietary refined speaker separation algorithm that identifies smaller speech segments (for example, "Yes", "OK"), to attribute the speakers better, even when the conferencing system itself did not present a speaker switch or presented a switch with a delay.
-
Telephony calls: When Gong receives stereo recordings, we use the two channels to determine the speakers. Assuming that these are the two speakers, we do not attempt to divide the call further.
When Gong receives mono recordings, we separate the single audio channel into as many channels as there are speakers, according to voice variance in a process known as diarization.
Gong applies different methods of participant identification, according to the type of call.
In mono call recordings, we only store voice identification for users who have opted into the voice identification feature. Voice identification is not stored for any other call participants.
-
Gong collects up to 5 short recordings of subscribed users who have opted into the feature. For best results, we look for calls that:
-
Are mono telephony calls
-
Include at least 2 minutes of recorded speech
-
-
Typically, Gong can accurately identify individuals from their second recorded call, based on the sample collected during their first call.
-
Gong replaces these samples on an ongoing basis in order to keep the sample fresh, and to increase recognition accuracy. This helps us identify the Gong user in variable conditions, like when they start the call from a different environment, use a different telephony system, or use a new headset.
-
As soon as we have enough samples for an individual, we revisit earlier calls where recorded team members were not identified, and leverage the sample to rerun voice identification. All of this analytics is performed on-the-fly, so no file containing a user’s voice identification is retained.
For info on how to enable Voice Identification, see this