Many contact centers provide multilingual services, and monitoring the quality of the languages agents are speaking becomes a priority, especially when they have to rely on agents who may speak the target language, but are not necessarily native speakers of that language. Customers frequently complain about the level of these agent’s languages criticizing their accents, grammar and choice of words.
The challenge for contact centers is that their supervisors are unlikely to be linguists themselves, even if they are native speakers of the language, so they have little or no expertise in language testing or evaluation.
How can supervisors be given at least a framework to evaluate an agent’s language quality?
The first stage is to define levels, the highest level will be that of native speaker. This does not necessarily mean no mistakes. How many native speakers do you hear who actually make no grammatical mistakes at all when speaking their own language? The lowest level will be that of a complete inability to convey any meaning at all.
“Define the levels – from native speakers to non speakers.”
It’s essential to put one or two levels in between the two extremes to enable supervisors to allocate their non native speaking agents to realistic levels. The upper level should be based on agent’s being able to be understood without effort by a native speaker without necessarily being at a native speaker’s level of accuracy. The lower level should be based on the agent being able to be understood by the native speaking customer, but only with some effort.
In addition to defining levels, we can complete the matrix by defining evaluation criteria within the language framework. The most important will be an overall assessment of comprehensibility. The reason why our agent speaks or writes is to be understood by the customer. If our agent cannot achieve this, then having a richer vocabulary than Shakespeare or Dickens won’t be much use to our customer.
Below this level would come grammar. Again, the native speaker level does not necessarily mean that the agent’s language is error free, but it must have at least no more errors than that of the native speaker. The two levels below are more difficult to quantify and may depend on how long an average phone call is likely to be. One method might be to count the number of errors per minute and decide what numbers you can use for each of the two lower levels. For written communication, you could set a limit based on the number of error divided by the total number of words in the e-mail. This does not mean that your evaluators will count the word length of every e-mail, but if a rough sample of 20 e-mails shows an average length of 100 words per e-mail, then you can make decisions on how many mistakes are acceptable limits for each level.
A similar rationale can be adopted for vocabulary. Here, the test is whether the word is appropriate. Aproppriacy is not always an easy thing to judge, and for this reason, regular and frequent calibration meetings will be needed to ensure that all evaluators are on the same page.
“Sum up those mistakes and calculate the agent’s score.”
For pronunciation, the test to define the levels will reflect the test for “comprehension”. How much effort is it for a native speaker to understand what the person is saying? This does not always mean that a native speaker’s accent is comprehensible, in the United Kingdom, there are many people who find it hard to understand what people from Scotland or the North East of England are saying, but they are all native speakers of English.
In the written language, pronunciation can be replaced by spelling. Here, a numerical approach similar to that for grammar and vocabulary can be adopted. Evaluators can afford to be harder on agents too, since many aspects of e-mails can be handled with templated answers or parts of answers which should therefore reduce the scope for spelling errors.
This framework, while greatly simplified, is based on the same principles that language examiners, such as the University of Cambridge Local Examinations Syndicate, use when evaluating how well examination candidate do when writing and speaking English and other languages. They use their frameworks for the same reason, do define levels to measure their candidates against and also as a common yardstick to ensure that they are all judging according to the same standards.














