June 08, 2016 | Antivirus for Windows

Endurance Test: Do security packages constantly generate false alarms?

You trust your installed security solution: when its alarm goes off, your system is at risk – or was it a false alarm after all? Whether it is for private users or in companies – false positives should not occur. In a 14-month endurance test, AV-TEST evaluated which protection solutions are prone to upset the user for no reason, and which ones are totally reliable.

False alarm or not?

AV-TEST tested 33 applications over a 14-month period.

Perhaps you have already experienced it yourself: a blazing red message window pops up on the screen, sounding an alarm. This was triggered by a file that you copied or an application you just launched. But what if the security software falsely suspects the Chrome browser or a Windows system file of being a dangerous intruder? That's when you have a classic false positive, which rattles the private user and sends an administrator scurrying through the company.

Usability endurance test for consumer users

The overview shows all the false alarms of the individual products – truly not that many.

Usability endurance test for consumer users

Hardly any false positives

Falsely categorized files with software for consumer users

Usability endurance test for corporate solutions

High point average per test

Hardly any false positives

Friend or foe detection

A security solution is required to offer perfect friend or foe detection. That is why the AV-TEST laboratory spent 14 months – from January 2015 to February 2016 – testing how well security solutions handle this issue. In total, 19 packages for private users and 14 solutions for corporate users were tested.

The testers specified four test items for the "Usability" test, which is "lab speak" as formulated by the engineers in their logs:

- False warnings or blockages when visiting websites
- False detections of legitimate software as malware during a system scan
- False warnings concerning certain actions carried out while installing and using legitimate software
- False blockages of certain actions carried out while installing and using legitimate software

Here is how the test worked

In all test categories, only benign websites, clean files or innocuous known applications were evaluated. After all, the existing number of clean files and applications in the IT world is more clearly defined than those infected. That's why all solutions work with so-called whitelist databases, in which all clean and benign files are stored with fingerprints and hash values. If a program does not immediately recognize a file, it queries the cloud whether that file is already registered. If that is not the case, the file is tagged and categorized as good or bad.

In order to obtain a truly relevant test result in terms of false positives, it is necessary to have a large number of benign test data. The lab fully met this standard:

For each individual solution, over a period of 14 months, some 7,000 websites were visited and 7.7 million files put to the test. In addition, 280 applications were launched twice and it was noted whether a false alarm was indicated, or whether the application was even blocked.

The set of 7.7 million test files naturally contained all new files from popular programs, such as Windows 7 to 10 and Office. If security software classified a new system file from Windows as false, that would have fatal repercussions. That is why the latest versions of these important files are always included in the tests.

Consumer user software: some work perfectly

Despite the high standards in the test, involving large volumes of data, some applications manage to get through without triggering a single false alarm. Avira AntiVirus Pro and Kaspersky Internet Security performed error-free.

This was followed by applications with fewer than 10 false positives in the endurance test, Intel Security, Bitdefender, AVG and Microsoft. But even the security packages at the bottom of the list did not deliver any dramatic results in the endurance test; on the contrary.

The results in detail:

- "False warnings or blockages when visiting websites" did not occur at all throughout the entire test when visiting 7,000 websites.

- "False detections of legitimate software as malware during a system scan". The worst score from Ahnlab V3 Internet Security involved 98 false positive files for 7.7 million test cases. That is a quota of 0.001% – acceptable value, even if there is room for improvement.

- "False warnings concerning certain actions carried out while installing and using legitimate software". 11 out of 19 applications did not trigger any false alarms. 6 solutions indicated only 1 to 3 false warnings in 280 tests. Only the suite from K7 Computing issued a total of 10 false warnings.

- "False blockages of certain actions carried out while installing and using legitimate software". Even here, most of the 19 programs did a good job. While 7 solutions did not falsely block anything, 11 of the security packages did so for 1 to 6 innocuous applications. The Comodo Internet Security Premium blocked 29 times.

Enterprise software: only Kaspersky did an error-free job

Also on the test of 14 security solutions for corporate users in terms of usability, nearly all the products indicated that they hardly make any errors in the friend or foe detection. Only the two Kaspersky solutions Endpoint Security and Small Office Security remained error-free in the tests. However, they only took part in 6 out of 7 test rounds.

Throughout a period of 14 months, the solutions from Sophos, Intel Security and Bitdefender showed an overall error rate of under 10. Yet all other products as well only had a low rate of false positives compared to the volume of test data.

The results in detail:

- "False warnings or blockages when visiting websites" did not occur in the entire test of enterprise security software when visiting 7,000 websites.

- "False detections of legitimate software as malware during a system scan". The worst score throughout all tests in this category came from F-Secure with only 49 false positive files for 7.7 million test files. Good, but not excellent like the Sophos solution, which only falsely detected 2 files.

- "False warnings concerning certain actions carried out while installing and using legitimate software". Only 4 out of 14 programs issued 1 or 2 false warnings on the respective 280 tests. A result that ought to please corporate administrators in particular.

- "False blockages of certain actions carried out while installing and using legitimate software". Also in this test category, the results are extremely good. After 280 tests for each program, the findings were as follows: 8 times, everything was error-free, and 6 times, there were only 1 to 5 false blockages. For comparison: on the consumer solutions, the worst result was 29!

Enterprise software solutions generate fewer false alarms

If we put both tests with security software for corporate users and private users side-by-side, we can see that corporate solutions on average generate fewer false alarms. However, it is worth noting that the manufacturers with products for both groups also have the best test results in both cases. This applies to the products from Kaspersky, Bitdefender, Microsoft, Trend Micro, Symantec and F-Secure.

Overall, however, all the products received a high quality rating in terms of usability. While the lab does always subtract a few points for false alarms, strictly speaking, this is critique at what is already a very high level of performance.

Some manufacturers will be taken aback when they see which programs caused the false alarms. A summary is listed in the top 15 and top 30 tables. These include popular applications, such as Notepad++, Yahoo Messenger or WinRAR. These really are not exotic applications, but rather standard software that everyone ought to know. Given the low number of false alarms, the manufacturers should be able to quickly get a handle on these errors in upcoming program versions.

Flare project: daily reloading of thousands of new test files

Maik Morgenstern, CTO AV-TEST GmbH

Every day, the IT world produces new, clean files that are not allowed to be falsely flagged by security products. Thus, with its Flare Project, AV-TEST collects the new files on a daily basis and also uses them in the test.

Security products detect benign and clean files by their unique hash values and fingerprints, which they store in their Cloud, for example. Thanks to the databases with hash values and fingerprints, the scan is completed at lightning speed. Therefore, the known files generally don't trigger any false alarms.

In order to test the security products professionally in terms of false positives, the test laboratory is required to have on-site the latest files produced by the IT world: applications, program updates, plug-ins, descriptions, drivers, etc. For this, AV-TEST uses its internal Flare Project.

Flare combs the Internet every day for new files, e.g. on manufacturers' websites or download portals. If there are new products, these are downloaded automatically installed. Afterwards, everything is tested for malware or unwanted applications. If the application, along with the corresponding files, is clean, it ends up in the database. Using this method, the database fills up with tens of thousands of new files every day! The Flare database currently lists just under 40 million clean files with a total size approaching 25 terabyte.

Incidentally: The collection naturally also includes all the new popular updates such as those from Microsoft Windows, Office, Adobe, Oracle, Mozilla, Google, Intel, IBM or SAP. If a file from these companies is wrongly classified by a security suite, this usually has grave consequences. That is why these files are always included in the AV-TEST encompassing test 7.7 million files.