Gmail RETVec antispam: what it is and how it works

RETVec, or Resilient & Efficient Text Vectorizer, is a technology developed by Google in 2023 to strengthen the security of Gmail, using artificial intelligence to identify and block spam, phishing and other types of malicious emails through heuristic analysis.

Gmail is one of the most widely used email services in the world, with over 1.8 billion users. Considering that almost 50% of the more than 400 billion email messages sent every day in the world are spam, the task of automatically identifying and filtering these messages is enormous.

That’s why Google invests a lot of effort in improving its filters. RETVec is one of these filters, heuristic in nature but based on artificial intelligence capable of identifying patterns and recognizing new practices.

We’ll talk about this technology in this post.

What is RETVec?

RETVec is an artificial intelligence-based heuristic analysis anti-spam technology, developed by the Google team for Gmail, capable of identifying and blocking spam, phishing and other types of malicious emails, even when senders try to bypass filters with intentional typos, special characters or language mixing.

In other words, RETVec aims to be able to identify malicious email messages even when senders try to bypass traditional filters with tactics such as intentional typos, special characters or language mixing.

The impact of implementing RETVec technology is significant: it has increased Gmail’s spam detection rate by 38% and reduced false positives (cases in which legitimate emails are mistakenly marked as spam) by 19.4%.

What’s more, its computational efficiency is impressive, reducing the use of TPUs (Tensor Processing Units), Google’s specialized hardware, by 83%! In other words, this allows RETVec to work not only on robust servers, but also on devices with limited resources, such as smartphones, guaranteeing real-time protection for all Gmail users.

Retvec gmail spam filter performance — Source: Google

What is a heuristic filter?

A heuristic email filter is a system used to identify and classify email messages based on a set of predefined rules or patterns, called empirical rules. These rules are created based on characteristics normally associated with unwanted emails, such as spam and phishing.

The heuristic filter analyzes the content and characteristics of an email, applying a set of rules that evaluate aspects such as:

Suspicious keywords: expressions such as “free”, “special offer” or “click here”.
Formatting: excessive use of capital letters or punctuation, e.g. “WIN NOW!!!”.
Structure: email messages with multiple hyperlinks, especially to unknown or shortened domains.
Sender: unknown origin or low reputation.

Weaknesses of heuristic filters

Although heuristic filters are a powerful ally for email providers in the fight against spam and malicious emails, they can have some disadvantages, such as:

False positives: legitimate emails can be mistakenly classified as spam if they match certain rules.
Constant maintenance: filtering rules need to be updated regularly to maintain the tool’s effectiveness, as spammers are always innovating and looking for new ways to bypass filters.

How does RETVec work in practice?

According to Google, RETVec transforms text into numerical vectors, which are mathematical representations that capture the underlying meaning of words, regardless of how they are written or manipulated. In other words, RETVec is somehow able to interpret the intention of a text, regardless of how it is written.

This approach allows it to ignore “noise” such as visual substitutions or typographical tricks, focusing on the purpose of the content.

Here are practical examples of how RETVec works in real situations:

Intentional typing errors

Example: “W1NN3R! Cl41m Y0ur Pr1z3 N0w!”

Spammers often replace letters with numbers or symbols (e.g. “0” instead of “o”, “1” instead“i”) to fool simple filters. This technique is called Leetspeak, 1337, eleet or hacker speech, which is a form of writing that uses numbers and symbols instead of letters.

In the example given, RETVec recognizes that “W1nn3r” is a variant of “winner” and “Pr1z3” of “Prize”, identifying the pattern of a suspicious commercial offer, especially if accompanied by a dubious link.

Homoglyphs and similar characters

Consider a phishing email saying “Update your Gmаil password at [fake link]”.
Here, the “а” is a Cyrillic character (Unicode U+0430) that mimics the Latin “a” (U+0061) – see the image below. Old filters could be fooled, but RETVec normalizes these homoglyphs, detecting that “Gmаil” is trying to impersonate “Gmail” and flagging the attempted scam.

Homoglyphic characters is a tactic used by spammers

Symbols and spaces as a disguise

Example: “W.i.n 1 m.i.l.l.i.o.n today” or “W i n 1 m i l l i o n today”

This technique uses dots or spaces to separate letters to make analysis more difficult. RETVec removes these noises, interpreting “W.i.n” as “Win” and “m.i.l.l.i.o.n” as “million”, and, when combined with the context of an exaggerated promise, marks the email as spam.

Gappy text used in spam messages as a tactic to avoid spam filters

Multilingual text

A sender sends a message with a subject line like “Hello friend, 你好! Check out this special offer at [link]”. The combination of English (“Hello”) and Chinese (“你好”, which means “hello”) could confuse monolingual systems. RETVec, however, processes both languages effortlessly, focusing on the intent of the “special offer” and the suspicious link to classify it as potentially malicious.

RETVec’s differential lies in its ability to process text in all languages and UTF-8 characters without the need for pre-processing. Unlike traditional filters that rely on specific, predefined rules or patterns, RETVec is adaptable to spammers’ strategies.

But how exactly does it achieve this?

Technically, RETVec vectorizes text into a unified representation that is able to analyze patterns and context to make decisions, such as the sender, sending history and associated links.

Which parts of the email does RETVec work on?

RETVec does not limit itself to a specific section. It analyzes all the textual parts of an email to ensure a comprehensive evaluation. So it works:

Subject: RETVec examines the subject text for manipulations or suspicious phrases. An example such as “Win £1 Million Today!” would be normalized and identified as an unrealistic promise.
Email body: in the main text, it detects malicious intent in paragraphs or call-to-actions. For example, “Click here to get your prize!” is unmasked as a scam attempt.
Links and URLs: RETVec collaborates with systems that check the real destination of a hyperlink, finding discrepancies between the displayed domain and the real one.
Sender: processing the name displayed in the “From” field to identify spoofing. A sender such as “Gmaіl Support” would be flagged as a fake.
Attachments with text: If the email includes PDFs or images with text extracted via OCR, RETVec can process them. For example, an attachment with “Send your data here!” would be vectored and flagged as phishing.

RETVec also integrates with other Gmail systems, such as sender reputation checks and link analysis, for complete protection.

Cautions for digital marketers

RETVec does not distinguish between malicious intent and poorly executed legitimate campaigns. Digital marketers need to plan their email marketing actions carefully to avoid their emails being mistaken for spam.

Follow these guidelines:

Avoid typographical tricks: texts like “S3LL N0W” or “C.L.I.C.K N.O.W” may be creative, but RETVec normalizes them and associates them with spam tactics. Write correctly.
Moderate promises: topics such as “Win 1 million today!” or “100% free!” can seem exaggerated and trigger filters, even if they are true. Be careful.
Reliable links: avoid hyperlink shorteners. Use recognizable URLs to build trust.
Multilingual consistency: in global campaigns, mixing languages without context (e.g. “Hello 你好” at random) can raise suspicions. Make sure the content is cohesive. At best, send separate email campaigns for different languages.
Solid reputation: send email campaigns from authenticated addresses (DKIM/SPF) and avoid purchased lists or new senders, as RETVec works with systems that analyze history.

Legitimate and well-crafted campaigns pass through RETVec without any problems, but exaggeration or dubious tactics can send your e-mail straight to the spam folder.

Conclusion

RETVec is a feature that demonstrates the advance of artificial intelligence in digital security, offering Gmail more reliable protection against spam and phishing.

Acting on all textual parts of the email, it benefits its users with a safer inbox. Although this technology is exclusive to Google for Gmail, its concept could inspire the future of email filtering.

FAQ

What is RETVec and what does it do?

RETVec is a Google technology that uses artificial intelligence to identify and block spam, phishing and malicious emails in Gmail, detecting threats even when the text is disguised as typos, symbols or a mixture of languages.

How does RETVec recognize disguised spam?

RETVec converts texts into numerical vectors that interpret the meaning of the words, ignoring visual tricks such as “Fr33” or “L o w P r i c e s”. Thus, it understands the intent of the message, even with manipulations. recognize disguised spam?

Which parts of the email does RETVec analyze?

It examines all textual content: subject, body, links, sender and attachments with text. This leads to a more comprehensive analysis and greater accuracy in identifying threats.

What is the difference between RETVec and heuristic filters? does RETVec analyze?

Heuristic filters generally follow fixed, predetermined rules, while RETVec adapts and learns from message patterns. This makes it more efficient and less susceptible to spammers’ mistakes or tricks.

How should digital marketing adapt to RETVec?

Professionals should avoid exaggeration, typographical tricks and suspicious links. Well-planned campaigns, with clear content and a reliable sender, pass through the filters without any problems.

Categorized in:

Email Deliverability, Email Marketing,

Gmail RETVec antispam: what it is and how it works

Table of Contents

What is RETVec?