• caglararli@hotmail.com
  • 05386281520

Does revealing semantic similarity scores between AES-encrypted data create an exploitable side channel?

Çağlar Arlı      -    7 Views

Does revealing semantic similarity scores between AES-encrypted data create an exploitable side channel?

Background: My expertise is in machine learning/AI, not cryptography, so I apologize if I'm missing fundamental security concepts. I'm trying to build a privacy-preserving AI agent system and want to understand potential vulnerabilities.

Use Case: I'm building an AI agent that can interact with personal message/contact databases while preserving privacy. For example, the agent would be able to:

  • Search message history for relevant conversations
  • Interact with contact lists and group chats
  • Send messages to specific people/groups
  • Maintain conversation context about who it's talking about/to

System Architecture:

  1. Database contains sensitive data (messages, contacts, etc.)
  2. Each sensitive piece of information has an associated AES-128 key
  3. When user queries the AI:
    • Query is processed to identify Personally identifiable information (PII) (names, contacts, etc.)
    • PII is replaced with its corresponding AES encrypted version
    • Modified query goes to the LLM
  4. When LLM needs to search the database:
    • Makes function call with encrypted parameters
    • In secure environment (black box to LLM):
      • Data is decrypted
      • Semantic similarity search performed on plaintext
      • Results re-encrypted
    • LLM receives encrypted results + similarity scores
  5. LLM constructs response using encrypted values
  6. Final output is decrypted for user presentation

Important Constraint: The system must maintain consistent encryption - if "John's group chat" is encrypted to "X7Yp9..." in one query, it must encrypt to the same value in future queries. This is because the LLM maintains conversation memory - if encryption isn't consistent, the LLM can't maintain context about which people/groups it's discussing.

Potential Vulnerability: Could an attacker reconstruct original plaintext through iterative probing?

  • Submit encrypted probe queries
  • Observe similarity scores with target encrypted text
  • Use scores as feedback to guide further probes
  • Iteratively refine probes to converge on plaintext content

Questions:

  1. Is this a valid way to attack the system?
  2. If yes, are there ways to preserve functionality while preventing such attacks?
  3. Adding noise to the encrypted data seems tricky because of the need to maintain consistent encryption to maintain conversation context.

Would appreciate any insights on whether this architecture is fundamentally flawed or if there are better approaches to achieving these privacy goals.

Note: I understand the most straightforward and secure solution would be to run the LLM locally, eliminating the need to encrypt data being sent to external models. However, this is impractical for my use case - to effectively handle these tasks (natural conversation, context understanding, function calling, etc.), it would require an LLM larger than anything most people could run locally, which is why I'm seeking a solution that works with cloud-based LLMs while preserving privacy.