Research

Your AI Agent Could Be Lying. These 15 Can't.

Mustafa Hourani

May 26, 2026 • 15 min read

Today, when you use an AI agent, you have no way to verify it's telling you the truth. The company running it could be reading your data while it operates. Its output could be silently swapped before it reaches you. Whatever receipt it hands you back doesn't have to match what actually happened on the company’s server. The only thing stopping any of this is the developer’s word.

What if you didn't have to take their word for it?

There's a different model emerging for how to build AI agents. The agent runs inside hardware that even the developer can't see into (called a Trusted Execution Environment or TEE). Every decision it makes is signed by a key that only the specific deployed version of the agent can hold. The signing key, the code, and the receipts are all bound together cryptographically. So when the agent tells you what it did, you don't take its word for it. You can verify all these components.

Last month, 10 builders shipped 15 different demo projects that work this way on EigenCompute. They are a preview of what the next few months of trusted AI agents might look like.

B³ (Bug Bounty Broker)

Repo by @Sanjith020

The bug bounty world has a trust asymmetry that costs researchers real money. If you find a vulnerability in code, you have to disclose it to the company that listed the bounty to get paid, and then hope that they pay you. If they decide not to (or low-ball you), there's no real recourse.

B³ flips this asymmetry. A security researcher submits an encrypted proof-of-concept to a neutral broker, the broker actually executes the exploit inside an enclave to confirm severity, signs an attestation describing how bad the bug is, and only unseals the full details to the company after USDC payment is made on-chain. With this model, security researchers no longer have to give away the answer first.

What EigenCompute uniquely enables: A neutral broker that doesn't require either side to take the other's word for it. Security researchers get cryptographic proof the exploit won't leak before payment. Companies get the same kind of proof that the severity score came from code they can audit. Both proofs come from a single mechanism: the exploit runs in a box only the deployed code can see into.

Dokimos

Repo by Pranavi Rohit

Every time you rent an apartment, open a bank account, or sign up for a service that asks "are you over 18?", you need to hand over your full government ID and a picture of yourself. The verifier typically never needs the document itself. Instead, they need a single attribute from it. However, existing identity providers (like Jumio and Persona) all work the same way by ingesting your raw document and you trust them not to keep a copy.

Dokimos (a Greek word for "tested and proven" like a coin checked against a touchstone) is a verify-once identity primitive that flips the model. You upload a government ID and a picture into a secure enclave. The backend then runs OCR, converting text inside images into machine-readable text. However, it does this without anyone (including Pranavi as the developer) being able to see the document in plaintext. From then on, every relying party that asks who you are gets a signed cryptographic attestation of only the attributes they need (18+, nationality etc), not the document itself. The system has both a consumer vault for managing your verified attributes and a verifier dashboard for businesses on the other side of the transaction.

What EigenCompute uniquely enables: Identity verification shifting from "trust X company" to "verify X attestation." A verifier can follow a chain of independent checks and confirm that the attestation came from exactly the code they audited, running on hardware nobody can see into. For builders shipping anything that needs KYC features without holding personally identifiable information, this is a potential solution.

Nostos

Repo by Pranavi Rohit

Nostos is an example of what you build on top of Dokimos. Renting an apartment requires sending your driver's license and a picture to a landlord (or to whatever 3rd party service the landlord plugged into their website), and hoping nobody keeps it. As we discussed earlier, the landlord doesn't actually need your document. They need to know you're over 18, that the ID hasn't expired, and that the face matches the person signing the lease.

Nostos is a proof-of-concept of that. The renter goes through a Dokimos verification once, then talks to a rental agent chat that walks them through mock NYC listings. The landlord sees verified attributes (without personally identifiable information) on their dashboard and can independently verify the attestation came from the deployed identity service. There's also an integration page for developers who want to embed the same flow into their own rental platforms.

What EigenCompute uniquely enables: The entire agent-mediated workflow stays trust-minimized. The rental agent that talks to the renter, the verification process that issues the attestation, and the landlord-side check, with signing keys bound to specific deployments. The renter doesn't have to trust the rental agent not to leak their ID because the rental agent never sees it. The landlord doesn't have to trust the rental agent's attestation, because they can independently check the signature, on-chain image record, and the source code. Rent is an example, but this strategy can be applied to anything an agent stands between a user and a service that wants their documents.

Sealed

Repo by @Victorchenca

When you're trying to figure out if you're underpaid in your job, you need to know what the four other engineers at your level on your team actually make. Levels.fyi and Glassdoor give you a band (say $180k to $250k), but the band doesn't tell you whether you should be at $190k or $230k, or whether the new hire who started last month got a signing bonus you didn't. That conversation never happens because whoever shares first commits to a number which the other person then gets to decide whether to reciprocate or deflect.

Sealed allows coworkers to "share simultaneously or not at all". A small group of peers at the same company, role, and level privately submit their compensation. Inside the enclave, a classifier validates each submission against a public rubric. Once enough valid submissions accumulate, every submission reveals simultaneously, with medians and percentiles surfaced alongside. Until then, nobody (including the app developer) can read any individual submission.

What EigenCompute uniquely enables: There is no privacy policy to trust. The running code is provably the public code on GitHub, the classifier weights are verified at build time, and the wallet that signs the final reveal is bound to the specific deployed image. Salary disclosure is the demo in this instance, but the ability to collect input where everyone wants to share and nobody wants to be first covers other scenarios like performance reviews or even whistleblower cases.

Vienna

Repo by @Victorchenca

Diplomacy is a strategy board game where players control pre-WWI European countries and write secret orders for their armies each turn. Nobody can see what anyone else wrote until all orders are revealed simultaneously. That means the game has a requirement that someone has to receive every player's secret orders, see them, and adjudicate the turn at once. Historically, that someone has had to be a human you trust. Online Diplomacy servers today (webDiplomacy, Backstabbr, PlayDiplomacy) solve this by asking you to trust the company hosting it. Vienna moves the game master inside a hardware enclave on EigenCompute, so the entity reading the secret orders is code nobody can see into.

Player orders are encrypted to a key that only exists inside the enclave. Every turn's resolution is signed by the same enclave. AI seats fill empty slots when humans drop, and they pay for their own Claude inference through agent-to-agent payments. The signed winner attestation at game end is structured to trigger an on-chain USDC payout, so the game can settle itself without a human in the loop.

What EigenCompute uniquely enables: This game architecture only works if you can prove that the box reading the secret orders isn't quietly leaking them or rigging the resolution. Without that, Diplomacy can't escape the trusted human in the loop. With it, the entire category of hidden-information games (like poker or even sealed-bid auctions) becomes possible to run as autonomous software.

ProofJudge

Repo | Writeup by @SethGammon

When an AI agent judges someone else's work (a pull request, a research deliverable, a vote recommendation), you have no way to confirm the judgment came from the agent that was deployed and not from a human who reached in and rewrote the verdict afterward. This gap kills any flow where work needs to get paid out automatically on acceptance, because there's no impartial witness to whether the work was actually accepted. It also blocks something bigger. Agent marketplaces only work if agents can maintain reputations the way humans do that on LinkedIn or other professional track records, with verifiable records of what each agent has actually done.

ProofJudge fixes that. It's a verifiable acceptance layer for autonomous work. You submit code, research output, or a governance proposal, and ProofJudge evaluates it against an explicit rubric, and then hands back a signed decision receipt that anyone can verify was not altered after the fact. Seth shipped four flavors of the same product, each running as its own deployment: Code (bounty and PR acceptance), Research (sourced deliverables), Negotiation (deal-term compliance), and Governance (pre-vote risk receipt).

What EigenCompute uniquely enables: Every verdict is bound to a specific deployed version of the agent. The decision carries the hash of the work submitted, the hash of the verdict itself, the signature, and the deployment identity. If anybody alters any of those after the fact then the verification breaks. ProofJudge doesn't necessarily claim its verdicts are objectively correct, but it does prove which deployed agent produced which verdict, and that the verdict you're looking at hasn't been rewritten after the fact.

GuardX

Repo | Writeup by @Avinashnayak27

When your AI agent is about to call another agent and pay it, some things you'd like to know are that the other agent isn't going to drain your wallet or leak your secrets.

Avinash built a pre-transaction safety check for agent-to-agent payments. You give it an app address and GuardX reconstructs the exact code that was deployed, and runs an AI audit covering the scenarios that can actually go wrong. These include endpoint dishonesty (does the service do what it claims?), mnemonic and wallet exposure, admin exploit paths, and risky third-party dependencies. The output is a verdict your agent can act on before it pays: ALLOW, LIMIT, REVIEW, or BLOCK.

He pointed GuardX at a deployment of a different app he built (Mnemonic Hunt) and GuardX returned BLOCK with a 5/100 safety score. It revealed that the paid clue endpoint ignored the indices the user requested and returned random words instead. That clue generation was leaking actual mnemonic material to OpenAI, and that the admin sweep endpoint had no authentication.

What EigenCompute uniquely enables: GuardX runs in its own enclave with its own enclave-bound signing key, so the verdict it gives you is verifiable as having come from the deployed GuardX, not from someone who intercepted the call. It also reads from the on-chain record of every other deployed app to reconstruct the exact code running. The entire loop of auditing a verifiable app with another verifiable app before paying it becomes possible with EigenCompute.

Heirloom

Repo by @Sanjith020

Roughly 3.8 million BTC (about 20% of the supply) is estimated to be permanently inaccessible due to factors like lost keys with no recovery plan. At current prices that's more than $400B locked forever. Current solutions to address this exist with tradeoffs. Custodial services ask you to trust a single company to still be around and honor its terms decades from now. Meanwhile, multisigs with family or lawyers requires every signer to stay crypto-literate and keep their key safe for the rest of their life.

Heirloom is a crypto inheritance agent that checks in with you on a schedule you set, and distributes your assets to your beneficiaries if you stop responding. The check-in is a selfie hash with a wallet signature. If the check-ins stop, the agent escalates through reminders, then emails to your emergency contacts. It then does an LLM-driven analysis of your on-chain activity to confirm you really are unreachable. If not, it executes the distribution across every chain you specified, split by the percentages you pre-configured.

What EigenCompute uniquely enables: The agent holding your seed has to be an entity nobody can compromise. On a normal cloud, the operator could read your seed at any time, and you'd have no way to know. On EigenCompute, the seed is sealed in a key derived inside the enclave from a secret only the platform injects at runtime. Every escalation step is signed by an agent wallet that only the deployed enclave can produce, so when your beneficiaries get paid, they can prove it was the agent acting.

Health Agent

Repo | Writeup by @AkloukAudeh

In January 2026, OpenAI launched ChatGPT Health. To use it, you upload your heart rate, sleep, workouts, conditions, every metric your Apple Watch tracks, to OpenAI's servers. You have to trust that OpenAI doesn't train on it, sell it to insurers, share it with advertisers, or hand it over on a subpoena. The problem is that there’s no way to verify any of that.

Health Agent is the same product built so the operator physically cannot read your data. You upload your Apple Health export, the agent parses it inside a hardware enclave on EigenCompute. It stores it encrypted at rest with keys nobody outside the enclave can access, and answers questions ("how does my recovery compare to last month?") grounded in your real numbers. Every response comes with a receipt that contains a hash of the question, a hash of the answer, the model used, and the list of cohort contributors whose patterns shaped the answer.

A caveat is that in v1 of this proof of concept, the LLM call itself still routes out to Anthropic so the model provider sees the prompt. However, the v2 roadmap aims to move inference inside the enclave too. Everything else (parsing, storage, cohort matching, receipt structure) is already in the trust boundary.

What EigenCompute uniquely enables: A two-sided marketplace where neither side has to trust the other. The user uploads data the operator can't read. A buyer (an insurance actuary or a sports performance team) pays per query to get answers. The receipt cryptographically records which contributors' patterns shaped each answer, and the query fee splits to them programmatically. Importantly, contributors get paid for influence on a buyer's answer.

Eigenized Gazette

Repo | Writeup by @Megabyte0x

AI news summaries typically hide their work. If you ask for a summary of an article, you get a confident paragraph with no way to see what the model picked up, what it ignored, how it framed the piece, or whether it ever considered the other side.

Eigenised Gazette addresses this by taking one article URL and running two AI agents on the same source material at the same time. One agent builds the strongest pro case, while the other builds the strongest contra case. A third agent compares them and surfaces where they agree and where they don't. This workflow is exposed as a paid endpoint that other agents can pay and call autonomously in USDC. The agent pays Eigenised Gazette and gets back a signed two-sided brief it can pass to whatever it's working on.

What EigenCompute uniquely enables: Every agent output is bound into a signed manifest pointing back to the exact deployed version that produced it. A standalone replay tool can take any saved manifest and re-run the workflow to confirm the output matches.

Hecate

Repo | Writeup by @EureCory

When an autonomous agent wants to swap crypto tokens, it faces a similar problem a human trader does. Submitting an intent publicly means bots watching the chain see it and trade against you before your swap executes, leaving you with a worse price. Meanwhile, submitting to a private system means trusting the operator not to do the same thing themselves.

Hecate is a fresh attempt at this design space. It is a TEE-mediated engine for agent intents with richer constraints than humans usually express (limit, partial-fill, min fill, deadlines, fallbacks, max impact). Agents submit encrypted intents. The engine matches them in a confidential batch and emits two kinds of receipts: a public batch receipt that proves the batch happened, and a private fill receipt only the owning agent can fetch.

What EigenCompute uniquely enables: The matcher has to see every agent's private constraints to find the best cross, which means it has to be an entity nobody trusts. On a normal cloud, the operator can front-run or selectively match. On EigenCompute, the running engine is provably the code published, and the server can refuse to start if its attestation metadata doesn't match the deployment record.

Vanta

Repo | Writeup by @Oxwizzdom

Polymarket crossed billions in monthly trading volume earlier this year and holds hundreds of millions of dollars in open positions. If you hold a position, your capital is stuck until the market resolves or you sell early at a discount. There's no clean way to borrow against your bet because prediction market positions are volatile and hard to price.

Vanta is an autonomous lender for those positions. The core is a council of three lending agents (one Claude-backed, one GPT-backed, one Gemini-backed), each with a different style. Claude is conservative and long-horizon, GPT is fast and tactical, Gemini focuses on politics and policy markets. You shop between them the way you'd shop between three banks. The UI is a fun walkable fantasy world with three kingdoms (one per agent persona) where NPCs are each kingdom's underwriting council. When you submit a loan, NPCs give their signed opinions, and synthesize a verdict.

What EigenCompute uniquely enables: Agents that can own their wallets, pay their own bills out of fees they earn, and run code that's provably the public source.

Mnemonic Hunt

Repo | Writeup by @Avinashnayak27

Mnemonic Hunt is a paid puzzle game with the prize sitting inside its own trap. A 24-word seed phrase controls a wallet holding USDC. Players pay per request for a single AI-generated image that combines visual clues for the word positions they asked about. The goal is to reconstruct the seed phrase and claim the prize wallet before a time-locked admin sweep.

What EigenCompute uniquely enables: The seed phrase that controls the prize wallet is generated and injected at runtime by the platform. The agent runs the game on top of a secret only the enclave can see which is why nobody, including the admin, knows the answer. Although mnemonic Hunt is a puzzle game, this pattern also works for other implementations like auctions and sealed-bid markets.

Governance Agent

Repo | Writeup by @EureCory

DAO governance has a turnout problem. Most token holders don't vote because reading every proposal is basically a part-time job. The few delegates who do vote are a small centralization risk. There is a new wave of AI governance agents that propose to fix that gap, but almost all of them collapse to the same shape. You chat with a model, get an opaque recommendation, and trust the model.

Cory's governance agent works on a different premise. You write your governance values in plain language. The AI reads them and turns them into a checklist of rules. From there, it's the rules that decide each vote. The AI is effectively a translator at the start, while the rules are the judge after that. You can edit the rules and save versions of them, and prove later which version produced any given vote. There's also a preview screen that lets you edit a rule and see how your vote would have changed across dozens of past DAO proposals.

What EigenCompute uniquely enables: Each user gets their own voting wallet that lives inside the hardware enclave, locked to a specific build of the agent. Months later, you can prove which version of your policy produced any vote and that it could only have come from the agent that was actually running. This same pattern fits trading agents, procurement agents, or anything where you need to prove the agent acted on the policy it claimed.

Verifiable DAO Proposal Risk Agent

Repo | Writeup by @Megabyte0x

DAO voters are often asked to approve code they rarely have time to read and analyze. A forum post might say "raise a supply cap," but the actual code being approved could also contain exploits like handing over admin control of the protocol. Existing simulation dashboards help, but the voter still has to trust the dashboard.

This agent turns proposal review into a verifiable check the voter can run before deciding. You give it a link to a governance proposal, and the agent pulls the actual code the vote would execute to run it in a safe sandbox. It investigates what would change, compares the protocol state before and after, and flags the proposal as SAFE, WARNING, or CRITICAL.

What EigenCompute uniquely enables: DAO voters can now verify reports themselves instead of trusting whoever served it. The signature ties the analysis to a specific version of the agent and a specific set of rules, so anyone reviewing a vote can confirm the report came from the code they expect.

Final Thoughts

A lot of the demos in AI right now are about what models can do. These 15 projects are about what you can prove the models did, and what that unlocks. If you're building an AI agent that takes payments, holds sensitive data, or works alongside other agents, "does it work" is not enough. The agents that earn trust from here on are the ones that can show and prove their work.

To explore some of the other agent projects built with EigenCompute, check out our ecosystem site. And if you're ready and want to build, check out our quickstart here.

B³ (Bug Bounty Broker)

Dokimos

Nostos

Sealed

Vienna

ProofJudge

GuardX

Heirloom

Health Agent

Eigenized Gazette

Hecate

Vanta

Mnemonic Hunt

Governance Agent

Verifiable DAO Proposal Risk Agent

Final Thoughts

Sign up for more research, updates, and announcements: