Litepaper

Whitepaper link: Read Whitepaper aqui linkear

What is Cinna and Why Cinna

Cinna is a decentralized GPU cloud built for AI compute. At its core, Cinna runs on a DePIN protocol where GPU owners can supply compute resources without permissions, and users can run AI or other GPU-intensive workloads through simple API or SDK access. With a scalable and cost-efficient infrastructure, we are creating an open ecosystem that enables developers to build, deploy, and monetize innovative AI solutions.

Current GPU clouds have introduced models similar to an “Airbnb for GPUs” where users rent GPU access directly. However, these setups often require technical know-how and incur additional labor costs for managing GPU machines. Scaling resources dynamically is also a challenge. These limitations have slowed the adoption of GPU-based DePIN models.

Cinna integrates serverless compute into its core protocol. Serverless means the infrastructure is abstracted away from the application layer. There are no virtual machines to manage and no complex software frameworks to configure. Cinna follows a Platform-as-a-Service model, comparable to services like AWS SageMaker or HuggingFace, offering dynamic scaling of resources based on real-time demand. The infrastructure burden is hidden behind clean, intuitive APIs and user-friendly frontends.

The main focus of our infrastructure is AI inference. Around ninety percent of compute in an AI model’s lifecycle is consumed during inference. While training AI models demands clusters of interconnected GPUs in optimized data centers, inference can run effectively on a single GPU or a machine with a few GPUs. This makes inference a perfect workload for a decentralized GPU network like Cinna.

Cinna supports a variety of GPU-based tasks beyond AI inference. These include training small language models, model fine-tuning, and ZK proof generation. The common trait among these tasks is low communication demand across nodes, which allows us to efficiently schedule and allocate GPU resources globally.

Global generative AI market growth. Source:https://www.statista.com/outlook/tmo/artificial-intelligence/generative-ai/worldwide

To demonstrate the power of Cinna’s serverless GPU DePIN, the Cinna team has built several free-to-use AI apps and services, including: LIKKEAR ABAAJO

Cover

Cinna Imagine

Image generator supporting a range of fine-tuned Stable Diffusion and Flux models.

Cover

AI-powered search engine optimized for fast, relevant results.

Cover

LLM Gateway

OpenAI-compatible API endpoint for seamless LLM integration.

All tools are released under open-source licenses.

Explore the full Cinna ecosystem and integration partners at https://cinna.ai/ecosystem LINKEAR BIEN DESPUES

Protocol Overview

Cinna is built to enable a permissionless, efficient, and user-centric decentralized GPU cloud. Below is a detailed breakdown of its core components:

Compute Layer

Compute Nodes: These are the basic units of computing power, representing full GPUs or fractional GPUs (leveraging techniques like NVIDIA’s Time Slicing and Multi-Instance GPU). Owners of both consumer and datacenter-grade GPUs can join as compute providers. The system is fully permissionless, with no lock-in periods, making it ideal for monetizing idle hardware.

Pods: These are self-contained, deployable units that run GPU workloads. Each pod defines its specific hardware and software needs, enabling precise matching with the right compute nodes. Pods can represent AI models, workflows that connect multiple models, fine-tuning services, or even ZK proof generators. Cinna automatically scales the number of nodes hosting each pod based on demand to maintain performance and cost-effectiveness.

Validation System: To ensure result integrity, Cinna uses a Solana-based crypto-economic validation mechanism. This includes staking models and tailored verification methods adapted to the nature of each workload, preserving trust and efficiency across the network.

Autoscaling Support

Cinna natively supports autoscaling for pod deployment. The system automatically scales up when workload demand increases and scales down during idle periods. The number of active compute nodes assigned to a pod is continuously adjusted using metrics like GPU utilization and job queue length, as well as input from token-based governance.

When no demand is detected, the system can fully scale to zero, eliminating both cost and resource consumption. This approach ensures high efficiency for workloads with fluctuating or sporadic processing needs.

Orchestration Layer

Cinna is developing a sovereign Layer 2 chain to coordinate its decentralized GPU cloud and ecosystem operations. This chain is built as an Elastic Chain using the ZK Stack, unlocking several powerful advantages:

High throughput with minimal cost: In a pay-per-use cloud environment, micro-transactions are frequent. Cinna’s ZK-based architecture aggregates multiple state updates into single slot changes using a state-diff model, drastically lowering transaction costs and optimizing network efficiency.

Interoperability: As part of the Elastic Chain framework, Cinna can seamlessly interact with external services and partner protocols without relying on fragile or complex bridges, enabling smoother integration across the Solana ecosystem and beyond.

Sovereignty: Cinna’s operations are fully isolated from external Layer 2 or Layer 1 activity. This guarantees that core systems such as compute orchestration, resource allocation, and payment settlement remain unaffected by congestion or volatility elsewhere.

The Layer 2 sequencer also functions as a router for serverless compute requests. It intelligently directs tasks to optimal nodes based on hardware profiles, software requirements, uptime, and performance metrics. Once a compute job is successfully completed, the transaction is finalized on-chain, triggering payments to contributors. Failed or timed-out tasks can be retried or discarded without incurring on-chain costs, preventing wasteful spending and enhancing system responsiveness.

Application Layer

Frontends

Anyone can build frontends to interface with the Cinna protocol. In Cinna’s ecosystem, a “frontend” goes beyond traditional web interfaces — it includes any entry point through which users can access the decentralized AI cloud. This can be a web app, mobile interface, API gateway, or a custom integration tailored to specific industries.

Developers who build frontends receive a share of the revenue generated by their traffic, creating clear incentives to innovate and maintain user-facing layers. By not having a single, privileged interface, Cinna avoids central points of control and promotes open participation.


Payment Gateways

Cinna supports flexible payment options designed for both Web2 and Web3 environments. Two primary models are supported:

Pay-by-Developer Developers can pre-purchase compute quotas as Compute Credits, similar to SaaS or PaaS structures. They absorb the infrastructure cost, and profit from the difference between user revenue and backend compute expenses.

Pay-by-User End users interact with applications directly through crypto wallets, paying compute fees per request via smart contracts. This approach removes upfront costs for developers and supports usage-based billing. It also unlocks new models — such as autonomous agents onchain that can pay for their own compute dynamically.

Alignment-Centric Economics

Cinna’s economic structure is built around the principle of alignment. Every participant is incentivized to contribute toward shared goals such as expanding compute capacity, improving resource efficiency, enabling innovative applications, and upholding the long-term values of scalability, transparency, and a commitment to open-source development.

Flow of values in the economy

Economic Primitives

  1. Protocol Emission Cinna distributes tokens to compute providers based on the workloads they handle. The emission model includes a base reward that guarantees a minimum payout regardless of overall network activity, and a dynamic reward that adjusts according to real demand for GPU compute. This structure supports scalable growth while keeping the emission rate predictable and aligned with real usage.

  1. Voting for Compute Nodes

Cinna allows token holders to assign voting power to specific compute providers. This voting directly impacts the reward multiplier applied to each provider’s protocol emissions. In return, compute providers can optionally share a portion of their mining rewards with voters as a tip, offering an added incentive for active participation in network governance.

  1. Revenue Sharing Between Compute Nodes and Pods

When users pay for compute resources that host a pod, part of that payment is shared with the person who deployed the pod. This rewards developers for creating and maintaining valuable pods, such as widely used open source AI models that drive consistent demand.

4. Voting for Pods

Token holders can assign voting power to specific pods, which influences how much they can scale within the network. A portion of the revenue each pod generates is shared with its voters, providing a clear financial incentive for supporting valuable and active pods.

Revenue Sharing Between Compute Nodes and Frontends

When a user makes an onchain payment through a frontend, the smart contract defines how revenue is split between the compute provider and the frontend operator. Frontends can also direct compute requests to specific nodes, enabling curated performance and deeper integration.

These core primitives shape Cinna’s alignment driven architecture. The ecosystem is designed to consistently reward those who contribute meaningful value, whether by supplying compute, building impactful applications, or curating high quality resources. All generated value is preserved and reinvested within the network, reinforcing long term sustainability and collective growth.

Real World Interactions

These economic primitives form a dynamic and interconnected network where participants can take on multiple roles to maximize their benefits. Here is how these components function in real world scenarios:

Data center operators can monetize their existing infrastructure by contributing GPUs as compute providers and also hosting user interfaces as frontend operators. By playing both roles, they earn from protocol emissions and user payments. Prioritizing their own compute nodes for frontend traffic helps them keep their hardware fully utilized, while still being able to scale beyond local limits by tapping into the broader network.

Individual GPU owners can engage in multiple ways to increase rewards. As compute providers, they receive protocol emissions. They can also stake Cinna tokens to vote for their own nodes, high performing compute operators, or valuable workflows. This staking and voting system adds another layer of income and helps guide the allocation of network demand.

AI developers can deploy and monetize their AI models or workflows and receive a portion of the revenue from their use. They also have the option to become frontend operators, building interfaces for their applications and gaining direct control over the user experience. This role allows them to fully integrate into Cinna’s monetization and scalability model.

The economic structure within the Cinna ecosystem encourages a cycle of efficient resource use and reliable service. The CINNA token is embedded in every action within the network, creating a positive feedback loop where ecosystem growth directly supports every participant.

Last updated