Free Tiers for Popular LLM Providers (2026) and How to Get Them
Executive Summary
This report compares the free tiers of several widely used large language model (LLM) and inference platforms as of early 2026 and provides step‑by‑step guidance on how to activate and use each one for development and experimentation.
It focuses on Google AI Studio (Gemini API), OpenRouter, DeepSeek, NVIDIA Build (NIM), GitHub Models, Hugging Face Inference, SambaNova Cloud, Cloudflare Workers AI, and Together AI.[1][2][3][4][5][6][7][8][9]
The offerings vary from permanently free, rate‑limited access to specific models through to one‑time or recurring credit pools that can be consumed across many models.
Most platforms allow you to start without a credit card and are designed primarily for prototyping rather than high‑volume production workloads.[4][5][6][7][1]
High‑Level Comparison
| Provider | What you get for free (typical) | Key limits (indicative) | Card required to start? | Primary activation path |
|---|---|---|---|---|
| Google AI Studio (Gemini API) | Free tier for Gemini models, including fast models like Gemini 2.5 Flash | Per‑project RPM/TPM/RPD caps; guides cite ~60 RPM and ~1M TPM for Flash; example 1.5k RPD[1][10][11][12] | No for free tier; billing only needed to upgrade[1][12] | Create AI Studio project, generate API key, stay on Free usage tier[1][10] |
| OpenRouter | 20+ free models via Free Models Router and model‑specific :free variants |
Availability and limits vary by model; free router chooses from current free pool[2][13] | No[2] | Sign up, get API key, select models with “(free)” or add :free suffix[2][13] |
| DeepSeek API | One‑time 5M free tokens for new accounts (approx. a few USD of usage) | Aggregator reports no hard rate limits; free pool can change over time[3][14] | No credit card required for free tokens[3] | Sign up at DeepSeek platform, generate API key, use until credits consumed[3][14] |
| NVIDIA Build / NIM | Around 1,000 free inference credits for NIM APIs at signup; can request more | Guide reports up to 5,000 total credits and ~40 RPM cap[15][16] | Usually no card for initial credits; corporate email needed for larger grants[15] | Join NVIDIA Build / NIM program, create project, generate API key[15][16] |
| GitHub Models | Rate‑limited free API usage for all GitHub accounts; access to GPT‑4‑class, Llama 3.1, Mistral, etc. via GitHub Models catalog[8][17][18] | Limits vary by model and Copilot plan; examples: 10–15 RPM and 50–150 RPD depending on tier[18] | GitHub account only; no separate card for free usage[8][18] | Enable GitHub Models from Marketplace, use playground or API token with free limits[8][18] |
| Hugging Face Inference (Inference Providers / hf‑inference) | Small recurring free credit pool (around 0.10 USD worth of serverless inference per month for Free tier) | Hard stop when credit exhausted on Free tier; higher tiers get larger pools[4][19] | HF account; additional pay‑as‑you‑go requires credits/card[4] | Create account, call Inference Providers or hf-inference endpoints until credits are exhausted[4][19] |
| SambaNova Cloud | Developer Tier with about 5 USD of free credit for new and existing developers (tens of millions of tokens on small models) | Free credit expires after 3 months; then paid usage only[9][20] | No card needed for initial developer credits per aggregator; payment needed beyond that[5][9] | Sign up to SambaCloud, join Developer Tier, use API until credits run out[5][9] |
| Cloudflare Workers AI | Ongoing free tier: 10,000 "neurons" per day (roughly 100–500 requests depending on model) | Daily neuron cap; no explicit monetary credits; no card required[6] | No credit card required to start[6] | Create Cloudflare account, enable Workers AI, use account‑scoped API token[6] |
| Together AI | One‑time 25 USD of free credits for new accounts, usable on 200+ open‑source models; additional credits via startup programs[7] | Build Tier rate limits around 60 RPM and 100k tokens/min; some premium models gated to higher spend tiers[7] | No card for initial 25 USD credits; card required for further usage[7] | Sign up, verify email, use default free credits via Together API key[7] |
Exact limits and credit amounts change frequently; always confirm current values in each provider’s dashboard or official docs before architecting anything that depends on the free tier.[2][3][5][6][7][8][9][1][4]
Google AI Studio (Gemini API)
What the Free Tier Provides
Google AI Studio exposes Gemini models via the Gemini API and organizes usage into a Free tier and multiple paid tiers.[10][1]
The Free tier is available to eligible users (region and age gate) without requiring billing setup and is controlled via per‑project quotas on requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD).[1][10]
Guides and promotional material for Gemini Flash indicate free API access to Gemini 1.5 or 2.x Flash with rates on the order of 15–60 RPM, around 1 million tokens per minute, and about 1,500 requests per day for some configurations.[11][12]
Google’s rate‑limit documentation shows a dedicated Free usage tier that caps overall billing but does not charge, while higher tiers are automatically unlocked once a billing account is linked and certain spend thresholds are met.[1]
These limits are model‑specific and are surfaced in the AI Studio UI, so the most reliable values for your project appear directly in the console.
How to Activate and Use the Free Tier
- Check eligibility and sign in
Go tohttps://aistudio.google.comand sign in with a Google account from a supported region (and confirm age requirements, usually 18+).[10] - Create or select a project
AI Studio organizes usage by project; either create a new project or select an existing one from the Projects page.[10][1] - Stay on the Free tier
As long as you do not link a billing account, your project remains on the Free usage tier, which enforces quotas but does not charge you.[1] - Generate an API key
In the AI Studio console, navigate to the API keys section and create an API key for the project; this key is used for Gemini API calls. - Choose a Gemini model suitable for free use
In playgrounds or code samples, select one of the supported Flash or Pro models that show Free‑tier quotas in the limits panel.[11][1] - Monitor quotas
Use the usage dashboard to track RPM, TPM, and daily request consumption; if you hit limits, you must wait for the quota window to reset or upgrade to a paid tier by adding billing.[10][1]
OpenRouter
What the Free Offering Includes
OpenRouter is an API gateway that exposes many third‑party and open‑weight models and designates a subset as free.[2]
It offers a Free Models Router, which automatically chooses from all currently available free models (for example, free DeepSeek R1 and various Llama or Qwen variants), and also exposes individual models whose names end with (free) or a :free suffix.[13][21][2]
The set of free models is dynamic and can change over time, so OpenRouter recommends checking the models page or searching for "free" inside the model selector.[13][2]
How to Get and Use OpenRouter Free Models
- Create an OpenRouter account
Visithttps://openrouter.ai, sign up with email or supported OAuth option, and verify your account.[22][2] - Obtain an API key
After logging in, click the Get API key or similar button in the dashboard, create a key, and copy it; keys are only shown once in full.[22] - Explore free models in the UI
In the Chat Playground or model selector, type "free" to filter the list and see all currently free options, including the Free Models Router.[13] - Use the Free Models Router
Select Free Models Router in the UI, or in API calls set themodelfield to the special router name; OpenRouter will dispatch your request to one of the free models and show which one handled the response.[2][13] - Use specific
:freevariants
For models that have both paid and free variants, the free one is usually named likemeta-llama/llama-3.2-3b-instruct:free; adding:freein your API request ensures you hit the free endpoint.[2] - Be aware of capabilities and limitations
Free variants generally keep the same capabilities (including tool use for many models), but availability and throttling may differ, and some client libraries need to preserve the:freesuffix correctly when constructing requests.[21][2]
DeepSeek API
What the Free Tier Provides
DeepSeek offers both chat and API access to its models (such as DeepSeek V3 and R1) and promotes a significant free token allocation for new API users.
Aggregator documentation and developer reports indicate that new accounts receive around 5 million free tokens as a starting pool, with no credit card required.[3][14]
One commonly cited breakdown equates this to roughly 8–9 USD worth of usage based on prevailing per‑million token prices, though this is an estimate rather than an official marketing number.[3]
The same sources state that the API aims to process as many requests as possible and does not impose strict user‑visible rate limits, instead relying on backend throughput; however, free‑tier behavior can change and may differ from the marketing description.[3]
How to Get and Use DeepSeek’s Free Tokens
- Sign up on the DeepSeek platform
Go tohttps://platform.deepseek.com(or the current console URL) and create an account with email and basic profile details.[14][3] - Verify your email and log in
Confirm the verification email, then log in to access the dashboard. - Locate the free token pool
In the dashboard, look for a usage or credits panel; developers report seeing a 5M‑token pool displayed there on new accounts.[14] - Generate an API key
Use the API Keys section to generate a new key; copy and store it securely. - Call DeepSeek models
Use the official REST endpoints or compatible SDKs, specifying models such asdeepseek-chatordeepseek-reasoneras documented. - Monitor consumption
Track remaining tokens in the dashboard and set your own soft alerts (for example, at 60% and 90% usage) so you can decide whether to move to paid usage or another provider before the free pool is exhausted.[14][3]
NVIDIA Build / NIM (build.nvidia.com)
What the Free Credits Cover
NVIDIA’s Build program (and NIM APIs exposed via build.nvidia.com) provides free inference credits intended for developers to test NVIDIA‑hosted models such as GLM‑5 or Kimi‑2.5 and various NIM‑packaged LLMs.[23]
Community guides and documentation summaries indicate that new Build users typically see around 1,000 free credits in their account, and that they can request additional credits—up to a total of about 5,000—subject to approval.[15][16]
A recent technical explainer describes the free tier as providing 1,000 inference credits at signup, a rate limit of about 40 requests per minute, and the option to request more credits for a limited time window.[16]
Some walkthroughs note that requesting larger credit bundles (for example, 4,000 extra credits) may require a corporate email address and may be tied to a 90‑day evaluation period.[15]
How to Access and Use NVIDIA Build’s Free Tier
- Register for an NVIDIA account
If you do not already have one, create an NVIDIA Developer account. - Join the Build / NIM program
Navigate tohttps://build.nvidia.comand sign in; you may need to accept program terms or apply to specific early‑access offerings depending on region and timing.[23][15] - Create a project and obtain an API key
In the Build dashboard, create a project or workspace and generate an API key for NIM model access.[15] - Confirm your free credits
Check the credits or billing panel to confirm that the 1,000‑credit allocation is present and note any expiration date or usage policies.[16][15] - Call NIM endpoints
Use NVIDIA’s documented REST endpoints or language‑specific SDKs to call supported models, passing your API key and project identifiers. - Request more credits if needed
If available, use the “request more credits” flow in the dashboard; tutorials indicate that approved requests can increase your credits (for example, by 4,000) but may require additional verification such as corporate email.[15]
GitHub Models
Nature of the Free Usage
GitHub Models is a catalog of hosted AI models (including variants of GPT‑4o, Llama 3.x, Mistral, DeepSeek, and others) that can be accessed via GitHub’s playground and API.[17][18]
GitHub’s billing documentation states that all GitHub accounts get rate‑limited access to GitHub Models at no cost, with limits varying by model and by whether the user is on Copilot Free, Copilot Pro, or higher tiers.[8][18]
This free usage is explicitly positioned as suitable for prototyping and experimentation, not for production workloads.[18][8]
Rate‑limit tables published by GitHub show example caps such as 15 requests per minute and 150 requests per day for "low" tier models on Copilot Free, and 10 RPM and 50 RPD for "high" tier models, with token limits like 8,000 input and 4,000 output tokens per request for many models.[24][18]
These numbers differ by model family and plan and are subject to change, so they should be treated as indicative rather than guaranteed.
How to Access GitHub Models for Free
- Ensure you have a GitHub account
Any standard GitHub account is eligible for rate‑limited free usage of GitHub Models.[8] - Open the GitHub Models catalog
Visit the GitHub Marketplace page for models or the dedicated GitHub Models documentation, which lists supported models, including GPT‑4o, Llama 3.1, and Mistral.[17][18] - Use the browser playground
From the catalog, open a model’s playground to send test prompts directly in the browser; free usage limits apply automatically.[18][8] - Enable API prototyping
Follow the "Prototyping with AI models" guide to get an access token and call the GitHub Models endpoints from your own application; the same free limits apply until you opt into paid usage.[18] - Monitor rate limits
Pay attention to usage errors indicating you hit RPM, RPD, or concurrent request caps; these require waiting for the window to reset unless you upgrade to higher‑tier Copilot or configure paid usage.[18]
Hugging Face Inference (Inference Providers and hf‑inference)
Free Credit Model
Hugging Face’s Inference Providers and hf-inference serverless API let you run inference on a curated set of models from multiple underlying providers (such as Cerebras, Together, and others) with Hugging Face handling routing and billing.[19][4]
Pricing and billing docs describe a freemium, usage‑based model: each tier includes a small monthly pool of credits that offset serverless inference costs, with Free users receiving around 0.10 USD worth of credits per month and PRO or Team tiers getting larger pools (for example, 2 USD of credits).[4][19]
Once the free credits are exhausted, Free‑tier users hit a hard stop, while paid tiers can continue in pay‑as‑you‑go mode at underlying provider prices.[19][4]
This credit pool can be consumed across any supported model that is eligible for Inference Provider billing; first‑party "Inference Endpoints" and other services may have separate pricing.
How to Use Hugging Face’s Free Tier
- Create a Hugging Face account
Sign up athttps://huggingface.coand verify your email. - Review Inference pricing and providers
Visit the Inference Providers pricing page to understand which models are eligible for the credit pool and how billing works.[4][19] - Obtain an access token
From your account settings, create a personal access token with appropriate scopes for Inference APIs. - Call Inference Providers or
hf-inference
Use Hugging Face’s Python or JavaScript SDKs, or direct HTTP calls, specifying either thehf-inferenceprovider or a specific Inference Provider endpoint. - Track credit usage
Monitor your account’s credit balance; once the monthly free credit is consumed, Free‑tier users cannot make further eligible requests until the next month unless they upgrade to a paid tier and add billing information.[19][4]
SambaNova Cloud (SambaCloud)
Developer Tier and Free Credits
SambaNova Cloud (often surfaced as SambaCloud) offers a hosted inference platform for a range of open‑weight models, including Llama and DeepSeek variants.
According to SambaNova’s own announcement, a Developer Tier gives new and existing developers 5 USD of free credit at launch, which translates to over 30 million free tokens on a Llama 8B model given the listed prices.[9]
This free credit expires after three months and is intended to enable continued experimentation without immediate payment.
An independent pricing overview and an API aggregator both emphasize that SambaNova does provide free access with rate limits, but that the free credit is relatively small in practice and can be consumed after only a few non‑trivial sessions on larger models.[5][20]
After the credit is exhausted or expires, usage is billed per token according to the published pricing table.
How to Access SambaNova’s Free Tier
- Sign up for SambaCloud
Go tohttps://cloud.sambanova.ai(or the current portal), create an account, and log in.[5][9] - Confirm Developer Tier enrollment
Ensure your account is on the Developer Tier; at launch this is the default for new developers, but it can change over time.[9] - Check your free credit balance
In the billing or usage section, confirm the presence of approximately 5 USD of credits and note the expiration date.[20][9] - Generate API credentials
Create an API key in the console for use with SambaNova’s HTTP endpoints. - Choose models and start experimenting
Begin by targeting smaller, cheaper models (such as Llama 3.1 8B Instruct) to stretch the free credit further, as per the cost table.[5] - Plan for transition to paid usage
Because the free credit is limited, plan either to supply billing details for continued use or to migrate workloads to another provider when it is consumed.[20]
Cloudflare Workers AI
Free Tier Structure
Cloudflare Workers AI exposes a catalog of open‑source models for text, vision, and audio tasks, running at the edge on Cloudflare’s global network.
Recent documentation and developer write‑ups describe a permanent free tier that allocates 10,000 "neurons" per day, which is roughly enough for 100–500 requests depending on the model and workload.[6]
This free tier does not require a credit card and is available to any Cloudflare account, making it attractive for small side projects and experimentation.[6]
The platform offers more than 50 models across text generation, image generation, speech‑to‑text, translation, and embeddings, all accessible via Cloudflare’s Workers or REST APIs.[6]
How to Start Using Workers AI for Free
- Create a Cloudflare account
Sign up for a free Cloudflare account if you do not already have one.[6] - Enable Workers & Pages and Workers AI
In the Cloudflare dashboard, navigate to Workers & Pages, then to Workers AI, and enable it if necessary.[6] - Get your Account ID and API token
Follow the setup guide to retrieve your account identifier and create an API token with Workers AI permissions.[6] - Use the REST API or Workers
Call the Workers AI endpoints directly or from Cloudflare Workers scripts, specifying the desired model and passing your account ID and API token. - Stay within the daily neuron budget
Monitor your usage in the dashboard; if you exceed 10,000 neurons in a day, you must either wait for the daily reset or upgrade to a paid Workers AI plan.[6]
Together AI
Free Credits and Free Models
Together AI is an inference provider focused on open‑source models with an OpenAI‑compatible API.
Documentation and pricing overviews describe a 25 USD free credit allocation for new accounts, usable across more than 200 open‑source models, including Llama, DeepSeek, Qwen, Mixtral, and others.[7]
The same sources note that rate limits for the baseline "Build" tier are around 60 requests per minute and 100,000 tokens per minute, which is generally sufficient for small applications and agent experiments.[7]
In addition to the generic credit, Together maintains a set of models explicitly marked as free (zero‑price endpoints) so that even after credits are consumed, some low‑throughput workloads can continue without direct cost.[7]
Premium features and some models (for example, certain Flux or dedicated endpoints) are gated behind higher tiers that unlock only after a minimum spend.[7]
How to Use Together AI’s Free Tier
- Sign up at Together
Visithttps://together.ai, create an account, and verify your email address. - Confirm promotional credits
Open the billing or credits section to verify that the initial 25 USD free credit has been applied and note any expiration or promotional terms.[7] - Create an API key
Generate an API key from the dashboard and store it securely. - Select low‑cost or free models
Use Together’s model catalog to choose among free or low‑cost models; these are explicitly flagged as free or with zero price in per‑million token terms.[7] - Respect rate limits
Design your application to stay within the Build‑tier defaults (for example, 60 RPM, 100k TPM) to avoid throttling.[7] - Apply to startup or partner programs if relevant
Together periodically offers startup accelerators or partner deals that grant additional credits (e.g., 15k–50k USD) beyond the standard free tier; these typically require an application and vetting.[7]
Practical Tips for Using Free Tiers Effectively
- Always verify limits just before building: All of these offerings are subject to change, and some (DeepSeek, Together, Hugging Face, SambaNova) adjust credit amounts and rate limits over time.[9][3][4][5][1][7]
- Design for graceful degradation: Because free tiers are rate‑limited, implement retry with backoff and consider multi‑provider routing so your system can fall back to another free or low‑cost model when one provider throttles.[2][6][7]
- Favor smaller, efficient models: To stretch credits (DeepSeek’s token pool, SambaNova’s 5 USD credit, Together’s 25 USD, Hugging Face’s small monthly pool), choose smaller context sizes and compact models for routine tasks.[3][5][9][7]
- Separate evaluation from production: Use free tiers for benchmarking, prototyping agent logic, and integration testing; once a workload is stable and important, migrate it to a paid, SLA‑backed tier on the same or another provider.
- Track usage centrally: Maintain your own per‑provider usage dashboard so you can see when a free tier is close to exhaustion and preemptively switch providers or provision paid capacity.