This feature is in early access and not yet available to all users. To request access, contact support.
Key concepts
When you create an index with dedicated read nodes, Pinecone allocates dedicated storage and compute resources based on your choice of node type, number of shards, and number of replicas.- Dedicated storage ensures that index data is always cached in memory and on disk for warm, low-latency queries. In contrast, for on-demand indexes, caching is best-effort; new and infrequently-accessed data may need to be fetched from object storage, resulting in cold, higher-latency queries.
- Dedicated compute ensures that an index always has the capacity to handle high query rates. In contrast, on-demand indexes share compute resources and are subject to rate limits and throttling.
Dedicated read nodes affects only read performance. Write performance is the same as for on-demand indexes.
Node types
There are two node types:b1 and t1. Both are suitable for large-scale and demanding workloads, but t1 nodes provide increased processing power and memory. Additionally, t1 nodes cache more data in memory, enabling lower query latency.
Shards
Shards determine the storage capacity of an index. Each shard provides 250 GB of storage, making it straightforward to calculate the number of shards necessary for your index size, including room for growth. For example:| Index size | Shards | Capacity |
|---|---|---|
| 100 GB | 1 | 250 GB |
| 500 GB | 3 | 750 GB |
| 1 TB | 5 | 1.25 TB |
| 1.6 TB | 7 | 1.75 TB |
- Relieves storage (disk) fullness. Data is spread across shards, so adding shards reduces the amount of data on each one.
- Relieves memory fullness. With less data stored on each shard, there’s also less data to cache in memory.
You are responsible for allocating enough shards for your index size. If your index exceeds its storage capacity, write operations (upsert, update, delete) are rejected.
Replicas
Replicas multiply the compute resources and data of an index, allowing higher query throughput and availability.-
Query throughput: Each replica duplicates the compute resources available to the index, allowing increased parallel processing and higher queries per second.
- In general, throughput scales linearly with the number of replicas, but performance varies based on the shape of the workload and the complexity of metadata filters.
- To determine the right number of replicas, test your query patterns or contact support.
-
High availability: Replicas ensure your index remains available even if an availability zone experiences an outage.
- When you add a replica, Pinecone places it in a different zone within the same region, up to a maximum of three zones. If you add more than three replicas, additional replicas are placed in zones that already have a replica. This multizone approach allows your index to continue serving queries even if one zone becomes unavailable.
- To achieve high availability, allocate at least n+1 replicas, where n is the minimum number of replicas required to meet your throughput needs. This ensures that, even if a zone (and its replica) fails, your index still has enough capacity to handle your workload without interruption.
As your query throughput and availability requirements change, you can increase or decrease replicas. Adding or removing replicas can be done through the API and does not require downtime, but it can take up to 30 minutes.
Index fullness
Dedicated read nodes store a search index in memory and record data on disk. There are three measures of index fullness:memory_fullness: How much of the index’s memory capacity is currently in use (0 to 1).storage_fullness: How much of the index’s storage capacity is currently in use (0 to 1).indexFullness: The greater ofmemory_fullnessandstorage_fullness.
storage_fullness is the limiting factor. However, memory can fill up first in the following scenarios:
b1nodes, a large namespace (hundreds of millions of records), low-dimension vectors (128 or 256 dimensions), and minimal metadata.t1nodes, high-dimension vectors (1024 or 1536 dimensions), and lots of metadata.
- Relieves storage (disk) fullness. Data is spread across shards, so adding shards reduces the amount of data on each one.
- Relieves memory fullness. With less data stored on each shard, there’s also less data to cache in memory.
You’re responsible for allocating enough shards to accommodate your index size. If your index exceeds its storage capacity, write operations (upsert, update, delete) are rejected.
Using dedicated read nodes
This feature is in early access and is not yet available to all users. To request access, contact support.
2025-10 of the Pinecone API.
Calculate the size of your index
To decide how many shards to allocate for your index, calculate the total index size and then add some room for growth. Each shard provides 250 GB of storage. To calculate the total size of an index, find the aggregate size of all its records. The size of an individual record is the sum of the following components:- ID size (in bytes)
- Dense vector size (4 bytes * dense dimensions)
-
Sparse vector size (9 bytes * number of non-zero sparse value)
To estimate the sparse vector component of your index size, multiply 9 bytes by the average number of non-zero values per vector.
- Total metadata size (total size of all metadata fields, in bytes)
Create an index
To create a dedicated index, call create an index. In thespec.serverless.read_capacity object:
- Set
modetoDedicated. - Set
dedicated.node_typeto eitherb1ort1, depending on the node type you want to use. - Set
dedicated.scalingtoManual(currently,Manualis the only option, and it must be included in the request). - Set
dedicated.manual.shardsto the number of shards required to accommodate at least the current size of your index, with a minimum of 1 shard. Each shard provides 250 GB of storage. - Set
dedicated.manual.replicasto the number of replicas for the index, with a minimum of 0 replicas (an index with 0 replicas is paused).
To determine the number of shards required by your index, see calculate the size of your index.
Add a hosted embedding model (optional)
If you’d like Pinecone to host the model that generates embeddings for your data, so that you use Pinecone’s API to insert and search by text (rather than vectors generated by an external model), configure your index to use a hosted embedding model. To do this, call configure an index, and specify theembed object in the request body.
Example request:
Remember:
- Replace
chunk_testwith the name of the field in your data that contains the text to be embedded. - Be sure to use a model whose dimension requirements match the dimensions of your index.
It’s also possible to specify a hosted embedding model when creating a dedicated read nodes index. To do this, call create an index with integrated embedding. In the request body, use the
read_capacity object to configure node type, shards, and replicas.Check index fullness
To check index fullness, call get index stats. Example request:indexFullness describes how full the index is, on a scale of 0 to 1. It’s set to the greater of memory_fullness and storage_fullness.
Add or remove shards
To add or remove shards, contact support. This cannot be done with the API.Add or remove replicas
You can add or remove replicas no more than once per hour, starting one hour after index creation. Each change can take up to 30 minutes to complete.
spec.serverless.read_capacity.dedicated.manual.replicas to the desired number of replicas.
Example request:
Pause a dedicated index
To pause an index, set the number of replicas to 0. This change should take less than a minute to complete, after which the index blocks all writes and reads.While an index is paused, you cannot write to it or read from it.
Change node types
To change the type of node used for a dedicated index, contact support. This cannot be done with the API.Migrate from on-demand to dedicated
You can change the of your index no more than once every 24 hours. The change can take up to 30 mins to complete.
- Determine the current size of your index.
-
Call configure an index.
In the request body, in the
spec.serverless.read_capacityobject, set the following fields:- Set
modetoDedicated. - Set
node_typeto the node type you want to use (b1ort1). - Set
shardsto the number of shards required for your index. Each shard provides 250 GB of storage. - Set
replicasto the number of replicas required for your query throughput needs.
index-to-migrateto a dedicated index withb1nodes, 1 shard, and 1 replica:Response: - Set
-
Monitor the status of the migration.
When the migration is complete, the value of
spec.serverless.read_capacity.status.stateisReady. AnErrorstate means that you didn’t allocate enough shards for the size of your index. Migrate to dedicated again, using a sufficient number of shards.
Migrate from dedicated to on-demand
To change a dedicated index to on-demand, contact contact support. This can’t be done with the API.Check the status of a change
After changing a dedicated index, check the status of the change by calling describe an index: Example request:spec.serverless.read_capacity.status.state field. Possible values include:
Ready: The dedicated index is ready to serve queries.Scaling: A change to the node type, number of shards, or number of replicas is in progress.Migrating: A change to the is in progress.Error: You did not allocate enough shards for the size of your index. Migrate to dedicated again, using a sufficient number of shards.
Limits
Read limits
On dedicated indexes, read operations (query, list, fetch) have no rate limits. However, if your query rate exceeds the compute capacity of your index, you may observe decreased query throughput. In such cases, consider adding replicas to increase the compute resources of the index.Write limits
- On dedicated indexes, write operations (upsert, update, delete) have the same rate limits as on-demand indexes.
- Writes that would cause your index to exceed its storage capacity are rejected. In such cases, consider adding shards to increase available storage. To determine how close to the limit you are, check index fullness.
Operational limits
| Metric | Limit |
|---|---|
| Min shards per index | 1 |
| Max namespaces per index | 1 |
| Node type or changes | 1 per 24 hours |
| Max shard or replica changes | 1 per hour |
Other limits
- To increase or decrease shards, contact support.
- To change node types, contact support.
- Dedicated indexes do not support backups or bulk imports.
memory_fullnessis an approximation and doesn’t yet account for metadata.
Cost
The cost of an index that uses dedicated read nodes is calculated by this formula:(Dedicated read nodes costs) + (storage costs) + (write costs)
-
(Dedicated read nodes costs)are calculated as:Node type rates vary based on pricing plan and cloud region. For exact rates, contact Pinecone. -
(Storage costs)are the same as for on-demand indexes. -
(Write costs)are the same as for on-demand indexes.
Example cost calculations
b1 nodes, 2 shards, 2 replicas - Standard plan
b1 nodes, 2 shards, 2 replicas - Standard plan
If the Standard plan rate for
b1 nodes is $548.96/month, the cost of dedicated read nodes would be as follows:t1 nodes, 2 shards, 2 replicas - Standard plan
t1 nodes, 2 shards, 2 replicas - Standard plan
If the Standard plan rate for
t1 nodes is $1,758.53/month, the cost of dedicated read nodes would be as follows: