Skip to main content

Before you begin

This test requires a Pinecone account on the Standard or Enterprise plan.
  • New users can sign up for the Standard trial for 21 days and $300 in credits, more than enough to cover the costs of this test.
  • Existing users on the Starter plan can upgrade.

1. Understand the test

This test is designed to simulate a production-scale dataset and workload, measuring import time, query throughput, query latency, and associated costs.

Dataset

  • Records: 10 million records from the Amazon Reviews 2023 dataset
  • Embedding model: llama-text-embed-v2 (1024 dimensions)
  • Similarity metric: cosine
  • Total size: 48.8 GB

Workload

  • Query load: 10 queries per second (QPS)
  • Concurrent users: 10 users querying simultaneously
  • Test duration: 1000 queries

Success criteria

This test aims to verify the following success criteria:
  • Import time: < 30 minutes
  • Query latency: p90 latency of less than 100ms

2. Get an API key

Create a new API key in the Pinecone console, or use the widget below to generate a key.
Your generated API key:
"{{YOUR_API_KEY}}"

3. Install an SDK

Install the Python SDK:
pip install pinecone

4. Create an index

Create an on-demand index that matches the dimensions and similarity metric of the dataset. Choose a cloud provider that you have access to because you’ll need to provision a VM in the same region as your index to run the benchmark.
Python
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="{{YOUR_API_KEY}}")

index_name = "search-10m"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        vector_type="dense",
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

5. Import the dataset

Pinecone’s import feature enables you to load millions of vectors from object storage in parallel. Use the import feature to load 10 million product records into a single namespace within your index.
1

Start bulk import

Python
from pinecone import Pinecone, ImportErrorMode

pc = Pinecone(api_key="{{YOUR_API_KEY}}")
index = pc.Index("search-10m")

# 1 namespace, 10M records
root = "s3://fe-customer-pocs/at-scale-pocs/review_100000000_20250620_203728/search_10M/dense"
import = index.start_import(
    uri=root,
    error_mode=ImportErrorMode.CONTINUE # or ImportErrorMode.ABORT
)
print(f"Import started: {import['id']}")
2

Monitor import progress

To track progress, check the status bar in the Pinecone console or use the describe import operation with the import ID:
Python
from pinecone import Pinecone
import time 

pc = Pinecone(api_key="{{YOUR_API_KEY}}")
index = pc.Index("search-25-million")

while True:
    status = index.describe_import(id="<IMPORT_ID>")
    print(f"Status: {status['status']}, Progress: {status['percent_complete']:.1f}%")
    if status['status'] == "Completed":
        print("Import completed successfully!")
        break
    elif status['status'] == "Failed":
        print("Import failed. Check error details.")
        break
    time.sleep(300) # Check every 5 minutes
The amount of time required for an import depends on various factors, including dimensionality and metadata complexity.
For this dataset, the import should take less than 30 minutes.

6. Run the benchmark

You’ll use the Vector Search Bench (VSB) tool to simulate realistic query patterns and measure latency and throughput.
1

Provision a VM

The VSB tool reports latency as the time from when the tool issues a query to when the query is returned by Pinecone. To minimize the client-side latency between the tool and Pinecone, it’s important to run the test on a dedicated VM on the same cloud provider and region as your Pinecone index. This reduces the client-side latency to sub-millisecond range.See the cloud provider’s documentation for instructions on how to provision a VM instance:
Be sure to create the VM instance in the same region as your Pinecone index. If you don’t, the client-side latency will be higher and you won’t get an accurate sense of Pinecone’s performance.
2

Connect to the VM

Connect to the VM using the cloud provider’s console.
3

Install Vector Search Bench (VSB)

Clone the VSB repository and use Poetry to install the dependencies:
Terminal
git clone https://github.com/pinecone-io/VSB.git
cd VSB
4

Install dependencies

Use Poetry to install the dependencies:
Terminal
sudo apt update
sudo apt install python3-poetry
poetry install 
poetry shell
5

Benchmark Pinecone

Use VSB to simulate 10 concurrent users issuing a total of 1000 queries at 10 queries per second (QPS):
Terminal
vsb \
    --database="pinecone" \
    --workload=synthetic-proportional \
    --pinecone_api_key="{{YOUR_API_KEY}}" \
    --pinecone_index_name="search-10m" \
    --pinecone_namespace_name="ns_2" \
    --synthetic_dimensions=1024 \
    --synthetic_metric=cosine \
    --synthetic_top_k=10 \
    --synthetic_requests=1000 \
    --users=10 \
    --requests_per_sec=10 \
    --synthetic_query_distribution=uniform \
    --synthetic_query_ratio=1 \
    --synthetic_insert_ratio=0 \
    --synthetic_delete_ratio=0 \
    --synthetic_update_ratio=0 \
    --skip_populate
By default, VSB populates the target index with a dataset. In this case, you’ve already done that, so --skip-populate makes sure VSB skips the population phase.

7. Analyze performance

At the end of the run, VSB prints an operation summary including the requests per second achieved and latencies at different percentiles. Here’s an example output:
Terminal
                      Operation Summary                      
                                                             
  Operation  Requests  Failures  Requests/sec  Failures/sec  
 ─────────────────────────────────────────────────────────── 
  Search         1000     0(0%)             9           0.0  
                                                             
                                                    Metrics Summary                                                     
                                                                                                                        
  Operation  Metric         Min  0.1%    1%    5%   10%   25%   50%   75%   90%   95%   99%  99.9%  99.99%   Max  Mean  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  Search     Latency (ms)    27    27    28    29    30    31    33    39    50    64   200    220     220   220    40  
Confirm that the requests per second achieved is around 10 QPS and the p90 latency is less than 100ms. To see more detailed statistics, you can analyze the stats.json file identified in the output.

8. Check costs

You can check the costs for the import, queries, and storage in the Pinecone console at Settings > Usage. Cost data is delayed up to three days, but once it’s available, compare the actual costs to the estimated costs below.
For the latest pricing details, see Pricing.
1

Import costs

The current price for import is $1/GB. The dataset size for this test is 48.8 GB, so the import cost should be $48.80.
2

Query costs

A query uses 1 read unit (RU) for every 1 GB of namespace size, and the current price for queries in the us-east-1 region of AWS is $16 per 1 million read units.This test ran 1000 queries against a namespace size of 48.8 GB, so the query cost should be $16/million RUs * 48.8 GB / 1000000 = $0.0007808.
3

Storage costs

The current price for storage is $0.33 per GB per month. The dataset size for this test is 48.8 GB. To estimate the storage cost for one hour: $0.33/GB/month * 48.8 GB / 730 hours = $0.022/hour.
4

Total costs

The total cost for the test is the sum of the import cost, query cost, and storage cost: $48.80 + $0.0007808 + $0.022 = $48.8227808.

9. Clean up

When you no longer need your test index, delete it to avoid incurring unnecessary costs.