- Gather requirements and define success criteria for your use case
- Set up and configure a production-scale test environment
- Import and index millions of vectors efficiently using bulk import
- Run comprehensive benchmarks with Vector Search Bench (VSB)
- Analyze performance results and validate against success criteria
- Test production readiness with monitoring and backup procedures
If you’re new to Pinecone Vector Database, complete the quickstart to learn basic concepts and operations.
1. Gather requirements
Start by clearly defining your use case, success criteria, data characteristics, and performance expectations. This ensures your test accurately reflects your production needs and provides meaningful results for decision-making.1
Identify use case
Clearly define the search problem you need to solve. Your use case influences how you define success criteria and how you configure Pinecone.Use case for this test:
- A product search application using semantic search to find similar products based on descriptions and features.
- Users search for products using natural language queries, expecting relevant results ranked by similarity.
2
Determine data requirements
Analyze your dataset characteristics including volume, dimensions, embedding strategy, and expected costs. Understanding your data helps you choose appropriate index configurations and plan for storage and compute requirements. (NOTE: PROBABLY REMOVE STORAGE AND COMPUTE UNTIL DEDICATED IS READY)Dataset for this test:
- Volume: 110 million product records
- Vector dimensions: 1024-dimensional dense embeddings
- Embedding model:
llama-text-embed-v2with cosine similarity - Architecture pattern: “Head and tail” search pattern with two namespaces:
- “head” namespace: 10 million vectors for frequently accessed recent products
- “tail” namespace: 100 million vectors for complete historical dataset
3
Define workload requirements
Specify the expected query load, traffic patterns, and concurrent usage that your system must handle. Performance requirements guide your testing methodology and help validate whether your configuration can handle production workloads.Workload requirements for this test:
- Concurrent users: Test with 1-20 concurrent clients
- Query patterns: 80% queries to head namespace, 20% to tail namespace
- Peak load: Sustained high query volume during testing periods
4
Define success criteria
Establish measurable performance targets that define what constitutes a successful test. Success criteria should align with your production requirements and user expectations, covering latency, throughput, accuracy, and data freshness.Success criteria for this test:
- Query latency: < 100ms (p95) for frequently accessed data, < 200ms (p95) for comprehensive searches
- Throughput: > 1000 QPS for high-frequency queries, > 500 QPS for comprehensive dataset queries
- Search accuracy: > 95% recall for frequent queries, > 90% recall for comprehensive searches
- Data freshness: New products searchable within 5 minutes for frequent data, 15 minutes for comprehensive dataset
2. Set up your environment
Proper environment setup is critical for large-scale testing success. You’ll configure your Pinecone account with appropriate plan limits, install the necessary tools for benchmarking, and establish secure authentication. This foundation ensures you have the infrastructure needed to conduct comprehensive performance tests.1
Set up your Pinecone account
This test requires a Pinecone account on the Standard or Enterprise plan. The scope of the test exceeds the limits of the Starter plan.
- New users: Sign up at app.pinecone.io and choose the Standard plan trial for 21 days and $300 in credits.
- Existing users: If you’re currently on the Starter plan, upgrade to Standard or Enterprise.
2
Install the Python SDK
This test uses the Python SDK. Install the SDK:
Terminal
3
Create and export your API key
You need an API key to authenticate your requests to Pinecone. Create a new API key in the Pinecone console and then export it as an environment variable in your terminal:
Terminal
3. Create your index
Index configuration is foundational to achieving your performance targets. The choices you make for embedding models, dimensions, similarity metrics, and deployment regions directly impact search accuracy, query latency, and operational costs. For this test, create a dense vector index optimized for the product search use case:Python
4. Import the dataset
Large-scale data import requires efficient bulk loading strategies to minimize time and cost while ensuring data integrity. Pinecone’s import feature enables you to load millions of vectors from object storage in parallel, significantly faster and more cost-effective than individual upserts. This phase tests your data pipeline’s ability to handle production volumes. For this test, use the import feature to load 110 million product records into two distinct namespaces within your index.1
Start bulk import
Start the import process for each namespace:
Python
2
Monitor import progress
The amount of time required for an import depends on various factors, including dimensionality and metadata complexity. For this test, the import should take around 3-4 hours.To track progress, check the status bar in the Pinecone console or use the describe import operation with the import ID:
Python
5. Run benchmarks
Systematic benchmarking measures your index’s performance under realistic conditions and generates the quantitative data needed to validate your system’s readiness. You’ll use the Vector Search Bench (VSB) tool to simulate production workloads, measure key performance metrics, and test different load scenarios. VSB provides standardized tools to simulate realistic query patterns while measuring latency, throughput, and recall metrics. You’ll configure synthetic workloads that match your data characteristics and expected usage patterns.NOTE: The VSB tool doesn’t currently support running against a specific namespace. It runs against only the default namespace. We need to change this before this guide can be tested/used.
1
Install Vector Search Bench (VSB)
Clone the VSB repository and use Poetry toinstall the dependencies:
Terminal
2
Test high-frequency queries
Simulate the high-frequency queries on your This test simulates 10 concurrent users querying your
head namespace data using a synthetic workload that matches your 1024-dimensional vectors:Terminal
head namespace at 100 requests per second, measuring how well your index handles frequent queries.3
Test comprehensive dataset queries
Test performance against your full This test evaluates performance against your complete dataset with realistic concurrent load.
tail namespace dataset with lower concurrency but higher query volume:Terminal
4
Run mixed workload simulation
Simulate realistic production traffic patterns that query both namespaces. Since VSB doesn’t natively support cross-namespace testing, run separate tests and combine the results:This approach simulates your expected traffic distribution across both namespaces.
Terminal
5
Test peak load scenarios
Validate performance under peak load conditions by increasing concurrent users and request rates:This stress test validates whether your configuration can handle peak traffic scenarios.
Terminal
6. Analyze performance
Now analyze VSB results, understand performance characteristics, and validate whether your system meets the success criteria defined in step 1.1
Review benchmark results
VSB provides two types of output: real-time metrics during test execution and detailed results saved to Post-test analysis:After each test completes, VSB saves detailed results to Expected performance characteristics:
stats.json files. This allows you to monitor progress and perform detailed analysis.Real-time output during testing:VSB displays live metrics during execution, updating every 10 seconds with current performance data. Here’s an example of what you’ll see:stats.json. Analyze these results to compare performance across different scenarios:Python
-
Head namespace (10M vectors, high-frequency queries):
- Latency: p95 latency around 50-80ms for 10 concurrent users at 100 RPS
- Throughput: Should sustain 100+ RPS with 10 concurrent users
- Recall: High recall (>0.95) due to smaller dataset and frequent access patterns
-
Tail namespace (100M vectors, comprehensive queries):
- Latency: p95 latency around 100-150ms for 5 concurrent users at 50 RPS
- Throughput: Should sustain 50+ RPS with lower concurrency
- Recall: Slightly lower recall (>0.90) due to larger dataset
-
Peak load scenarios:
- Latency degradation: Expect 2-3x higher latencies under peak load (20 users, 200 RPS)
- Rate limiting: May hit plan limits at sustained high throughput
2
Validate against success criteria
Compare your benchmark results against the success criteria defined in step 1:
Success evaluation:
| Test Scenario | Target p95 Latency | Target Throughput | Target Recall | Your Results |
|---|---|---|---|---|
| Head namespace queries | < 100ms | > 100 RPS | > 0.95 | Enter results |
| Tail namespace queries | < 200ms | > 50 RPS | > 0.90 | Enter results |
| Mixed workload | < 150ms | > 75 RPS | > 0.92 | Enter results |
| Peak load | < 300ms | > 150 RPS | > 0.90 | Enter results |
- ✅ Pass: Results meet or exceed targets
- ⚠️ Marginal: Results within 20% of targets - consider optimizations
- ❌ Fail: Results significantly below targets - requires optimization or architecture changes
7. Test production readiness
Production readiness testing goes beyond basic performance metrics to evaluate operational aspects like monitoring, backup procedures, and system reliability. This phase ensures your system can handle real-world operational demands and provides the observability needed for production deployment.1
Set up monitoring and observability
Implement comprehensive monitoring to track index performance, query patterns, and system health. Pinecone provides multiple monitoring options for production deployments.Pinecone console monitoring:
Monitor basic metrics through the Pinecone console, including index statistics, query volume, and performance trends.Prometheus integration:
For production environments, integrate with Prometheus for advanced monitoring and alerting:Application-level monitoring:
Implement query-level monitoring to track latency, error rates, and throughput in your application.
Python
2
Test backup and recovery procedures
Validate backup and restore functionality to ensure data protection and disaster recovery capabilities.Important considerations:
Python
- Test backup and restore procedures in a non-production environment
- Document recovery time objectives (RTO) and recovery point objectives (RPO)
- Validate that restored data maintains integrity and search functionality