To assist organizations scale their AI utilization with out overstretching their budgets, we have added two new methods to scale back the price of constant asynchronous workloads.
- Utilization low cost on dedicated throughput: Clients with ongoing ranges of tokens per minute (TPM) utilization on GPT-4 or GPT-4 Turbo can request entry to provisioned throughput and obtain a 10-50% discount primarily based on the dimensions of their dedication. You may get a variety of reductions.
- Value financial savings for asynchronous workloads: Clients can use the brand new Batch API to run non-emergency workloads asynchronously. Batch API requests are priced at 50% off the frequent worth, provide a lot larger price limits, and return outcomes inside 24 hours. That is ideally suited to be used circumstances resembling mannequin analysis, offline classification, summarization, and artificial knowledge technology.
We plan to proceed including new options with a deal with enterprise-grade safety, administrative controls, and value administration. To study extra about these releases, please see our API documentation or contact our group to debate a customized resolution to your enterprise.