Alphabet (GOOGL.US) has updated the pricing tiers for its Gemini API, with the optimized plans and pricing structured according to actual inference usage needs. The newly introduced inference service tiers include Standard, Flex, Priority, Batch, and Caching versions. The company stated that the Gemini API offers multiple optimization mechanisms, allowing users to balance speed, cost, and service stability based on specific business workload requirements. Whether building real-time chatbots or running large offline data processing workflows, selecting the appropriate operating mode can significantly reduce costs or improve operational efficiency. The Flex inference tier utilizes idle computing resources during off-peak hours, offering a 50% discount compared to the standard price, with a target latency of 1 to 15 minutes, though no latency guarantee is provided. The Batch API tier also provides a 50% discount on the standard rate, with latency potentially extending up to 24 hours. The Caching tier is billed based on the number of cached tokens and storage duration, and is recommended for use cases such as chatbots with complex system instructions, repeated analysis of long video files, and large-scale document set queries. The Priority tier is priced 75% to 100% higher than the standard rate, with latency controllable from milliseconds to seconds. Alphabet recommends this tier for scenarios such as real-time customer service chatbots, real-time fraud detection, and business-critical intelligent assistants.
Comments