Databricks
Overview
Databricks is a data and AI company that pioneered the "lakehouse" architecture, unifying data warehousing and data lake capabilities. Founded by the original creators of Apache Spark at UC Berkeley, the company has grown from an open-source data processing tool into a $43B AI infrastructure platform competing directly with Snowflake.
历史时间线
- 2013: Founded by Matei Zaharia (Spark creator) + 5 UC Berkeley researchers in a San Francisco loft
- 2014: Open-sources Apache Spark — becomes #1 distributed processing engine
- 2016: Delta Lake project launched — adds ACID transactions to data lakes
- 2019: Introduces "lakehouse" concept, challenging traditional data warehouse model
- 2020: Raises $400M at $6.2B valuation
- 2021: Acquires 8080 Labs (creators of dbt alternative)
- 2023: DBRX model released — competitive with GPT-4 in some benchmarks
- 2023: Raises $500M at $43B valuation, preparing for IPO
- 2024: Lakehouse AI features launch, embedding ML directly into data workflows
- 2024: Revenue surpasses $2B ARR
商业模式
消耗量定价+企业订阅:
- 平台使用费: 按数据处理量(DBU - Databricks Units)计费,类似AWS消耗模式
- 企业版: 安全管理、协作功能、SLA保障,年合同$50K-10M+
- AI功能: Mosaic ML、Lakehouse AI,为AI训练和推理提供专属定价层
- 市场生态: Databricks Marketplace — 数据产品交易抽成
护城河分析
- 开源根基: Spark是事实标准,Hadoop生态继承者,开发者心智占有率极高
- Lakehouse创新: 融合数据湖(低成本存储)与数据仓库(高性能查询)的最佳特性
- AI原生: 从数据到ML训练到推理的端到端平台,比Snowflake更偏AI
- 云中立: 同时运行在AWS、Azure、GCP上,客户不受单一云厂商锁定
关键数据
- ARR: $2B+ (2024)
- 估值: $43B (2023年融资)
- 客户数: 10,000+ 企业
- Fortune 500采用率: 50%+
- 开发者社区: Spark全球最大开源数据社区
- 融资总额: $3.9B+
有趣事实
创始人Matei Zaharia创建Apache Spark时是UC Berkeley的博士生 — Spark的论文在他的博士答辩前仅几周发布,随后Spark成为分布式计算历史上引用次数最多的论文之一,Databricks公司也是从这篇论文直接孵化而来。