Lead Data Engineer @ PepsiCo | Databricks | PySpark | SQL | Azure | ADF | ADLS Gen2 | Unity Catalog | CI/CD | ETL/ELT | Performance Tuning | GenAI (LangChain ...
In the first part of this series I showed how the Concurrency setting in a Fabric Dataflows Gen2 can affect refresh performance when there are multiple queries inside the dataflow. In this post I will ...
Customer stories Events & webinars Ebooks & reports Business insights GitHub Skills ...
Hands-on projects are one of the best ways to learn cloud computing. This guide highlights 10 practical AWS and Azure projects that help build real-world skills in serverless computing, DevOps, data ...
Customer stories Events & webinars Ebooks & reports Business insights GitHub Skills ...
On the first day of Microsoft Build 2026 in San Francisco, Microsoft announced the public preview of Azure HorizonDB, a fully managed PostgreSQL-compatible database rebuilt from the ground up for ...
Edge, a leading open source enterprise Postgres company, is launching pgEdge ColdFront, a transparent data tiering solution for PostgreSQL. Unlike other alternatives, ColdFront's cold tier is fully ...
When is one more cost-effective than the other? 🔟 How do you manage infrastructure as code for data pipelines using Azure Bicep or Terraform? 𝗗𝗮𝘁𝗮 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 & 𝗦𝗤𝗟: 1️⃣1️⃣ How do you ...
To finish off my series of posts on concurrent evaluation in Fabric Dataflows Gen2 (see part 1 and part 2) I decided to do some more realistic tests to see how much parallelism I could get. To do this ...
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Martin Kleppmann, an associate professor at ...
在将几个 Spark 批处理管道从本地基础设施迁移到 Azure Kubernetes Service (AKS) 后不久,我们发现其中一个比较大的作业反复出现执行器内存不足 (OOM) 故障。这些故障出现在 shuffle ...