#ApacheSpark
Come and join me for more hacking on @apachespark today -- www.youtube.com/watch?v=u5F_... I'm going to be working on github.com/apache/spark...
Spark PR adventures: Determinism Issues (maybe?)
YouTube video by Holden Karau
www.youtube.com
November 13, 2025 at 6:59 PM
The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark

#apachekafka
#apacheflink
#apachespark

medium.com/@balaji.raja...
The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark
The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark Read the full story for free …
medium.com
November 6, 2025 at 9:09 AM
Come and join me for some not-so-spooky zombie PR adventures: bringing back some "stale"/"dead" PRs around TTLs and other fun adventures in @ApacheSpark -- www.twitch.tv/holdenkarau / www.youtube.com/watch?v=u3Qo...
holdenkarau - Twitch
Zombie Apache Spark PR dev time :p
www.twitch.tv
November 3, 2025 at 6:28 PM
Apache Kafka (Kafka Connect) vs. Apache Flink vs. Apache Spark: Choosing the Right Ingestion Framework

#apachekafka
#apacheflink
#apachespark

www.onehouse.ai/blog/kafka-c...
Apache Kafka® (Kafka Connect) vs. Apache Flink® vs. Apache Spark™: Choosing the Right Ingestion Framework
This article compares three data ingestion frameworks—Kafka, Flink, and Spark—highlighting their unique strengths, use cases, and performance capabilities.
www.onehouse.ai
November 3, 2025 at 9:11 AM
Everything you need to know about Spark Structured Streaming

#apachespark

From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.

vutr.substack.com/p/everything...
Everything you need to know about Spark Structured Streaming
From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.
vutr.substack.com
October 24, 2025 at 8:05 PM
IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data.

#apachespark

github.com/indextables/...
GitHub - indextables/indextables_spark: IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamle...
IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamlessly with Spark SQL, allowing you to ...
github.com
October 16, 2025 at 8:34 PM
Meet Spark Analyzer – a free tool to unearth Apache Spark bottlenecks

#apachespark

www.onehouse.ai/blog/meet-sp...
Meet Spark Analyzer – a free tool to unearth Apache Spark™ bottlenecks
www.onehouse.ai
September 28, 2025 at 5:06 PM
What makes #ApacheSpark Delta Tables a game-changer?

It's all about features like time travel, data skipping, & auto-optimization. This blog shows how they make #datamanagement simpler and more reliable.

Read the blog here 👉 antt.me/XXbnTnut

#DataEngineering #AntStack
What makes Apache Spark + Delta Tables Nifty? | Build AI-Powered Software Agents with AntStack | Scalable, Intelligent, Reliable
Enhance your data lake with Apache Spark + Delta Tables. Explore powerful features like ACID transactions, time travel, and data skipping in this insightful blog.
antt.me
September 18, 2025 at 7:00 AM
Nuevo Podcast #AWSlatam 🎤 - EP289: Mejores Prácticas de Amazon Athena

#AmazonAthena #DataArchitecture #CostOptimization #ApacheSpark #BestPractices
EP289: Mejores Prácticas de Amazon Athena
Podcast AWS LATAM · Episode
ift.tt
September 7, 2025 at 8:43 AM
ICYMI: Abe Sharp looks at Volcano, a @cncf.io project that optimizes high-performance workloads on Kubernetes to avoid deadlocks
www.admin-magazine.com/Archive/2025...
#Kubernetes #scheduler #Volcano #CNCF #Queue #PodGroup #ApacheSpark #PyTorch #MachineLearning
August 28, 2025 at 3:37 PM
Boost your data lake performance with #ApacheSpark Delta Tables.

The latest blog post breaks down key features like Time Travelling, Data Skipping, and more for better efficiency.

Read the full blog to learn more!👇
antt.me/XXbnTnut

#DeltaLake #DataEngineering #AntStack
What makes Apache Spark + Delta Tables Nifty? | Build AI-Powered Software Agents with AntStack | Scalable, Intelligent, Reliable
Enhance your data lake with Apache Spark + Delta Tables. Explore powerful features like ACID transactions, time travel, and data skipping in this insightful blog.
www.antstack.com
August 14, 2025 at 6:00 AM
🚀 Choosing the right data analytics platform in 2024? Ataira breaks down the top contenders📊

🔗 Comparing Popular Data Analytics Products in 2024 #DataAnalytics #PowerBI #Tableau #ApacheSpark #BusinessIntelligence #CloudComputing #TechTrends #DigitalTransformation
Comparing Popular Data Analytics Products in 2024 - Ataira
Comparing Popular Data Analytics Products in 2024 - Choosing the right data analytics product depends on the organization's needs, including budget, technical expertise, and data scale
www.ataira.com
July 30, 2025 at 12:20 PM
What is the default engine used in Fabric Notebooks?
The default engine is PySpark, which runs on top of the Apache Spark engine.
#MicrosoftFabric #FabricNotebooks #PySpark #ApacheSpark #BigData #DataEngineering #PowerBI #DataPlatform #OneLake #FabricCommunity #DP700 #SparkEngine #DataProcessing
July 29, 2025 at 4:50 AM
➡️ Guía práctica para integrar #Spark con #Prometheus

🔷 Requisitos previos
🔷 Pyspark
🔷 JMX Exporter: ¿Qué es y cómo se configura?
🔷 Ejecución de Spark
🔷 Configuración de Prometheus

➡️ blog.damavis.com/integracion-...

#BigData #ApacheSpark #DataEngineering
Cómo integrar Spark y Prometheus: Guía práctica
Aprende a integrar Apache Spark con Prometheus con esta guía práctica y monta paso a paso tu propio entorno de monitorización de métricas
blog.damavis.com
July 25, 2025 at 9:35 AM
📈 Monitoriza tus métricas con #Spark y #Prometheus

1️⃣ Requisitos previos
2️⃣ #Pyspark
3️⃣ JMX Exporter: ¿Qué es y cómo se configura?
4️⃣ Ejecución de Spark
5️⃣ Configuración de Prometheus

➡️ blog.damavis.com/integracion-...

#ApacheSpark #BigData #DataEngineering
July 17, 2025 at 11:58 AM
⚡ Integración #Spark + #Prometheus

🔍 Mejora la observabilidad de los trabajos lanzados en Spark
📊 Analiza y monitorea métricas real time
⚙️ Garantiza la estabilidad del entorno y comprende todo lo que ocurre

📌 Guía práctica ➡️ blog.damavis.com/integracion-...

#ApacheSpark #BigData
Cómo integrar Spark y Prometheus: Guía práctica
Aprende a integrar Apache Spark con Prometheus con esta guía práctica y monta paso a paso tu propio entorno de monitorización de métricas
blog.damavis.com
July 11, 2025 at 12:03 PM
Databricks is contributing the tech behind Delta Live Tables (DLT) to the #ApacheSpark project!

It will now be known as Spark Declarative Pipelines, making it easier to develop & maintain streaming pipelines for all Spark users.

🔗Learn more: bit.ly/44DfiRs

#InfoQ #SoftwareArchitecture #opensource
July 9, 2025 at 8:10 AM
🚀 Working with #PySpark in the cloud — juggling multiple #DataFrames in parallel.

🔍 Combining filter(), select(), and join() efficiently is teaching me how to optimize both loading and exploration on large datasets.

#BigData #Databricks #DataEngineering #ApacheSpark
July 5, 2025 at 1:41 PM
Today is the DBA Appreciation Day!

Bring your DBAs a cake and a coffee, please. And don't drop any tables in production, pretty please. It's weekend ...

#PostgreSQL #SQLServer #Oracle #DB2 #MySQL #MariaDB #Snowflake #SQLite #Teradata #Aerospike #ApacheSpark #Clickhouse #WarehousePG #Greenplum
July 4, 2025 at 5:51 PM
Spark Connect transforms PySpark development by enabling remote, lightweight integration with notebooks, IDEs like VSCode, and modern web apps. #apachespark
Spark Connect Makes PySpark Play Nice with Notebooks, IDEs, and Web Apps
hackernoon.com
July 2, 2025 at 4:25 AM
🚀 Unlocking Big Data Potential with PySpark!
Key Features:
🔹 Spark SQL
🔹 Spark MLlib
🔹 Spark Streaming
🔹 DataFrame API

#PySpark #BigData #DataScience #ApacheSpark #MachineLearning #DataEngineering #XavierDataTech
July 1, 2025 at 8:14 AM
🚀 Working with PySpark SQL? Here's a quick and powerful example!

You can query DataFrames using SQL syntax in Spark — great for teams coming from SQL backgrounds.

#PySpark #BigData #SparkSQL #DataEngineering #ETL #ApacheSpark #SQL #DataScience #XavierDataTech
June 28, 2025 at 8:57 PM