AWS Glue, a serverless information integration service offered by Amazon Net Companies, showcases Python and Apache Spark capabilities in a model 4.0 launch launched this week.
The improve provides engines for Python 3.10 and Apache Spark 3.3.0. Each engines embrace efficiency enhancements and bug fixes, with Spark providing capabilities corresponding to row-level runtime filtering and improved error messages.
New engine plugins in Glue 4.0 assist the Ray compute framework, the Cloud Shuffle Service for Spark, and Adaptive Question Execution. Assist for the Pandas information evaluation and manipulation instrument, constructed on prime of Python, is also featured. New information format assist covers Apache Hudi, Apache Iceberg, and Delta Lake. Glue 4.0 additionally contains the Parquet vectorized reader, with assist for extra encodings and information varieties.
AWS Glue supplies information discovery, information preparation, information transformation, and information integration capabilities, with autoscaling primarily based on workload measurement. AWS mentioned Glue additionally now presents visible transforms for purchasers to make use of and share business-specific ETL logic amongst groups.
AWS introduced a preview of AWS Glue for Ray as a brand new engine choice. Information engineers can use AWS Glue for Ray to course of giant information units with Python and fashionable Python libraries. Distributed processing of Python code is finished over multi-node clusters.
Glue 4.0 is out there now in components of the US together with Ohio, Northern Virginia, and Northern California.
Copyright © 2022 IDG Communications, Inc.