Talk:Apache Pig

Latest comment: 9 years ago by Bennie91 in topic References

SQL does indeed allow you to specify how the join is to be performed: http://technet.microsoft.com/en-us/library/ms173815.aspx Any advice on rewording the article to reflect that? — Preceding unsigned comment added by 82.32.246.27 (talk) 21:03, 6 January 2014 (UTC)Reply

References

edit

I just changed some referenced which were dead links because they were moved, is it common practice to also update the access date? If not, this should be reverted.

For a computer science course I'm currently attending, I had to read the paper Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience. This paper is written in August 2009, so it's 6 years old by now. Since then some stuff changed, although this is more of version history, it could be added to the article somehow. I'll list it here for further reference, my sources are mostly the release notes linked here.

  • Pig 0.4.0 (2009-09-22)
    • Skew handling has been improved, support for rule-based optimizer and support for the outer join has been added
  • Pig 0.6.0 (2009-10-25)
    • Improvement for UDFs (give UDFs a way to pass info from front to back end), added left outer join for fragment replicate join, reworked memory manager.
  • Pig 0.7.0 (2010-05-05)
    • Redesign for future growth of Pig, provide more information for hadoop job launched, more aggressive use of hadoop distributed cache.
  • Pig 0.8.0 (2010-12-13)
    • Support for casting to a scalar, python UDFs, custom partitioner, integration of mapreduce code into pipeline, nested describe and other. Mapside join and cogroup were also added as well as memory improvements.
  • Pig 0.9.0 (2011-07-21)
    • The main focus of this release is addition of control structures, semantic cleanup, and ithe foundation for better usability with replacement of javacc parser with antlr.
  • Pig 0.10.0 (2012-04-25)
    • This release include several new features such as boolean datatype, nested cross/foreach, JRuby udf, limit by expression, split default destination, tuple/bag/map syntax support, map-side aggregation and more.
  • Pig 0.11.0 (2013-02-14)
    • New RANK, CUBE and ROLLUP operators, New DateType data type, Support for Groovy UDFs, Support for loading macros from jars, Support for schema-based Tuples for reduced memory footprint, Support for passing environment variables to streaming jobs, Improved support for working with Maps in Pig scripts, Grunt improvements: history and clear, UDF lifecycle improvements, Performance improvements to merge join, Performance improvements to in memory aggregation, Performance improvements to Spillable management
  • Pig 0.12.0 (2013-10-14)
    • This release include several new features such as ASSERT operator, Streaming UDF, new AvroStorage, IN/CASE operator, BigInteger/BigDecimal data type, support for Windows and more.
  • Pig 0.13.0 (2014-07-03)
    • This release includes several new features such as pluggable execution engines (to allow pig run on non-mapreduce engines in future), auto-local mode (to jobs with small input data size to run in-process), fetch optimization (to improve interactiveness of grunt), fixed counters for local-mode, support for user level jar cache, support for blacklisting and whitelisting pig commands.
  • Pig 0.14.0 (2014-11-20)
    • The highlight of this release includes Pig on Tez, OrcStorage, loader predicate push down, constant calculation optimization and interface to ship jar.

Bennie91 (talk) 13:42, 23 May 2015 (UTC)Reply