Wednesday, February 17, 2010

Stop writing foreach() cycles

At least in php. How many for() or foreach() have you written today? I bet a lot. Php 5.3 has a solution that will reduce the average number of iteration structures you need to write: closures applied by plain old array functions.
There are some array functions which have already been supported at least from Php 4, and that take as an argument a callback whose formal parameters have to be one or two elements of the array. I'm talking about array_map(), array_reduce(), array_filter() and uasort() (or similar custom sorting function.) These functions abstract away a foreach() cycle by applying a particular computation to all the elements of an array.
Back in Php 4 and Php 5.2, specifying a callback was cumbersome: you had to define an external function and then passing its name as a string; or passing an array containing an object or the class name plus the method name in case of a public (possibly static) method.
In Php 5.3, callbacks may also be specified as anonymous functions, defined in the middle of other code. These closures are first class citizens, and are treated as you would treat a variable, by passing it around as a method parameter. While I am not a fan of mixing up object-oriented and functional programming, closures can be a time saver which capture very well the intent of low-level processing code, avoiding the foreach() noise.
If you bear with me for a moment, I will show you with working code how you can avoid writing most of your foreach() cycles.
<?php
// obviously only the prime numbers less than 20
$primeNumbers = array(2, 3, 5, 7, 11, 13, 17, 19);

// array_map() applies a function to every element of an array,
// returning the result
$square = function($number) {
    return $number * $number;
};
$squared = array_map($square, $primeNumbers);
echo "The squares of those prime numbers are: ",
     implode(', ', $squared), "\n";

// array_reduce() applies a function recursively to pair
// of elements, reducing the array to a single value.
// there is the native array_sum(), but the application of
// a custom function is the interesting part
$sum = function($a, $b) {
    return $a + $b;
};
$total = array_reduce($primeNumbers, $sum);
echo "The sum of those prime numbers is ", $total, ".\n";

// array_filter() produces an array containing
// the elements that satisfy the given predicate
$even = function($number) {
    return $number % 2 == 0;
};
$evenPrimes = array_filter($primeNumbers, $even);
echo "The even prime numbers are: ",
     implode(', ', $evenPrimes), ".\n";

// uasort() customize the sorting by value,
// maintaining the association with keys
// there is the native asort(), but again the customization
// of the function is more interesting
$compare = function($a, $b) {
    if ($a == $b) {
        return 0;
    }
    return ($a > $b) ? -1 : 1;
};
uasort($primeNumbers, $compare);
echo "The given numbers in descending order are: ",
     implode(', ', $primeNumbers), ".\n";

26 comments:

Unknown said...

I went in to this article thinking, "what do you mean?" but left smiling and thinking, "this is great!"

Thanks, not only is it probably faster, but it just seems more elegant of a way to deal with foreach() loops.

I need to start looking at the documentation of built in functions more.

Thanks :)

Julius Beckmann said...

This functions are NOT faster than a simple foreach loop!
Their usage might be easier, but they have the overhead of a function call for *each* array value!

Here is some simple code to prove it:
http://www.pastie.org/829318

I got this result:
array_map = 0.0362298488617
foreach = 0.0109920501709

Even if this might be micro-optimization, i would say:
Just use stuff you have proven to be better, not just newer.

regards, Julius

Giorgio said...

Julius,
Map and Reduce are classics in functional programming.
http://en.wikipedia.org/wiki/Map_%28higher-order_function%29
My goal is not faster code, but succinct code. :) When calling a function inside a foreach, refactoring to map/reduce/filter may be cleaner.

Anonymous said...

@Julius Beckmann

You results actually embarrass me in a positive way. I would have expected a much worse ratio.

The functional style of programming is so much more elegant than the procedural style. I think it would be possible to write a program transformer that compiles functional constructs like these into old skool loops.

I think a nice alternative name for php would be Array, since that is the core construct. It's the primary data structure due to it's extensive support and flexibility.
Even the slightest piece of innovation in this area would be welcomed, since we use arrays so much!

Unknown said...

PHP closures are almost first class citizens. One thing you can't do is alter their properties. You might, for example, like to create a function in one place and later add methods or properties in another.

Whether this restriction is a good or bad thing depends entirely on you own perspective.

The execution speed is usually not an issue. A big advantage is that you abstract out the thing inside the loop. Take a query result set, with functional style, you can give it any variety of formatting functions, html, json, text, etc. If you can abstract out the formatting, you can pass the formatter to any variety of result sets. Your code grows linearly, but the potential outputs as the product -- a huge savings.

Also foreach does have its place, so don't abandon it without reason.

Anonymous said...

While I think the new closure are cool, I don't think it's meant to be the final piece of the puzzle for array_* type functions to replace foreach loops altogether.

Let's not all of us start going around saying for and foreach loops are now evil...
or noisy...
because they're not.

They're good iterators...

...and they're my friends... T.T Leave iterators ALoOoOoOoOOoOoNE!

Unknown said...
This comment has been removed by the author.
Anonymous said...

closures != anonymous functions

Anonymous said...

Try the same test as Julie (http://giorgiosironi.blogspot.com/2010/02/stop-writing-foreach-cycles.html?showComment=1266430020132#c2840388803912847494)
with annonymous function, and function call in foreach

simple foreach still faster than everyother

Giorgio said...

Closures are a subset of anonymous function, at least in php.
I do not intend to replace iterator, because they are a basic structure which can perform all kinds of tasks. But when their power is not exploited, and an array is mapped to another homogeneous array or to a value, an array function may be a clean solution. It's just using the simplest tool for the job. :)

Unknown said...

Because the closure is the first argument, there is no elegant way to use it without first assigning it to a variable. What we need is a way that allows us to use closures directly. For example:

<?php
$primeNumbers = array(2, 3, 5, 7, 11, 13, 17, 19);

$squared = $primeNumbers.map(
  function($number) {
    return $number * $number;
  }
);

// Alternate compact syntax
$squared = $primeNumbers.map(function($number) {
  return $number * $number;
});

$total = $primeNumbers.reduce(
  function($a, $b) {
    return $a + $b;
  }
);

$evenPrimes = $primeNumbers.filter(
  function($number) {
    return $number % 2 == 0;
  }
);

$primeNumbers = $primeNumbers.sort(
function($a, $b) {
  if ($a == $b) return 0;
    return ($a ^gt; $b) ? -1 : 1;
  }
);

?>

Mamsaac said...

@Mazen:
Except for the "." as operator (PHP uses ->) for the method, you can easily make a wrapper class that allows for such operations.
Not sure if it's worth the time... that works beautifully for jQuery, though.

Unknown said...

@Mamsaac

Oops (on the -> operator), using several languages will do that to you.

The wrapper class idea is good, except there is no nice way to support maps: $a = myArray('a'=>3, 'b'=>4, 7);

What I would really like to see, and it would take PHP a huge step forward, is a simple parameter passing feature that allows developers to pass to a function anything they can pass to an array constructor. Why can't func_get_args() return a map?

The syntax can be:
  function myFunc() { $args = func_get_args(); /* code */ }

or, better yet:
  function myFunc($args[]) { /* code */ }

With a relatively simple change to the language we would enable PHP to implement many of the "cool" features of Ruby and Python.

The main objection to this is compatibility with the existing parameter passing mechanism, but if we think about it a little we will easily see that there is no ambiguity. We may even implement a mix in the future that allows:
  function($a, $b, $rest[]) { /* code */ }

Now, what's the procedure to include this in the language?

Giorgio said...

Syntactic sugar is good but it's not what I would use to choose a programming language. :) Php is not very verbose, and you can easily pass an array or an object if there are many arguments (which is usually a smell.)

Unknown said...

@giorgio:

Yes, you can pass an array, but the resulting syntax is ugly and inconvenient enough that almost no one uses it. You can also simulate all loops with while() and all conditions with if(), it doesn't mean that we should do away with for loops and switch statements.

This is a lot like the argument against foreach(){}. You don't need it as long as you have the traditional C for() loop, but foreach is more natural and less error-prone than for() in many cases. Now that we have foreach in many languages, people use it and love it. It makes code easier to write, easier to debug and more natural.

I would argue that the same applies to map arguments. Why should I do something best done by the PHP compiler?

Giorgio said...

Python has named arguments, but they are not a tool for passing maps. If I wanted to pass an arbitrary number of arguments, I would pass an object which captures a domain concept, since passing maps around does not result in much type safety or validation. :) If I wanted to pass a fixed number of arguments that is more than 3, probably I would need some refactoring of the method.

Unknown said...

@Giorgio
I am not sure that your observations are applicable in this case.

Your first point is about passing an object. An object doesn't capture the idea of an arbitrary number of arguments the way a map does. An object makes for a good parameter, but not for a good parameter list.

Your second point is that "If I wanted to pass a fixed number of arguments that is more than 3, probably I would need some refactoring of the method." I have trouble with this on three counts. (1) Why are we assuming "fixed number of parameters"? (2)Why is 3 a magical upper bound? (3)
Why do we preempt the developer without knowing the code being developed?

Here are two legitimate cases for the map argument set:

//1- Our original case, encapsulating arrays
$keyCodes = myArray('a' => 36, '['=>49); // Creates a special map
$keyCodes.each(function($key, $value) { /* Do Something */ }

//2- As optional attributes
name = "Save";
htmlButton($name, 'id' => 'saveButton', 'class' => 'lightButton', 'type' => 'submit', 'disabled'=> true);

Let the developer be free! :-)

Maghiel Dijksman said...

Interesting insight, and very useful for some situations.

BUT:
- Don't forget that foreach is a language construct and will always be faster and create less overhead than anything else you write in php.

- IMHO if you find yourself writing too much foreach iterations, there's something wrong in your design.

- The example you give is bad. Let's just forget about build in libraries and extensions here; if you find yourself having to do these operations, wouldn't by example a class with static methods be better and more readable?

[code]

[/code]

Maghiel Dijksman said...

Blogger removed the code between the tags:

Maghiel said...

Ok, Blogger just removes anything that smells HTML, like php tags :p

class MyTools_Math
{
public static function square(array $numbers);
public static function sum(array $numbers);
public static function sort(array $numbers, $order = SORT_DESC);
}

class PrimeNumber extends MyTools_Math
{
public static function getEvenNumbers(array $numbers);
}

$primeNumbers = array(2, 3, 5, 7, 11, 13, 17, 19);

$squared = MyTools_Math::square($primeNumbers);
$total = MyTools_Math::sum($primeNumbers);
$evenPrimes = PrimeNumber::getEvenNumbers($primeNumbers);
$sorted = MyTools_Math::sort($primeNumbers, SORT_DESC);

Unknown said...

@maghiel
I agree that the language construct will always be more efficient than user created functions, but built-in functions can be optimized to equal the performance of language constructs. I would love to see an optimized built-in set of array() functions such as array().map().

While performance is important, sometimes it should be compromised in favor of fast development. That's the whole reason for using PHP instead of C.

The code you posted is good when I'm developing a library, but should I really have to create a class just to filter and sort a list of prime numbers?

Maghiel Dijksman said...

@Mazen Mokhtar:
I totally agree that performance should not be put above everything. Not only to favor fast development but also for other reasons like abstraction, readability, sanity, etc. I use ZF for almost everything myself, which obviously creates an overhead.

On your question:
It totally depends on many factors. Where in your application would these operations happen? How many times do you need these operations? On what do these operations happen? etc

My opinion:
If you only need to do this once somewhere in your business logic, the closures might be an option.
If you need to do this somewhere else or more often than just once: yes, you have to create a class.

I think that many PHP developers are developing OOP-like applications, but still trapped in a half-procedural thinking. (I am definatly not an expert, guru, or anything. It's just my 2 cents).

Giorgio said...

Maghiel,
I chose a set of homogeneous operations (array of numbers to array of numbers) because is the case where map, reduce, and similar higher order functions apply. When keys are involved, for example, foreach() is invaluable.
Using a *static* class is half-procedural. :) In fact, if you try to write those methods, you'll find that they contain exactly the same foreach(), as long as the operations are homogeneous like the ones described in the post. Since we should not duplicate code, abstracting away the loop prevents you from writing the same foreach over and over again.

For the map/arguments discussion, I will publish my thoughts later as they are a bit long for this commenting space.

trond said...

@Mazen Mokhtar
"(2)Why is 3 a magical upper bound? "

Felt a need to point out that Giorgio is not alone in proposing to stick to a minimum number of arguments or that a larger number (> 2/3) usually is a sign of a need for refactoring.

Some of the most profiled coders/design experts out there (who know this from years of experience with multiple languages and (broken) designs) recommend the exact same thing. See for example the "Clean Code" book (by Robert C. Martin).

Giorgio said...

I wrapped up my thoughts in a new post:
http://giorgiosironi.blogspot.com/2010/02/number-one-rule-of-design.html

Anonymous said...

it is all about keeping it real =]

ShareThis