Thursday, May 29, 2008

Select and then Sort


When you use both select-object and sort-object in a pipeline, what's the proper order? Let's check how fast they execute.
Each speed test is built from two similar commands with a different sort/select piping order and each test is executed 10 times. The total execution time is measured in Milliseconds.


- Updated: 06/11/2008 (see comment below by Lee Holmes) -


Test #1

PS > (measure-Command { 1..10 | foreach { gsv | sort name | select name,status }}).TotalMilliseconds


PS > (measure-Command { 1..10 | foreach { gsv | select name,status | sort name}}).TotalMilliseconds


Result: Second command is 15% faster.


Test #2  

# this command is the third example of select-object command from the help files.

PS > (measure-command { 1..10 | foreach { gps | sort ws | select -last 5 }}).TotalMilliseconds


PS > (measure-command { 1..10 | foreach { gps | select -last 5 | sort WS }}).TotalMilliseconds

: Second command is 3.44 times faster.

# this command is the sixth example of sort-object command from the CTP help files. I changed the extension to ps1.

PS > (measure-Command { 1..10 | foreach { dir *.ps1 | sort @{Expression={$_.LastWriteTime-$_.CreationTime}; Ascending=$
false} | select LastWriteTime, CreationTime}}).TotalMilliseconds
PS > (measure-Command { 1..10 | foreach { dir *.ps1 | select LastWriteTime, CreationTime | sort @{Expression={$_.LastWr
iteTime-$_.CreationTime}; Ascending=$false}}}).TotalMilliseconds


 Result: Second command is 8.2% faster


Test #3  

PS > (measure-Command { 1..10 | foreach { dir | sort -unique | select name}}).TotalMilliseconds


PS > (measure-Command { 1..10 | foreach { dir | select name | sort -unique}}).TotalMilliseconds


Result: Second command is 753% faster!


The reason why 'select then sort' is faster to execute is because there are much less properties for sort-object to work on. When you select certain properties from a collection, select-object creates a new object with just the specified properties of the incoming object thus resulting in a smaller object to process.

One thing is for sure: In most cases, select objects before sorting them, and ALWAYS make sure they produce the SAME output!.


Anonymous said...

Be aware that the order of the commands makes a big difference in functionality, though.

Test #1 -- that is a valid performance trick. I would note that it is 15% faster, not 115% (1.15 times) faster.

Test #2 -- This introduces a bug, as you are only getting the working set from 5 random processes, rather than actually getting the 5 processes with the largest working set.

Test #3 -- This has a good chance of introducing a bug on objects where "Name" isn't a valid key to sort by. For example, imagine sorting DateTime objects. They have a built-in comparison function that PowerShell uses (which sorts by Ticks.) If you instead sorted by a property (such as the the string representation,) the two tests would produce radically different results.

Shay Levy said...


Thank you for the comprehensive comment. I've updated the post.
Test #2 example slipped under the radar :)

I can safely say that in most cases selecting objects (in case both return the same output) before sorting them is much more faster.