Monday, February 11, 2008

Tokenizing PowerShell scripts

 

In Windows PowerShell Virtual User Group Meeting #3, Lee Holmes presented a script that can do Syntax Highlighting in PowerShell in PowerShell v2 (CTP).

Reading through Lee's script, I remembered an older post where I was trying to take a script and resolve any aliases it contained to its full cmdlet name. Back then, Richard Siddaway commented that there is one major drawback. If you used the "%" sign as an alias for the Foreach-Object cmdlet then things may go wrong and behave unexpectedly, since "%" is also the symbol for modulo.

Well, not in the case of Tokenize. The Tokenize method takes the script content and breaks it apart into the script ingredients (PSTokenType). In the case of modulo, it knows when it's used as an alias and when it's used as an operator, amazing stuff!

 

To get a full list of all token types: 

PS > Get-EnumValues System.Management.Automation.PSTokenType

              Name Value
              ---- -----
           Unknown     0
           Command     1
  CommandParameter     2
   CommandArgument     3
            Number     4
            String     5
          Variable     6
            Member     7
         LoopLabel     8
         Attribute     9
              Type    10
          Operator    11
        GroupStart    12
          GroupEnd    13
           Keyword    14
           Comment    15
StatementSeparator    16
           NewLine    17
  LineContinuation    18
          Position    19

 

So, here we go... a sample script that contains aliases as well as the modulo sign:  

### demo.ps1 ###
dir | ? { ($_.length % 2) -eq 0 } | % { $_.name }

And here's a modified Convert-AliasToCmdlet script:
### Convert-AliasToCmdlet ###

param($file)

function Get-TokenType($token){
    switch($token.type){
        "Variable" {'${0}' -f $token.content}
        "Type" {"[{0}]" -f $token.content}
        "Command" { 
                $alias = (get-alias | where {$_.name -eq $token.content}).ResolvedCommandName
                if($alias) {$alias} else {$token.content}
        }
        default {$token.content}    
    }
}

$column=1
$content = [IO.File]::ReadAllText($file)
$tokens = [System.Management.Automation.PsParser]::Tokenize($content, [ref]$null)
$tokens | foreach {
    $padding=(" " * ($_.StartColumn - $column))
    $column=$_.EndColumn
    write-host ($padding + (Get-TokenType $_)) -NoNewline
}

write-host

 

The result shows that all aliases including the modulo sign were resolved as expected and each by its own usage context.

PS > Convert-AliasToCmdlet demo.ps1

Get-ChildItem | Where-Object {
 ($_.length % 2) -eq 0
} | ForEach-Object {
 $_.name
}

 

One thing I wasn't able to find is how to preserve TAB characters.

No comments: