Tokenizing PowerShell scripts
In Windows PowerShell Virtual User Group Meeting #3, Lee Holmes presented a script that can do Syntax Highlighting in PowerShell in PowerShell v2 (CTP).
Reading through Lee's script, I remembered an older post where I was trying to take a script and resolve any aliases it contained to its full cmdlet name. Back then, Richard Siddaway commented that there is one major drawback. If you used the "%" sign as an alias for the Foreach-Object cmdlet then things may go wrong and behave unexpectedly, since "%" is also the symbol for modulo.
Well, not in the case of Tokenize. The Tokenize method takes the script content and breaks it apart into the script ingredients (PSTokenType). In the case of modulo, it knows when it's used as an alias and when it's used as an operator, amazing stuff!
To get a full list of all token types:
PS > Get-EnumValues System.Management.Automation.PSTokenType Name Value ---- ----- Unknown 0 Command 1 CommandParameter 2 CommandArgument 3 Number 4 String 5 Variable 6 Member 7 LoopLabel 8 Attribute 9 Type 10 Operator 11 GroupStart 12 GroupEnd 13 Keyword 14 Comment 15 StatementSeparator 16 NewLine 17 LineContinuation 18 Position 19
So, here we go... a sample script that contains aliases as well as the modulo sign:
### demo.ps1 ###
dir | ? { ($_.length % 2) -eq 0 } | % { $_.name }
And here's a modified Convert-AliasToCmdlet script:
### Convert-AliasToCmdlet ### param($file) function Get-TokenType($token){ switch($token.type){ "Variable" {'${0}' -f $token.content} "Type" {"[{0}]" -f $token.content} "Command" { $alias = (get-alias | where {$_.name -eq $token.content}).ResolvedCommandName if($alias) {$alias} else {$token.content} } default {$token.content} } } $column=1 $content = [IO.File]::ReadAllText($file) $tokens = [System.Management.Automation.PsParser]::Tokenize($content, [ref]$null) $tokens | foreach { $padding=(" " * ($_.StartColumn - $column)) $column=$_.EndColumn write-host ($padding + (Get-TokenType $_)) -NoNewline } write-host
The result shows that all aliases including the modulo sign were resolved as expected and each by its own usage context.
PS > Convert-AliasToCmdlet demo.ps1 Get-ChildItem | Where-Object { ($_.length % 2) -eq 0 } | ForEach-Object { $_.name }
One thing I wasn't able to find is how to preserve TAB characters.
No comments:
Post a Comment