en-US/SplitPipeline.dll-Help.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
<?xml version="1.0" encoding="utf-8"?>
<helpItems xmlns="http://msh" schema="maml">
<command:command xmlns:maml="http://schemas.microsoft.com/maml/2004/10" xmlns:command="http://schemas.microsoft.com/maml/dev/command/2004/10" xmlns:dev="http://schemas.microsoft.com/maml/dev/2004/10">
<command:details>
<command:name>Split-Pipeline</command:name>
<maml:description>
<maml:para>Splits pipeline input and processes its parts by parallel pipelines.</maml:para>
</maml:description>
<command:verb>Split</command:verb>
<command:noun>Pipeline</command:noun>
</command:details>
<maml:description>
<maml:para>The cmdlet splits the input, processes its parts by parallel pipelines, and
outputs the results for further processing. It may work without collecting
the whole input, large or infinite.
 
When Load is omitted the whole input is collected and split evenly between
Count parallel pipelines. This method shows the best performance in simple
cases. In other cases, e.g. on large or slow input, Load should be used in
order to enable processing of partially collected input.
 
The cmdlet creates several pipelines. Each pipeline is created when input
parts are available, created pipelines are busy, and their number is less
than Count. Each pipeline is used for processing one or more input parts.
 
Because each pipeline works in its own runspace variables, functions, and
modules from the main script are not automatically available for pipeline
scripts. Such items should be specified by Variable, Function, and Module
parameters in order to be available.
 
The Begin and End scripts are invoked for each created pipeline once before
and after processing. Each input part is piped to the script block Script.
The Finally script is invoked after all, even on failures or stopping.
 
If number of created pipelines is equal to Count and all pipelines are busy
then incoming input items are enqueued for later processing. If the queue
size hits the limit then the algorithm waits for any pipeline to complete.
 
Input parts are not necessarily processed in the same order as they come.
But output parts can be ordered according to input, use the switch Order.
 
In rare scenarios when synchronous code must be invoked in pipelines,
use the helper $Pipeline.Lock, see the repository tests for examples.
 
ERROR PREFERENCE
 
If the current error preference is Stop and the internal pipelines emit
errors (even non-terminating) then Split-Pipeline treats these errors as
terminating per its current environment. To avoid this consider using
-ErrorAction Continue.</maml:para>
</maml:description>
<command:syntax>
<command:syntaxItem>
<maml:name>Split-Pipeline</maml:name>
<command:parameter required="true" position="0" >
<maml:name>Script</maml:name>
<command:parameterValue required="true">ScriptBlock</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="1" >
<maml:name>InputObject</maml:name>
<command:parameterValue required="true">PSObject</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>ApartmentState</maml:name>
<command:parameterValue required="true">ApartmentState</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Begin</maml:name>
<command:parameterValue required="true">ScriptBlock</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Count</maml:name>
<command:parameterValue required="true">Int32[]</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>End</maml:name>
<command:parameterValue required="true">ScriptBlock</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Filter</maml:name>
<command:parameterValue required="true">PSObject</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Finally</maml:name>
<command:parameterValue required="true">ScriptBlock</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Function</maml:name>
<command:parameterValue required="true">String[]</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Load</maml:name>
<command:parameterValue required="true">Int32[]</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Module</maml:name>
<command:parameterValue required="true">String[]</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Variable</maml:name>
<command:parameterValue required="true">String[]</command:parameterValue>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Order</maml:name>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Refill</maml:name>
</command:parameter>
</command:syntaxItem>
</command:syntax>
<command:parameters>
<command:parameter required="true" position="0" >
<maml:name>Script</maml:name>
<maml:description>
<maml:para>The script invoked for each input part of each pipeline with an input
part piped to it. The script either processes the whole part ($input)
or each item ($_) separately in the &quot;process&quot; block. Examples:
 
    # Process the whole $input part:
    ... | Split-Pipeline { $input | %{ $_ } }
 
    # Process input items $_ separately:
    ... | Split-Pipeline { process { $_ } }
 
The script may have any of &quot;begin&quot;, &quot;process&quot;, and &quot;end&quot; blocks:
 
    ... | Split-Pipeline { begin {...} process { $_ } end {...} }
 
Note that &quot;begin&quot; and &quot;end&quot; blocks are called for each input part but
scripts defined by parameters Begin and End are called for pipelines.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" pipelineInput="true (ByValue)" position="1" >
<maml:name>InputObject</maml:name>
<maml:description>
<maml:para>Input objects processed by parallel pipelines. Normally this parameter
is not used directly, objects are sent using the pipeline. But it is
fine to specify the input using this parameter.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>ApartmentState</maml:name>
<maml:description>
<maml:para>Specify either &quot;MTA&quot; (multi-threaded ) or &quot;STA&quot; (single-threaded) for
the apartment states of the threads used to run commands in pipelines.</maml:para>
<maml:para>Values : STA, MTA, Unknown</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Begin</maml:name>
<maml:description>
<maml:para>The script invoked for each pipeline on creation before processing. The
goal is to initialize the runspace to be used by the pipeline, normally
to set some variables, dot-source scripts, import modules, and etc.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Count</maml:name>
<maml:description>
<maml:para>Specifies the parallel pipeline count. The default value is the number
or processors. For intensive jobs use the default or decreased value,
especially if there are other tasks working at the same time. But for
jobs not consuming much processor resources increasing the number may
improve performance.
 
The parameter accepts an array of one or two integers. A single value
specifies the recommended number of pipelines. Two arguments specify
the minimum and maximum numbers and the recommended value is set to
Max(Count[0], Min(Count[1], ProcessorCount)).</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>End</maml:name>
<maml:description>
<maml:para>The script invoked for each pipeline once after processing. The goal
is, for example, to output some results accumulated during processing
of input parts by the pipeline. Consider to use Finally for releasing
resources instead of End or in addition to it.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Filter</maml:name>
<maml:description>
<maml:para>Either a hashtable for collecting unique input objects or a script used
in order to test an input object. Input includes extra objects added in
Refill mode. In fact, this filter is mostly needed for Refill.
 
A hashtable is used in order to collect and enqueue unique objects. In
Refill mode it may be useful for avoiding infinite loops.
 
A script is invoked in a child scope of the scope where the cmdlet is
invoked. The first argument is an object being tested. Returned $true
tells to add an object to the input queue.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Finally</maml:name>
<maml:description>
<maml:para>The script invoked for each opened pipeline before its closing, even on
terminating errors or stopping (Ctrl-C). It is normally needed in order
to release resources created by Begin. Output is ignored. If Finally
fails then its errors are written as warnings because it has to be
called for remaining pipelines.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Function</maml:name>
<maml:description>
<maml:para>Functions imported from the current runspace to parallel.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Load</maml:name>
<maml:description>
<maml:para>Enables processing of partially collected input and specifies input
part limits. If it is omitted then the whole input is collected and
split evenly between pipelines.
 
The parameter accepts an array of one or two integers. The first is the
minimum number of objects per pipeline. If it is less than 1 then Load
is treated as omitted. The second number is the optional maximum.
 
If processing is fast then it is important to specify a proper minimum.
Otherwise Split-Pipeline may work even slower than a standard pipeline.
 
Setting the maximum causes more frequent output. For example, this may
be important for feeding simultaneously working downstream pipelines.
 
Setting the maximum number is also needed for potentially large input
in order to limit the input queue size and avoid out of memory issues.
The maximum queue size is set internally to Load[1] * Count.
 
Use the switch Verbose in order to get some statistics which may help
to choose suitable load limits.
 
CAUTION: The queue limit may be ignored and exceeded if Refill is used.
Any number of objects written via [ref] go straight to the input queue.
Thus, depending on data Refill scenarios may fail due to out of memory.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Module</maml:name>
<maml:description>
<maml:para>Modules imported to parallel runspaces.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Order</maml:name>
<maml:description>
<maml:para>Tells to output part results in the same order as input parts arrive.
The algorithm may work slower.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Refill</maml:name>
<maml:description>
<maml:para>Tells to refill the input by [ref] objects from output. Other objects
go to output as usual. This convention is used for processing items of
hierarchical data structures: child container items come back to input,
leaf items or other data produced by processing go to output.
 
NOTE: Refilled input makes infinite loops possible for some data. Use
Filter in order to exclude already processed objects and avoid loops.</maml:para>
</maml:description>
</command:parameter>
<command:parameter required="false" position="named" >
<maml:name>Variable</maml:name>
<maml:description>
<maml:para>Variables imported from the current runspace to parallel.</maml:para>
</maml:description>
</command:parameter>
</command:parameters>
<command:inputTypes>
<command:inputType>
<dev:type>
<maml:name>Object</maml:name>
</dev:type>
<maml:description>
<maml:para>Input objects processed by parallel pipelines.</maml:para>
</maml:description>
</command:inputType>
</command:inputTypes>
<command:returnValues>
<command:returnValue>
<dev:type>
<maml:name>Object</maml:name>
</dev:type>
<maml:description>
<maml:para>Output of the Begin, Script, and End script blocks. The scripts Begin
and End are invoked once for each pipeline before and after processing.
The script Script is invoked repeatedly with input parts piped to it.</maml:para>
</maml:description>
</command:returnValue>
</command:returnValues>
<command:examples>
<command:example>
<maml:title>-------------------------- EXAMPLE 1 --------------------------</maml:title>
<dev:code>1..10 | . {process{ $_; sleep 1 }}
1..10 | Split-Pipeline -Count 10 {process{ $_; sleep 1 }}</dev:code>
<dev:remarks>
<maml:para>Two commands perform the same job simulating long but not processor
consuming operations on each item. The first command takes about 10
seconds. The second takes about 2 seconds due to Split-Pipeline.</maml:para>
<maml:para></maml:para>
</dev:remarks>
</command:example>
<command:example>
<maml:title>-------------------------- EXAMPLE 2 --------------------------</maml:title>
<dev:code>$PSHOME | Split-Pipeline -Refill {process{
    foreach($item in Get-ChildItem -LiteralPath $_ -Force) {
        if ($item.PSIsContainer) {
            [ref]$item.FullName
        }
        else {
            $item.Length
        }
    }
}} | Measure-Object -Sum</dev:code>
<dev:remarks>
<maml:para>This is an example of Split-Pipeline with refilled input. By the convention
output [ref] objects refill the input, other objects go to output as usual.
 
The code calculates the number and size of files in $PSHOME. It is a &quot;how
to&quot; sample, performance gain is not expected because the code is trivial
and works relatively fast.
 
See also another example with simulated slow data requests:
https://github.com/nightroman/SplitPipeline/blob/master/Tests/Test-Refill.ps1</maml:para>
<maml:para></maml:para>
</dev:remarks>
</command:example>
<command:example>
<maml:title>-------------------------- EXAMPLE 3 --------------------------</maml:title>
<dev:remarks>
<maml:para>Because each pipeline works in its own runspace variables, functions, and
modules from the main script are not automatically available for pipeline
scripts. Such items should be specified by Variable, Function, and Module
parameters in order to be available.
 
&gt; $arr = @(&apos;one&apos;, &apos;two&apos;, &apos;three&apos;); 0..2 | . {process{ $arr[$_] }}
one
two
three
 
&gt; $arr = @(&apos;one&apos;, &apos;two&apos;, &apos;three&apos;); 0..2 | Split-Pipeline {process{ $arr[$_] }}
Split-Pipeline : Cannot index into a null array.
...
 
&gt; $arr = @(&apos;one&apos;, &apos;two&apos;, &apos;three&apos;); 0..2 | Split-Pipeline -Variable arr {process{ $arr[$_] }}
one
two
three</maml:para>
</dev:remarks>
</command:example>
</command:examples>
<maml:relatedLinks>
<maml:navigationLink>
<maml:linkText>Project site:</maml:linkText>
<maml:uri>https://github.com/nightroman/SplitPipeline</maml:uri>
</maml:navigationLink>
</maml:relatedLinks>
</command:command>
</helpItems>