You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

343 lines
16 KiB

4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
  1. <?xml version="1.0" encoding="utf-8" standalone="yes"?>
  2. <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  3. <channel>
  4. <title>Posts on Fr1nge&#39;s Personal Blog</title>
  5. <link>http://fr1nge.xyz/posts/</link>
  6. <description>Recent content in Posts on Fr1nge&#39;s Personal Blog</description>
  7. <generator>Hugo -- gohugo.io</generator>
  8. <language>en-us</language>
  9. <copyright>Yigit Colakoglu</copyright>
  10. <lastBuildDate>Wed, 05 May 2021 17:08:12 +0300</lastBuildDate><atom:link href="http://fr1nge.xyz/posts/index.xml" rel="self" type="application/rss+xml" />
  11. <item>
  12. <title>Supercharge Your Bash Scripts with Multiprocessing</title>
  13. <link>http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/</link>
  14. <pubDate>Wed, 05 May 2021 17:08:12 +0300</pubDate>
  15. <guid>http://fr1nge.xyz/posts/supercharge-your-bash-scripts-with-multiprocessing/</guid>
  16. <description>Bash is a great tool for automating tasks and improving you work flow. However, it is SLOW. Adding multiprocessing to the scripts you write can improve the performance greatly.
  17. What is multiprocessing? In the simplest terms, multiprocessing is the principle of splitting the computations or jobs that a script has to do and running them on different processes. In even simpler terms however, multiprocessing is the computer science equivalent of hiring more than one worker when you are constructing a building.</description>
  18. <content>&lt;p&gt;Bash is a great tool for automating tasks and improving you work flow. However,
  19. it is &lt;em&gt;&lt;strong&gt;SLOW&lt;/strong&gt;&lt;/em&gt;. Adding multiprocessing to the scripts you write can improve
  20. the performance greatly.&lt;/p&gt;
  21. &lt;h2 id=&#34;what-is-multiprocessing&#34;&gt;What is multiprocessing?&lt;/h2&gt;
  22. &lt;p&gt;In the simplest terms, multiprocessing is the principle of splitting the
  23. computations or jobs that a script has to do and running them on different
  24. processes. In even simpler terms however, multiprocessing is the computer
  25. science equivalent of hiring more than one
  26. worker when you are constructing a building.&lt;/p&gt;
  27. &lt;h3 id=&#34;introducing-&#34;&gt;Introducing &amp;ldquo;&amp;amp;&amp;rdquo;&lt;/h3&gt;
  28. &lt;p&gt;While implementing multiprocessing the sign &lt;code&gt;&amp;amp;&lt;/code&gt; is going to be our greatest
  29. friend. It is an essential sign if you are writing bash scripts and a very
  30. useful tool in general when you are in the terminal. What &lt;code&gt;&amp;amp;&lt;/code&gt; does is that it
  31. makes the command you added it to the end of run in the background and allows
  32. the rest of the script to continue running as the command runs in the
  33. background. One thing to keep in mind is that since it creates a fork of the
  34. process you ran the command on, if you change a variable that the command in the
  35. background uses while it runs, it will not be affected. Here is a simple
  36. example:&lt;/p&gt;
  37. &lt;div class=&#34;collapsable-code&#34;&gt;
  38. &lt;input id=&#34;1&#34; type=&#34;checkbox&#34; /&gt;
  39. &lt;label for=&#34;1&#34;&gt;
  40. &lt;span class=&#34;collapsable-code__language&#34;&gt;bash&lt;/span&gt;
  41. &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
  42. &lt;/label&gt;
  43. &lt;pre class=&#34;language-bash&#34; &gt;&lt;code&gt;
  44. foo=&amp;#34;yeet&amp;#34;
  45. function run_in_background(){
  46. sleep 0.5
  47. echo &amp;#34;The value of foo in the function run_in_background is $foo&amp;#34;
  48. }
  49. run_in_background &amp;amp; # Spawn the function run_in_background in the background
  50. foo=&amp;#34;YEET&amp;#34;
  51. echo &amp;#34;The value of foo changed to $foo.&amp;#34;
  52. wait # wait for the background process to finish
  53. &lt;/code&gt;&lt;/pre&gt;
  54. &lt;/div&gt;
  55. &lt;p&gt;This should output:&lt;/p&gt;
  56. &lt;pre&gt;&lt;code&gt;The value of foo changed to YEET.
  57. The value of foo in here is yeet
  58. &lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As you can see, the value of &lt;code&gt;foo&lt;/code&gt; did not change in the background process even though
  59. we changed it in the main function.&lt;/p&gt;
  60. &lt;h2 id=&#34;baby-steps&#34;&gt;Baby steps&amp;hellip;&lt;/h2&gt;
  61. &lt;p&gt;Just like anything related to computer science, there is more than one way of
  62. achieving our goal. We are going to take the easier, less intimidating but less
  63. efficient route first before moving on to the big boy implementation. Let&amp;rsquo;s open up vim and get to scripting!
  64. First of all, let&amp;rsquo;s write a very simple function that allows us to easily test
  65. our implementation:&lt;/p&gt;
  66. &lt;div class=&#34;collapsable-code&#34;&gt;
  67. &lt;input id=&#34;1&#34; type=&#34;checkbox&#34; /&gt;
  68. &lt;label for=&#34;1&#34;&gt;
  69. &lt;span class=&#34;collapsable-code__language&#34;&gt;bash&lt;/span&gt;
  70. &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
  71. &lt;/label&gt;
  72. &lt;pre class=&#34;language-bash&#34; &gt;&lt;code&gt;
  73. function tester(){
  74. # A function that takes an int as a parameter and sleeps
  75. echo &amp;#34;$1&amp;#34;
  76. sleep &amp;#34;$1&amp;#34;
  77. echo &amp;#34;ENDED $1&amp;#34;
  78. }
  79. &lt;/code&gt;&lt;/pre&gt;
  80. &lt;/div&gt;
  81. &lt;p&gt;Now that we have something to run in our processes, we now need to spawn several
  82. of them in controlled manner. Controlled being the keyword here. That&amp;rsquo;s because
  83. each system has a maximum number of processes that can be spawned (You can find
  84. that out with the command &lt;code&gt;ulimit -u&lt;/code&gt;). In our case, we want to limit the
  85. processes being ran to the variable &lt;code&gt;num_processes&lt;/code&gt;. Here is the implementation:&lt;/p&gt;
  86. &lt;div class=&#34;collapsable-code&#34;&gt;
  87. &lt;input id=&#34;1&#34; type=&#34;checkbox&#34; /&gt;
  88. &lt;label for=&#34;1&#34;&gt;
  89. &lt;span class=&#34;collapsable-code__language&#34;&gt;bash&lt;/span&gt;
  90. &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
  91. &lt;/label&gt;
  92. &lt;pre class=&#34;language-bash&#34; &gt;&lt;code&gt;
  93. num_processes=$1
  94. pcount=0
  95. for i in {1..10}; do
  96. ((pcount=pcount%num_processes));
  97. ((pcount&amp;#43;&amp;#43;==0)) &amp;amp;&amp;amp; wait
  98. tester $i &amp;amp;
  99. done
  100. &lt;/code&gt;&lt;/pre&gt;
  101. &lt;/div&gt;
  102. &lt;p&gt;What this loop does is that it takes the number of processes you would like to
  103. spawn as an argument and runs &lt;code&gt;tester&lt;/code&gt; in that many processes. Go ahead and test it out!
  104. You might notice however that the processes are run int batches. And the size of
  105. batches is the &lt;code&gt;num_processes&lt;/code&gt; variable. The reason this happens is because
  106. every time we spawn &lt;code&gt;num_processes&lt;/code&gt; processes, we &lt;code&gt;wait&lt;/code&gt; for all the processes
  107. to end. This implementation is not a problem in itself, there are many cases
  108. where you can use this implementation and it works perfectly fine. However, if
  109. you don&amp;rsquo;t want this to happen, we have to dump this naive approach all together
  110. and improve our tool belt.&lt;/p&gt;
  111. &lt;h2 id=&#34;real-chads-use-job-pools&#34;&gt;Real Chads use Job Pools&lt;/h2&gt;
  112. &lt;p&gt;The solution to the bottleneck that was introduced in our previous approach lies
  113. in using job pools. Job pools are where jobs created by a main process get sent
  114. and wait to get executed. This approach solves our problems because instead of
  115. spawning a new process for every copy and waiting for all the processes to
  116. finish we instead only create a set number of processes(workers) which
  117. continuously pick up jobs from the job pool not waiting for any other process to finish.
  118. Here is the implementation that uses job pools. Brace yourselves, because it is
  119. kind of complicated.&lt;/p&gt;
  120. &lt;div class=&#34;collapsable-code&#34;&gt;
  121. &lt;input id=&#34;1&#34; type=&#34;checkbox&#34; /&gt;
  122. &lt;label for=&#34;1&#34;&gt;
  123. &lt;span class=&#34;collapsable-code__language&#34;&gt;bash&lt;/span&gt;
  124. &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
  125. &lt;/label&gt;
  126. &lt;pre class=&#34;language-bash&#34; &gt;&lt;code&gt;
  127. job_pool_end_of_jobs=&amp;#34;NO_JOB_LEFT&amp;#34;
  128. job_pool_job_queue=/tmp/job_pool_job_queue_$$
  129. job_pool_progress=/tmp/job_pool_progress_$$
  130. job_pool_pool_size=-1
  131. job_pool_nerrors=0
  132. function job_pool_cleanup()
  133. {
  134. rm -f ${job_pool_job_queue}
  135. rm -f ${job_pool_progress}
  136. }
  137. function job_pool_exit_handler()
  138. {
  139. job_pool_stop_workers
  140. job_pool_cleanup
  141. }
  142. function job_pool_worker()
  143. {
  144. local id=$1
  145. local job_queue=$2
  146. local cmd=
  147. local args=
  148. exec 7&amp;lt;&amp;gt; ${job_queue}
  149. while [[ &amp;#34;${cmd}&amp;#34; != &amp;#34;${job_pool_end_of_jobs}&amp;#34; &amp;amp;&amp;amp; -e &amp;#34;${job_queue}&amp;#34; ]]; do
  150. flock --exclusive 7
  151. IFS=$&amp;#39;\v&amp;#39;
  152. read cmd args &amp;lt;${job_queue}
  153. set -- ${args}
  154. unset IFS
  155. flock --unlock 7
  156. if [[ &amp;#34;${cmd}&amp;#34; == &amp;#34;${job_pool_end_of_jobs}&amp;#34; ]]; then
  157. echo &amp;#34;${cmd}&amp;#34; &amp;gt;&amp;amp;7
  158. else
  159. { ${cmd} &amp;#34;$@&amp;#34; ; }
  160. fi
  161. done
  162. exec 7&amp;gt;&amp;amp;-
  163. }
  164. function job_pool_stop_workers()
  165. {
  166. echo ${job_pool_end_of_jobs} &amp;gt;&amp;gt; ${job_pool_job_queue}
  167. wait
  168. }
  169. function job_pool_start_workers()
  170. {
  171. local job_queue=$1
  172. for ((i=0; i&amp;lt;${job_pool_pool_size}; i&amp;#43;&amp;#43;)); do
  173. job_pool_worker ${i} ${job_queue} &amp;amp;
  174. done
  175. }
  176. function job_pool_init()
  177. {
  178. local pool_size=$1
  179. job_pool_pool_size=${pool_size:=1}
  180. rm -rf ${job_pool_job_queue}
  181. rm -rf ${job_pool_progress}
  182. touch ${job_pool_progress}
  183. mkfifo ${job_pool_job_queue}
  184. echo 0 &amp;gt;${job_pool_progress} &amp;amp;
  185. job_pool_start_workers ${job_pool_job_queue}
  186. }
  187. function job_pool_shutdown()
  188. {
  189. job_pool_stop_workers
  190. job_pool_cleanup
  191. }
  192. function job_pool_run()
  193. {
  194. if [[ &amp;#34;${job_pool_pool_size}&amp;#34; == &amp;#34;-1&amp;#34; ]]; then
  195. job_pool_init
  196. fi
  197. printf &amp;#34;%s\v&amp;#34; &amp;#34;$@&amp;#34; &amp;gt;&amp;gt; ${job_pool_job_queue}
  198. echo &amp;gt;&amp;gt; ${job_pool_job_queue}
  199. }
  200. function job_pool_wait()
  201. {
  202. job_pool_stop_workers
  203. job_pool_start_workers ${job_pool_job_queue}
  204. }
  205. &lt;/code&gt;&lt;/pre&gt;
  206. &lt;/div&gt;
  207. &lt;p&gt;Ok&amp;hellip; But that the actual fuck is going in here???&lt;/p&gt;
  208. &lt;h3 id=&#34;fifo-and-flock&#34;&gt;fifo and flock&lt;/h3&gt;
  209. &lt;p&gt;In order to understand what this code is doing, you first need to understand two
  210. key commands that we are using, &lt;code&gt;fifo&lt;/code&gt; and &lt;code&gt;flock&lt;/code&gt;. Despite their complicated
  211. names, they are actually quite simple. Let&amp;rsquo;s check their man pages to figure out
  212. their purposes, shall we?&lt;/p&gt;
  213. &lt;h4 id=&#34;man-fifo&#34;&gt;man fifo&lt;/h4&gt;
  214. &lt;p&gt;fifo&amp;rsquo;s man page tells us that:&lt;/p&gt;
  215. &lt;pre&gt;&lt;code&gt;NAME
  216. fifo - first-in first-out special file, named pipe
  217. DESCRIPTION
  218. A FIFO special file (a named pipe) is similar to a pipe, except that
  219. it is accessed as part of the filesystem. It can be opened by multiple
  220. processes for reading or writing. When processes are exchanging data
  221. via the FIFO, the kernel passes all data internally without writing it
  222. to the filesystem. Thus, the FIFO special file has no contents on the
  223. filesystem; the filesystem entry merely serves as a reference point so
  224. that processes can access the pipe using a name in the filesystem.
  225. &lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So put in &lt;strong&gt;very&lt;/strong&gt; simple terms, a fifo is a named pipe that can allows
  226. communication between processes. Using a fifo allows us to loop through the jobs
  227. in the pool without having to delete them manually, because once we read them
  228. with &lt;code&gt;read cmd args &amp;lt; ${job_queue}&lt;/code&gt;, the job is out of the pipe and the next
  229. read outputs the next job in the pool. However the fact that we have multiple
  230. processes introduces one caveat, what if two processes access the pipe at the
  231. same time? They would run the same command and we don&amp;rsquo;t want that. So we resort
  232. to using &lt;code&gt;flock&lt;/code&gt;.&lt;/p&gt;
  233. &lt;h4 id=&#34;man-flock&#34;&gt;man flock&lt;/h4&gt;
  234. &lt;p&gt;flock&amp;rsquo;s man page defines it as:&lt;/p&gt;
  235. &lt;pre&gt;&lt;code&gt; SYNOPSIS
  236. flock [options] file|directory command [arguments]
  237. flock [options] file|directory -c command
  238. flock [options] number
  239. DESCRIPTION
  240. This utility manages flock(2) locks from within shell scripts or from
  241. the command line.
  242. The first and second of the above forms wrap the lock around the
  243. execution of a command, in a manner similar to su(1) or newgrp(1).
  244. They lock a specified file or directory, which is created (assuming
  245. appropriate permissions) if it does not already exist. By default, if
  246. the lock cannot be immediately acquired, flock waits until the lock is
  247. available.
  248. The third form uses an open file by its file descriptor number. See
  249. the examples below for how that can be used.
  250. &lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Cool, translated to modern English that us regular folks use, &lt;code&gt;flock&lt;/code&gt; is a thin
  251. wrapper around the C standard function &lt;code&gt;flock&lt;/code&gt; (see &lt;code&gt;man 2 flock&lt;/code&gt; if you are
  252. interested). It is used to manage locks and has several forms. The one we are
  253. interested in is the third one. According to the man page, it uses and open file
  254. by its &lt;strong&gt;file descriptor number&lt;/strong&gt;. Aha! so that was the purpose of the &lt;code&gt;exec 7&amp;lt;&amp;gt; ${job_queue}&lt;/code&gt; calls in the &lt;code&gt;job_pool_worker&lt;/code&gt; function. It would essentially
  255. assign the file descriptor 7 to the fifo &lt;code&gt;job_queue&lt;/code&gt; and afterwards lock it with
  256. &lt;code&gt;flock --exclusive 7&lt;/code&gt;. Cool. This way only one process at a time can read from
  257. the fifo &lt;code&gt;job_queue&lt;/code&gt;&lt;/p&gt;
  258. &lt;h2 id=&#34;great-but-how-do-i-use-this&#34;&gt;Great! But how do I use this?&lt;/h2&gt;
  259. &lt;p&gt;It depends on your preference, you can either save this in a file(e.g.
  260. job_pool.sh) and source it in your bash script. Or you can simply paste it
  261. inside an existing bash script. Whatever tickles your fancy. I have also
  262. provided an example that replicates our first implementation. Just paste the
  263. below code under our &amp;ldquo;chad&amp;rdquo; job pool script.&lt;/p&gt;
  264. &lt;div class=&#34;collapsable-code&#34;&gt;
  265. &lt;input id=&#34;1&#34; type=&#34;checkbox&#34; /&gt;
  266. &lt;label for=&#34;1&#34;&gt;
  267. &lt;span class=&#34;collapsable-code__language&#34;&gt;bash&lt;/span&gt;
  268. &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
  269. &lt;/label&gt;
  270. &lt;pre class=&#34;language-bash&#34; &gt;&lt;code&gt;
  271. function tester(){
  272. # A function that takes an int as a parameter and sleeps
  273. echo &amp;#34;$1&amp;#34;
  274. sleep &amp;#34;$1&amp;#34;
  275. echo &amp;#34;ENDED $1&amp;#34;
  276. }
  277. num_workers=$1
  278. job_pool_init $num_workers
  279. pcount=0
  280. for i in {1..10}; do
  281. job_pool_run tester &amp;#34;$i&amp;#34;
  282. done
  283. job_pool_wait
  284. job_pool_shutdown
  285. &lt;/code&gt;&lt;/pre&gt;
  286. &lt;/div&gt;
  287. &lt;p&gt;Hopefully this article was(or will be) helpful to you. From now on, you don&amp;rsquo;t
  288. ever have to write single threaded bash scripts like normies :)&lt;/p&gt;
  289. </content>
  290. </item>
  291. </channel>
  292. </rss>