<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>I don't know anything about spark, so I'm not sure how it starts
up its workers. My first suggestion would be to use the .pth
method of starting coverage in subprocesses rather than the
sitecustomize technique, and see if that works better.</p>
<p>--Ned.<br>
</p>
<br>
<div class="moz-cite-prefix">On 4/26/16 9:01 AM, Kun Chen wrote:<br>
</div>
<blockquote
cite="mid:CAPTVxySrrmtV7kqYap0JJUnrctR5ifspXCwjtPFL1TCodjDdcQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">Hi, all
<div><br>
</div>
<div>I tried to run a simple pyspark application on spark in
local mode, and was hoping to get the coverage data file
generated somewhere for future use.</div>
<div><br>
</div>
<div>0. I put the following lines at the head of
/usr/lib/python2.7/sitecustomize.py</div>
<div>
<div>import coverage</div>
<div>coverage.process_startup()</div>
</div>
<div><br>
</div>
<div>1. I set the following env variable in ~/.bashrc</div>
<div>export
COVERAGE_PROCESS_START=/home/kunchen/git/es-signal/.coveragerc</div>
<div><br>
</div>
<div>2. the config file
'/home/kunchen/git/es-signal/.coveragerc' has following
content</div>
<div>
<div>[run]</div>
<div>parallel = True</div>
<div>concurrency = multiprocessing</div>
<div>omit =</div>
<div> *dist-packages*</div>
<div> *pyspark*</div>
<div> *spark-1.5.2*</div>
<div>cover_pylib = False</div>
<div>data_file = /home/kunchen/.coverage</div>
<div><br>
</div>
</div>
<div>3. I put ci3.py and test.py both
in /home/kunchen/Downloads/software/spark-1.5.2 ( my spark
home )</div>
<div><br>
</div>
<div>4. in my spark home, I ran the following command to
submit and run the code.</div>
<div>spark-submit --master local --py-files=ci3.py test.py<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>6. after the application finished, I got two coverage
files in /home/kunchen</div>
<div>.coverage.kunchen-es-pc.31117.003485<br>
</div>
<div>.coverage.kunchen-es-pc.31176.826660<br>
</div>
<div><br>
</div>
<div>but according to the process id in the file names and
the content of those files, none of them was generated by
the spark worker process(or thread? not sure here).</div>
<div><br>
</div>
<div>My question is what I have to do to get the coverage
data of the code being executed by the spark workers?</div>
</div>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
testing-in-python mailing list
<a class="moz-txt-link-abbreviated" href="mailto:testing-in-python@lists.idyll.org">testing-in-python@lists.idyll.org</a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/testing-in-python">http://lists.idyll.org/listinfo/testing-in-python</a>
</pre>
</blockquote>
<br>
</body>
</html>