<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>I don't know anything about spark, so I'm not sure how it starts
      up its workers.  My first suggestion would be to use the .pth
      method of starting coverage in subprocesses rather than the
      sitecustomize technique, and see if that works better.</p>
    <p>--Ned.<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 4/26/16 9:01 AM, Kun Chen wrote:<br>
    </div>
    <blockquote
cite="mid:CAPTVxySrrmtV7kqYap0JJUnrctR5ifspXCwjtPFL1TCodjDdcQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr">Hi, all
            <div><br>
            </div>
            <div>I tried to run a simple pyspark application on spark in
              local mode, and was hoping to get the coverage data file
              generated somewhere for future use.</div>
            <div><br>
            </div>
            <div>0. I put the following lines at the head of
              /usr/lib/python2.7/sitecustomize.py</div>
            <div>
              <div>import coverage</div>
              <div>coverage.process_startup()</div>
            </div>
            <div><br>
            </div>
            <div>1. I set the following env variable in ~/.bashrc</div>
            <div>export
              COVERAGE_PROCESS_START=/home/kunchen/git/es-signal/.coveragerc</div>
            <div><br>
            </div>
            <div>2. the config file
              '/home/kunchen/git/es-signal/.coveragerc' has following
              content</div>
            <div>
              <div>[run]</div>
              <div>parallel = True</div>
              <div>concurrency = multiprocessing</div>
              <div>omit =</div>
              <div>    *dist-packages*</div>
              <div>    *pyspark*</div>
              <div>    *spark-1.5.2*</div>
              <div>cover_pylib = False</div>
              <div>data_file = /home/kunchen/.coverage</div>
              <div><br>
              </div>
            </div>
            <div>3. I put ci3.py and test.py both
              in /home/kunchen/Downloads/software/spark-1.5.2 ( my spark
              home )</div>
            <div><br>
            </div>
            <div>4. in my spark home, I ran the following command to
              submit and run the code.</div>
            <div>spark-submit --master local --py-files=ci3.py test.py<br>
            </div>
            <div><br>
            </div>
            <div><br>
            </div>
            <div>6. after the application finished, I got two coverage
              files in /home/kunchen</div>
            <div>.coverage.kunchen-es-pc.31117.003485<br>
            </div>
            <div>.coverage.kunchen-es-pc.31176.826660<br>
            </div>
            <div><br>
            </div>
            <div>but according to the process id in the file names and
              the content of those files, none of them was generated by
              the spark worker process(or thread? not sure here).</div>
            <div><br>
            </div>
            <div>My question is what I have to do to get the coverage
              data of the code being executed by the spark workers?</div>
          </div>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
testing-in-python mailing list
<a class="moz-txt-link-abbreviated" href="mailto:testing-in-python@lists.idyll.org">testing-in-python@lists.idyll.org</a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/testing-in-python">http://lists.idyll.org/listinfo/testing-in-python</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>