[TIP] how to generate coverage info for pyspark applications

Kun Chen kunchen at everstring.com
Tue Apr 26 06:01:02 PDT 2016


Hi, all

I tried to run a simple pyspark application on spark in local mode, and was
hoping to get the coverage data file generated somewhere for future use.

0. I put the following lines at the head of
/usr/lib/python2.7/sitecustomize.py
import coverage
coverage.process_startup()

1. I set the following env variable in ~/.bashrc
export COVERAGE_PROCESS_START=/home/kunchen/git/es-signal/.coveragerc

2. the config file '/home/kunchen/git/es-signal/.coveragerc' has following
content
[run]
parallel = True
concurrency = multiprocessing
omit =
    *dist-packages*
    *pyspark*
    *spark-1.5.2*
cover_pylib = False
data_file = /home/kunchen/.coverage

3. I put ci3.py and test.py both
in /home/kunchen/Downloads/software/spark-1.5.2 ( my spark home )

4. in my spark home, I ran the following command to submit and run the code.
spark-submit --master local --py-files=ci3.py test.py


6. after the application finished, I got two coverage files in /home/kunchen
.coverage.kunchen-es-pc.31117.003485
.coverage.kunchen-es-pc.31176.826660

but according to the process id in the file names and the content of those
files, none of them was generated by the spark worker process(or thread?
not sure here).

My question is what I have to do to get the coverage data of the code being
executed by the spark workers?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20160426/a4ea1c92/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ci3.py
Type: text/x-python
Size: 369 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20160426/a4ea1c92/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 231 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20160426/a4ea1c92/attachment-0001.py>


More information about the testing-in-python mailing list