[TIP] how to generate coverage info for pyspark applications
ned at nedbatchelder.com
Tue Apr 26 08:32:38 PDT 2016
I don't know anything about spark, so I'm not sure how it starts up its
workers. My first suggestion would be to use the .pth method of
starting coverage in subprocesses rather than the sitecustomize
technique, and see if that works better.
On 4/26/16 9:01 AM, Kun Chen wrote:
> Hi, all
> I tried to run a simple pyspark application on spark in local mode,
> and was hoping to get the coverage data file generated somewhere for
> future use.
> 0. I put the following lines at the head of
> import coverage
> 1. I set the following env variable in ~/.bashrc
> export COVERAGE_PROCESS_START=/home/kunchen/git/es-signal/.coveragerc
> 2. the config file '/home/kunchen/git/es-signal/.coveragerc' has
> following content
> parallel = True
> concurrency = multiprocessing
> omit =
> cover_pylib = False
> data_file = /home/kunchen/.coverage
> 3. I put ci3.py and test.py both
> in /home/kunchen/Downloads/software/spark-1.5.2 ( my spark home )
> 4. in my spark home, I ran the following command to submit and run the
> spark-submit --master local --py-files=ci3.py test.py
> 6. after the application finished, I got two coverage files in
> but according to the process id in the file names and the content of
> those files, none of them was generated by the spark worker process(or
> thread? not sure here).
> My question is what I have to do to get the coverage data of the code
> being executed by the spark workers?
> testing-in-python mailing list
> testing-in-python at lists.idyll.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the testing-in-python