[MSNoise] [EXTERNAL] Re: MWCS sql crash

Thomas Lecocq Thomas.Lecocq at seismology.be
Fri Oct 19 19:18:30 UTC 2018


Ahston

I'm pretty sure the issue #1 is linked to your mysql server not being 
able to handle long connections/too numerous connections.

Thomas


Le 19/10/2018 à 21:13, Flinders, Ashton a écrit :
> It ran for less than a day. Approximately ~12k MWCS jobs, but not all of
> those ran because of the REF file issue. I reran it last night, and it
> ended just in the last hour - and while it did compute all the jobs, it
> gave me the same error at the end. So it seems like an exit error once the
> compute_mwcs job is done (no exit code, and then SQL hangs?).
>
> Of course this is could all be complicated since I am using a Slurm job
> scheduler to hand processor assignment.
>
> On Fri, Oct 19, 2018 at 11:01 AM Thomas Lecocq <Thomas.Lecocq at seismology.be>
> wrote:
>
>> Ashton,
>>
>> No, I don't think it's linked. If the REF file is not available, the
>> code should crash and not hang.
>>
>> How long did your MWCS ran ? How many MWCS jobs are there ? How many
>> stations / stations-pairs ?
>>
>> what is the content of your my.cnf / or mysql configuration file ?
>>
>> Thomas
>>
>>
>> Le 19/10/2018 à 18:53, Flinders, Ashton a écrit :
>>> Hi Thomas, I actually think this was related to the PR I submitted the
>>> other day. Since I have a mix of stations (some 3-comp some only Z), when
>>> mwcs_compute tried to calculate RR for a station-pair that only had ZZ,
>> and
>>> it couldnt find the reference function it crashed/hanged. Then after a
>>> while hanging it threw the SQL error.
>>>
>>> On Thu, Oct 18, 2018 at 11:15 PM Thomas Lecocq <
>> Thomas.Lecocq at seismology.be>
>>> wrote:
>>>
>>>> Hi Ashton
>>>>
>>>> it seems your MWCS computation took a looooooong time and the MySQL
>>>> connection was killed during that time. Can you confirm ?
>>>>
>>>> Thomas
>>>>
>>>>
>>>> Le 18/10/2018 à 18:54, Flinders, Ashton a écrit :
>>>>> I get a strange crash part way through my MWCS step (see below), and
>>>>> compute_MWCS is not finishing. E.g. I have 5 frequency bands, but for
>>>> bands
>>>>> 2-4 only 1 of 10 station pair MWCS's get calculated, even though all
>> the
>>>>> data is there in the stacks. I have tried rerunning comute_mwcs by
>>>> changing
>>>>> the flag back to 'T' for the station pairs where mwcs did not get
>>>>> calculated, but it still crashes. This crash is repeatable.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> (p.s. I also initially tried remaking the stacks, but it crashed at the
>>>>> same point. The data looks good in the stacks)
>>>>>
>>>>> -ashton
>>>>>
>>>>> During handling of the above exception, another exception occurred:
>>>>>
>>>>>
>>>>> Traceback (most recent call last):
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/.local/lib/python3.5/site-packages/sqlalchemy/engine/base.py",
>>>>> line 1139, in _execute_context
>>>>>
>>>>>        context)
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/.local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>>>>> line 450, in do_execute
>>>>>
>>>>>        cursor.execute(statement, parameters)
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/anaconda3/envs/msnoise/lib/python3.5/site-packages/pymysql/cursors.py",
>>>>> line 165, in execute
>>>>>
>>>>>        result = self._query(query)
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/anaconda3/envs/msnoise/lib/python3.5/site-packages/pymysql/cursors.py",
>>>>> line 321, in _query
>>>>>
>>>>>        conn.query(q)
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/anaconda3/envs/msnoise/lib/python3.5/site-packages/pymysql/connections.py",
>>>>> line 859, in query
>>>>>
>>>>>        self._execute_command(COMMAND.COM_QUERY, sql)
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/anaconda3/envs/msnoise/lib/python3.5/site-packages/pymysql/connections.py",
>>>>> line 1096, in _execute_command
>>>>>
>>>>>        self._write_bytes(packet)
>>>>>
>>>>>      File
>>>>>
>> "/home/ashton/anaconda3/envs/msnoise/lib/python3.5/site-packages/pymysql/connections.py",
>>>>> line 1048, in _write_bytes
>>>>>
>>>>>        "MySQL server has gone away (%r)" % (e,))
>>>>>
>>>>> pymysql.err.OperationalError: (2006, "MySQL server has gone away
>>>>> (BrokenPipeError(32, 'Broken pipe'))")
>>>>>
>>>>>
>>>>> The above exception was the direct cause of the following exception:
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> MSNoise mailing list
>>>> MSNoise at mailman-as.oma.be
>>>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>>>>
>> _______________________________________________
>> MSNoise mailing list
>> MSNoise at mailman-as.oma.be
>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>>
>



More information about the MSNoise mailing list