[MSNoise] advice on processing database subsets

Mon May 2 07:41:20 UTC 2016

Hi again,

Thanks for the comments. Here's what I did to set just a singe day for 
processing, so that I can test the parameter settings. I looked into the 
API code and needed to import from msnoise_table_def.py, but it seems to 
work OK:

from msnoise.api import connect
from msnoise_table_def import Job

set_day = '2013-10-14'
jobtype = 'CC'
session = connect()
jobs_set   = session.query(Job).filter(Job.jobtype == 
jobtype).filter(Job.day == set_day)
jobs_set.update({Job.flag: 'T'})
jobs_unset = session.query(Job).filter(Job.jobtype == 
jobtype).filter(Job.day != set_day)
jobs_unset.update({Job.flag: 'D'})
session.commit()

So now I have a jobs table with just the day I want set to 'T'. I hoped 
I was ready to try 'msnoise compute_cc', but it seems to want me to set 
Filters first. This appears to be referring to the MCWS filter 
parameters? I am a little surprised since I thought MCWS would come 
later, and don't understand how the CC computation would be dependent on 
the MCWS filter parameters.

To tell you the truth, at the moment I am more interested in using the 
msnoise cross-correlations as input to a tomography algorithm, rather 
than in MCWS itself. In any case I am keen to look at the CC to see that 
they make sense, before i move to anything else.

Would it be possible to please advise on whether there is a way to do 
compute_cc without having to worry about the MCWS parameters?

Thanks,

- Phil

Thomas Lecocq wrote:
> Hi guys,
>
> Yeah, I have been thinking about a "benchmark" mode for quite a number 
> of weeks, i.e. since I tested a first run of PWS in order to compare 
> the final dv/v ; to compare properly I have to test quite a number of 
> parameters.
>
> My current idea is to run a set of possible parameters, for different 
> steps. This would lead to a large number of branches in a large tree, 
> but it would definitively be quite interesting.
>
> I am really not in favor of duplicating the database, rather to 
> create  a "config" file with an caller script, to set/change/ 
> parameters... Theoretically, the API should let you do all the 
> actions. The only thing that would be a little trickier is to 
> store/reuse the results of each step in order to compare them. For 
> info, using the "shutil" module you can move/copy files easily.
>
> Let's keep brainstorming on that and see how it goes !
>
> Cheers
>
> Thomas
>
> On 01/05/2016 16:52, Lukas Preiswerk wrote:
>> Hi all
>>
>> I was in a similar situation as Phil, and I used (1). It’s not
>> straightforward to copy the database and make msnoise work again in a 
>> new
>> directory. But it’s definitely possible.
>> I actually think it would be a nice addition to msnoise to not only 
>> make an
>> option for multiple filters, but also for multiple other parameters 
>> (window
>> lengths, overlaps, windsorizing, etc.). This would really help in the 
>> first
>> “exploratory phase” to find out what is the best way to process your
>> dataset.
>> What do you think of this idea? Practically I would implement it by 
>> moving
>> these parameters (window length etc.) to the filter parameters, and 
>> treat
>> it in the same way as an additional filter. As far as I understand the
>> code, this wouldn’t require many adaptions…
>>
>> Lukas
>>
>>
>>
>> 2016-05-01 11:35 GMT+02:00 Thomas Lecocq <Thomas.Lecocq at seismology.be>:
>>
>>> Hi Phil,
>>>
>>> I'd say (3) would be better indeed. You can script msnoise using the 
>>> api.
>>> If you need to change params in the config, you can alternatively 
>>> use the
>>> "msnoise config --set name=value" command.
>>>
>>> Please keep me updated of your progresses & tests !
>>>
>>> Thomas
>>>
>>>
>>>
>>> On 01/05/2016 10:34, Phil Cummins wrote:
>>>
>>>> Hi again,
>>>>
>>>> As some of you may recall, I'm just getting started with msnoise. I 
>>>> have
>>>> a large database and have managed to get my station and data 
>>>> availability
>>>> tables populated.
>>>> At this point, rather than running through the whole database, 
>>>> processing
>>>> it with parameters I hope might work, I'd rather process small 
>>>> subsets,
>>>> e.g. 1 day at a time, to experiment with window lengths, overlaps, 
>>>> etc., to
>>>> find what seems optimal. My question is, what's the best way to 
>>>> process
>>>> subsets of my database?
>>>> It seems to me I have several options:
>>>>      (1) Make separate databases for each subset I want to test, 
>>>> and run
>>>> through the workflow on each
>>>>      (2) Set start and end times appropriate for my subset, re-scan 
>>>> and
>>>> run through the workflow.
>>>>      (3) Populate the jobs table, and write a script to activate 
>>>> only the
>>>> jobs I want and not the others.
>>>> I want to a fair bit of testing using different parameters before I 
>>>> run
>>>> through the whole thing, so I think (3) may be best. But any advice 
>>>> would
>>>> be appreciated.
>>>> Regards,
>>>>
>>>> - Phil
>>>> _______________________________________________
>>>> MSNoise mailing list
>>>> MSNoise at mailman-as.oma.be
>>>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>>>>
>>> _______________________________________________
>>> MSNoise mailing list
>>> MSNoise at mailman-as.oma.be
>>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>>>
>> _______________________________________________
>> MSNoise mailing list
>> MSNoise at mailman-as.oma.be
>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>
> _______________________________________________
> MSNoise mailing list
> MSNoise at mailman-as.oma.be
> http://mailman-as.oma.be/mailman/listinfo/msnoise