[MSNoise] advice on processing database subsets
Phil Cummins
phil.cummins at anu.edu.au
Mon May 2 07:41:20 UTC 2016
Hi again,
Thanks for the comments. Here's what I did to set just a singe day for
processing, so that I can test the parameter settings. I looked into the
API code and needed to import from msnoise_table_def.py, but it seems to
work OK:
from msnoise.api import connect
from msnoise_table_def import Job
set_day = '2013-10-14'
jobtype = 'CC'
session = connect()
jobs_set = session.query(Job).filter(Job.jobtype ==
jobtype).filter(Job.day == set_day)
jobs_set.update({Job.flag: 'T'})
jobs_unset = session.query(Job).filter(Job.jobtype ==
jobtype).filter(Job.day != set_day)
jobs_unset.update({Job.flag: 'D'})
session.commit()
So now I have a jobs table with just the day I want set to 'T'. I hoped
I was ready to try 'msnoise compute_cc', but it seems to want me to set
Filters first. This appears to be referring to the MCWS filter
parameters? I am a little surprised since I thought MCWS would come
later, and don't understand how the CC computation would be dependent on
the MCWS filter parameters.
To tell you the truth, at the moment I am more interested in using the
msnoise cross-correlations as input to a tomography algorithm, rather
than in MCWS itself. In any case I am keen to look at the CC to see that
they make sense, before i move to anything else.
Would it be possible to please advise on whether there is a way to do
compute_cc without having to worry about the MCWS parameters?
Thanks,
- Phil
Thomas Lecocq wrote:
> Hi guys,
>
> Yeah, I have been thinking about a "benchmark" mode for quite a number
> of weeks, i.e. since I tested a first run of PWS in order to compare
> the final dv/v ; to compare properly I have to test quite a number of
> parameters.
>
> My current idea is to run a set of possible parameters, for different
> steps. This would lead to a large number of branches in a large tree,
> but it would definitively be quite interesting.
>
> I am really not in favor of duplicating the database, rather to
> create a "config" file with an caller script, to set/change/
> parameters... Theoretically, the API should let you do all the
> actions. The only thing that would be a little trickier is to
> store/reuse the results of each step in order to compare them. For
> info, using the "shutil" module you can move/copy files easily.
>
> Let's keep brainstorming on that and see how it goes !
>
> Cheers
>
> Thomas
>
> On 01/05/2016 16:52, Lukas Preiswerk wrote:
>> Hi all
>>
>> I was in a similar situation as Phil, and I used (1). It’s not
>> straightforward to copy the database and make msnoise work again in a
>> new
>> directory. But it’s definitely possible.
>> I actually think it would be a nice addition to msnoise to not only
>> make an
>> option for multiple filters, but also for multiple other parameters
>> (window
>> lengths, overlaps, windsorizing, etc.). This would really help in the
>> first
>> “exploratory phase” to find out what is the best way to process your
>> dataset.
>> What do you think of this idea? Practically I would implement it by
>> moving
>> these parameters (window length etc.) to the filter parameters, and
>> treat
>> it in the same way as an additional filter. As far as I understand the
>> code, this wouldn’t require many adaptions…
>>
>> Lukas
>>
>>
>>
>> 2016-05-01 11:35 GMT+02:00 Thomas Lecocq <Thomas.Lecocq at seismology.be>:
>>
>>> Hi Phil,
>>>
>>> I'd say (3) would be better indeed. You can script msnoise using the
>>> api.
>>> If you need to change params in the config, you can alternatively
>>> use the
>>> "msnoise config --set name=value" command.
>>>
>>> Please keep me updated of your progresses & tests !
>>>
>>> Thomas
>>>
>>>
>>>
>>> On 01/05/2016 10:34, Phil Cummins wrote:
>>>
>>>> Hi again,
>>>>
>>>> As some of you may recall, I'm just getting started with msnoise. I
>>>> have
>>>> a large database and have managed to get my station and data
>>>> availability
>>>> tables populated.
>>>> At this point, rather than running through the whole database,
>>>> processing
>>>> it with parameters I hope might work, I'd rather process small
>>>> subsets,
>>>> e.g. 1 day at a time, to experiment with window lengths, overlaps,
>>>> etc., to
>>>> find what seems optimal. My question is, what's the best way to
>>>> process
>>>> subsets of my database?
>>>> It seems to me I have several options:
>>>> (1) Make separate databases for each subset I want to test,
>>>> and run
>>>> through the workflow on each
>>>> (2) Set start and end times appropriate for my subset, re-scan
>>>> and
>>>> run through the workflow.
>>>> (3) Populate the jobs table, and write a script to activate
>>>> only the
>>>> jobs I want and not the others.
>>>> I want to a fair bit of testing using different parameters before I
>>>> run
>>>> through the whole thing, so I think (3) may be best. But any advice
>>>> would
>>>> be appreciated.
>>>> Regards,
>>>>
>>>> - Phil
>>>> _______________________________________________
>>>> MSNoise mailing list
>>>> MSNoise at mailman-as.oma.be
>>>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>>>>
>>> _______________________________________________
>>> MSNoise mailing list
>>> MSNoise at mailman-as.oma.be
>>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>>>
>> _______________________________________________
>> MSNoise mailing list
>> MSNoise at mailman-as.oma.be
>> http://mailman-as.oma.be/mailman/listinfo/msnoise
>
> _______________________________________________
> MSNoise mailing list
> MSNoise at mailman-as.oma.be
> http://mailman-as.oma.be/mailman/listinfo/msnoise
More information about the MSNoise
mailing list