Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

No pretraining exp #142

Closed
wants to merge 6 commits into from
Closed

No pretraining exp #142

wants to merge 6 commits into from

Conversation

jdiaz4302
Copy link
Collaborator

@jdiaz4302 jdiaz4302 commented Nov 3, 2021

⚠️ Definitely not merge worthy as-is; documenting experiment ⚠️

Regarding #38

What happens here:

  • Compare 5 model runs with process-based (PB) pretraining and 5 runs of using PB outputs as inputs
  • Training hyperparameters:
    • Training partition is non-summer months between (1984, 2010) and (2015, 2020); this was an artifact of the existing config.yml
    • Likewise, validation partition is all months between (1979, 1984), (2010, 2015), and (2020, 2021); the last interval appears to only have 2 days of data.
  • 100 fine tuning epochs and either 0 or 200 pretraining epochs (manually changed in config.yml between runs if needed)
    • If 0 pretraining epochs, concatenate the PB outputs to the x variable arrays prior to training and rewrite that file (prepped.npz -> prepped2.npz)
    • After all of this, it occurred to me that the PB pretraining runs get 300 total training updates, while the PB inputs get 100, so I also generated one run with 300 fine tuning epochs and 0 pretraining epochs; I would've done more but Tallgrass had long queues.

Once a run is completed, I copy the output directory to a separate location and rerun (possibly after changing the pretraining epochs); e.g., cp -r output_DRB_offsetTest/ no_pretrain_1_300/.

After all the runs were done, I made some plots of the learning/training curves, validation set RMSE by month (adjusted to bin "months" by the 21st date of each month which better aligns with defining summer by equinox dates), time series, and I'm starting to look at the outputs vs input plots (will upload soon). I will add the notebooks to generate those plots shortly (need to clean them up).

Right now this looks very favorable for using PB pretraining (specifically validation set RMSE by month), but maybe too favorable? It would be nice to get some more eyes to spot any mistakes or oversights. One thing that is definitely strange is the all-over-the-place behavior of PB input models during validation set summers (see last two plots).

Plots:

TrainingCurves_PB_inputs_vs_pretraining

RMSE_by_month_PB_inputs_vs_pretraining (1)

PB_experiment_TimeSeries1566_300IDd

PB_experiment_TimeSeries1573_300IDd

@jdiaz4302
Copy link
Collaborator Author

Also, the config.yml that I used was apparently inbetween comments. I did not make all the changes that the differences would suggest; I only changed the training partition start/end dates and the number of epochs.

@aappling-usgs
Copy link
Member

aappling-usgs commented Nov 3, 2021

@jsadler2 any chance you have time to review, or willingness to punt quickly to @SimonTopp if not? Jeremy pointed out the big concern - it's weird that the model predictions are all right on top of one another from 2009-09 to 2010-06 and then suddenly all over the map in the summer (and even a bit in the winter) once the validation period starts. Could be a bunch of things, including but not limited to:

  • this is actually how it works, and pretraining just knocks fine tuning out of the park, and the validation data is enough out-of-sample that variability in both the pretraining and the PB-as-input predictions is wider during val
  • something funny about how training and val partitions are getting defined and munged? see Try model without pretraining #38 (comment)
  • something else we haven't thought of yet

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 3, 2021

I can look at this this afternoon

@janetrbarclay
Copy link
Collaborator

How has it been done previously to train on the winter data and test on the summer? I wonder if shortening the training data like this is causing weird jumps in the data (such that the model thinks that Sept 23 comes immediately after June 19). I have a 2 PM (eastern) meeting but can look a little more afterwards.

@SimonTopp
Copy link
Contributor

I think Janet might be onto something. Here we're cutting up all our observations by start and end date.

def sel_partition_data(dataset, time_idx_name, start_dates, end_dates):
"""
select the data from a date range or a set of date ranges
:param dataset: [xr dataset] input or output data with date dimension
:param time_idx_name: [str] name of column that is used for temporal index
(usually 'time')
:param start_dates: [str or list] fmt: "YYYY-MM-DD"; date(s) to start period
(can have multiple discontinuos periods)
:param end_dates: [str or list] fmt: "YYYY-MM-DD"; date(s) to end period
(can have multiple discontinuos periods)
:return: dataset of just those dates
"""
# if it just one date range
if isinstance(start_dates, str):
if isinstance(end_dates, str):
return dataset.sel({time_idx_name: slice(start_dates, end_dates)})
else:
raise ValueError("start_dates is str but not end_date")
# if it's a list of date ranges
elif isinstance(start_dates, list) or isinstance(start_dates, tuple):
if len(start_dates) == len(end_dates):
data_list = []
for i in range(len(start_dates)):
date_slice = slice(start_dates[i], end_dates[i])
data_list.append(dataset.sel({time_idx_name: date_slice}))
return xr.concat(data_list, dim=time_idx_name)
else:
raise ValueError("start_dates and end_dates must have same length")
else:
raise ValueError("start_dates must be either str, list, or tuple")

Then here we're taking the resulting sequences and slicing them by 365 which is assuming that continuous years are being passed to it. I think I walked through all this when I made issue #127.

def split_into_batches(data_array, seq_len=365, offset=1.0):
"""
split training data into batches with size of batch_size
:param data_array: [numpy array] array of training data with dims [nseg,
ndates, nfeat]
:param seq_len: [int] length of sequences (e.g., 365)
:param offset: [float] 0-1, how to offset the batches (e.g., 0.5 means that
the first batch will be 0-365 and the second will be 182-547)
:return: [numpy array] batched data with dims [nbatches, nseg, seq_len
(batch_size), nfeat]
"""
combined = []
for i in range(int(1 / offset)):
start = int(i * offset * seq_len)
idx = np.arange(start=start, stop=data_array.shape[1] + 1, step=seq_len)
split = np.split(data_array, indices_or_sections=idx, axis=1)
# add all but the first and last batch since they will be smaller
combined.extend([s for s in split if s.shape[1] == seq_len])
combined = np.asarray(combined)
return combined

I think what we want to be doing here is masking out the summer months rather than excluding them in the start/end dates. Maybe using the exclude file (might need some work after the big update a couple months ago) or by using reduce_training_data_continuous on line 348.

Comment on lines +214 to +223
np.savez_compressed(updated_io_data, x_trn = io_data['x_trn'], x_val = io_data['x_val'], x_tst = io_data['x_tst'],
x_std = io_data['x_std'], x_mean = io_data['x_mean'], x_vars = io_data['x_vars'],
ids_trn = io_data['ids_trn'], times_trn = io_data['times_trn'],
ids_val = io_data['ids_val'], times_val = io_data['times_val'],
ids_tst = io_data['ids_tst'], times_tst = io_data['times_tst'], dist_matrix = io_data['dist_matrix'],
y_obs_trn = io_data['y_obs_trn'], y_obs_wgts = io_data['y_obs_wgts'],
y_obs_val = io_data['y_obs_val'], y_obs_tst = io_data['y_obs_tst'],
y_std = io_data['y_std'], y_mean = io_data['y_mean'], y_obs_vars = io_data['y_obs_vars'],
y_pre_trn = io_data['y_pre_trn'], y_pre_wgts = io_data['y_pre_wgts'],
y_pre_val = io_data['y_pre_val'], y_pre_tst = io_data['y_pre_tst'], y_pre_vars = io_data['y_pre_vars'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did these all get updated somewhere that I'm not seeing, or are you basically just copying io_data here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just copying the io_data because that's what satisfied snakemake - correct my next interpretation if it's wrong, I'm still very new to snakemake.

The output file, prepped2.npz, wasn't actually updating when I was using PB outputs as inputs until I specified it as an output in the Snakefile and that required it to always be made. So, I rewrote prepped.npz to prepped2.npz if pretraining did occur - when I didn't really need to - to make the pipeline work under how I set everything else up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is prepped2.npz and how does it differ from prepped.npz I think that would help clarify how to best do this in Snakemake. It seems pretty unusual to me to have to make a straight copy of a file just to make the pipeline work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, gotcha, ya Snakemake can be tricky when your experiments have different outputs. I think you could use touch here in the snakemake, which basically creates a temporary phantom file to fool snakemake into thinking the output exists. You'd probably have to play around with it a bit though.

@jdiaz4302
Copy link
Collaborator Author

jdiaz4302 commented Nov 3, 2021

Good catch!

Interestingly, this training set processing applies to both groups in the experiment, so I wonder what that implies.

That is, both groups are given a discontinuous sequence of inputs values, but only the PB inputs are seemingly affected while PB pretraining seem to handle it. It could be that PB pretraining de-emphasizes long-term information that is more likely to be from a discontinuous interval and/or further evidence that PB inputs is overfitting to the training data rather than learning valuable relationships (i.e., pretraining may facilitate being "right for the right reasons").

@SimonTopp
Copy link
Contributor

I was just thinking about that @jdiaz4302 . I would expect the discontinuous sequences to decrease accuracy across the board, but we still see pretty decent results from the pre-training which is surprising. Am I right that the pre-training here has the some breaks as the training dataset? If so, it's bonkers that it can still learn annual signals.

@jdiaz4302
Copy link
Collaborator Author

Am I right that the pre-training here has the some breaks as the training dataset?

Yep!

I'm assuming with a discontinuous 365-day sequence, you could often still reliably use (e.g.) the last 2 weeks of data and learn certain variable relationships with less focus on long-term temporal dynamics.

@SimonTopp
Copy link
Contributor

you could often still reliably use (e.g.) the last 2 weeks of data and learn certain variable relationships with less focus on long-term temporal dynamics

I've found similar things with the GraphWaveNet model I've been developing, but that would imply that there's relatively little worthwhile information beyond ~1-2 months in a sequence. Might be interesting to run some tests with different sequence lengths and see at what point (how short) the model sees a drop in performance from loss of temporal info. Also, should have said this off the bat, very cool work and great visualizations man!

Copy link
Collaborator

@jsadler2 jsadler2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really interesting stuff, @jdiaz4302!

I'm having a hard time understanding exactly how you set up the two treatments. I understand what you wrote in the description, but I don't see it in the code anywhere. For example,

  • I don't see PB-input and PB-pretraining treatments in the Snakefile.
  • I also don't see where you are combining the PB outputs. Is that just in the x_vars in the config file?

Discontinuous sequences

I think that the non-continuous thing is really interesting. So does the model have chunks of time where it's doing well and then, when it gets to a break in the time series it does poorly for a few days and then recovers? That's how I'm picturing it working, but it'd be nice to

  1. get a confirmation that the 365-day training sequences are indeed discontinuous in time,
  2. see what impact that is having on predictions.

As an aside, I think the prep_data function has sequence length as an argument that gets propagated to the other relevant functions so if we want, we could shorten the sequence length to reduce the discontinuities.

@aappling-usgs
Copy link
Member

Am I right that the pre-training here has the some breaks as the training dataset?

Let's fix this if we can! We've hypothesized that a lot of the pretraining benefit is in getting to see predictions for conditions under which the model doesn't get see any observations (in this case, for summertimes). So maybe we can get the pretraining results even better, justifiably, by adding those back in.

both groups are given a discontinuous sequence of inputs values, but only the PB inputs are seemingly affected while PB pretraining seem to handle it. It could be that PB pretraining de-emphasizes long-term information that is more likely to be from a discontinuous interval

Any ideas on what the mechanism would be for this? I would think the PB inputs approach would have a better shot at learning this since it could learn to rely on the PB input more heavily (which does integrate memory across that missing period) whereas the pretraining approach has no such pseudo-memory to rely on.

I've seen a handful of (informal) HPO exercises looking at sequence length for such problems, and people generally settle on ~176 or 365 days. But I bet it varies by region, and I wonder if memory just isn't that important over the summer in these reaches b/c snow is long gone by June and drought is rarely severe. I wouldn't mind seeing this experiment done again for the DRB but also don't see it as a very high priority.

and/or further evidence that PB inputs is overfitting to the training data rather than learning valuable relationships (i.e., pretraining may facilitate being "right for the right reasons").

This explanation seems more plausible to me.

@jdiaz4302
Copy link
Collaborator Author

@jsadler2

PB-input or PB-pretraining is triggered by the number of pt_epochs in the config.yml file. I manually edit this when I want to get runs for a different group (0 = PB-input; not 0 = pretraining). Here I concatenate the pretraining Ys (i.e., PB outputs) to the x_{partition}_obs if there is no pretraining. The data is wrote to a different .npz file that is used at later points (prepped2.npz).

@SimonTopp
Copy link
Contributor

Let's fix this if we can! We've hypothesized that a lot of the pretraining benefit is in getting to see predictions for conditions under which the model doesn't get see any observations (in this case, for summertimes). So maybe we can get the pretraining results even better, justifiably, by adding those back in.

This is updated in my most recent PR (granted in kind of a bulky way), but you could pull it from there if you wanted. It basically just creates an x_pre_train and y_pre_train that include everything (all partitions and process outputs).

@janetrbarclay
Copy link
Collaborator

Since Jeff and Simon are deep into this review already, I'll follow the conversation and comment if I think of something, but mostly let them dig into the code.

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 3, 2021

Mmk. I'm pretty sure I know what is going on here as far as why the predictions for PB-input is so wonky in the validation phase and not in the training. The Training Y values are normalized and the Validation ones are not.

if normalize_y:

@aappling-usgs
Copy link
Member

aappling-usgs commented Nov 3, 2021

Yikes, good catch! Is that only the case in this PR, or has it been that way in recent code as well?

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 3, 2021

It's always been like that. There's been no need to normalize Y_tst and Y_val before. Do you think that we should normalize all Y? I think often no Y partition is normalized. It's just when doing the multi-variable predictions it is needed.

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 3, 2021

@jdiaz4302 - I think if you scaled and centered y_val_pre and y_tst_pre in these lines, we'd see much more reasonable results.

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 3, 2021

BTW - the only reason I thought of this as quickly as I did is because I did basically the same thing for an experiment for my multi-task paper 😄

I kept thinking "how are the training predictions okay and the val/test predictions terrible?!" and then I realized that the model was being trained on variables that were on a total different scale than what what I was giving them in the val/test conditions.

@jdiaz4302
Copy link
Collaborator Author

😮 Haha, thanks @jsadler2. I agree there's generally no reason to scale the validation and testing set observations since you're usually going to just use those for evaluation at the scale-of-interest.

PB_outputs_scale

@aappling-usgs
Copy link
Member

That's a great find, Jeff. Experience and team communication paying off big time here.

Combining the scaling fix (💯 Jeff!), discontinuity fix (💯 Janet and Simon!), and pretraining fix (💯 Simon), I feel a lot more optimistic that a next run of these models could give us a correct result. Sweet!

@SimonTopp
Copy link
Contributor

It's always been like that. There's been no need to normalize Y_tst and Y_val before. Do you think that we should normalize all Y? I think often no Y partition is normalized. It's just when doing the multi-variable predictions it is needed.

If you do pull from the preproc_utils changes in the other PR, be aware that I changed it to scale y_pre_trn, y_trn, and y_val because I updated the training routine to validate at each epoch

@aappling-usgs
Copy link
Member

Sounds like we need somebody to review & merge Simon's PR soon!

@janetrbarclay
Copy link
Collaborator

I can take a look at Simon's PR tomorrow.

@jdiaz4302
Copy link
Collaborator Author

jdiaz4302 commented Nov 8, 2021

Results with scaling fix:

RMSE_by_month_PB_inputs_vs_pretraining_FIXEDSCALE
PB_experiment_TimeSeries1566_300IDd_SCALEDFIXED

PB_experiment_TimeSeries1573_300IDd_SCALEFIXED

Regarding the discontinuity fix, I tried using a modified version reduce_training_data_continuous (to handle multiple intervals) in place of separate_trn_tst, but then there's nan values in the model's input data (i.e., x_{partition}) which mess up training. Can easily set those to a fixed value, but that's not good. May replace nan with some sort of average for that day of the year (maybe by river segment). To clarify, none of those discontinuity fixing ideas are implemented with these results.

@jzwart
Copy link
Member

jzwart commented Nov 8, 2021

much different than before. To clarify, the heatmap is from all segments modeled, correct? not just the two segments with time series plots

@jdiaz4302
Copy link
Collaborator Author

Yes, the heatmap is from all segments. Definitely different, but that should be expected given the previous results were generated with validation set variables that had the wrong scale

@SimonTopp
Copy link
Contributor

Super interesting Jeremy. So basically, even though the PB input runs saw nothing that resembles summer, they're able to generalize to summer conditions better than models that were at least pre-trained with summer months included? Also, did we confirm our discontinuous training sequences?

@jdiaz4302
Copy link
Collaborator Author

There's no pretraining of summer months included here yet - didn't want to duplicate efforts.

I posted a graph at #127 showing that we do have discontinuous batches

@jordansread
Copy link
Member

Interesting. Note in our lake modeling paper, we also looked at the impact of skipping exposure to summer conditions in pre-training:
image

@jdiaz4302
Copy link
Collaborator Author

jdiaz4302 commented Nov 12, 2021

I messed up some of the version control associated with this, so to clarify, since the last update, I didn’t make any changes to:

  • Docker_README.md
  • Snakefile
  • Snakefile_gw
  • config.yml
  • config_gw.yml
  • river_dl/loss_functions.py

What I did make meaningful changes to were:

  • river_dl/preproc_utils.py
    • To get continuous batches (with nan), I took Simon’s suggestion of using reduce_training_data_continuous. I used this function in place of separate_trn_tst. This means that all the partitions are the same size (full data range) but with varying numbers of nan (associated with other partitions).
      • This definitely isn’t optimal; fairly certain it (expectedly) made training take longer.
    • I only applied this to the Y observations so that the other vectors (y_pre_{partition} and x_{partition}) would still have valid/real value . This means there are x_trn observations from summer, but no associated y observations from summer to learn from - if I sat those x to nan there's a separate problem that the nan propagate to predictions and loss, then do we set to 0, interpolate, etc... Likewise, y_pre_trn keeps all its values, meaning that when I did pretraining, this included pretraining of summer data as well as validation and test set dates.
    • I made some changes to filter_reduce_dates that allowed it to use a list of discontinuous dates and perform as I expected; I expected it to ultimately keep the dates it identified, but it was setting those to nan – hence the np.logical_not
  • river_dl/train.py
    • Mostly the same as before, just scaling y_pre_val/tst

Figures

Figure showing the continuous batch of y_obs_trn with nan and y_pre_trn with data:

continuous_batch

Figure showing latest performance heatmap. I found it strange that performance took a strong hit from using the continuous batches with nan opposed to discontinuous batches with real values, but yeah... it could be misleading to provide the summertime X values and not provide a learning target for them; literally tells the optimization task, "You can do whatever here/in the summer as long as you get your act together by fall":

RMSE_by_month_PB_inputs_vs_pretraining (2)

Time series for reservoir impacted and not impacted stream:

PB_experiment_TimeSeries1566_NoDiscontAllPretrain

PB_experiment_TimeSeries1573_NoDiscontAllPretrain

I'll include these plots of input versus output as well (since I made them), but I didn't find them incredibly insightful (colors are the same; kinda interesting that it seems to taper the effect of higher PB values):

PB_inputs_response_plots (2)

I'm likely going to be helping more on the reservoir task starting next week, and like I said, this was not designed to merge with the existing codebase - more an exploratory tangent. Feel free to close out or maybe I will sometime next week when engagement is practically dead.

Also, thanks @jsadler2 for the better approach! I just didn't have time to learn it and get the results, but I will definitely be reviewing it before trying to take on a deeper snakemake-affliated task

@jzwart
Copy link
Member

jzwart commented Nov 15, 2021

Interesting. I find it a bit surprising that both methods are overpredicting temperature by quite a bit during the summer periods even though they didn't see any forcing data in that range. Do you know if it's overpredicting at all segments?

@jdiaz4302
Copy link
Collaborator Author

@jzwart here's a plot of all observations (x-axis) versus predictions (y-axis); these look approximately the same across models and runs. It does seem like that's the general trend. Made some low-effort quadrants via dashed lines to try to discern summer (upper left quadrant - above 25 Celsius). Solid line is 1:1

preds_vs_obs_PB_experiment

Seeing data adjacent to summer (when temperatures are changing faster) may suggest that summer will peak higher than it does (i.e., a sharper rather than rounder parabola)?

@SimonTopp
Copy link
Contributor

I used this function in place of separate_trn_tst.

This seems like a relevant conversation to be had and maybe a good task to assign to someone for a new PR. We should probably make sure our pipeline is creating continuous sequences and has the flexibility mask out certain observations within those sequences for experiments like this. I know @jsadler2 mentioned he had some ideas for an upcoming PR, maybe we should put this on the to-do list?

Also, at least in these reaches it looks like our high temp bias is in the training predictions as well. I feel like that might be an indication that it could be something wrong with our data prep rather than an issue generalizing to the unseen summer temps. What do you think @jdiaz4302?

image

@jdiaz4302
Copy link
Collaborator Author

The red box annotations are a good point. It's possible, but I don't necessarily suspect that something is wrong with the data prep.

In my experience, it's not uncommon for there to be under/overestimating at the low ends and over/underestimating at the high ends (note opposite order with respect to "/") because then performance at the central/median/mean values are still optimized. These plots do seem overly skewed to not performing at the high ends, but a density view of the plot seems to show that the low end is far more weighted (same plot as above, but plt.hexbin)

preds_vs_obs_PB_experiment_DensityEst

It's possible that additional variables/missing context could help reel in those low and high ends to the 1:1 line though. RMSE definitely optimizes with respect to the central values, but I've never had luck fixing this problem by using a different generic loss function.

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 15, 2021

Something that I find really interesting, and @jdiaz4302 brought this up when he first posted this, is the shapes of the inputs vs the outputs for the temp-related inputs... especially the seg_tave_gw. They all have this unusual pattern where it's a little like the "quiet coyote" shape :) - It goes up kind of linearly at the bottom of the range of inputs, but then at the top it kind of splits where some of the points keep going up and some level off and sometimes go down.

I'm scratching my head. Why would the model learn that? Shouldn't increasing air temps (for example) always lead to higher water temps? There is the factor of the reservoirs, but, if I understand these sites correctly, only some of them are influenced by the reservoir. And why would sometimes they go up and sometimes they go level and sometimes they go down?

The gw one is especially interesting because there is also this vertical line when the input is zero. And to me that just seems really weird and like there is some kind of mistake in the model. But again, I'm scratching my head .... no ideas so far as what it might be.
image

@jdiaz4302
Copy link
Collaborator Author

jdiaz4302 commented Nov 15, 2021

This seems like a relevant conversation to be had and maybe a good task to assign to someone for a new PR. We should probably make sure our pipeline is creating continuous sequences and has the flexibility mask out certain observations within those sequences for experiments like this. I know @jsadler2 mentioned he had some ideas for an upcoming PR, maybe we should put this on the to-do list?

Yeah, I think the implicit assumption for a standard LSTM/RNN architecture is that values are evenly spaced/sampled in time. There are variants (e.g., Time-LSTM, Time Aware LSTM) that explicitly require the time between values as an input and easily allow a discontinuous segment, but those are probably better suited for truly uneven time series rather than an actually even time series with big chunks missing - also, its effort into new models, so a masking approach seems like the most applicable in these cases.

@jsadler2 I think those plots are really cool for the same reasons 😄 . It could only be possible because of some interacting effects, and with the vertical lines, I'm assuming some interacting effect that's coupling air temperature with a binary variable.

Less confidence in that last speculation because the more I think about that vertical bar the more my head kinda hurts too - "At (seemingly) exactly average air temperature, let's occasionally predict uncharacteristically low values"

@jdiaz4302
Copy link
Collaborator Author

While finding a storage place for this work and testing the storage place, I found that the output versus input plots were specific to segment 1566 (reservoir-impacted); this is the same segment as the reservoir impacted time series throughout this PR (not labelled, but obvious by the spiky summer behavior in those time series plots).

Here are the corresponding output versus input plots for 1573 (the not-reservoir-impacted time series segment; I used 1566 and 1573 because they had tons of data). I think it's really interesting that these plots are a lot more straightforward - less of those "quiet coyote" shapes, as Jeff pointed out. Also, the relationship between prediction and PB output (last row) is a lot more monotonic but still noisy/spread (I believe we expect the PB model to be more reliable away from reservoirs), could be motivation for further refining the PB model for reservoirs.

PB_inputs_response_plots_1573

Here is the same plot for all segments. Overplotting doesn't really resolve even with decimal-point (e.g., 0.01) alpha and marker size. Generally the overall out vs in plots seem to more closely resemble the corresponding 1573 plot, probably because most segments aren't so directly impacted by reservoirs as 1566. There's definitely a lot more spread added when considering the whole data set though.

PB_inputs_response_plots_ALL

My plan is to close (and not merge) this PR by the end of the work day just to clean up and it will still be present in the "closed" tab for reference.

I've stored all the output directories generated by this experiment in the newly created pump project space that Alison announced under river-dl-PB_experiment. The first round of results are simply {no_}pretrain_{n}; the second round (only affecting PB inputs) is stored as no_pretrain_{n}_PretrainScaled; and this final round is stored as {no_}pretrain_{n}_NoDiscontAllPretrain

@jdiaz4302 jdiaz4302 closed this Nov 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants