Jonathan Thornburg (remove animal to reply)
2019-04-01 04:51:45 UTC
I have a data-analysis problem, and I wonder if someone can suggest
software to solve it.
Summary:
I have a number of points (typically 300 to 1000) in $R^2$. I'm looking
for code (in any reasonable programming language) to find the smallest-area
ellipsoid which encloses 95% of the points. The code doesn't need to be
particularly efficient.
More details:
I have a bunch of time series each consisting of relatively-slowly varying
"background" plus some signals; in certain regions of the data the signals
are well-approximated by damped sinusoids. For each time series, I've
done a nonlinear least-squares fit (using the MINPACK routine LMDIF1) of
a model (signal = spline "background" + damped simusoid) to the data. I'd
like to estimate a confidence region for the fitted damped-sinusoid
frequency and decay rate so that I can assess whether these are
consistent with a theoretical prediction.
The residuals from my fits are highly non-random -- they are dominated
by low-level quasi-periodic oscillations with frequencies similar to
those of the fitted damped-sinusoids. (By "low-level" I mean that the
RMS residual is typically on the order of $10^{-3}$ times the amplitude
of the fitted damped-sinusoid.)
Because of the systematics in the residuals, I don't think a standard
$\chi^2$-square error estimates would be valid. Instead, I have estimated
a confidence region via a Monte-Carlo procedure in which I do trial fits
on randomly-chosen subintervals of the data, and use the scatter in the
trial fits' damped-sinusoid parameters as an estimate of the uncertainty
in those parameters for the original fits.
[The "randomly-chosen" subintervals are subject to
a minimum length of several oscillation periods so
that (a) the oscillation period & decay rate are
well-defined, and (b) the oscillation is not degenerate
with the "background" spline.]
I typically do a few hundred Monte-Carlo trials for each time series.
The result is that for each time series, I now have a set of (say) 300
(period,decay_rate) points in $R^2$. A scatterplot of these shows a cloud
of points (in some cases with "interesting" structure, but I don't think
that's particularly relevant here); I believe that this cloud is a reasonable
estimate of the actual confidence region of the fitted parameters.
The problem I now face is how to summarize such a scatterplot with a few
numbers. (I actually have several hundred time series, each with its own
scatterplot.) In particular, for each time series / scatterplot I'd like
to find the minimum-area ellipsoid enclosing 95% of the points.
Mount et al have published an elegant and efficient algorithm for solving
such problems.
[
D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman,
and A. Y. Wu,
"Quantile Approximation for Robust Statistical Estimation
and k-Enclosing Problems",
Int'l. J. of Computational Geometry and Applications,
volume 10 (2000), pages 593-608.
http://www.cs.umd.edu/users/mount/Papers/QuantApx.pdf
]
Alas, I have not been able to find any implementation of their algorithm
(and Mount himself does not know of one), and I don't want to reimplement
it myself (it's quite complicated, and I have many other things to finish
for this project!).
I strongly suspect that other people using Monte-Carlo methods have
encountered this or a similar problem before, which suggests that there
may already be software available for solving it. So... Can anyone in
this newsgroup point me to software for finding the minimum-area ellipsoid
which contains at least 95% of a specified set of points in $R^2$?
Given that I have only a few hundred such sets-of-points, I could tolerate
the minimum-area-ellipsoid computation being fairly inefficient -- say
up to an hour of CPU time and a few gigabytes of memory.
Thanks,
software to solve it.
Summary:
I have a number of points (typically 300 to 1000) in $R^2$. I'm looking
for code (in any reasonable programming language) to find the smallest-area
ellipsoid which encloses 95% of the points. The code doesn't need to be
particularly efficient.
More details:
I have a bunch of time series each consisting of relatively-slowly varying
"background" plus some signals; in certain regions of the data the signals
are well-approximated by damped sinusoids. For each time series, I've
done a nonlinear least-squares fit (using the MINPACK routine LMDIF1) of
a model (signal = spline "background" + damped simusoid) to the data. I'd
like to estimate a confidence region for the fitted damped-sinusoid
frequency and decay rate so that I can assess whether these are
consistent with a theoretical prediction.
The residuals from my fits are highly non-random -- they are dominated
by low-level quasi-periodic oscillations with frequencies similar to
those of the fitted damped-sinusoids. (By "low-level" I mean that the
RMS residual is typically on the order of $10^{-3}$ times the amplitude
of the fitted damped-sinusoid.)
Because of the systematics in the residuals, I don't think a standard
$\chi^2$-square error estimates would be valid. Instead, I have estimated
a confidence region via a Monte-Carlo procedure in which I do trial fits
on randomly-chosen subintervals of the data, and use the scatter in the
trial fits' damped-sinusoid parameters as an estimate of the uncertainty
in those parameters for the original fits.
[The "randomly-chosen" subintervals are subject to
a minimum length of several oscillation periods so
that (a) the oscillation period & decay rate are
well-defined, and (b) the oscillation is not degenerate
with the "background" spline.]
I typically do a few hundred Monte-Carlo trials for each time series.
The result is that for each time series, I now have a set of (say) 300
(period,decay_rate) points in $R^2$. A scatterplot of these shows a cloud
of points (in some cases with "interesting" structure, but I don't think
that's particularly relevant here); I believe that this cloud is a reasonable
estimate of the actual confidence region of the fitted parameters.
The problem I now face is how to summarize such a scatterplot with a few
numbers. (I actually have several hundred time series, each with its own
scatterplot.) In particular, for each time series / scatterplot I'd like
to find the minimum-area ellipsoid enclosing 95% of the points.
Mount et al have published an elegant and efficient algorithm for solving
such problems.
[
D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman,
and A. Y. Wu,
"Quantile Approximation for Robust Statistical Estimation
and k-Enclosing Problems",
Int'l. J. of Computational Geometry and Applications,
volume 10 (2000), pages 593-608.
http://www.cs.umd.edu/users/mount/Papers/QuantApx.pdf
]
Alas, I have not been able to find any implementation of their algorithm
(and Mount himself does not know of one), and I don't want to reimplement
it myself (it's quite complicated, and I have many other things to finish
for this project!).
I strongly suspect that other people using Monte-Carlo methods have
encountered this or a similar problem before, which suggests that there
may already be software available for solving it. So... Can anyone in
this newsgroup point me to software for finding the minimum-area ellipsoid
which contains at least 95% of a specified set of points in $R^2$?
Given that I have only a few hundred such sets-of-points, I could tolerate
the minimum-area-ellipsoid computation being fairly inefficient -- say
up to an hour of CPU time and a few gigabytes of memory.
Thanks,
--
-- "Jonathan Thornburg [remove -animal to reply]" <***@astro.indiana-zebra.edu>
Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA
"He wakes me up every morning meowing to death because he wants to
go out, and then when I open the door he stays put, undecided, and
then glares at me when I put him out"
-- Nathalie Loiseau (French minister for European Affairs,
explaining why she named her cat "Brexit")
-- "Jonathan Thornburg [remove -animal to reply]" <***@astro.indiana-zebra.edu>
Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA
"He wakes me up every morning meowing to death because he wants to
go out, and then when I open the door he stays put, undecided, and
then glares at me when I put him out"
-- Nathalie Loiseau (French minister for European Affairs,
explaining why she named her cat "Brexit")