Anomaly Detection Techniques

Statistical-based: fast for herarchal models, achieve a balanced performance between speed and generality in both flat and hierachical structures of wsns.

Nearest Neighbour-based: Most commonly used approaches to analyse a data instance with respect to its nearest neighbours in the data mining and machine learning community. Use several well-defined distance notions to compute the distance (similarity measure) between two data instance.
A data instance is declared as an outlier if it is located far from its neighbours. Euclidean distance is a popular choice for univariate and multivariate continuous attributes.

Classification-based: systematic based. Learn a classification model through training then classify unseen data into learned (normal or outlier).

Spectral Decomposition-based: find normal modes in behaviour using principal components.

Information Theoretic-based

Clustering-based: Popular within data-mining community to group similar data instances into clusters with similar behaviour. Euclidean distance often used as dissimilarity measure between two instances

Parametric-based: assume availability of knowledge about the underlying distribution and estimates the distribution parameters from the given data. Based on the type of distribution assumed, two further categories arise.

Non-parametric-based: don't assume availability - usually define a distance measure between a new test instance and the statistical model and use some kind of thresholds on the distance to determine if it's an anomaly.

Gaussian-based

Non-gaussian-based

Kernal-based

Histogram-based

Principal Component Analysis-based: used to reduce dimensionality before outlier detection and finds a new subset of dimension which capture the behavour of the data.

Support Vector Machine-based: separate data belonging to different classes by filtering a hyperplane (line) between them to maximise the separation.

Bayesian Network-based: use a probablistic graphical model to represent a set of variables and their probablistic independancies. infrmation is aggregated from different variables and provide an estimateon expectancy of an event to belong to a learned class.

Naive Bayesian: capture spatial-temporal correlation among sensor nodes

Bayesian Belieft: (based on a degree of probabilistic independencies among variables) consider correlations among attributes of the sensor data

Dynamic Bayesian: (based on a degree of proabblistic independancies among variables). consider the dynamic network topology that evolves over timeadding new state variables to represent the system state in the current time instance.

W. Wu, X. Cheng, M. Ding, K. Xing, F. Liu, and P. Deng, Localised Outlying and Boundary Data Detection in Sensor Networks: Use spatial correlation of readings in neighbouring sensor nodes to distinguish outlying sensors and event boundaries. Accuracy not usually high as these techniques (comparing the difference of a node to its neighbouring node's median then being too far above a set threshold) ignore temporal correlation of sensor readings.

Y. Hida, P. Huang, and R. Nishtala, Aggregation Query under uncertainty: Make max and avg more reliable under faulty sensor readings and failed nodes. Relies on spatial correlation of sensor data. One dimension data and too much memory required.

M.C. Jun, H. Jeong, and C.C.J. Kuo, Distributed Spatio-Temporal Outlier Detection in Sensor Networks: uses a technique called symmetric α-stable (SαS) distribution to model outliers in form of impulsive noise. Utilises spatio-temporal correlations of sensor data to locally detect outliers. Each node in a cluster detects and corrects temporal outliers by comparing predicted data and the sensing data. It reduces commincation cost and cmputational cost as cluster heads carry mostof the computation taks. May not be suitable for real data as the cluster based structure cant cope with typical dynamic changes.

T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D..Gunopulos,Distribute Deviation Detection in Sensor Networks: Online identification of outliers in streaming sensor data, requires no a priori data, uses kernel estimator to approximate underlying distribution of sensor data. Nodes locally find outliers than deviate signifigantly from the model of estimated distribution. Outliers = values in neighbourhood are less than a user-specified threshold. High dependancy on threshold, not suitable for mulitdimensional data. no maintenace considered

One dimension data considered only. B. Sheng, Q. Li, W. Mao, and W. Jin, Outlier Detection in Sensor
Networks: Identify global outliers from data applications of sensor networks. Sink uses histogram info to extract data distribution from the network and filters non-outliers using a threshold distance or rank among all outliers to detect outliers.

S. Subramaniam, T. Palpanas, D. Papadopoulos, V.Kalogerakiand, and D. Gunopulos, Online Outlier Detection.in Sensor Data using nonparametric
Models: Solve the problem of Palpanas (single threshold for multidimensional data) and Sheng (maintaining the built model) by proposing two global outlier detection techniques: 1.) allows each node to locally identify outlier thens transmit outliers to its parent to be checked until the sink identifies all global outliers. 2.) each node employs LOCI to locally detect global outliers by having a copy of global estimator model from the sink. Both have high accuracy in terms of data distribution and detection rate using low memory and message transmission. Still cant detect spatial outliers.

V. Chatzigiannakis, S. Papavassiliou, M. Grammatikou, and B.
Maglariset, Hierarchical Anomaly Detection in Distributed Large-Scale Sensor Networks, Proc. ISCC, 2006

todo - file:///C:/Users/Bronagh/Downloads/sensors-13-10087%20(1).pdf - see email

J. Branch, B. Szymanski, C. Giannella, and R. Wolff, In-Network Outlier Detection in Wireless Sensor Networks, Proc. IEEE ICDCS, 2006.
Distance similarity to identify global outliers. Locally identifies outliers then broadcasts them to neighbours for verification, with neighbours repeating the procedure until global oultkiers are agreed. Causes too much communication ovrhead and doesnt scale well to large networks.

K. Zhang, S. Shi, H. Gao, and J. Li, Unsupervised Outlier Detection in Sensor Networks using Aggregation Tree, Proc. ADMA, 2007: Distance-based technique to
identify n global outliers in snapshot and continuous query
processing applications of sensor networks. prevents broadcasting reducing communication overhead. Each node transmits to the parent then the sink figures out the n global outliers and floods the outliers to all nodes.

Y. Zhuang and L. Chen, In-Network Outlier Cleaning for Data Collection in Sensor Networks, Proc. VLDB, 2006: 1.) wavelet analysis for outliers (such as noise or ocassionally appead errors). 2.) Distance-time-warping specifically for errors that last a certain period of time. Con: dependancy on suitable predefined threshold that isnt obvious to define.

todo - Techniques for anomaly detection within sensor networks according to chandola, banjeree, kumar 2016 (anomaly detection;a survey):

Parsmetric statistical modelling - Phuong et al. [2006], Du et al. [2006]

nearest neighbour based tecniques - Subramaniam et al. [2006], Kejia Zhang and Li [2007], Id´e et al. [2007]

rule-based systems - Branch et al. [2006]

spectral - Chatzigiannakis et al. [2006]

Bayesian networks - Janakiram et al. [2006]

Note to self: Through a small subset of seminal exemplars. Francis et al. (Francis et al., 1999) showed equivalent classification accuracy when the data set of a novelty detection system based on neural network approaches was pre-processed with feature extraction compared to the classification accuracy with that of the novelty detector trained using the full range of attributes. The authors used linear regression to combine multiple attributes into single dependent attributes.

Victoria J. Hodge, A survey of outier detection methodologies, 2004

Pros and cons: Dont make any assumptions about data distribution and can generalise many notions from statistical based approaches. Suffer from choice of proper input parameters, lack of scalability, computationally expensive in multivariate data.

Pros and cons: Can be used in an incremental model. Suffer from choice of appropriate parameter of cluster width. Computationally expensive in multivariate data.

Supervised vs unsupervised vs semi-supervised

Rajasegarar et al. [42]: One-class quarter-sphere - any data outside the quarter is considered an outlier. each node communicates summary data to its parent for global outlier classification. Outliers are detected from measurements collected after a long time window (not realtime) and ignores spatial correlation of neighbouring nodes.

Elnahrawy and
Nath [24] - detection of faulty sensors

Janakiram et al. [23]: identifying local outliers in streaming sensor data.

Hill et al. [43]: local outliers in environmental data streams.

RODAC - Requirements of Anomaly Detection in WSNs

Reduction of data

Online Detection

Distributed Detection

Adaptive Detection

Correlation Exploitation

Note: Nearest neighbour and spectral decomposition based models considered here in taxonomy reference by "sensors 13"??

Motivation for anomaly detection

Data reliability

Event reporting

Malicious attacks

Faulty sensors\readings

Use cases for anomaly detection in smart homes: this approach may help identify what we want from a smart home(Tran et al 2010)

Detecting extended activity - example shower taking 30 minutes and not the usual 10-20. Implications: how much time is too much ? Other examples, too short shwer? Too long nap?

Recognising acceptable variation in shower start time: shower is usually taken at 8-20, but waits til 8.30 as its cold (winter). Other examples: taking medication after midnight, late for church. (these could possibly be helped by interactign with the human instead of notifying a carer\family member - humn-in-the-loop)

Reacting to abnormal behaviours immediately: E.g. getting confused - walking around in circles, laying on kitchen floor, leaving home etc.

Unsupervised: does not need any labelled data. Typically use scores as outputs.

Semi-supervised uses some labelled data. Typically use scores as outputs.

Supervised: uses labels as outputs due to classification algorithms. Needs labelled data

One-class supervised: typically learns the boundary and anything outside the boundary is an anomaly.

Types of anomalies

Point anomalies- detecting single anomalous instances in large datasets. Simplest and most popular approach to decide an anomaly through a predefined threshold. Popular in unsupervised anomaly detection.

Collective anomaly - anomalies represented as a set of many instances (which may not be an anomaly if not in the set).

Contextual anomalies - making a decision based on the contextual aspect - uses context to detect an anomaly. e.g. what may once be seen as normal may actually be an anomaly once context is added.

Algorithms according to Goldstein and Uchida, 2016 (comparative eval)

K-Nearest-neighbour global anomaly detection - knn global unsupervised anomaly detections algorithm - NOT TO BE CONFUSED WITH Knn CLASSIFICATION -

Clustering based

Statisical

Subspace techniques

Neural Networks

Local Outlier Factor LOF: most well known local anomaly detection algorithm: basically a ratio of local densities. Low density = larger score.

1.) knn must be found for each record x

2.) Using these knn, local density for a record is estimated using the local reachability density (LRD). (In highly dense clusters euclidean distance is used (very rare)).

3.) LOF is computed by comparing the LRDof a record with the LRD f it's k neighbours.

Note: if local outliers are not of interest, alot of false alarms are found. K setting is crucial for the algorithm.

Connectivity-based Outlier Factor: similar to LOF but density estimation is calculated differently. Uses shortest path approach (chaining distance) which is a mini-sum of all the distances connecting k neighbours and the instance.

Influenced outlierness (INFLO). LOF can fail scoring instances at the borders of clusters. INFLO uses knn and nearest neoghbourood set (which records are stored for crrent record and neighbour). INFLO score combines both neighbourhood sets.

TODO - Local Outlier Probability (LoOP), Local Correlation Integral (LOCI), Approximate Local Correlation Integral (aLOCI), Cluster-based Local Outlier Factor (CBLOF\uCBLOF), Local Density Cluster-based Outlier Factor (LDCOF), Clustering-based Multivariate Gaussian Outlier Score (CMGOS), Histogram-based Outlier Score (HBOS), One class support vector machine (OCSVM), Robust Component Principal Analysis (rPCA)

Main research in smart home anomaly detection are inactive periods of time, falls and identifying a disease.

Ways to detect anomalies\ behavioural changes: Mukhopadyay, 2016 survey

Discriminating: learning anomaly data from historic data ad searching for a similar pattern from new incoming data to consider anomalies.

Profiling (more popular): modelling normal behaviour and considering any new input data that deviates from the model as an anomaly.

Contextual aspects: temporal (time and duration), spatial (location), time's order, activity's order, health status.

Group clustering: nomral data profile is made - anyhting outside clusters are anomalies

Near Cluster Centroid: : similar to group cluster but with a score to determine distance between a normal roup and anomalous group

Small\sparse clusters: uses cluster size and threshold to determine anomlaies

?? Rajasegarar et al. [41] global outlier detection to identify anomalous measurements in sensor nodes. Nodes send info to cluster parents to sink. Anomalous clusters identified in sink if cluster's inter-cluster distance is larger than a threshold value for this. determining cluster width and threshold isnt easy.

DBN with new nodesrelated to time, which enables the detection of time-dependentbehavioral anomalies. Four types of anomalies, Type 1, 2, and 3 are point anomalies, while Type 4 is a contextual anomaly: 1—Spatial anomaly, 2—Timing anomaly, 3—Duration anomaly, 4—Sequence anomaly. Uses maximum-likelihood estimation and Laplace smoothing to learn histroical data then online anomaly detection (based on probabilities). Chun Zhu, Wearable Sensor-Based Behavioral Anomaly Detection in Smart Assisted Living Systems, 2015

For each hour of the day, the average proportion of time spent within a specific room is estimated. Large deviations from this pro- portion are considered as abnormal - G. Virone, N. Noury, J. Demongeot, “A system for automatic measurement of circadian activity deviations in telemedicine” IEEE Transactions 2002, and G. Virone, M. Alwan, S. Dalal, S. Kell, B. Turner B., J.A. Stankovic, R. Felder, “Behavioral Patterns of Older Adults in Assisted Living” IEEE Transactions

The average time spent in each room is estimated for each day (not each hour) and they also propose to monitor other variables such as the daily distance moved by a person - S. Ohta, H. Nakamoto, Y. Shinagawa, T. Kishimoto, “Home Telehealth: Connecting Care Within the Community”, Medical telematics, 2006

Probabilistic Model of Behaviour (PMB) based. A set of attributes or features is associated with each occurrence of this type of activity and a probabilistic model of this type activity is learned from historical data. Deviation from the model is then considered as an anomaly - this approach uses GMM with parameters chosen using Expectation Maximization algorithm. Features: Start time, duration, weekdy\weekend, Activity Level (Number of sensors triggered during the observation). Rule-based algorithms used to detect the acivities. e.g. sleeping - when sensor is on and bed is occupied for more than 30 mins the person is sleeping. This pproach builds a model for each type of activity - sleeping and watching tv. Modelling experiemnt: GMM parameters (means, variances and weights) are learned using the 26 weeks of training data. - Modelling of behavioural patterns for abnormality detection in the context of lifestyle reassurance
Fabien Cardinaux, 2008

Detecting abnormal events on binary sensors in smart home environments Juan Ye, - statistics-driven ethod that uses cblof clustering groups of activities based on a knowledge base. Uses casas interleaved dataset along with placelab dataset to evaluate its model "CLEAN".