Работа с базами данных
<<  Профессии, в которых необходимы знания по работе с базами данных What Can Your Data Tell You  >>
Iris Sample Data Set
Iris Sample Data Set
Measures of Location: Mean and Median
Measures of Location: Mean and Median
Measures of Spread: Range and Variance
Measures of Spread: Range and Variance
Example: Sea Surface Temperature
Example: Sea Surface Temperature
Visualization Techniques: Histograms
Visualization Techniques: Histograms
Visualization Techniques: Histograms
Visualization Techniques: Histograms
Two-Dimensional Histograms
Two-Dimensional Histograms
Example of Box Plots
Example of Box Plots
Scatter Plot Array of Iris Attributes
Scatter Plot Array of Iris Attributes
Contour Plot Example: SST Dec, 1998
Contour Plot Example: SST Dec, 1998
Visualization of the Iris Data Matrix
Visualization of the Iris Data Matrix
Visualization of the Iris Correlation Matrix
Visualization of the Iris Correlation Matrix
Parallel Coordinates Plots for Iris Data
Parallel Coordinates Plots for Iris Data
Parallel Coordinates Plots for Iris Data
Parallel Coordinates Plots for Iris Data
Star Plots for Iris Data
Star Plots for Iris Data
Chernoff Faces for Iris Data
Chernoff Faces for Iris Data
Example: Iris data
Example: Iris data
Example: Iris data (continued)
Example: Iris data (continued)
Example: Iris data (continued)
Example: Iris data (continued)
Data Cube Example
Data Cube Example
Data Cube Example (continued)
Data Cube Example (continued)
Картинки из презентации «Data Mining: Exploring Data» к уроку информатики на тему «Работа с базами данных»

Автор: Computations. Чтобы познакомиться с картинкой полного размера, нажмите на её эскиз. Чтобы можно было использовать все картинки для урока информатики, скачайте бесплатно презентацию «Data Mining: Exploring Data.ppt» со всеми картинками в zip-архиве размером 939 КБ.

Data Mining: Exploring Data

содержание презентации «Data Mining: Exploring Data.ppt»
Сл Текст Сл Текст
1Data Mining: Exploring Data. Lecture 21spatial grid They partition the plane into
Notes for Chapter 3 Introduction to Data regions of similar values The contour
Mining by Tan, Steinbach, Kumar. lines that form the boundaries of these
2What is data exploration? A regions connect points with equal values
preliminary exploration of the data to The most common example is contour maps of
better understand its characteristics. Key elevation Can also display temperature,
motivations of data exploration include rainfall, air pressure, etc. An example
Helping to select the right tool for for Sea Surface Temperature (SST) is
preprocessing or analysis Making use of provided on the next slide.
humans’ abilities to recognize patterns 22Contour Plot Example: SST Dec, 1998.
People can recognize patterns not captured 23Visualization Techniques: Matrix
by data analysis tools Related to the area Plots. Matrix plots Can plot the data
of Exploratory Data Analysis (EDA) Created matrix This can be useful when objects are
by statistician John Tukey Seminal book is sorted according to class Typically, the
Exploratory Data Analysis by Tukey A nice attributes are normalized to prevent one
online introduction can be found in attribute from dominating the plot Plots
Chapter 1 of the NIST Engineering of similarity or distance matrices can
Statistics Handbook also be useful for visualizing the
http://www.itl.nist.gov/div898/handbook/in relationships between objects Examples of
ex.htm. matrix plots are presented on the next two
3Techniques Used In Data Exploration. slides.
In EDA, as originally defined by Tukey The 24Visualization of the Iris Data Matrix.
focus was on visualization Clustering and 25Visualization of the Iris Correlation
anomaly detection were viewed as Matrix.
exploratory techniques In data mining, 26Visualization Techniques: Parallel
clustering and anomaly detection are major Coordinates. Parallel Coordinates Used to
areas of interest, and not thought of as plot the attribute values of
just exploratory In our discussion of data high-dimensional data Instead of using
exploration, we focus on Summary perpendicular axes, use a set of parallel
statistics Visualization Online Analytical axes The attribute values of each object
Processing (OLAP). are plotted as a point on each
4Iris Sample Data Set. Many of the corresponding coordinate axis and the
exploratory data techniques are points are connected by a line Thus, each
illustrated with the Iris Plant data set. object is represented as a line Often, the
Can be obtained from the UCI Machine lines representing a distinct class of
Learning Repository objects group together, at least for some
http://www.ics.uci.edu/~mlearn/MLRepositor attributes Ordering of attributes is
.html From the statistician Douglas Fisher important in seeing such groupings.
Three flower types (classes): Setosa 27Parallel Coordinates Plots for Iris
Virginica Versicolour Four (non-class) Data.
attributes Sepal width and length Petal 28Other Visualization Techniques. Star
width and length. Virginica. Robert H. Plots Similar approach to parallel
Mohlenbrock. USDA NRCS. 1995. Northeast coordinates, but axes radiate from a
wetland flora: Field office guide to plant central point The line connecting the
species. Northeast National Technical values of an object is a polygon Chernoff
Center, Chester, PA. Courtesy of USDA NRCS Faces Approach created by Herman Chernoff
Wetland Science Institute. This approach associates each attribute
5Summary Statistics. Summary statistics with a characteristic of a face The values
are numbers that summarize properties of of each attribute determine the appearance
the data Summarized properties include of the corresponding facial characteristic
frequency, location and spread Examples: Each object becomes a separate face Relies
location - mean spread - standard on human’s ability to distinguish faces.
deviation Most summary statistics can be 29Star Plots for Iris Data. Setosa
calculated in a single pass through the Versicolour Virginica.
data. 30Chernoff Faces for Iris Data. Setosa
6Frequency and Mode. The frequency of Versicolour Virginica.
an attribute value is the percentage of 31OLAP. On-Line Analytical Processing
time the value occurs in the data set For (OLAP) was proposed by E. F. Codd, the
example, given the attribute ‘gender’ and father of the relational database.
a representative population of people, the Relational databases put data into tables,
gender ‘female’ occurs about 50% of the while OLAP uses a multidimensional array
time. The mode of a an attribute is the representation. Such representations of
most frequent attribute value The notions data previously existed in statistics and
of frequency and mode are typically used other fields There are a number of data
with categorical data. analysis and data exploration operations
7Percentiles. For continuous data, the that are easier with such a data
notion of a percentile is more useful. representation.
Given an ordinal or continuous attribute x 32Creating a Multidimensional Array. Two
and a number p between 0 and 100, the pth key steps in converting tabular data into
percentile is a value of x such that p% of a multidimensional array. First, identify
the observed values of x are less than . which attributes are to be the dimensions
For instance, the 50th percentile is the and which attribute is to be the target
value such that 50% of all values of x are attribute whose values appear as entries
less than . in the multidimensional array. The
8Measures of Location: Mean and Median. attributes used as dimensions must have
The mean is the most common measure of the discrete values The target value is
location of a set of points. However, the typically a count or continuous value,
mean is very sensitive to outliers. Thus, e.g., the cost of an item Can have no
the median or a trimmed mean is also target variable at all except the count of
commonly used. objects that have the same set of
9Measures of Spread: Range and attribute values Second, find the value of
Variance. Range is the difference between each entry in the multidimensional array
the max and min The variance or standard by summing the values (of the target
deviation is the most common measure of attribute) or count of all objects that
the spread of a set of points. However, have the attribute values corresponding to
this is also sensitive to outliers, so that entry.
that other measures are often used. 33Example: Iris data. We show how the
10Visualization. Visualization is the attributes, petal length, petal width, and
conversion of data into a visual or species type can be converted to a
tabular format so that the characteristics multidimensional array First, we
of the data and the relationships among discretized the petal width and length to
data items or attributes can be analyzed have categorical values: low, medium, and
or reported. Visualization of data is one high We get the following table - note the
of the most powerful and appealing count attribute.
techniques for data exploration. Humans 34Example: Iris data (continued). Each
have a well developed ability to analyze unique tuple of petal width, petal length,
large amounts of information that is and species type identifies one element of
presented visually Can detect general the array. This element is assigned the
patterns and trends Can detect outliers corresponding count value. The figure
and unusual patterns. illustrates the result. All non-specified
11Example: Sea Surface Temperature. The tuples are 0.
following shows the Sea Surface 35Example: Iris data (continued). Slices
Temperature (SST) for July 1982 Tens of of the multidimensional array are shown by
thousands of data points are summarized in the following cross-tabulations What do
a single figure. these tables tell us?
12Representation. Is the mapping of 36OLAP Operations: Data Cube. The key
information to a visual format Data operation of a OLAP is the formation of a
objects, their attributes, and the data cube A data cube is a
relationships among data objects are multidimensional representation of data,
translated into graphical elements such as together with all possible aggregates. By
points, lines, shapes, and colors. all possible aggregates, we mean the
Example: Objects are often represented as aggregates that result by selecting a
points Their attribute values can be proper subset of the dimensions and
represented as the position of the points summing over all remaining dimensions. For
or the characteristics of the points, example, if we choose the species type
e.g., color, size, and shape If position dimension of the Iris data and sum over
is used, then the relationships of points, all other dimensions, the result will be a
i.e., whether they form groups or a point one-dimensional entry with three entries,
is an outlier, is easily perceived. each of which gives the number of flowers
13Arrangement. Is the placement of of each type.
visual elements within a display Can make 37Data Cube Example. Consider a data set
a large difference in how easy it is to that records the sales of products at a
understand the data Example: number of company stores at various dates.
14Selection. Is the elimination or the This data can be represented as a 3
de-emphasis of certain objects and dimensional array There are 3
attributes Selection may involve the two-dimensional aggregates (3 choose 2 ),
chossing a subset of attributes 3 one-dimensional aggregates, and 1
Dimensionality reduction is often used to zero-dimensional aggregate (the overall
reduce the number of dimensions to two or total).
three Alternatively, pairs of attributes 38Data Cube Example (continued). The
can be considered Selection may also following figure table shows one of the
involve choosing a subset of objects A two dimensional aggregates, along with two
region of the screen can only show so many of the one-dimensional aggregates, and the
points Can sample, but want to preserve overall total.
points in sparse areas. 39OLAP Operations: Slicing and Dicing.
15Visualization Techniques: Histograms. Slicing is selecting a group of cells from
Histogram Usually shows the distribution the entire multidimensional array by
of values of a single variable Divide the specifying a specific value for one or
values into bins and show a bar plot of more dimensions. Dicing involves selecting
the number of objects in each bin. The a subset of cells by specifying a range of
height of each bar indicates the number of attribute values. This is equivalent to
objects Shape of histogram depends on the defining a subarray from the complete
number of bins Example: Petal Width (10 array. In practice, both operations can
and 20 bins, respectively). also be accompanied by aggregation over
16Two-Dimensional Histograms. Show the some dimensions.
joint distribution of the values of two 40OLAP Operations: Roll-up and
attributes Example: petal width and petal Drill-down. Attribute values often have a
length What does this tell us? hierarchical structure. Each date is
17Visualization Techniques: Box Plots. associated with a year, month, and week. A
Box Plots Invented by J. Tukey Another way location is associated with a continent,
of displaying the distribution of data country, state (province, etc.), and city.
Following figure shows the basic part of a Products can be divided into various
box plot. categories, such as clothing, electronics,
18Example of Box Plots. Box plots can be and furniture. Note that these categories
used to compare attributes. often nest and form a tree or lattice A
19Visualization Techniques: Scatter year contains months which contains day A
Plots. Scatter plots Attributes values country contains a state which contains a
determine the position Two-dimensional city.
scatter plots most common, but can have 41OLAP Operations: Roll-up and
three-dimensional scatter plots Often Drill-down. This hierarchical structure
additional attributes can be displayed by gives rise to the roll-up and drill-down
using the size, shape, and color of the operations. For sales data, we can
markers that represent the objects It is aggregate (roll up) the sales across all
useful to have arrays of scatter plots can the dates in a month. Conversely, given a
compactly summarize the relationships of view of the data where the time dimension
several pairs of attributes See example on is broken into months, we could split the
the next slide. monthly sales totals (drill down) into
20Scatter Plot Array of Iris Attributes. daily sales totals. Likewise, we can drill
21Visualization Techniques: Contour down or roll up on the location or product
Plots. Contour plots Useful when a ID attributes.
continuous attribute is measured on a
Data Mining: Exploring Data.ppt
cсылка на страницу

Data Mining: Exploring Data

другие презентации на тему «Data Mining: Exploring Data»

«Data Mining» - Сложность разработки и эксплуатации приложения Data Mining. Задачи Data Mining. Недостатки. Решение задачи классификации новых объектов. Для линейной регрессии - линия регрессии. Data Mining не может заменить аналитика! История Data Mining. Решение задачи прогнозирования. Основная идея - разделение выборки данных на v "складок".

«Хранимые процедуры» - Хранимые процедуры. Реализация триггеров. Создание, изменение и удаление хранимых процедур. Сервер. Примеры использования. Понятие хранимых процедур. Системные хранимые процедуры. Триггер. Триггеры. Типы триггеров.

«Проектирование баз данных» - Этапы создания базы данных. Работа с сохраненной базой данных. Создание структуры базы данных и заполнение. Организация информации в табличную форму. Задание структуры базы данных. Нормализация. Проектирование баз данных. Нормализованная база данных. Плохо нормализованная таблица. Таблица может быть: Хорошо нормализованной Плохо нормализованной.

«Триггеры баз данных» - Привилегии для создания триггера. Типы триггеров. Пример замещающего триггера. Обновим группу. Понятие триггера. Замещающий триггер. Создание замещающих триггеров баз данных. Имя отдела. Представления, которые содержат соединения. Модифицируемые и немодифицируемые представления. Создание замещающих триггеров.

«Большие объекты» - Удалить N байт. Двухуровневое разбиение. Удаление. Вставка. Рисунок. Добавление в конец. Обозначения. Дескриптор поля большого размера. Поля большого размера. Запись. Организация памяти. Древовидное представление. Выделение места. Алгоритм. Упражнения. Улучшенное двухуровневое разбиение. Операции с полями большого размера.

«Практические работы по базам данных» - Информационные системы и базы данных. Указать имена полей, участвующие в формировании запроса. Построить и выполнить запрос с сортировкой данных по определенному полю. СУБД MS Access. 4. Формы представления баз данных. Сохранить запрос. Цель работы: обучение самостоятельной разработке многотабличной БД.

Работа с базами данных

11 презентаций о работе с базами данных


130 тем