Работа с базами данных <<  Профессии, в которых необходимы знания по работе с базами данных What Can Your Data Tell You  >> Iris Sample Data Set Measures of Location: Mean and Median Measures of Spread: Range and Variance Example: Sea Surface Temperature Arrangement Visualization Techniques: Histograms Visualization Techniques: Histograms Two-Dimensional Histograms Example of Box Plots Scatter Plot Array of Iris Attributes Contour Plot Example: SST Dec, 1998 Visualization of the Iris Data Matrix Visualization of the Iris Correlation Matrix Parallel Coordinates Plots for Iris Data Parallel Coordinates Plots for Iris Data Star Plots for Iris Data Chernoff Faces for Iris Data Example: Iris data Example: Iris data (continued) Example: Iris data (continued) Data Cube Example Data Cube Example (continued)
Картинки из презентации «Data Mining: Exploring Data» к уроку информатики на тему «Работа с базами данных»

Автор: Computations. Чтобы познакомиться с картинкой полного размера, нажмите на её эскиз. Чтобы можно было использовать все картинки для урока информатики, скачайте бесплатно презентацию «Data Mining: Exploring Data.ppt» со всеми картинками в zip-архиве размером 939 КБ.

## Data Mining: Exploring Data

содержание презентации «Data Mining: Exploring Data.ppt»
 Сл Текст Сл Текст 1 Data Mining: Exploring Data. Lecture 21 spatial grid They partition the plane into Notes for Chapter 3 Introduction to Data regions of similar values The contour Mining by Tan, Steinbach, Kumar. lines that form the boundaries of these 2 What is data exploration? A regions connect points with equal values preliminary exploration of the data to The most common example is contour maps of better understand its characteristics. Key elevation Can also display temperature, motivations of data exploration include rainfall, air pressure, etc. An example Helping to select the right tool for for Sea Surface Temperature (SST) is preprocessing or analysis Making use of provided on the next slide. humans’ abilities to recognize patterns 22 Contour Plot Example: SST Dec, 1998. People can recognize patterns not captured 23 Visualization Techniques: Matrix by data analysis tools Related to the area Plots. Matrix plots Can plot the data of Exploratory Data Analysis (EDA) Created matrix This can be useful when objects are by statistician John Tukey Seminal book is sorted according to class Typically, the Exploratory Data Analysis by Tukey A nice attributes are normalized to prevent one online introduction can be found in attribute from dominating the plot Plots Chapter 1 of the NIST Engineering of similarity or distance matrices can Statistics Handbook also be useful for visualizing the http://www.itl.nist.gov/div898/handbook/in relationships between objects Examples of ex.htm. matrix plots are presented on the next two 3 Techniques Used In Data Exploration. slides. In EDA, as originally defined by Tukey The 24 Visualization of the Iris Data Matrix. focus was on visualization Clustering and 25 Visualization of the Iris Correlation anomaly detection were viewed as Matrix. exploratory techniques In data mining, 26 Visualization Techniques: Parallel clustering and anomaly detection are major Coordinates. Parallel Coordinates Used to areas of interest, and not thought of as plot the attribute values of just exploratory In our discussion of data high-dimensional data Instead of using exploration, we focus on Summary perpendicular axes, use a set of parallel statistics Visualization Online Analytical axes The attribute values of each object Processing (OLAP). are plotted as a point on each 4 Iris Sample Data Set. Many of the corresponding coordinate axis and the exploratory data techniques are points are connected by a line Thus, each illustrated with the Iris Plant data set. object is represented as a line Often, the Can be obtained from the UCI Machine lines representing a distinct class of Learning Repository objects group together, at least for some http://www.ics.uci.edu/~mlearn/MLRepositor attributes Ordering of attributes is .html From the statistician Douglas Fisher important in seeing such groupings. Three flower types (classes): Setosa 27 Parallel Coordinates Plots for Iris Virginica Versicolour Four (non-class) Data. attributes Sepal width and length Petal 28 Other Visualization Techniques. Star width and length. Virginica. Robert H. Plots Similar approach to parallel Mohlenbrock. USDA NRCS. 1995. Northeast coordinates, but axes radiate from a wetland flora: Field office guide to plant central point The line connecting the species. Northeast National Technical values of an object is a polygon Chernoff Center, Chester, PA. Courtesy of USDA NRCS Faces Approach created by Herman Chernoff Wetland Science Institute. This approach associates each attribute 5 Summary Statistics. Summary statistics with a characteristic of a face The values are numbers that summarize properties of of each attribute determine the appearance the data Summarized properties include of the corresponding facial characteristic frequency, location and spread Examples: Each object becomes a separate face Relies location - mean spread - standard on human’s ability to distinguish faces. deviation Most summary statistics can be 29 Star Plots for Iris Data. Setosa calculated in a single pass through the Versicolour Virginica. data. 30 Chernoff Faces for Iris Data. Setosa 6 Frequency and Mode. The frequency of Versicolour Virginica. an attribute value is the percentage of 31 OLAP. On-Line Analytical Processing time the value occurs in the data set For (OLAP) was proposed by E. F. Codd, the example, given the attribute ‘gender’ and father of the relational database. a representative population of people, the Relational databases put data into tables, gender ‘female’ occurs about 50% of the while OLAP uses a multidimensional array time. The mode of a an attribute is the representation. Such representations of most frequent attribute value The notions data previously existed in statistics and of frequency and mode are typically used other fields There are a number of data with categorical data. analysis and data exploration operations 7 Percentiles. For continuous data, the that are easier with such a data notion of a percentile is more useful. representation. Given an ordinal or continuous attribute x 32 Creating a Multidimensional Array. Two and a number p between 0 and 100, the pth key steps in converting tabular data into percentile is a value of x such that p% of a multidimensional array. First, identify the observed values of x are less than . which attributes are to be the dimensions For instance, the 50th percentile is the and which attribute is to be the target value such that 50% of all values of x are attribute whose values appear as entries less than . in the multidimensional array. The 8 Measures of Location: Mean and Median. attributes used as dimensions must have The mean is the most common measure of the discrete values The target value is location of a set of points. However, the typically a count or continuous value, mean is very sensitive to outliers. Thus, e.g., the cost of an item Can have no the median or a trimmed mean is also target variable at all except the count of commonly used. objects that have the same set of 9 Measures of Spread: Range and attribute values Second, find the value of Variance. Range is the difference between each entry in the multidimensional array the max and min The variance or standard by summing the values (of the target deviation is the most common measure of attribute) or count of all objects that the spread of a set of points. However, have the attribute values corresponding to this is also sensitive to outliers, so that entry. that other measures are often used. 33 Example: Iris data. We show how the 10 Visualization. Visualization is the attributes, petal length, petal width, and conversion of data into a visual or species type can be converted to a tabular format so that the characteristics multidimensional array First, we of the data and the relationships among discretized the petal width and length to data items or attributes can be analyzed have categorical values: low, medium, and or reported. Visualization of data is one high We get the following table - note the of the most powerful and appealing count attribute. techniques for data exploration. Humans 34 Example: Iris data (continued). Each have a well developed ability to analyze unique tuple of petal width, petal length, large amounts of information that is and species type identifies one element of presented visually Can detect general the array. This element is assigned the patterns and trends Can detect outliers corresponding count value. The figure and unusual patterns. illustrates the result. All non-specified 11 Example: Sea Surface Temperature. The tuples are 0. following shows the Sea Surface 35 Example: Iris data (continued). Slices Temperature (SST) for July 1982 Tens of of the multidimensional array are shown by thousands of data points are summarized in the following cross-tabulations What do a single figure. these tables tell us? 12 Representation. Is the mapping of 36 OLAP Operations: Data Cube. The key information to a visual format Data operation of a OLAP is the formation of a objects, their attributes, and the data cube A data cube is a relationships among data objects are multidimensional representation of data, translated into graphical elements such as together with all possible aggregates. By points, lines, shapes, and colors. all possible aggregates, we mean the Example: Objects are often represented as aggregates that result by selecting a points Their attribute values can be proper subset of the dimensions and represented as the position of the points summing over all remaining dimensions. For or the characteristics of the points, example, if we choose the species type e.g., color, size, and shape If position dimension of the Iris data and sum over is used, then the relationships of points, all other dimensions, the result will be a i.e., whether they form groups or a point one-dimensional entry with three entries, is an outlier, is easily perceived. each of which gives the number of flowers 13 Arrangement. Is the placement of of each type. visual elements within a display Can make 37 Data Cube Example. Consider a data set a large difference in how easy it is to that records the sales of products at a understand the data Example: number of company stores at various dates. 14 Selection. Is the elimination or the This data can be represented as a 3 de-emphasis of certain objects and dimensional array There are 3 attributes Selection may involve the two-dimensional aggregates (3 choose 2 ), chossing a subset of attributes 3 one-dimensional aggregates, and 1 Dimensionality reduction is often used to zero-dimensional aggregate (the overall reduce the number of dimensions to two or total). three Alternatively, pairs of attributes 38 Data Cube Example (continued). The can be considered Selection may also following figure table shows one of the involve choosing a subset of objects A two dimensional aggregates, along with two region of the screen can only show so many of the one-dimensional aggregates, and the points Can sample, but want to preserve overall total. points in sparse areas. 39 OLAP Operations: Slicing and Dicing. 15 Visualization Techniques: Histograms. Slicing is selecting a group of cells from Histogram Usually shows the distribution the entire multidimensional array by of values of a single variable Divide the specifying a specific value for one or values into bins and show a bar plot of more dimensions. Dicing involves selecting the number of objects in each bin. The a subset of cells by specifying a range of height of each bar indicates the number of attribute values. This is equivalent to objects Shape of histogram depends on the defining a subarray from the complete number of bins Example: Petal Width (10 array. In practice, both operations can and 20 bins, respectively). also be accompanied by aggregation over 16 Two-Dimensional Histograms. Show the some dimensions. joint distribution of the values of two 40 OLAP Operations: Roll-up and attributes Example: petal width and petal Drill-down. Attribute values often have a length What does this tell us? hierarchical structure. Each date is 17 Visualization Techniques: Box Plots. associated with a year, month, and week. A Box Plots Invented by J. Tukey Another way location is associated with a continent, of displaying the distribution of data country, state (province, etc.), and city. Following figure shows the basic part of a Products can be divided into various box plot. categories, such as clothing, electronics, 18 Example of Box Plots. Box plots can be and furniture. Note that these categories used to compare attributes. often nest and form a tree or lattice A 19 Visualization Techniques: Scatter year contains months which contains day A Plots. Scatter plots Attributes values country contains a state which contains a determine the position Two-dimensional city. scatter plots most common, but can have 41 OLAP Operations: Roll-up and three-dimensional scatter plots Often Drill-down. This hierarchical structure additional attributes can be displayed by gives rise to the roll-up and drill-down using the size, shape, and color of the operations. For sales data, we can markers that represent the objects It is aggregate (roll up) the sales across all useful to have arrays of scatter plots can the dates in a month. Conversely, given a compactly summarize the relationships of view of the data where the time dimension several pairs of attributes See example on is broken into months, we could split the the next slide. monthly sales totals (drill down) into 20 Scatter Plot Array of Iris Attributes. daily sales totals. Likewise, we can drill 21 Visualization Techniques: Contour down or roll up on the location or product Plots. Contour plots Useful when a ID attributes. continuous attribute is measured on a Data Mining: Exploring Data.ppt
http://900igr.net/kartinka/informatika/data-mining-exploring-data-239475.html
cсылка на страницу

## Data Mining: Exploring Data

другие презентации на тему «Data Mining: Exploring Data»

«Data Mining» - Сложность разработки и эксплуатации приложения Data Mining. Задачи Data Mining. Недостатки. Решение задачи классификации новых объектов. Для линейной регрессии - линия регрессии. Data Mining не может заменить аналитика! История Data Mining. Решение задачи прогнозирования. Основная идея - разделение выборки данных на v "складок".

«Хранимые процедуры» - Хранимые процедуры. Реализация триггеров. Создание, изменение и удаление хранимых процедур. Сервер. Примеры использования. Понятие хранимых процедур. Системные хранимые процедуры. Триггер. Триггеры. Типы триггеров.

«Проектирование баз данных» - Этапы создания базы данных. Работа с сохраненной базой данных. Создание структуры базы данных и заполнение. Организация информации в табличную форму. Задание структуры базы данных. Нормализация. Проектирование баз данных. Нормализованная база данных. Плохо нормализованная таблица. Таблица может быть: Хорошо нормализованной Плохо нормализованной.

«Триггеры баз данных» - Привилегии для создания триггера. Типы триггеров. Пример замещающего триггера. Обновим группу. Понятие триггера. Замещающий триггер. Создание замещающих триггеров баз данных. Имя отдела. Представления, которые содержат соединения. Модифицируемые и немодифицируемые представления. Создание замещающих триггеров.

«Большие объекты» - Удалить N байт. Двухуровневое разбиение. Удаление. Вставка. Рисунок. Добавление в конец. Обозначения. Дескриптор поля большого размера. Поля большого размера. Запись. Организация памяти. Древовидное представление. Выделение места. Алгоритм. Упражнения. Улучшенное двухуровневое разбиение. Операции с полями большого размера.

«Практические работы по базам данных» - Информационные системы и базы данных. Указать имена полей, участвующие в формировании запроса. Построить и выполнить запрос с сортировкой данных по определенному полю. СУБД MS Access. 4. Формы представления баз данных. Сохранить запрос. Цель работы: обучение самостоятельной разработке многотабличной БД.

## Работа с базами данных

11 презентаций о работе с базами данных
Урок

130 тем
Картинки