Home > News > Content
Big Data Development: From Search Engines To Artificial Intelligence
Nov 15, 2018

Our use of big data technology has also undergone a development process. From the very beginning, Google began to use big data technology in search engines, and nowadays ubiquitous various artificial intelligence applications, along with the development of big data technology, big data applications have also gone from the high to the present.

When Google first published the epoch-making papers on big data, perhaps he did not think that he had opened up a new era of big data. Today's achievements in big data and artificial intelligence are inseparable from the efforts of millions of big data practitioners around the world, including you and me. History may be opened by genius, but it is ultimately created by the people. As a participant in the era of big data, we are making history.

Search engine era for big data applications

As the world's largest search engine company, Google is also recognized as the originator of big data. It stores almost all the accessible web pages in the world, and the number may exceed one trillion, and all of them need to store tens of thousands of disks. In order to store these files, Google developed GFS (Google File System), which manages tens of thousands of disks on thousands of servers and then stores them as a file system.

You may feel that if you simply store all the pages, it seems that nothing is too great. That’s right, but Google’s access to these web pages is to build a search engine that requires word frequency statistics for the words in all the files and then calculates the page rank based on the PageRank algorithm. In the meantime, Google needs to calculate the files on these tens of thousands of disks, which sounds great. Of course, based on these needs, Google has developed the MapReduce big data computing framework.

In fact, before Google, the world's most famous search engine is Yahoo. However, with its own big data technology and PageRank algorithm, Google has made a qualitative leap in the search engine search experience. People have abandoned Yahoo and switched to Google. So when Google published its own GFS and MapReduce papers, Yahoo should be the first company to focus on these papers.

Doug Cutting first made Hadoop based on Google's papers, so Yahoo dug up Doug Cutting and developed Hadoop full-time. However, the honeymoons of Yahoo and Doug Cutting did not last long. Doug Cutting was uncomfortable with Yahoo's internal struggle, and moved to Cloudera, a company specializing in Hadoop commercialization, while Yahoo invested in Cloudera's competitor HortonWorks.

The top companies, like the top players, have an elegant aesthetic. You can watch Google along the way, from search engines, Gmail, maps, Android, and driverless, each step pushing human technology boundaries to higher levels. Even the poor companies have even gained a prominent position, but once they lose the sense of beauty and rhythm of doing things, in this era of rapid change, the fall is faster than the meteor.

The data warehouse era of big data applications

When Google’s paper was first published, it attracted search engine companies like Yahoo and open source search engine developers like Doug Cutting. Other companies were just eating people. But when Facebook launched Hive, the scent-sensitive technology companies were not calm, and they began to realize that the era of big data really opened up.

Once we were doing data analysis and statistics, we were limited to databases and statistically analyzed the data tables in the database in the computing environment of the database. And because of the amount of data and computing power, we can only count and analyze the most important data. The most important data here refers to the data and financial related data that are given to the boss.

Hive can perform SQL operations on Hadoo to achieve data statistics and analysis. In other words, we can get much more data storage and computing power than ever before at a lower price. We can put the running log, application collection data, and database data together for calculation and analysis, and obtain the data results that were not available before, and the enterprise data warehouse will exponentially expand.

Not only the boss, but every ordinary employee in the company, such as product manager, operator, and engineer, can provide analysis needs and obtain the data analysis results that they want to know from the big data warehouse as long as they have data access rights.

You see, in the era of data warehousing, as long as there is data, it is almost necessary to conduct statistical analysis. If the data size is large, we will think of using Hadoop big data technology, which is one reason why Hadoop developed particularly fast during this period. The development of technology has also promoted the application of technology, which also paved the way for the next big data application to enter the era of data mining.

The era of data mining for big data applications

Once big data enters more companies, we will put forward more expectations for big data. In addition to data statistics, we also hope to discover the value of more data, and big data will enter the era of data mining.

To tell a real case, merchants have long discovered through data that people who buy diapers often buy beer, so smart merchants put these two products together to promote sales. You can have a variety of interpretations of the relationship between beer and diapers, but if you don't use data mining, you may break your head and think about the relationship between them. In the business environment, it is not important to interpret this relationship. What is important is that as long as there is an association between them, the association analysis can be carried out. The ultimate goal is to let the user see the goods they want to buy as much as possible.

In addition to the relationship between goods and goods, you can also use the relationship between people to recommend goods. If many of the goods purchased by two people are similar or even the same, no matter how far the two people are from each other, they must have a certain relationship, such as the possible educational background, economic income, and hobbies. According to this relationship, it is possible to make association recommendations so that they can see the products they are interested in.

Further, big data can also dig out the different characteristics of each person and put on various labels: after 90, living in the first-tier cities, monthly income of 10,000 to 20,000, houses... These labels constitute the user portrait. And as long as there are enough labels, you can completely depict a person, and even the person closest to you is more complete and accurate.

In addition to merchandising, data mining can also be used for interpersonal relationship mining. Have you heard the "six degrees of separation theory"? It believes that two people in the world who don't know each other need only a few middlemen to connect them. The result of this theory in the United States is that two unfamiliar Americans can be reached in six steps. Based on this theory, Facebook has studied the data of more than one billion users, trying to find the number between two strangers, the answer is amazing 3.57. As you can see, all kinds of social software record our buddy relationship, and through the relationship map mining, almost all the human network in the world can be depicted.

Modern life is almost inseparable from the Internet, and all kinds of applications collect data all the time. These data are constantly being analyzed and mined in the big data cluster in the background. These analyses and excavations bring us good or fear and rely on the efforts of big data practitioners. But you can be sure that no matter what the final result is, this process will only accelerate and will not stop. You and I can only invest in it.

Machine learning era of big data applications

We have long discovered that there is a law in the data. This law is that all data is followed. What happened in the past follows this law, and things that will happen in the future follow this law. Once this law is found, it can be predicted according to this law for what is happening.

In the past, we were limited by the ability of data collection, storage, and computing. We could only obtain a small amount of data by sampling, and we could not get complete, global, and detailed rules. Now with big data, you can collect all the historical data, count its rules, and predict what is happening.

This is machine learning.

In the history, the game data of the human Go game is stored, and it is possible to obtain a higher win for each type of disk. After getting this statistical law, you can use this rule to play chess with people. Every step of calculation calculates where you will get a bigger win. So we get a robot that will play chess. This is the first two years of sensation. AlphaGo, with an overwhelming advantage, won the top players of humanity.

Let me give you an example that is closer to our lives. Collect the conversation data of people's chats and record the context of each conversation. If the previous sentence is to ask how the day is going, then how to deal with the next sentence can be counted through machine learning. In the future, if someone asks how they are doing today, they can automatically reply to the next sentence, so we will get a robot that will chat. Siri, Tmall Elf, and Xiao Ai, such a voice chat robot is already full of streets in the era of machine learning.

The data generated by human activities can be statistically learned through machine learning, which can simulate human behavior and make the machine show human-specific intelligence. This is artificial intelligence AI.

Now we have some irrational attitudes towards artificial intelligence. Some people think that artificial intelligence will become more and more powerful and will rule humanity in the future. In fact, a little understanding of the principle of artificial intelligence will reveal that this is just the statistical law calculated by big data. The re-intelligence of performance is impossible to understand the meaning of doing so, and meaning is the source of human intelligence. According to the current development of artificial intelligence, it is never possible to surpass human intelligence, and it is even more impossible to rule humanity.

Related News


Please send us your requirement by email.

Copyright © Hangzhou Wintek Building Products Co.Ltd All Rights Reserved.