The Internet is full of flawed content generated by AI. Will AI trained based on this information be outrageous?

The Internet is full of flawed content generated by AI. Will AI trained based on this information be outrageous?

1. Yes.

If you answer the question of "yes and no", the answer is obviously "yes".

2. The logic of AI learning can be simply summarized as the following three steps:

2.1. Input data and feature extraction: The first step of AI learning is input data and feature extraction. At this stage, the AI system will receive some input data, which can be images, text, voice or other types of data. Then, the AI system will extract some useful features from these data, which can help the AI system better understand and process the data.

2.2. Model training: The second step of AI learning is model training. At this stage, the AI system will use the input data and features extracted to train a model. This model can be neural network, decision tree, support vector machine, etc. It will learn how to map the input data to the output results according to the features extracted from the input data and features. The goal of model training is to make the model accurately predict the output results.

2.3. Model evaluation and optimization: The third step of AI learning is model evaluation and optimization. At this stage, the AI system will use some test data to evaluate the performance of the model and optimize the model according to the evaluation results. If the performance of the model is not good enough, the AI system will adjust and optimize the model to improve its accuracy and generalization ability. Generally speaking, the logic of AI learning is to continuously improve the performance and ability of AI system through input data and feature extraction, model training and model evaluation and optimization, so as to realize more accurate and intelligent prediction and decision-making.

3. Garbage input and garbage output

The quality of AI-generated content is affected by the quality of data used to train AI models. If the training packet contains defective content, then the artificial intelligence model will also be defective. This is called "garbage input, garbage output". Because the AI system will try to imitate and repeat the existing data when learning, if the data itself has problems, then the AI system may repeat these problems and even aggravate them.

Constructing high-quality data sets is the key, and attention should be paid to the source, quality, scale and diversity of data sets. Miniaturization of language model is also an important research direction.

4. But there is room for optimization and self-evolution.

4.1. The training of AI system does not only depend on the data on the Internet, and the data on the Internet is less than 5% of human information.

4.2. It also includes various artificially designed data sets and algorithms. If these data sets and algorithms are carefully designed and optimized, then the AI system can be prevented from being affected by flawed data on the Internet.

4.3. AI system can also improve itself through self-learning and adaptation, thus improving its accuracy and reliability.

4.4. Although flawed content on the Internet may have some impact on the training and development of AI system, it does not mean that AI system will become more outrageous. On the contrary, with the advancement of technology and the continuous improvement of data sets, AI systems will become more accurate and reliable.

5. Making an AIGC or ChatGPT requires a lot of technology, not just input.

Large-scale model technology accumulation: it is necessary to master the basic knowledge of large models, such as Transformer architecture, self-supervised learning, pre-training and fine-tuning.

Accumulation of natural language processing technology: you need to know the basic knowledge of natural language processing, such as word segmentation, word vector, semantic understanding, emotion analysis, entity recognition and so on.

Data set construction technology accumulation: it is necessary to build high-quality dialogue data sets to improve the quality and effect of the model. The construction of data set needs to consider many factors, such as data source, data quality, data scale, data diversity and so on.

Algorithms and computing power: You need to master reinforcement learning, generative model, attention mechanism and other algorithms, and have enough computing resources to train and optimize the model.

6. Take the manuscript to make an inappropriate analogy.

Copying refers to copying, pasting, modifying and deleting other people’s original articles without authorization, which makes them look different from the original, but in essence they copy the contents and ideas of the original. Editors usually aim to get the content quickly, save time and energy, so as to achieve the purpose of publishing articles quickly, but this behavior seriously infringes on the intellectual property rights of the original author, and also violates academic ethics and professional ethics. Washing manuscripts not only harms the interests of original authors, but also greatly damages the reputation and image of the whole industry, so it is regarded as an immoral and illegal behavior.

However, the manuscript washing should also be level and creative. The awesome manuscript washing often needs to "see" a lot of materials, which is another process from quantitative change to qualitative change.

7. Going back to the nature of AI, is it a tool or decision logic? What do you do with AI?

The process of human receiving information can be divided into the following stages:

Perception: Perception means that we receive external information through sensory organs, including vision, hearing, touch, taste and smell. Perception process is based on the interaction between sensory organs and external stimuli, which transforms external information into neural signals and transmits them to the brain.

Attention: Attention refers to the process of selecting and processing the perceived information. Because of the diversity and complexity of external information, we can’t handle all the information at the same time, so we need to filter out important information and deal with it through attention.

Understanding: Understanding refers to the process of interpreting and understanding the information we receive. This process needs to rely on our knowledge, experience and language ability to connect and integrate the perceived information with the existing knowledge, thus forming new cognition and understanding.

Memory: Memory refers to the process of storing and processing the received information. Memory can be divided into short-term memory and long-term memory. The former is the ability to temporarily store information for processing, while the latter is the ability to permanently store information in the brain.

Judgment: Judgment refers to the process of evaluating and judging the information we receive. This process needs to rely on our values, beliefs and cognitive abilities, and compare and evaluate information with our existing cognitive and value systems, thus forming our attitudes and views on information.

Action: Action refers to the process of making decisions and taking actions according to the information we receive. This process needs to rely on our willpower, decision-making ability and action ability, and turn our understanding and judgment of information into actual actions and decisions.

What is the purpose of obtaining information with AI? To what stage can AI replace you?

关于作者

admin administrator