From if statements to machine learning

From if statements to machine learning 2017-09-26 BPStudy#121 Cybozu Labs / BeProud Technical Advisor Doctor of Science / Master of Technology Management NISHIO Hirokazu ver.2 published 2017-09-29 ver.3 2018-06-22 diagram cleanup ver.4 2023-06-10Clean up and add OCR results

Do you think machine learning is a world away from the programs you are writing now?

Purpose of this slide

We will bridge the gap between if statements and machine learning, which everyone is familiar with, so that we can connect the ground and use machine learning in practice.
He will also explain how to proceed with the business implementation of machine learning in four steps, step by step.

if statement 4 if(condition){ …}

How if statement works

Execute contents when the condition is True.

When there is more than one condition

What if I want to run when either condition x1 or condition x2 is True?

if(x1 or x2){ …}

When there is more than one condition

What if I want to run when both condition x1 and condition x2 are True?

and

if(x1 and x2){ …}

When there is more than one condition

What if I want to execute when two or more of the conditions x1, x2, x3 are True?

Complicated!

if((x1 and x2) or (x1 and x3) or (x2 and x3)){ …}

When there is more than one condition

What if I want to execute when 3 or more of the conditions x1, x2, …, x5 are True?

Complicated!

if((x1 and x2 and x3) or (x1 and x2 and x4) or (x1 and x2 and x5) or…){ …}

There is a better way!

false value to be numeric.

Converting True to 1 and False to 0 changes “3 or more True” to “3 or more when added”.
if((x1 + x2 + x3 + x4 + x5) >= 3){ …}

Both and and or can have the same form.

x1 and x2 and x3 and x4 and x5 = all of the conditions x1, x2, …, x5 are true = add up to 5 or more
x1 or x2 or x3 or x4 or x5 = any of the conditions x1, x2, …, x5 are true = add up to 1 or more

Both and and or can have the same form.

Condition x1, x2, …, x5 are all True
- (x1 + x2 + x3 + x4 + x5) >= 5
At least 3 of the conditions x1, x2, …, x5 are True
- (x1 + x2 + x3 + x4 + x5) >= 3
One of the conditions x1, x2, …, x5 is True
- (x1 + x2 + x3 + x4 + x5) >= 1

Try to align the right side

Condition x1, x2, …, x5 are all True (weak)
- (1/5 *x1 + …) >= 1
At least 3 of the conditions x1, x2, …, x5 are True
- (1/3 *x1 + …) >= 1
One of the conditions x1, x2, …, x5 is True (strong).
- (1 *x1 + …) >= 1

Strength = Weight

Something like the “strength” of the condition is expressed by the magnitude of the coefficient (weight).

question

How can I express that all of the conditions x1, x2, …, x5 are True, or at least 3 of the conditions x6, x7, …, x10 are True, or any of the conditions x11, x12, …, x15 are True?

Weights may vary from condition to condition

1 *x1 + … + 1/3 * x6 + … + 1 * x11 >= 1

Weighted sum method

Set the true value of true to 1, false to 0, multiply by the strength of the condition as weights, and add them together.
This allows for the description of complex conditions that would be difficult with and or combinations.
Negation can be realized with negative weights and xor cannot be expressed, omitting the fact that xor

How do you adjust the weight?

1: People decide, just like we just did.
2: Machine decides based on a large amount of data = machine learning

Specific Codes

The contents of this issue can be studied using logistic regression.
If you do from sklearn.linear_model import LogisticRegression and call fit(X, y), the weights will be adjusted automatically and if you call predict(X), the weighted sum will be calculated and determined. Easy.
If you want to learn more about logistic regression, I recommend the series of articles by Hidehiro Nakatani of Cybozu Labs on the Gijutsu Hyoronsha website.
- 18th logistic regression | gihyo.jp

summary

Complex conditions are difficult to write with and/or.
It is easier to replace them with weighted sums.
Weights can be determined by machine learning if there is enough data.
The level of detail described in this article can be done in a few lines using scikit-learn.

Machine learning techniques so far Here are 4 steps to the business of machine learning

Differences between Academia and Business - academia - Data is publicly available. - Withered technology was investigated long ago. - Devise new methods and compete for accuracy.

business
- Data not publicly available.
- (Use of dead technology)

Business Objectives - customer value

([novelty (patentability) not)

Sub-questions to experience customer value

A certain gemstone has a 1/2 chance of containing a gem when broken and sells for 20,000 yen.
One rough stone can be bought for 9,500 yen. Rough stones are hard and only 20 can be broken per day.
Q1: What is the expected amount of revenue you can earn per day?

A1 Revenue Expectations

A1: The expected value of income per unit is 10,000 yen because you get 20,000 yen at 1/2 chance.
The purchase price per unit is 9,500, so the revenue per unit is 500 yen.
Since 20 units can be processed per day, the revenue per day is 1,000 yen.

Q2 Machining speed doubling device

Q2: Suppose you have a machine that can break a rough stone twice as fast as a human.
In other words, suppose the number of gemstones that can be broken per day increases from 20 to 40.
How much more revenue can I expect?

A2 Machining speed doubling device

A2: It is still a profit of 500 yen per gemstone.
The amount that can be processed per day increases from 20 to 40.
Since there will be 20 more units, profit will increase by 10,000 yen.

Q3 60% discriminator

Q3: Suppose you can buy an identifier* that has a 60% chance of guessing the gemstone that contains the gem.
- *Suppose that by using it before buying the gemstone, you can get a “gemstone with a 60% chance of containing a gemstone” for 9,500 yen.
- I am not concerned this time about the fact that the store owner might not like it.
- Also assume that the machine in Q2 is not in hand and the number of pieces that can be broken per day remains 20.
How much more revenue can I expect?

A3: 60% discriminator

Since there is a 60% chance of getting 20,000 yen, the expected income per unit is 12,000 yen.
Since the purchase price per unit is 9,500 yen, the revenue per unit is 2,500 yen.
Since 20 units can be processed per day, the revenue per day is 50,000 yen.
40,000 more…

In other words, in this problem setting

The “device that doubles the processing speed” increases the customer’s profit by 10,000 yen,
The “identifier with 60% accuracy” increases the customer’s profit by 40,000 yen.

lesson

In this problem setting, a “discriminator with 60% accuracy” has four times the customer value of a “device that doubles the processing speed.
How much accuracy and how much customer value can be generated depends on Business Requirements.
Accuracy is not always important.

Business Requirements

Before thinking about improving accuracy, it is necessary to first clarify what customers want and what kind of constraints they have.
Even if a method described in the latest paper is implemented, if the customer cannot provide the amount of data required by the method, then the implementation has no customer value.

Customers are not experts.

In many cases, customers are not experts in machine learning and cannot clearly verbalize either the required accuracy or the constraints that must be met.
So it is necessary to learn this fast.

Minimum Viable Product

A concept advocated in the “Lean Startup” methodology of IT venture management.
Ventures have limited funds and must quickly understand customer needs.
Therefore, by creating miscellaneous products at minimal cost and actually showing them to customers, we can find out where the customers’ needs lie.

Minimal implementation = no implementation

It’s called “concierge-type MVP.”

Step 1

Let’s assume it’s a human and answer the following questions
- ① customer value : What makes the customer happy?
- (2) How people do it: If people do it, how do they do it (are there people who already do it?)? No?)

(2) How humans do it

Someone is already doing it, or knows how to do it.
But it takes too much time and effort.
→ Opportunity! Mechanization to reduce time and labor is customer value!

(2) How humans do it

There are people doing it on the client side, etc., but I don’t know how to do it.
→ There is some important information being omitted from being communicated.

(2) How humans do it

Neither the customer nor myself know how to do it the way humans do it.
→ Possibility of being recklessly delusional in the first place.

Example: Spam Filter

Q1 What makes customers happy?
A1 Customers have a problem with a lot of spam in their mailboxes, and would be happy if the spam would go away.
Q2 How would a human being do that?
A2 Look at the body of the e-mail to determine if it is spam or not, and move spam to a different folder.

Step 2

Putting humans in a box.
- Only electronic data can be put in and out of this box.
- (Can be considered remote work).

Step 2

(3) Input data: What data does a human put in the box to do (2)?
(4) Output data: What kind of output data comes out of the box when a human does (2)?

Step 2

(5) How to obtain: How to obtain input data (3)? (first step and how to do it continuously)
(6) How to give: How do we connect output data (4) to customer value (1)?

The order in which these four questions are answered does not matter.

Example 1: To do ~ to a customer (6), output ~ (4), therefore input ~ (3), how shall I get this? (5)
Example 2: Now that we have the data (5, 3), how do we generate customer value from it? (6) What outputs are needed to achieve this? (4)

Example: Spam Filter

Q2 How would a human being do that?
A2If it’s done by a human, look at the body of the email, determine if it’s spam or not, and move the spam to a different folder.
Q3 What data needs to be included for a human to (2)?
Need information on the body of A3 emails. I’d like to get the title and sender if I can get them.

Example: Spam Filter

Q4What output data comes out of the box when a human does (2)?
A4Output “spam, not spam” labels for each e-mail

Example: Spam Filter

Q5 How do I get the input data (3)?
A5 As a first step, if you could export your email for now.
- To do this on an ongoing basis, I would need to create a way to get the data from the mailer or modify the mail server side.
PS: Which is a higher priority, receiving the data for the first step and experimenting or modifying the mail server so that data is continually available?
- Basically, “can it be identified by machine learning” is uncertain, so the sooner it is verified, the better.
- It depends on the situation, so we need to decide on a case-by-case basis which to do first, or whether to do them in parallel.

Example: Spam Filter

Q6How do you connect output data (4) to customer value (1)?
A6Sort mail by looking at the label that is/isn’t spam
PS: Once we have verbalized that we need to “sort mail” in order to connect to “customer value,” the next question is, “How can we do that sorting?” and then we can dig deeper.
- Can we modify the side of the program that displays the email?
- For example, do you put a specific string in the header indicating that it has been flagged as spam and ask the user to configure the sorting settings?

part-time job

(7) If the person in the box is a part-time worker with no knowledge at all, what manuals do you need to provide? Thinking about this will make the next step easier.

Step 3

Step 1: Human beings do it.
Step 2: The person in the box does it.
Step 3: The machine in the box does it.
Replace the person in the box with a computer.

First Program

(8) Write a program for the computer in the box to do (2)
It doesn’t have to be perfect.
Does not have to be a sophisticated algorithm
Accuracy can be low.
It’s enough to say, “It works for now.”

Run it and show it to your clients.

(9) Put actual data into the program (8) and observe the behavior. (How is the accuracy? How fast is it?)
(10) Will customers be satisfied with this?

Afraid to show your clients?

Afraid to show clients if it’s not accurate?
But only the customer knows what the customer values.
Even if a customer says it’s low quality, it’s an opportunity to learn what the customer values. see “Lean Startup.”

If the customer is not satisfied.

(11) Gather how you are not satisfied specific dissatisfaction.
- (What kind of output do you want for what kind of input? This is called teacher data.

Step 4

Finally, machine learning!

scientific methodology

When the teacher data is enriched, the quality of the algorithm can be quantitatively measured.
- (A portion of the teacher data is used for validation)
Compare with program (8) neatly, because just because it is made into machine learning does not mean it will be better.
Cycle of hypothesis, experiment, verification, and revision.
- ( PDCA cycle / scientific methodology )
- cycle and improve it.

Specific Methods of Improvement

Extract and view “data that the current algorithm is unable to classify correctly” and consider what can be done to classify them correctly. (e.g., adding features).
If the algorithm returns “confidence in the decision”, such as logistic regression, then look at the “unconfident results” and add more teacher data. (active learning)

I’ll say it again and again so you don’t make a mistake.

What matters in business is customer value. Not accuracy.
Even if the accuracy is 99%, if the 1% that makes a mistake is fatal to the customer, a program with 60% accuracy that does not make that mistake is more valuable to the customer.

summary

Customer value is important in business.
Neither the customer nor you understand exactly what customer value is.
Repeat experiments with minimum man-hours for quick understanding (Minimum Viable Prodict)
Experiments gradually materialize dissatisfaction and customer value.
Iterate improvements to increase customer value.

Supplemental information and questions/answers below

supplement

This is similar to “software development without clear customer requirements or specifications.
The difference is that if a customer is dissatisfied, it could be solved by “adding that information to the training data” instead of “reworking the software”.
However, it is not always possible to solve the problem with data, so it is still important to experiment with minimal implementation.

Supplement:Bad Pattern

(1) Easy way
(2) Textile data
(3) Amazing Machine Learning
- Use up all your time here.
(4) Highly accurate output data
(5) bumping into each other in a miscellaneous manner
1. Mismatch: Customer “this is not the one”.
(vii) Complaints without a snowball’s chance in hell To avoid this, it is important to have a form of experimentation that can be repeated.

Q&A Manual Writing

Question about (7) and (8): “If you can write a manual for a part-time job, shouldn’t you also be able to write a program?”
Yes, I did. What I wanted to say was, “If the specifications are so vague that you can’t write a manual for part-time workers, it’s impossible to write a program.
How do you write a manual?” This is how we notice what has not yet been verbalized, and this is how we encourage verbalization. (See examples on the next page).

Example: Spam Filter
What we want to achieve: >(2) “Look at the body of the email to determine if it is spam or not, and move spam to a different folder.”
You “look at the text to determine if it’s spam.”
Byte: “How do you judge?”
You say, “For example, if the word ____ is in there, it’s spam.”
Noticed: You should make the first step program (8) “determine if NG keywords are included”.
- Supplement to the Supplement: The purpose of writing the manual for the part-time program is to help in writing the “first puerile program,” so you can skip it if you can write the program without much effort.

production requirements

Q: How many man-hours should I expect it to take?
A: Case by case.
- For example, when data is entered into logistic regression, which is a well-developed technique, sometimes the accuracy is so good that the client gives a one-shot OK, while other times it is completely inadequate. The only way is to experiment.
- Since it is difficult to estimate man-hours, it is dangerous to enter into a contract that promises results in a fixed amount of time.

accuracy

Q: If a customer asks for an accuracy guarantee
A: We need to make it a common understanding with customers that “we will not know how accurate it will be until we experiment with it.
It is better to have a contract in the form of payment for “trial and error that may produce a good product” rather than a commitment to accuracy.
We need to experiment with as small a unit as possible and communicate closely with customers.

Cost and accuracy of decisions
Knowledgeable “experts” make decisions
A “part-time” worker with no knowledge makes a decision.
Machine determines
The accuracy and cost of decision making is high up there.
Machines don’t get tired of working 24 hours a day and the labor department doesn’t get mad at them.
Accuracy is not necessarily high*, but lower costs create business value. (*Depends on your efforts)
2023 note: GPT-3.5 exceeds humans for a wide range of tasks.
- ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
- So the “machine” is now more accurate than a “part-time worker with no knowledge.”
- The time has come to say, “It is better to use GPT-3.5 than to crowdsource teacher data to amateurs who have neither the knowledge nor the motivation to do so.

Deep Learning

Q: “Can’t Deep Learning do this automatically even if a human can’t define the conditions?”
A: Half right.
In the field of image processing, for example, even a 100x100 monochrome image is in the painful state of “there are 10,000 input values. Experts have been devising various conditions for decades in order to handle such inputs well.
I created a special form of neural net that repeats convolution and trained it with a large amount of data, and surprisingly, it showed better accuracy than the one devised and implemented by the experts.
This is the correct half. The incorrect half is on the next page.

Deep Learning

This success draws attention to Deep Leaning, takes out the context, and warps the message game to say that Deep Learning can be used to create programs that make more accurate decisions than humans.
As a result, more and more people think, “This too can be managed by Deep Learning” for problems with totally different conditions.
Since the effectiveness of a method changes as the problem conditions change, the only basic rule is to “repeat small experiments in the order of the dead methods.

active learning

Q: Is it better to do active learning with logistic regression rather than naïve Bayes?
A: Naive Bayes is also a method that can produce a confidence level similar to logistic regression, so you can do active learning with Naive Bayes.
Tip: Although I omitted it this time, both logistic regression and naive Bayes are “stochastic model”, which is a model that produces outputs like “the probability that this input is class 1 is 0.8”. This was expressed as “decision confidence” in this presentation.

differential (e.g. calculus)

Twitter Thoughts, “No matrices or derivatives showing up.”
When the machine adjusts the weights, it says, “Let’s change the weights a little bit in the direction of a smaller deviation from the correct answer.
In mathematical terms, this would be “Differentiate the ‘deviation from the correct answer’ by the weights to obtain the gradient, and change the weights slightly in the direction of the gradient.

matrix

In logistic regression, for example, if there are 15 conditions, 15 values of “weight” can be created.
The mathematical term for this is “vector”.
A neural net is a collection of multiple logistic regression-like pieces, so if 10 pieces are collected, 15x10 values can be produced.
The mathematical term for this would be “matrix.

Ambiguous judgment

“P21 is 17/15 if, for example, x1, x2, x3, x4, and x6 are True, which is True. Mathematically incorrect.”
Yes, that’s the important part (I forgot to emphasize).
- The original equation and the weighted sum equation are not equivalent.
However, when trying to use machine learning, we do not understand “what kind of conditionals will create customer value” in the first place.
- Therefore, “being equivalent to the original formula” has no customer value.
Let go of the need to describe things logically and properly, and instead express them in a messy and ambiguous conditional expression, and test them on real data, not logic, to see if they work properly. This is the basic stance of machine learning.

Supplement (advertisement)

I forgot to specify and emphasize that “It is important to observe the data.
PyQ’s “Machine Learning for Beginners,” which I supervised, explains everything from observing data and making if statements to logistic regression, step by step, in small steps.
Let’s learn the basics of machine learning from if statements.
Let’s find the thresholds.
Visualize and find thresholds
How to handle data for which thresholds cannot be determined
Classification from 2D data
Plot 2D data
Classification using linear equations
Machine Learning for Beginners: Logistic Regression

This page is auto-translated from /nishio/if文から機械学習への道 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

From if statements to machine learning

Graph View

Backlinks