Difference between revisions of "PRE2017 4 Groep7"
(Test file upload)
|Line 240:||Line 240:|
===== Keyword based filtering =====
===== Keyword based filtering =====
The first method is keyword based filtering. This method makes use of a predefined list of important keywords. Every message is checked and given a score on how many important keywords are in that message. When a message has a higher score than a certain threshold the message will be placed in the group important messages. This is
The first method is keyword based filtering. This method makes use of a predefined list of important keywords. Every message is checked and given a score on how many important keywords are in that message. When a message has a higher score than a certain threshold the message will be placed in the group important messages
===== Recurrent neural networks =====
===== Recurrent neural networks =====
Revision as of 15:37, 13 May 2018
0LAUK0 - Group 7
- Bas Voermans | 0967153
- Julian Smits | 0995642
- Tijn Centen | 1006867
- Bart van Schooten | 0999971
- Jodi Grooteman | 1006743
- Emre Aydogan | 0902742
The planning can be found here:
A Personal assistant (PA) works closely with a person to provide administrative support, this support is usually delivered on a one-to-one basis. A PA helps a person to make the best use of their time because they limit the time spent on secretarial and administrative tasks. unfortunately having the luxury of a personal assistant is reserved for the rich and successful only, this is because of the one-to-one nature and the extensive knowledge usually required to perform PA tasks successfully. Less fortunate people have to take on these tasks themselves costing them time which could be spent on their core business, which can lead to stress and discomfort. The goal of this study is to map current novelties in this field, find new areas of improvement from them and (hopefully) making a prototype, incorporating these improvements, such that it is able to handle almost all the tasks of a PA. Also this prototype would improve itself by learning what each specific user expects from a PA.
Who are the users?
The users that this research is meant for the users that have to weed through countless notifications while deciding what is important to them and what is not. Hence users that deal with many of these notifications are our main goal. This research will focus mainly on the student user group, which makes it easier to define the needs and requirements of this group since this research is familiar with this group.
Requirements of the users
- The system should run on pre-owned devices
- The system should filter important information out of incoming messages.
- The system should tune its intrusiveness based on the users feedback.
In this chapter we look at the potential impact of the product of our research. If our product fully works and solves the problem described in the problem description, it can have a great impact on the users of the product and the society as a whole. Beneath is described what impact our product can have on the users, society, possible relevant enterprises and the economy.
The users of the product will, as described above, primaliary be students, but it can also be extended to anybody with a smartphone who receives more messages than desired but does not want to miss out on any potentially important messages. When a person no longer has to spend time on reading all seemingly unimportant messages or scan through them looking for important messages, they will have more time to spend on things they want to spend their time on. This is a positive effect of our product as this allows the user to focus on their core business. However, our product might also have different effects on the user. Scanning texts messages or text in general for relevant information can be a valuable skill to have, as it has also applications in other scenarios, such as scanning scientific articles or reports for important information. When an AI takes care of this tasks, users might lose this skill. This might hinder them in the other scenarios as described above, where the AI possible can not help them find the important information. Another negative consequence might occur when the AI does not work perfect, but the user trusts it to work perfect. In this scenario the user might miss an important message, which can have quite some consequences. In a work environment this can mean that the user does not get informed about a (changed) deadline or meeting. In a social environment this can lead to irritation or even a quarrel.
When we are talking about society, we are talking about all people - users and non-users of the product - combined and everything included that comes with that. To look at what impact our product might have on the society, we look at how relations between individuals chance, as well as how the entire society together behaves. The consequences described above can be extended to a society level. If people become more productive as described above, it certainly would benefit society, as more can be accomplished. The fact that people might lose the ability to quickly scan text to find important information can also have an impact on society. If an entire generation grows up like this, there will also nobody to teach it to younger generations, meaning that society as a whole will lose this skill. Now it can be questioned how relevant such a skill might still be in future society, but its a loss nonetheless. Another thing that might occur when a large public uses our product is that nobody longer reads all the seemingly unimportant messages. If nobody reads them anymore, those who write those messages will probably stop doing so, removing the purpose of our product.
Possible relevant enterprises might be those who are interested to buy our product. This could be either a company like WhatsApp themselves, who want to integrate it in their application themselves, or a third party that wants to publice it as an application on its own. The companies, especially a third party, would want to make profit of such an application. companies like WhatsApp could offer it as a free service to make sure users keep using their application and possible attract new users. Third party companies can not do this and would need to find another way to make profit of the application. An easy solution for this seems to make the application not free of charge.
Our product will reduce costs for users. A lot of people do not have time or do not want to filter the most important information themselfs. For this they can use a personal assistant to take over this task. But our software will be less expensive than a personal assistant. This will save money. A disadvantage of this is that personal assistants will have less work. If people use our software instead of a personal assistant for this particular task, personal assistants are not needed for this task anymore. This causes that there is less work for personal assistants.
To start of, research to the state-of-the-art will be done to acquire the knowledge to do a good study on what the desired product should be. Next an analysis will be made concerning the User, Society and Enterprise (USE) aspects with the coupled advantages and disadvantages. At this point the description of the prototype will be worked out in detail and the prototype will start to be build. At the same time research will be done to analyse the different approaches of filtering the incoming messages and the impact they give. The results of the research will be implemented in the prototype. When the prototype is complete, the goal of the project will be reflected upon and some more improvements of the prototype can be made.
State of the art research
Understanding adoption of intelligent personal assistants: A parasocial relationship perspective
The article is about intelligent personal assistants (IPA’s). IPA’s help for example with sending text messages, setting alarms, planning schedules, and ordering food. In the article is a review of existing literature on intelligent home assistants given. The writers say that they don’t know a study that analyzes factors affecting intentions to use IPA’s. They only know a few studies that have investigated user satisfaction with IPA’s. Furthermore is the parasocial relationship (PSR) theory presented. This theory says that a person responds to a character “similarly to how they feel, think and behave in real-life encounters” even though the character appears only on TV, according to the article. Lastly is there a lot about the study in the article. The hypotheses of this study are: H1. Task attraction perceived by a user of an IPA will have a positive influence on his or her PSR with the IPA. H2. Task attraction perceived by a user of an IPA will have a positive influence on his or her satisfaction with the IPA. H3. Social attraction perceived by a user of an IPA will have a positive influence on his or her PSR with the IPA. H4. Physical attraction perceived by a user of an IPA will have a positive influence on his or her PSR with the IPA. H5. Security/privacy risk perceived by a user of an IPA will have a negative influence on his or her PSR with the IPA. H6. A person’s PSR with an IPA will have a positive influence on his or her satisfaction with the IPA. H7. A person’s satisfaction with an IPA will have a positive influence on his or her continuance intention toward the IPA.
Personal assistant for your emails streamlines your life
This article is about GmailValet, which is a personal assistant for emails. Normally is a personal assistant for turning an overflowing inbox into a to-do list only a luxury of the corporate elite. But the developers of GmailValet wanted to make this also affordable for less then $2 a day.
This article is about “Everyone’s Assistant”, which is a California based service company for personal assistant services in Los Angeles and surrounding areas. The company makes personal assistant service affordable and accessible for everyone. The personal assistants cost $25 a hour and can be booked the same day or for future services.
Experience With a Learning Personal Assistant
This article is about the potential of machine learning when it comes to personal software assistants. So the automatic creating and maintaining of customized knowledge. A particular learning assistant is a calancer manager what is calles Calendar APprentice (CAP). This assistant learns by experience what the user scheduling preferences are.
SwiftFile: An Intelligent Assistant for Organizing E-Mail
This article is about SwiftFile, which is an intelligent assistant for organizing e-mail. It helps by classifying email by predicting the three folders that are most likely to be correct. It also provides shortcut buttons which makes selecting between folders faster.
An intelligent personal assistant robot: BoBi secretary
This article is about an intelligent robot with the name BoBi secretary. Closed it is a box with the size of a smart phone, but it can be transformed to a movable robot. The robot can entertain but can also do all the work a secretary does. The three main functions are: intelligent meeting recording, multilingual interpretation and reading papers.
RADAR: A Personal Assistant that Learns to Reduce Email Overload
This article discusses artificial learning agents that manage an email system. The problem described in the article is that overload of email causes stress and discomfort. A big question remains that it is not sure whether or not the user will accept an agent managing their email system. Nevertheless the agent improved really fast and improved the productivity of the user.
Intelligent Personal Assistant — Implementation
This article does research to the best and most promising current Agents used by the major companies such as apple and microsoft. The conclusion of this paper states that cortana is currently the best working agent in assisting the user.
Intelligent Personal Assistant
This article is about the current by speech driven agents that perform tasks for the user. In the paper this communication would become bi-directional and therefore will the agent respond back to the user. It will also store user preferences to have a better learning capacity
Voice mail system with personal assistant provisioning
A patent that describes a PA that can be used to keep track of address books and to make predictions on what the user wants to do. The patent also suggests text-to-speech so that the user can listen to, rather than read the response. The PA should also remember previous commands and respond accordingly on related follow-up commands.
USER MODEL OF A PERSONAL ASSISTANT IN COLLABORATIVE DESIGN ENVIRONMENTS
The article is about creating models of the users of PA’s and the different domains associated to the user and the PA. The article suggests four different user models, user interest model, user behavior model, inference component and collaboration component. According to the article the user should have the right to change the user model, since ‘the user model can be more accurate with the aid of the user.’ Two approaches are through periodically promoted dialogs or by giving the user the final word.
A Personal Email Assistant
The paper is about Personal Email Assistants (PEA) that have the ability of processing emails with the help of machine-learning. The assistant can be used in multiple different email systems. Some key features of the PEA described in the paper are: smart vacation responder, junk mail filter and prioritization. The team members of the paper found the PEA good enough to be used in daily life.
Rapid development of virtual personal assistant applications
This patent is about creating a platform for development of a virtual personal assistant (VPA). The patent works by having three ‘layers’, first the user interface that interacts with the user. Next is the VPA engine that analyses the user intent and also generates outputs. The last layer is the domain layer that contains domain specific components like grammar or language.
A Softbot-Based Interface to the Internet
The article describes an early version of a PA that is able to interact with files, search databases and interact with other programs. The interface for the Softbot is build on four ideas: Goal oriented, Charitable, Balanced and Integrated. Furthermore, different modules could be created to communicate with the softbot in different ways, like speech or writing.
Socially-Aware Animated Intelligent Personal Assistant Agent
The article describes a Socially-Aware Robot Assistant (SARA) that is able to analyse the user in other ways than normal input, for example the visual, vocal and verbal behaviours. By analysing these behaviours SARA is able to have its own visual, vocal and verbal behaviours. The goal of SARA is to create a personalized PA that, in case of the article, can make recommendations to the visitors of an event.
JarPi: A low-cost raspberry pi based personal assistant for small-scale fishermen
This article describes how fisherman can also have a form of a personal assistant, that keeps track of the weather and current position on the sea. Normally such systems are really expensive and not available for small-scale fisherman, but using cheap technology such as the raspberry pi a great alternative can be created.
Solution to abbreviated words in text messaging for personal assistant application
This article describes how a personal assistant that reads incoming text messages such as SMS-messages can handle abbreviations, which are commonly used in text based messaging. The study was performed with abbreviations common in the Indonesian language, based on a survey.
A voice-controlled personal assistant robot
This article described the design and testing of a voice controlled physical personal assistant robot. commands can be given via a smartphone to the robot, which can perform various tasks.
Management Information Systems in Knowledge Economy
AI Personal Assistants: How will they change our lives�
How artificial intelligence will redefine management
How can AI transform public administration?
Extra Bronnen - Spam Filters/Machine Learning
Intellert: a novel approach for content-priority based message filtering
This article described how filtering text based on its content and keywords leads to great reduction in the amount of notification that has to be send, by only sending those messages that are marked urgent or important. The results look promising.
Content-based SMS spam filtering based on the Scaled Conjugate Gradient backpropagation algorithm
Classification of english phrases and SMS text messages using Bayes and Support Vector Machine classifiers
Generative and Discriminative Text Classification with Recurrent Neural Networks
This article analyses the difference between discriminative and generative Recurrent Neural Networks (RNN) for text classification. The authors find that the generative model is more effective most of the time, while it does have a higher error rate. The generative model is especially effective for zero-shot learning, which is about applying knowledge from different tasks to tasks that the model did not see before. The discriminative model is more effective on larger datasets. The datasets that are tested range from two to fourteen classifications.
SMS spam filtering and thread identification using bi-level text classification and clustering techniques
The problem that this article is addressing is the large amount of sms messages that are sent and that identifying spam or threads in these messages is difficult. First the spam is classified, which could be done with one of four popular text classifiers, NB, SVM, LDA and NMF. These are all binary classification algorithms that either work with hyper planes, matrices or probabilities to split up the classes. Next, the clustering is applied to construct the sms threads, which is done by either the K-means algorithm or NMF. The results of the article are that the choice of the algorithms is very important. The algorithms used in the experiment are SVM classification and NMF clustering which give good results.
Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks
This article is about creating a spam filter with the help of a Recurrent Neural Network. The spam filter is intended for both SMS and email. The network is tested on four spam datasets, Enron, SpamAssassin, SMS and Social Networking. The experiment starts by pre-processing the datasets such that there are only lower cases, no special characters and no stop words, since these contain no semantic information. The results of the experiment are compared with the following spam filters, Minimum description length, Factorial design analysis using SVM and NB, Incremental Learning, Random Forest, Voting and CNN. The results of the experiment are that the model is better on three of the four datasets by a small amount and the accuracy is around 98% for the three and 92% for the last one.
A Comparative Study on Feature Selection in Text Categorization
This article researches five different techniques to categorize text.
A Learning Personal Agent for Text Filtering and Notification
This article is about an agent that is used for managing notifications. This agent acts as a personal assistant. This agent learns the model of the user preferences in order to notify a user when relevant information becomes available.
Combining Collaborative Filtering with Personal Agents for Better Recommendations
This article is about information filtering agents that identify which item a user finds worthwhile. This paper shows that Collaborative filtering can be used to combine personal Information filtering agents to produce better recommendations.
Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. 
This article is about anti-spam filters by using machine-learning and calculation of word weights. This categorizes spam and non-spam messages. This categorizing is more and more difficult because spammers use more legitimate words.
Robust personalizable spam filtering via local and global discrimination modeling
There are two options of filtering: a single global filter for all users or a personalized filter for each user. In this article a personalized filter is presented and the challenges of it. They also present a strategy to personalize a global filter.
Mail server probability spam filter
This article is about a spam filter that uses a white list, black list, probability filter and keyword filter. The probability filter uses a general mail corpus and a general spam corpus to calculate the probability that the email is a spam.
The Art and Science of how spam filters work
This article explains the principle of blacklists which analysis the header of a message to determine whether something is spam. Also messages that contain statistically dangerous files, such as .exe files, are often automatically blocked by content filters. The article end with a piece about Machine Learning in spam filters. Algorithms used in these filters try to find similar characteristics found in spam.
The Effects of Different Bayesian Poison Methods on the Quality of the Bayesian Spam Filter ‘SpamBayes’
This article discusses how spammers try to elude spam filters. The principle works as follows: add a few words that are more likely to appear in non-spam messages in order to trick spam filters in believing the message is legitimate. This article illustrates that even spam evolves, and as a result filters have to evolve with them.
A review of machine learning approaches to Spam filtering
This paper presents a review of currently existing approaches to spam filtering and how the researchers believe we could improve certain methods.
To get a good understanding of what kind of prototype is required for the described problem and the given user, a concrete goal needs to be described that will fulfill a good selection of the user requirements described in the section above. After a concrete goal is described a prototype design needs to be created to solve the problem described in the problem statement.
The goal that the prototype should fulfill is dependent on the user requirements that have been described. Since it is not possible to create a prototype that is able to achieve all requirements in the current planning, a selection of important requirements will be chosen that are to be implemented in the prototype. The rest of the requirements are going to be analysed and researched in a written manner to still be able to give insights in their importance to the user.
The requirements that are chosen for the prototype are the following:
- The system should run on pre-owned devices
- The system should filter important information out of incoming messages
- The system should tune its intrusiveness based on the users feedback
So the prototype will become a software module that can be implemented by existing messaging applications like Whatsapp, telegram or other messaging applications that can be used to send and receive messages between a large group of people. The module will output a binary value depending on whether the message is important or unimportant. To determine the grandiosity the system should base its reasoning on feedback that the user gives during setup or usage of the application.
To achieve the goal described above, two prototype design variations will be created to be able to analyse their effectiveness. The first variation will be using keyword based filtering which has the advantage of having an understandable filtering process, since the keywords support the reasoning. The second variation will be using machine learning in the form of a recurrent neural network (RNN), which is often used for text based machine learning. These two subsystems will be integrated in a larger system that also involves the removal of clearly identifiable spam and the coupling of closely related messages in the form of threads.
Input Output interface
The required input for the filtering module should be as abstract as possible to support as many different messaging applications as possible. However, there should be consistency in the input format. Not only the message itself is important, but also the metadata like the date and time, the sender, whether a message is a response to a different message and whether any media like images is coupled with the message. The prototype will not be able to analyse any coupled media but the information of media being present can still be useful for filtering. Messages are inputted in batches, just like they are for unread notifications. The messages in a batch should all come from the same group chat since the messages could be coupled with each other. The module will then process this batch without taking other batches into account. The output of the filtering module will be a boolean value indicating for every individual message, whether the message should be shown to the user or should be discarded.
The first step to start analyzing the messages is to filter the spam out of the messages. The purpose of this is to cut out the messages that don’t really have an influence on the context. For example the smiley’s are mostly not important. Therefore when there is a message with only smiley’s the program can categorize this as spam and thus filter it out. In this part the message is clearly looked at from a point that it only looks at what the actual text of a message is. To give an example, a message with a strange combination of letters would be filtered out. Thus the program does not pay attention to the meaning of a message but to the actual content of that particular message. Filtering out the spam before analyzing is important because we don’t have to analyze messages that have no influence in the first place.
After the clearly identifiable spam messages have been discarded, the remaining messages can be coupled together in so called threads. This is done to retain important information that could be spread over multiple messages. A factor that could indicate a thread is for example the time of sending the messages, since messages sent in a short timespan will most likely involve the same subject. Another factor is the person that sends the messages, since information is most of the time coming from one person and is intended for all the others. The last factor is when a message is a reply on a different message. This is a feature that some messaging applications support and will link the messages that is being replied on to the new message. These two linked messages most likely need to be coupled together.
These coupled messages are then combined in such a way that the filtering in the next step will take the combined messages into account before determining the importance of the message.
Now the program starts with categorizing the coupled messages in two groups. The first group is the important messages and the second are the unimportant messages. There are two different ways of doing this. Namely Keyword based filtering and Recurrent neural networks.
Keyword based filtering
The first method is keyword based filtering. This method makes use of a predefined list of important keywords. Every message is checked and given a score on how many important keywords are in that message. When a message has a higher score than a certain threshold the message will be placed in the group important messages.
Evaluation of messages will be done in a few steps. First the program checks if the message has one of the following words: Who, what, when, where. By checking these words the program already gains a lot of information about the message. The next step is to analyze what kind of word is stated after one of the W words. For example when there is a sentence that ends with who. It might not be as important as a sentence that starts with who. This is because the sentence that ends with who is not a question and thus might not have a much meaning as the other one. Also the message that ends with Who is not grammatically correct. This indicates that it has a low priority. In addition to that the length of the message is taken into account. The longer the message the more important it is most of the time.
Recurrent neural networks
The second method is recurrent neural networks. This method uses learning to categorize messages. Therefore it needs training. There are two ways of obtaining this training. The first one is to analyze messages by hand and use this to train the neural network. The second one is to give a set of messages to the user of our product and let the user categorize these messages. This creates personalized test data for all the users and thus will the neural network also be a personalized to a user when using this test data to learn. Combining these two methods of creating training data is the best thing to do. This is because then the neural network can have more training and it is not fully personalized. The fact that it is not fully personalized is a good thing because the user would otherwise fully rely on his categorization. When the user would not be able to categorize the messages the program would perform bad. Now with using both training data sources we optimize the program. Using a neural network gives a certain percentage of correct categorized messages. There option is there to make the user give a percentage to the program and that it keeps learning until this percentage is reached.
Both options of categorizing the messages are possible. Perhaps one works better than the other, or they can be combined to make the perfect program.
- ↑ https://www.emeraldinsight.com/doi/full/10.1108/IMDS-05-2017-0214
- ↑ https://www.sciencedirect.com/science/article/pii/S0262407913600925
- ↑ https://search.proquest.com/docview/1704945627/50DE051B4E904379PQ/4?accountid=27128
- ↑ https://www.ri.cmu.edu/pub_files/pub1/mitchell_tom_1994_2/mitchell_tom_1994_2.pdf
- ↑ http://www.aaai.org/Papers/Symposia/Spring/2000/SS-00-01/SS00-01-023.pdf
- ↑ https://ieeexplore.ieee.org/document/8273196/
- ↑ http://www.aaai.org/Papers/Workshops/2008/WS-08-04/WS08-04-004.pdf
- ↑ https://link.springer.com/chapter/10.1007/978-1-85233-842-8_10
- ↑ http://ijifr.com/pdfsave/10-05-2017475IJIFR-V4-E8-060.pdf
- ↑ https://patents.google.com/patent/US6792082B1/en
- ↑ http://papers.cumincad.org/data/works/att/d5b5.content.06921.pdf
- ↑ https://s3.amazonaws.com/academia.edu.documents/39232090/0c9605226714eeedad000000.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1524910399&Signature=BjYwD6z0FRQMoazG%2BSJHjHc8nZQ%3D&response-content-disposition=inline%3B%20filename%3DA_personal_email_assistant.pdf
- ↑ https://patents.google.com/patent/US20140337814A1/en
- ↑ https://homes.cs.washington.edu/~weld/papers/cacm.pdf
- ↑ http://www.aclweb.org/anthology/W16-3628
- ↑ https://ieeexplore.ieee.org/document/8279618/
- ↑ https://ieeexplore.ieee.org/document/8251876/
- ↑ https://ieeexplore.ieee.org/document/7150798/
- ↑ https://books.google.nl/books?id=sRqlLLOboagC&lpg=PA246&dq=artificial%20intelligence%20personal%20assistant%20administrative%20tasks&hl=nl&pg=PR2#v=onepage&q=artificial%20intelligence%20personal%20assistant%20administrative%20tasks&f=false
- ↑ https://www.fungglobalretailtech.com/research/ai-personal-assistants-will-change-lives/
- ↑ https://hbr.org/2016/11/how-artificial-intelligence-will-redefine-management
- ↑ http://www.icdk.us/aai/public_administration
- ↑ https://ieeexplore.ieee.org/document/7940206/
- ↑ https://ieeexplore.ieee.org/document/7382023/
- ↑ https://ieeexplore.ieee.org/document/5090166/
- ↑ https://arxiv.org/abs/1703.01898
- ↑ http://journals.sagepub.com/doi/pdf/10.1177/0165551515616310
- ↑ https://link.springer.com/content/pdf/10.1007/s10489-018-1161-y.pdf
- ↑ http://www.surdeanu.info/mihai/teaching/ista555-spring15/readings/yang97comparative.pdf
- ↑ http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=B2FB1F51D66DC0D90FA785BB9D087172?doi=10.1.1.44.562&rep=rep1&type=pdf
- ↑ http://www.aaai.org/Papers/AAAI/1999/AAAI99-063.pdf
- ↑ https://search.proquest.com/docview/2017545987/B4CAA5405B794CA8PQ/1?accountid=27128
- ↑ https://search.proquest.com/docview/1270351132/B4CAA5405B794CA8PQ/3?accountid=27128
- ↑ https://patents.google.com/patent/US7320020B2/en
- ↑ https://securityintelligence.com/the-art-and-science-of-how-spam-filters-work/
- ↑ https://www.cs.ru.nl/bachelorscripties/2009/Martijn_Sprengers___0513288___The_Effects_of_Different_Bayesian_Poison_Methods_on_the_Quality_of_the_Bayesian_Spam_Filter_SpamBayes.pdf
- ↑ https://www.sciencedirect.com/science/article/pii/S095741740900181X