Computer scientist looks for ways of fraud prevention

Jeremy Wendt, Sandia National Laboratories computer science researcher, focuses on working on a program to uncover probable preys of nefarious emails.

The weakest link in many computer networks is a gullible human. With that in mind, Sandia National Laboratories computer science researcher Jeremy Wendt wants to figure out how to recognize potential targets of nefarious emails and put them on their guard. – Phys. Org

He aims to trim down the figure of visitors that cyber analysts have to check as possible bad guys among thousands who search Sandia websites each day.

Spear phishing, sending an email to thousands of addresses or a specific address with the goal of someone will fall for a scam and follow the link they provided, in hopes to lure them in giving out banking details, is what he really wanted to spot.

Wendt has developed algorithms that separate robotic web crawlers from people using browsers. As believed by Wendt, this will allow analysts to look at groups separately and as a result will improve security and will help as scam watch.

Wendt said, even if an outsider gets into a Sandia machine that doesn’t have much information, that access makes it easier to get into another machine that may have something.

“Spear phishing is scary because as long as you have people using computers, they might be fooled into opening something they shouldn’t,” he said.

Identifying malicious intent

The ability to identify the possible intent to send malicious content might enable security experts to raise awareness in a potential target, said Sandia cyber security’s Roger Suppona.  “More importantly, we might be able to provide specifics that would be far more helpful in elevating awareness than would a generic admonition to be suspicious of incoming email or other messages,” he said.

Wendt presented his work at a Sandia poster session in the final stretch of a two-year Early Career Laboratory Directed Research and Development grant

Wendt has examined the behaviors of web crawlers vs. browsers, he did this to identify if it matches to how computers identify themselves when asking for a webpage.  A specific version HTML or HyperText Markup Language the main language for displaying webpages , can generally be interpreted by Browsers and often provides browser and operating system information.  Crawlers on the other hand identify themselves by program name and version number.  “Nulls”, as Wendt labels, a small number that offers no identification.  The reason though is unknown; maybe the programmer did not include that information on purpose or perhaps because someone wants to hide.

Wendt aims to see a computer that is not identifying itself or says the a thing but rather behaves like another which in turns trick websites in which common visitors pays a little attention or not at all interested.

A log of the search is created whenever you go to Internet sites.  Sandia traffic is about evenly divided between web crawlers and browsers.  Browsers concentrate on one place, such as jobs while Crawlers tend to go all over.

“When we get crawled by a Google bot, we aren’t being crawled by one visitor, we’re being crawled by several hundreds or thousands of different IP addresses,” Wendt said. Crawlers, also known as bots, are automated and follow links like Google or Bing do.  An IP or Internet Protocol address is a numerical label assigned to devices on a computer network.  It identifies the machine and its location.


Distinguishing bots and browsers

Wendt find ways to measure behavior since he wants to identify bots from browsing even though without trusting who they say they are.

The primary measurement deals with the verity bots try to index a website.  Crawler looks for pages associated with those words once you type in search words, ignoring how they’re arranged on a page, meaning a bot pulls down HTML files far more often.

Wendt initially examined at HTML downloads.  Bots should have a high percentage while browsers pull down smaller percentages.

Typical bot behavior, more than 90 percent of the nulls pulled down nothing but HTML.

A sole measurement wasn’t adequate, thus Wendt devised a second based on another marker of bot behavior: politeness.

Bots could suck down webpages from a server so fast it would shut down the server to anyone else, he said. That might prompt the site administrator to block them.

So bots take turns. “They say, ‘Hey, give me a page,’ then they may crawl a thousand other sites taking one page from each,” Wendt said. “Or they might just sit there spinning their wheels for a second, waiting, and then they’ll say, ‘Hey, give me another page.’”

Some behavior is ‘bursty’

Browsers go after only one page except it wants all images, code, and layout files for it right away. “I call that a burst,” he said. “A browser is bursty; a crawler is not bursty.” Bursts equal a certain number of visits within a certain number of seconds.

Ninety percent of declared bots had no bursts and none had a high burst ratio while sixty percent of nulls also had no bursts, lending credibility to Wendt’s classification of them as bots.

The other forty percent demonstrate some bursty behavior, which makes it hard to separate from browsers. Nevertheless, normal browser behavior also falls within set parameters.  Most nulls fell outside those parameters when Wendt combined both metrics.

That left browsers who behaved like bots. “Now, are all these people lying to me? No. There could be reasons somebody would fall into this category and still be a browser,” he said. “But it distinctly increases suspicions.”

IP addresses can change unlike physical addresses, so he looked for this.  Example, you plug your laptop into the Internet at a coffee shop this assigns you an IP address.  When done, someone else will use the IP address after you making IP address doesn’t necessarily distinguish users.

Another identifier: a fastidious browser on a specific operating system, leads to a user agent string—there are thousands of distinct strings.

IP addresses and user agent strings can smash together, but Wendt said odds are dramatically lower that two people will collide on the same IP address and user agent string within a short period such as a day. That led to a conclusion that they’re probably different people.

He needs to connect the gap between splitting groups and identifying targets of malicious emails.  He is now working on going further on his research; he has submitted proposals to do this, after the current funding ends this spring.

“The problem is significant,” he said. “Humans are one of the best avenues for entering a secure network.”



Responses are currently closed, but you can trackback from your own site.

Comments are closed.