What Is Crawling & How The Search Engine Bots Work Whole Together?

We all know what Search engines are, but very few of us know about how the search engines Bots work. It is important to know about search engine functionalities. So, this blog intends to give you all the details about search engine bots and for this, we are going to discuss following points;

  1. What is crawling? Why is it a prime function of the search engine?
  2. How does the Search Engine Bots work efficiently?

So, let’s address the very first point, i.e.

What is crawling? Why is it the prime function of a search engine?

Crawling is where the acquisitions of any type of data about a website begin. The advertisements are present, it includes the scanning of all websites, gathering all the details about each and every page like relevant images, wherever the links are being crammed, page layouts, etc.

Let’s know how a website is crawled;

When an automated bot, that is also known as ‘spider’ visits pages back to back as it keeps on responding, with help of links that are used to identify about where we should go next. The search engine giant Google’s spiders have the capability to read n-number of thousands of pages every second.

Therefore, whenever the web crawler visits any of the pages, it gathers all the links present on that page and then keeps on adding all of them to the list that it has to visit immediately after this visit. Then move on to next pages added to the list, gathers all the links again and repeat the process. Web crawlers tend to revisit last pages that it already visited to see if everything is fine or anything is changed over there.

What we mean to say is that any of the sites that are already linked from an indexed site will always be crawled. You can also see crawling variations, like some sites, are crawled frequently and few are crawled less, some are crawled to the deeper depths.

Example- Consider the world wide web as a chain of showrooms in a mall.

Well, you can understand crawling in a better way through this example. Now each showroom is a unique document like a web page, JPG, PDF, etc. So, what search engines have to do is that it needs a way out to “Crawl” entire mall in order to find out all the shops on the way, so they choose the different ‘links’ as a best-possible path.

  • Crawling and Indexing- Both of them provides millions of pages, documents, files, videos, news, media, etc. on the World Wide Web.
  • Providing answers- Answering users queries mostly through the long list of pages (mostly relevant pages) that is being searched and ranked consequently for their relevancy.

How does the Search Engine Bots work efficiently?

The structure of web links aims to keep all pages bound together.

  • Millions of links present to enable the automated robots of search engines (usually named as ‘crawlers’ or ‘spiders’, in reaching a vast number of interconnected documents available over the web.
  • So, once the search engines find out all of these pages, they take the code from the same and store few chosen pieces them appropriately in the wide databases, so that it can be recalled any time in the later stages in case of search queries.
  • This can be done in a fraction of seconds because the search engines have created huge data-centers worldwide.
  • This complex storage facility holds a wide number of machines that process an unlimited amount of information within a fraction of seconds.

When an individual does any kind of search in Google or any other search engines, it can show the result instantly. The user can feel dissatisfied even it there is the delay of 1-2 seconds, hence, it has to be fastest possible for responding to users’ queries.

Finding answers in search engines-

To respond instantly, the search engines must be unbelievably fast and to find any information instantly, it has to go through millions of web pages within a fraction of seconds. For this, the search engine uses specific software robots and these robots are named as “spiders”. The spiders are used to build a long list of web results. So, when the spider builds this list of words present in websites, the whole process is known as ‘Web crawling’. When the spider initiates the work with the most popular website, it indexes the words, follows every link on the website, it travels as much as possible to find answers.

The spiders start its search from the most used servers and most popular pages. i.e it starts with popular and relevant pages.

The initiation of web search by spiders

Initially, Google has a specific server for availing URLs to the spiders. So instead of relying on internet service provider for DNS, that deals with the translation of server’s name into a specific address Google has a dedicated DNS for them to minimize any delay.

The Google spider focused on 2 things when it referred the HTML page, i.e.

  • The words present in a particular page
  • And where these words were located.

The Google spider was designed in such a way that it was able to index all the words on a page, leaving just ‘a’, ‘an’ and ‘the’. Initially, each spider was capable of keeping around 300 connections to Web pages open at the same time. Later on, it increases and multiple spiders are being used during a single search. Other search engine’s spiders may take a differing approach.  We also have other systems like ‘Alta Vista’ that goes in a different direction whole together and indexes all words including a, an, the, etc.

Over the years, our smart engineers have implemented many better ways to find précised results for any kind of queries. The engines use a vast number of algorithms (mathematical equations) to sort the relevant results and then to rank those results in according to their popularity.

Conclusion

Search Engines are a blessing because they answer every query of users, that too within a fraction of seconds. It is essential and interesting to know how these search engine bots work. It is a complex system that is hard to understand by everyone and definitely not as easy as it works for us in finding the search results quickly.

The post What Is Crawling & How The Search Engine Bots Work Whole Together? appeared first on eLearning.

Should I Open Source Captivate Customizations?

Hello Community,

In my previous blog I mentioned how I am using Captivate from a web developer’s point of view.  As I move more in this direction, focusing on mobile and measuring learning with Google Analytics, the tools are getting more complicated and exciting.

For example:

  • I am successfully getting PhoneGap plugins to work when you “publish for devices” to PhoneGap Build.
  • I am working on getting Firebase Authentication to function so a user can log in with Google, Facebook etc., so we can store custom variables on Google Firestore and retrieve them across multiple devices.
  • I will be offering these Captivate generated Apps via the app stores for download to mobile devices.
  • Updates to content will be automatic, because I am hosting mobile friendly web pages on Google Cloud Platform.  The are wrapped in Captivate with web objects.

Here are my questions:  If I publish all these customizations on Github with an open source license, would you as community members be interested in contributing to the project?  Can you help me maintain the github repo? Do you see any drawbacks to sharing this information?  Would you be interested in collaborating?

Thanks,

Brian

 

Reflections on Adobe eLearning Conference 2017

This was the first time I participated in Adobe eLearning Conference held in Washington D.C. and am I glad that I did so. I understand this was the second time this conference was organized. If I’m not wrong, close to 600 people attended this one, in contrast to about 200 last year (I’m told).

I’m someone with one foot in online learning space and the other firmly planted in technical writing. I believe that, no matter how you slice it, creating technical “documentation” is, in essence, teaching and learning technical information. So in that sense I see a great overlap between the eLearning and TechDoc spaces. That’s why I attended this great conference as a Captivate and Presenter Express user.

First off, I was very happy to meet in person some of the people that I’ve been following for a while on the Internet; truly exceptional elearning professionals like Paul Wilson, Joe Ganci, Mark Lassoff, Anita Horsley, Phil Cowcill, Nancy Reyes, Damien Bruyndonckx, Dr. Pooja Jaisingh, Dr. Allen Partridge, and others. It was my good luck to end up at the same breakfast table with most of them. BTW, talking about the breakfast – the organizers should be applauded for the super breakfast, lunch, snacks and beverages they’ve provided. Hosting (by Carahsoft) was first class and everything was free as well (including my lovely Adobe t-shirt). Thanks Adobe!

My only problem with this conference was this: there was just too much to learn and attend to since there were five tracks of presentations lasting the whole day. So unfortunately I ended up attending only 20% of what was going on around me. I enjoyed some wonderful presentations about mobile elearning, principles of sound elearning design, how to use videos properly in elarning, how to design a great learning system in this “attention economy,” etc.

The presentation delivered during the closing session by Tridib Roy Chowdhury (G.M. & Sr. Director of Products at Adobe) was very thought-provoking as well.

In his presentation Chowdhury forwarded provocative themes like “How to Out-Google Google.” It was evident that he touched on a current concern (i.e., not enough people are using LMS for their learning) shared by most within the eLearning community but I had and still have my reservations on that score.

The differences between my point of view and that of Chowdhury’s was evident from the very outset when he, in the beginning of his presentation, asked for a show-of-hands about the different venues and sources of learning that the conference participants had used recently. To me it was telling that books were not included in that list! I love things digital obviously and I’m on the Internet most of the day for all kinds of learning and information gathering. But the omission of a source as obvious as the good-old books made me wonder if we were relying on a somewhat lopsided paradigm, with too much emphasis on things digital.

I believe people use different platforms and sources for different kinds of learning. Yes, I do use Google to access quick info on specific inquiries. I do not go to my local library anymore for straight-forward fact finding. But guess what, when I’d like to learn the life of a great scientist, for example, I still search for a good book written by an established writer who did a lot of research on that topic. I do not expect to learn as much and at that depth from Google for that specific mission.

Another example: when I want to learn the best way to visit San Francisco or Paris, I’d check neither Google nor my local library. Instead, I’d call a friend or a relative who had been there. I’d try to learn from them all the things I should pay attention to once I’m there. They would be my first source of information for that specific case. After that I wouldn’t mind checking out Google as well.

Similarly, for me eLearning has a special and very valuable place. When I want to go through a new topic in a step-by-step fashion, with all sub-topics organized from easy to difficult, while interacting with the material and testing myself along the way, I want nothing but a good eLearning module with links, pictures, videos, quizzes, badges, scores, and all that good stuff. eLearning does not need to “out-Google Google” since the two are designed to accomplish different tasks and serve different benefits. I’m still meditating on this issue but so far my conclusions have not changed. I’d like to thank Chowdhury all the same for his seminal presentation even though I do not agree with him totally on all of his points.

I don’t know where the 3rd conference will be held next year but I’ll try to be there. I guess it should not be that hard to find the date and location on the Google

P.S. If you’d like to read a great book on the life of Enrico Fermi, I heartily recommend “The Pope of Physics: Enrico Fermi and the Birth of the Atomic Age” by Bettina Hoerlin and Gino Claudio Segre. When you finish it you will be automatically out-Googling the Google since you can’t read this stuff on Google.

Using custom fonts in Adobe Captivate

The visual impression of layout and typeface can be profound — something that impacts you long before the actual recognition of meaning. Like art, your impression is immediate and yet mostly unconscious.

The use of custom fonts is now commonplace on the web, as a critical element of that visceral first impression. This has been primarily driven by the ability in HTML5 to load fonts from external sources as part of the page, rather than from the operating system. Google provides a vast set of such “web fonts” that can be integrated onto a page with just a few lines of HTML, completely for free.

In HTML5 based Captivate modules, it has been common practice to limit ourselves to the use of fonts guaranteed to be available in all browsers: Ariel, Courier New, Georgia, HelvaticaNeue, Times New Roman, Trebuchet MS, and Verdana. Captivate highlights those fonts as “web fonts” in the text styling menu. The more numerous “System Fonts”, which are listed next are not recommended for web deployment as they only display correctly on systems that have pre-loaded those fonts into the operating system ( Windows/IOS ). Further, until Captivate added the “responsive” design capability, fonts could look terrible when viewed a non-standard screen resolutions. We use the responsive project type exclusively to avoid this problem.

In this short article, we will see how to both display google web fonts during our design process, and ensure that they will be seen by the user. The strategy is remarkably simple. It does not require any “post processing” or modification of the captivate published modules, but does assume that the user is viewing the module on a web browser that has access to the internet. This means that the font data is pulled from Google’s servers at the time the module is loaded.

Loading Google fonts into Captivate

Google fonts can be found here. There are hundreds of fonts to choose from and it is easy to get lost in just poking around the different styles. We will use the beautiful Calligrafitti for this example. After selecting the font, it is possible to open a small window that provides the needed information for linking to, and downloading the font files. To add the font to Captivate, you download the zipped font file and install it as a system font Windows/IOS. In Windows, this simply means unzipping this file, selecting the .ttf files, a right-click to open the ttf file into the font preview window which has a button to install the font.

Google also provides the URL information needed to link to the cloud version of the font in the “import” section after selecting the font to download. Copy this for use later:

<link href="https://fonts.googleapis.com/css?family=Calligraffitti" rel="stylesheet">

The next time Captivate is opened, the Calligraffitti font will show up as a system font. Any text styled with the font will render correctly. Even “previewing” and “publishing” will display the font as it is now installed on your system. This is misleading — if your users don’t have the Calligrafitti font loaded on their system, it will not be seen. It is the next step that is critical for making the font work generally.

Loading Google fonts as part of the module

Ensuring that any user will see this font turns out to be remarkably simple. By adding the following lines of code to the first page of your presentation using “On Enter Execute Javascript”, the browser will load the font directly from Google when the module is opened:

var fontLink = '<link href="https://fonts.googleapis.com/css?family=Calligraffitti" rel="stylesheet">'
$(fontLink).appendTo("head")

Since browsers will cache ( hold on to.. ) font files, the font is retrieved only once from Google’s servers. Note that this technique doesn’t require modifying any of the published files. All of the magic occurs in the javascript.

As you can imagine, you can use this technique to load as many different font types as you like. Any font styling that you see works in Captivate, such as bold or italic, will work as well when published. Although we’ve shown how to load Google fonts, any other cloud based font with a downloadable system font can work the same.


There are a few other cases that can show up:

  • When users are not guaranteed to be connected to the internet when they open the module, or the connection to Google is very slow.
  • You wish to use a font that doesn’t have a corresponding system font that can be loaded into the OS.

In the first case, it is possible to download the web font files and either include them in the captivate module after publication, or in a separate directory on the module’s server. By simply changing the <link> href address, the browser can locate the files. Both of these cases are more “fragile” in that they can depend on the directory structure where the module is loaded.

Using fonts that do not have a corresponding system font

In the second case, although you would not be able to see the font inside captivate, it is actually possible to force a particular shape to display text using a web font that captivate doesn’t have built in. We do this using a “named” shape. In this case, shapes that have a unique ID or “shape name” can be forced to use a specific font. The following code shows how:

var shapeID = "callig";

var fontLink = '<link href="https://fonts.googleapis.com/css?family=Calligraffitti" rel="stylesheet">';
$(fontLink).appendTo("head");

domStyle = $('<style> .' + shapeID + ' .cp-actualText {font-family: "Calligraffitti regular", Calligraffitti !important ;} </style>');
domStyle.appendTo('head');

$('[id^=re-'+shapeID+']').addClass(shapeID);

Any shape that has been given a shape name that starts with the letters “callig” will have this font. This enables multiple shape with names like callig_1, callig_2 etc. Since Captivate tends to create shape IDs using this convention automatically, duplicating shapes will keep the font the same.

Note that this technique is ONLY needed in cases where you wish to use a web-font that either there is no system font you can simply add to Captivate, or you prefer not to.


Extra Credit

For folks curious about why the above code is needed, as it turns out, Captivate doesn’t actually create the text elements until the time they are “played” and the elements themselves carry their font information, so changing the font has to be done in a more indirect way.

This technique first finds the “wrapper” DIV where the text will be placed based by looking at the ID. A custom class is added to the DIV to signal that text in this wrapper will have a different font. Next, a style is added that addresses any text-carrying elements in that class. Text elements in Captivate all have a class “cp-actualText”. In this style, the font is specified and overrides the default font using the “!important” tag.

Is LinkedIn still relevant?

I have a LinkedIn account and profile – here it is: https://uk.linkedin.com/in/davidmhopkins

I think it’s OK – nothing special, nothing outstanding. I’ve put a little effort into making it what it is, making sure it’s up to date, professional, and that I have appropriate and relevant connections. I am fully aware of how this ‘shop window’ into my work can work for or against me at any time, even when I’ve been ignoring it for months on end.

Those who know me will know that I moved from Bournemouth University to the University of Leicester in 2012, and again on to the University of Warwick in 2014. I am certain that online professional persona was used as part of the interview/hiring process (let’s face it, they’d have missed a trick if they didn’t use them!) as well as my CV and application forms – my Twitter feed, my LinkedIn profile, my (under-used) Google+ stream, SlideShare presentations, published books, etc.

This is why it’s important to spend a little time keeping your profile up to date, trim the connections (or not accept those you don’t know in some way), post updates and projects, etc.

This LinkedIn Snakes and Ladders from Sue Beckingham is just perfect for anyone who has a LinkedIn profile, student or staff. Sue makes important suggestions on what will help or hinder your profile, like adding projects, publications, and a professional photo (help) or sharing trivia, posting insensitive or unprofessional updates (hinder).

LinkedIn snakes or ladders? from Sue Beckingham

My question is, do we still need LinkedIn? Are those of use who are active elsewhere (Twitter, FaceBook, Google, blogs, etc.) doing enough already, or do we need this ‘amalgamator’ that is LinkedIn to pull our work together? Do you use LinkedIn to find out about people you encounter?

Note: I don’t use the LinkedIn Premium. Does anyone?

Image source: Patrick Feller (CC BY 2.0)

MOOCs and ‘facilitation’

What are your thoughts on this – moderation and/or facilitation of MOOCs?

Considering the time, effort, and cost of developing these free courses (more information is available here or here or here, among other sources), what are your thoughts on how we manage the course, the comments and discussion during the run, and the subsequent comments and discussion during re-runs?

Do you have support, from technical and/or academic backgrounds monitoring the course to keep comments on track and answer pertinent questions? Are these paid positions or part of their role? Do you actively check the comments? If so, what for, why, and what do you do?

Do you design-in an element of real-time collaboration on the course (facilitation of discussion, round-up videos, Google Hangouts, etc.), and if so are these sustainable over multiple runs of the course? If you’ve done these before, but then designed them out of the course for re-runs, why?

All comments and feedback welcome – I’m trying to understand how we move MOOCs forward and maintain institutional ‘control’ where there is little (financial) reward.

Image source: Greg Johnston (CC BY-NC-ND 2.0)

Reading list: November 14th, 2015

Like many in my line of work (eLearning, Educational or Learning Technology) I read. I read a lot. I read books, articles, blog posts, journals, etc. I sometimes tweet them, I sometimes print them, I sometimes actually finish them. Sometimes I bin them (those are the ones I wished I’d not started).

I’m going to try and keep a diary (a web-log, or blog, if you like) of the more important, or rather more interesting things I read, at or for work. I’ve also decided to up my game regarding my general reading habits … here is my side project of books I’m reading outside of a work setting.

So far this month I’ve been reading, in no particular order: 

What about you, what have you read that has meant something or that you’ve identified with?

Image source: Bernal Saborio (CC BY-SA 2.0)

Every Classroom Matters: How Teachers Can Self-Publish Books #edtechchat

Earlier this year I was invited to share my experiences of self publishing my work as eBooks with Vicki Davis (@coolcatteacher) on the Every Classroom Matters podcast, broadcast through the BAM Radio Network.

David Hopkins is a leading and respected Learning Technologist from the UK. He earned the award of Highly Commended Learning Technologist of the Year from the Association for Learning Technology (ALT) in 2014, and is the author of several books on and around learning technology and understanding the roles of Learning Technologists. His most recently self-published work is ‘The Really Useful #EdTechBook’, which is described as a ’mix of academic, practical and theoretical offerings is a useful recipe book for any Learning Technologist’ by Steve Wheeler, Associate Professor of Learning Technology, University of Plymouth.

In the recording we discuss the process and purpose of writing a book, the details of getting from Word to MOBI or EPUB files, the value and difficulties of different publishing platforms, etc. Here are some links to support it:

I’d be happy to chat and answer any questions you have – leave a comment below, or contact me on Google+ or Twitter.

Ideas for Sketchnotes

I’ve not had many opportunities to sketchnote in the past few months, but I keep the practice up by trying different style, different fonts, different ‘thinking’ and concepts.

But what i love is seeing what other people do and seeing if I can adapt and include in my own ‘repertoire’. So, where can you find ideas … try Google Image search!

A quick and easy search (and very basic) using terms like ‘people sketchnote‘ or ‘font skecthnote‘ or ‘sketchnote writing‘ or ‘sketchnote faces‘ and you’ll get some wonderful results.

Go on, try it and share your sketches:

Gearing up for #ALTC 2015

So, with only two weeks to go before this years ALT conference (ALTC) it’s time to start making sense of the programme and sessions, see what’s happening and when, and then trying to work out how to be in several places at once.

So, after a first pass at the ALTC programme here are my plans, subject to change once I spend more time reading more of the abstracts and changing my mind. I think I may need to compare notes with someone who can get to some of the sessions I miss? 

ALTRC 2015 Programme

Other ways I’m getting ready and gearing up for ALTC is making sure I have the necessary ‘stuff’ around me, and working, now so I won’t be rushing on the days before hand. Perhaps the most important is to have enough power with me for phone and tablet, for this I’ll be taking a wall charger as well as an Anker Astro Mini battery.

For note and sketchnotes I’ll be taking both my old, not quite full notebook I’ve used at previous events and my new ALT Moleskine notebook (thank you ALT!)

As always I’ll really enjoy the sessions as well as catching up with old friends, and making new ones .. and meeting ‘virtual’ friends for the first time. So please come and say hello, either in the sessions or in the down-time between (and at the evening events!)!

Big question .. how many sketchnotes can I get this year? Comments?

Image source: Mike Kniec (CC BY 2.0)