Archive

Author Archive

WR#10 – “Finally Surfing…”

August 12, 2010 Leave a comment

Summer(of code) is almost over, but there are still a couple more weeks
before the studies start. So, while others prefer safari[1] i've decided
to go to sea, have some fun surfing[2], make experiments with wave[3],
and enjoy the rest of it. Although there are still some exceptions[4]
with floating operations[5], basically not everything that floats is a
point, but besides that everything looks fine, plus feeding[6] is not
that bad. For the ones who like site-seeing[7], i've managed to take a
few pix[8]. Probably, they are not the best ones, but at least there are
no watermarks[9].

P.S. Here is a list of things already done, and i plan to spend much
more time on Coral CDN[10], torrents[11], gnutella2[12] and other
things, so it all looks rather like a beginning than an end.

IDFetch progress:
=================

1) Added WebUI[7]

2) Added RSS-feed[6]

3) Added Statistics[13,14]

4) Replaced tuiclient argument: --wait-distfile=df_name
by --wait-distfiles=df_name1,df_name2,...,df_nameN

Tuiclient waits until all distfiles are downloaded.
If at least one of the specified distfiles is not in the queue tuiclient
will exit immediately.

5) Add settings for colors in tuiclient
[colors]
# Define color scheme for tuiclient.
# Available colors are:
# COLOR_BLACK
# COLOR_RED
# COLOR_GREEN
# COLOR_YELLOW
# COLOR_BLUE
# COLOR_MAGENTA
# COLOR_CYAN
# COLOR_WHITE
# Defaults:
# when tuiclient connected to seggetd
color_distfile_added_connected_fg=COLOR_WHITE
color_distfile_added_connected_bg=COLOR_BLACK
color_distfile_waiting_connected_fg=COLOR_BLUE
color_distfile_waiting_connected_bg=COLOR_BLACK
color_distfile_script_rejected_connected_fg=COLOR_YELLOW
color_distfile_script_rejected_connected_bg=COLOR_BLACK
color_distfile_downloading_connected_fg=COLOR_CYAN
color_distfile_downloading_connected_bg=COLOR_BLACK
color_distfile_downloaded_connected_fg=COLOR_GREEN
color_distfile_downloaded_connected_bg=COLOR_BLACK
color_distfile_failed_connected_fg=COLOR_RED
color_distfile_failed_connected_bg=COLOR_BLACK
color_distfiles_window_connected_fg=COLOR_WHITE
color_distfiles_window_connected_bg=COLOR_BLACK
color_scroll_window_connected_fg=COLOR_GREEN
color_scroll_window_connected_bg=COLOR_BLACK
color_downloads_connected_fg=COLOR_WHITE
color_downloads_connected_bg=COLOR_BLACK
color_info_connected_fg=COLOR_BLACK
color_info_connected_bg=COLOR_CYAN
color_status_connected_fg=COLOR_GREEN
color_status_connected_bg=COLOR_BLACK

# when tuiclient disconnected from seggetd
color_distfile_added_disconnected_fg=COLOR_WHITE
color_distfile_added_disconnected_bg=COLOR_BLACK
color_distfile_waiting_disconnected_fg=COLOR_WHITE
color_distfile_waiting_disconnected_bg=COLOR_BLACK
color_distfile_script_rejected_disconnected_fg=COLOR_WHITE
color_distfile_script_rejected_disconnected_bg=COLOR_BLACK
color_distfile_downloading_disconnected_fg=COLOR_WHITE
color_distfile_downloading_disconnected_bg=COLOR_BLACK
color_distfile_downloaded_disconnected_fg=COLOR_WHITE
color_distfile_downloaded_disconnected_bg=COLOR_BLACK
color_distfile_failed_disconnected_fg=COLOR_WHITE
color_distfile_failed_disconnected_bg=COLOR_BLACK
color_distfiles_window_disconnected_fg=COLOR_WHITE
color_distfiles_window_disconnected_bg=COLOR_BLACK
color_scroll_window_disconnected_fg=COLOR_WHITE
color_scroll_window_disconnected_bg=COLOR_BLACK
color_downloads_disconnected_fg=COLOR_BLACK
color_downloads_disconnected_bg=COLOR_WHITE
color_info_disconnected_fg=COLOR_WHITE
color_info_disconnected_bg=COLOR_BLACK
color_status_disconnected_fg=COLOR_BLACK
color_status_disconnected_bg=COLOR_RED

[1] Safari http://en.wikipedia.org/wiki/Safari_(web_browser)
[2] Surfing http://www.infoanarchy.org/en/Anonymous_Web_Surfing
[3] Wave http://wave.google.com/about.html
[4] Exceptions http://en.wikipedia.org/wiki/Exception_handling
[5] Floating operations http://en.wikipedia.org/wiki/Floating_point
[6] Feeding http://idfetch.isgreat.org/_content/webui_demo/rss.html
[7] WebUI demo http://idfetch.isgreat.org/_content/webui_demo/
[8] WebUI pix http://idfetch.isgreat.org/_content/webui_demo/distfiles.html
[9] Watermarks http://en.wikipedia.org/wiki/Digital_watermarking
[10] Coral CDN http://www.coralcdn.org/
[11] Torrents http://en.wikipedia.org/wiki/Torrent_file
[12] Gnutella2 http://en.wikipedia.org/wiki/Gnutella2
[13] Stats http://idfetch.isgreat.org/_content/webui_demo/stats.html
[14] Mirrors stats http://idfetch.isgreat.org/_content/webui_demo/mirrors.html

WR#8-9 – “Once upon a time…”

August 5, 2010 Leave a comment

Once upon a time there were no computers, and nobody knew how a gnome[1]
and dEmons[2] look like. Today even kids know this, but i still bumped
into a problem that i can not see a dEmon. All it started when i was
trying to play "Roshambo game"[3] with segget dEmon.

Firstly, i was trying to fork[4] the curly[5] daemon twice and it kept
punching me in my nose, so i thought my TTL[6] would rapidly decrease.
I understood that it's not such an easy thing to win while fighting with
someone you can not see. And when daemon obtained Python[7-8] support
and started to spawn[9] zombies[10], i've got even more problems.
Conscience was telling me that i must play by the rules, but consciousness
was sure that daemon doesn't always abide the protocol[11]. I've tried
to follow the thread[12-13(from the 3rd minute)], but the dEmon was running like a ghost[13], so i almost got myself lost in the thicket of logs[14] and trees[15] ūüė¶

Anjuta[16] came to my rescue and helped me to improve my tools, so i could
see what the daemon does. Unfortunately, curses[17-18] usually don't work
on dEmons, and i really needed a pure magic to win this game. So, i've
learned: "Mutex"[19], "Rainbow Colors"[20] and some other tricks.

In a meantime i was finding myself knowing more and more about the dEmon,
but this was not enough and i had to prepare good arguments if i were
going to talk to the dEmon. Here they are:

1. For segget daemon:
Command line arguments:
--no-daemon
--conf-dir=specify_conf_dir_here
Arguments are optional. If no arguments provided, segget will run in a daemon
mode and use /etc/seggetd dir to read configuration files.

2. For request tool:
--pkglist-file
E.i.:
$request --pkglist-file=/home/user/mypkg.list

3. For tuiclient:
--wait-distfile=distfile_name
tuiclient checks distfile status, and returns when distfile is downloaded or not in the queue.

Btw, here's features added to segget daemon and tuiclient during this
period of time:

1. DAEMON
=========
1.1. Options:
--------------
Add daemon mode to segget
Add /etc/init.d/seggetd script to start|stop|restart|status segget daemon
Check all set checksums, checksums are optional.
Consider distfile failed if one of its segments is failed.
Fixed: if only local mirrors are available and all of them failed to download
a distfile, distfile still had DWAITING status, because attempt_limit wasn't reached.

Add CoralCDN support as an option to network#.conf files (section [mode])

Add options FOLLOW_LOCATION and MAX_REDIRS  to network#.conf files

SYNOPSIS: FOLLOW_LOCATION= 0 | 1
A parameter set to 1 tells segget to follow any Location: header that the server
sends as part of an HTTP header. This means that the segget will re-send the
same request on the new location and follow new Location: headers all the way
until no more such headers are returned. MAX_REDIRS can be used to limit the
number of redirects segget will follow.
Default:
follow_location=1

MAX_REDIRS
The set number will be the redirection limit. If that many redirections have
been followed, the next redirect will cause an error. This option only makes
sense if the FOLLOW_LOCATION is used at the same time.
Setting the limit to 0 will make segget refuse any redirect.
Minimum value: 0
Maximum value: 100
Default:
max_redirs=5

Add BIND_LOCAL_PORT and BIND_LOCAL_PORT_RANGE options to network#.conf files

BIND_LOCAL_PORT
This sets the local port number of the socket used for connection. This option
can be used in combination with BIND_INTERFACE and you are recommended to
use BIND_LOCAL_PORT_RANGE as well when this is set. Set to 0 - to disable
binding. Valid port numbers are 1 - 65535.
Minimum value: 0 (no binding)
Maximum value: 65535
Default:
bind_local_port=0

BIND_LOCAL_PORT_RANGE
If BIND_LOCAL_PORT=0 this option will be ignored.
This is the number of attempts segget should make to find a
working local port number. It starts with the given BIND_LOCAL_PORT and adds
one to the number for each retry. Setting this to 1 or below will make segget
do only one try for the exact port number. Port numbers by nature are scarce
resources that will be busy at times so setting this value to something too
low might cause unnecessary connection setup failures.
Minimum value: 1
Maximum value: 65535
Default:
bind_local_port_range=20

Add option proxy_type to network#.conf files

SYNOPSIS: PROXY_TYPE = 0 | 1 | 2 | 3 | 4 | 5
0 - HTTP
1 - HTTP_1_0
2 - SOCKS4
3 - SOCKS4a
4 - SOCKS5
5 - SOCKS5_HOSTNAME
Specify type of the proxy.
Default:
proxy_type=0

1.2. Proxy-fetcher
------------------
Implement checks for both (proxy_fetcher and request_server) queues.

There're 2 queues: proxy_fetcher queue and request_server queue.

Note: Segget processes request_server queue first and if no segment was
chosen switches to proxy_fetcher queue.

Before adding a distifile to any of the queues it's necessary to
check both queues, since distfile may already be in one of them.

1.3. Python scripting
---------------------
Add [scripting_and_scheduling] section to segget.conf file.
[scripting_and_scheduling]
Segget provides Python scripting functionalyty to support scheduling.
Each time segget tries to start a new connection certain network it calls
a python script (client.py) to accept or reject this connection and
if necessary adjusts its settings.

PYTHON_PATH
Define path to python
Default:
python_path=/usr/bin/python

SCRIPTS_DIR
Define a path to the dir with python scripts. Before establishing connection for
a particular segment via network# segget checks SCRIPTS_DIR.
If SCRIPTS_DIR contains net#.py file, segget will launch schedule() function
from this file to apply settings for connetion and accept or reject this
segment for the moment. net#.py file is a python script file
with a user-writen schedule() function.
It's necessary to import functions before using get("variable"),
set("variable",value), accept_segment() and reject_segment() in schedule().
get() function can obtain values for the following variables:
connection.num, connection.url, connection.max_speed_limit,
network.num, network.mode, network.active_connections_count,
distfile.name, distfile.size, distfile.dld_segments_count,
distfile.segments_count, distfile.active_connections_count,
segment.num, segment.try_num, segment.size, segment.range
set() function can change connection.max_speed_limit, see example:
-----------------EXAMPLE STARTS-----------------
from functions import *
import time;
def schedule():
localtime = time.localtime(time.time());
hour=localtime[3];
# disable downloading distfiles that have size more than 5 000 000 bytes
# from 8-00 to 22-00.
if hour>8 and hour<22 and (get("distfile.size"))>5000000:
print "reject because distfile is too big"
reject_segment()
# set speed limit 50 000 cps for distfiles larger than 1 000 000 bytes
if get("distfile.size")>1000000:
print "limit connection speed"
set(connection.max_speed_limit, 50000)
accept_segment()
-----------------EXAMPLE ENDS-----------------
From example above localtime returns following tuple:
Index  Attributes       Values
0     tm_year   e.i.: 2008
1     tm_mon          1 to 12
2     tm_mday         1 to 31
3     tm_hour         0 to 23
4     tm_min          0 to 59
5     tm_sec          0 to 61 (60 or 61 are leap-seconds)
6     tm_wday         0 to 6 (0 is Monday)
7     tm_yday         1 to 366 (Julian day)
8     tm_isdst        -1, 0, 1, -1 means library determines DST
Therefore localtime[3] provides hours.
Segment will be accecpted by default if it was neither accepted nor rejected
during the schedule() function.
sagget saves logs of resulting stdout and stderr in the log folder
separatly for each network. Hence, if there's an error in net3.py file python
error message would be saved to net3_script_stderr.log. Results of print would
be saved in net3_script_stdout.log.
Default:
scripts_dir=./scripts

SCRIPT_SOCKET_PATH
Segget uses AF_UNIX domain sockets for communication with python.
Specify path for the socket on your filesystem.
Default:
script_socket_path=/tmp/segget_script_socket

1.4 Logs
--------
Add "none" as an option for log files.

Add explanations for CURL error codes to logs.

Add options: GENERAL_LOG_TIME_FORMAT, ERROR_LOG_TIME_FORMAT and DEBUG_LOG_TIME_FORMAT to segget.conf file

GENERAL_LOG_TIME_FORMAT
Set time format for general log as a string containing any combination of
regular characters and special format specifiers. These format specifiers are
replaced by the function to the corresponding values to represent the time
specified in timeptr. They all begin with a percentage (%) sign, and are:
%a Abbreviated weekday name             [For example: Thu]
%A Full weekday name                    [For example: Thursday]
%b Abbreviated month name               [For example: Aug]
%B Full month name                      [For example: August]
%c Date and time representation         [For example: Thu Aug 23 14:55:02 2001]
%d Day of the month (01-31)             [For example: 23]
%H Hour in 24h format (00-23)           [For example: 14]
%I Hour in 12h format (01-12)           [For example: 02]
%j Day of the year (001-366)            [For example: 235]
%m Month as a decimal number (01-12)    [For example: 08]
%M Minute (00-59)                       [For example: 55]
%p AM or PM designation                 [For example: PM]
%S Second (00-61)                       [For example: 02]
%U Week number with the first Sunday
as the first day of week one (00-53) [For example: 33]
%w Weekday as a decimal number with
Sunday as 0 (0-6)                    [For example: 4]
%W Week number with the first Monday as
the first day of week one (00-53)    [For example: 34]
%x Date representation                  [For example: 08/23/01]
%X Time representation                  [For example: 14:55:02]
%y Year, last two digits (00-99)        [For example: 01]
%Y Year                                 [For example: 2001]
%Z Timezone name or abbreviation        [For example: CDT]
%% A % sign                             [For example: %]

For instace general_log_time_format=Time: %m/%d %X

Default:
general_log_time_format=%m/%d %X

ERROR_LOG_TIME_FORMAT
Set time format for error log as a string containing any combination of
regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT
for details on format specifiers.
Default:
error_log_time_format=%m/%d %X

DEBUG_LOG_TIME_FORMAT
Set time format for debug log as a string containing any combination of
regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT
for details on format specifiers.
Default:
debug_log_time_format=%m/%d %X

2. REQUEST TOOL
===============
Add request tool.

Request tool reads list of distfiles from ./pkg.list file and requests
seggetd daemon to download distfiles from the list.

3. TUICLIENT
============
Add network_type for each connection to tui.
Add ETA, AVG speed and active/total connections to tui.
Add segments counters to stats and tui.
Add connetion num to totals.
Add log and error_log windows to tuiclient
Add distfiles window to tuiclient that shows progress on distfile downloads,
including its status: added/waiting/downloading/downloaded/failed/rejected by script etc.

[1]  Gnome http://www.gnome.org/
[2]  dEmon http://www.clker.com/cliparts/5/1/b/d/11954315391526924611beastie_freebsd_daemon_r_02.svg.med.png
[3]  Roshambo game http://www.erikandanna.com/Humor/FlashStuff/SouthPark/roshamboN.swf
[4]  fork http://en.wikipedia.org/wiki/Fork_%28software_development%29
[5]  curl http://curl.haxx.se/
[6]  TTL http://en.wikipedia.org/wiki/Time_to_live
[7]  Python http://loyalkng.com/wp-content/uploads/2010/03/adam-apple-bizarro-cartoon-comic-tampon-chandelier-pc-mac-snake-eve.jpg
[8]  Python http://www.python.org/
[9]  spawn http://en.wikipedia.org/wiki/Spawn_(computing)
[10] zombies http://en.wikipedia.org/wiki/Zombie_process
[11] protocol http://en.wikipedia.org/wiki/Communications_protocol
[12] thread http://en.wikipedia.org/wiki/Thread_(computer_science)
[13] ghost http://www.youtube.com/watch?v=9WrEDyIzdjY from the 3rd minute
[14] logs http://www.nawwal.org/~mrgoff/photojournal/2004/winspr/pictures/03-20nurselog.jpg
[15] pstrees http://en.wikipedia.org/wiki/Pstree
[16] Anjuta http://www.anjuta.org/
[17] curses http://en.wikipedia.org/wiki/Curse
[18] Ncurses http://en.wikipedia.org/wiki/Ncurses
[19] Mutex http://en.wikipedia.org/wiki/Mutual_exclusion
[20] Rainbow Colors http://idfetch.isgreat.org/_content2/tuiclient_rainbow_colors.jpg see "DISTFILES" window.

WR#7 – “Feeling the Beat”

July 16, 2010 Leave a comment
I used to wonder what do these fancy words mean MC[1], Dj, Pj, cd[2-3],
RnB[4], PnP... According to the posters to become an MC one has to look
cool. If that's what it takes - just the MC style [5] - probably i
can do it [6]. But to become a Dj one must "feel the beat", understand
the difference between the POP[7], Jit[8], Samba[9] and TCP[10]. It's
not my work to evaluate what i've done, but i've managed to make New
Prog[11] - tuiclient[6]. It uses TCP and casts some light on segget's
activity. Probably i'll make PHP version later, so ppl would be able to
see it in Opera[12] or whatever they like.

Still even a clicks-type[13] person needs options. So i thought why not
to use String[14] for this, of course i had to Lowercase[15] it first,
but now it works. Btw, a few more options were added:

[ui_server]
# tuiclient monitors segget's activity by establishing tcp connection
# with segget daemon (ui_server part of it).

# UI_IP
# Define an ip address segget will use to provide access for tuiclients.
# The parameter should be a string holding your host dotted IP address.
# Default:
# ui_ip=127.0.0.1
ui_ip=127.0.0.1

# UI_PORT
# Define a port segget will use to provide access for tuiclients.
# The parameter should be an integer.
# Minimum value: 1
# Maximum value: 65535
# Default:
# ui_port=9999
ui_port=9999

[provide_proxy_fetcher_to_others]
# tuiclient monitors segget's activity by establishing tcp connection
# with segget daemon (ui_server part of it).

# PROVIDE_PROXY_FETCHER_IP
# Define an ip address segget will use to provide access for tuiclients.
# The parameter should be a string holding your host dotted IP address.
# Default:
# provide_proxy_fetcher_ip=127.0.0.1
provide_proxy_fetcher_ip=127.0.0.1

# PROVIDE_PROXY_FETCHER_PORT
# Define a port segget will use to provide access for tuiclients.
# The parameter should be an integer.
# Minimum value: 1
# Maximum value: 65535
# Default:
# provide_proxy_fetcher_port=9777
provide_proxy_fetcher_port=9777

[1]  MC http://en.wikipedia.org/wiki/Master_of_Ceremonies
[2]  CD http://en.wikipedia.org/wiki/Compact_Disc
[3]  cd http://www.manpagez.com/man/1/cd/
[4]  RnB http://en.wikipedia.org/wiki/Contemporary_R%26B
[5]  MC style
http://168hours.wordpress.com/2008/08/18/10-total-commander-alternatives-for-linux/
[6]  tuiclient http://idfetch.isgreat.org/_content2/tuiclient.jpg
[7]  POP http://en.wikipedia.org/wiki/Pop_music
[8]  Jit http://en.wikipedia.org/wiki/Jit
[9]  Samba http://en.wikipedia.org/wiki/Samba
[10] TCP http://en.wikipedia.org/wiki/TCP_(music)
[11] New Prog http://en.wikipedia.org/wiki/New_prog
[12] Opera http://en.wikipedia.org/wiki/Opera
[13] Clicks http://en.wikipedia.org/wiki/Clicks_n_Cuts
[14] String http://en.wikipedia.org/wiki/String_(Thai_pop)
[15] Lowercase http://en.wikipedia.org/wiki/Lowercase_(music)

WR#6 – “Network/Connection Management”

July 7, 2010 Leave a comment

Amway, Oriflame, Mary Kay ... looks like the whole world is covered with
these networks. I don't mind network marketing [1], but in most cases
network brokers so eager to cover all of your needs, that probably you
don't even have a chance to buy from the others or even from a local
place [2]. Trying to impose limits on your connections with network
brokers doesn't really work - they already know how to get you via your
cell phone, LAN and in some cases even the address you are bound to [3].

Although, sometimes i'd like to provide something myself [4], but
mostly it's getting extremely annoyinggggggggg. Hopefully development of
Network/Connection Management System [5] will mitigate at least some of
these problems.
We'll see...

From IDFetch project timeline [5]:
Fetchers-Connection-Manager will allow fetcher to manage multiple
connections, in order to increase throughput, and prevent downloads via
some network connections ( e.i. to prevent connection via cell-phone,
when you pay for the traffic).

Progress on IDFetch project
===========================
Development of Network/Connection Management System resulted in adding
the following options to the segget's configuration files:

----------------- Added to segget.conf file -----------------

[networks]
# NETWORK0_PRIORITY
# Define priority as a value in range from lowest 0 to highest 10.
# Segget tries to use networks with higher priority levels first, and in
# case of failure will switch to networks with lower priority levels.
# Segget will NOT use network if its priority level set to 0. Therefore
# at least one network must have priority level higher than 0.
# Networks with local mirrors usually would have higher priority than
# that of networks with remote mirrors.
# Segget can have up to 10 networks (from network0 to network9).
# Settings for each network should be defined in network#.conf file,
# where instead of # should be a network number.
# For network0 it's network0.conf
# Default:
# network0_priority=10
# network1_priority=0
# network2_priority=0
# network3_priority=0
# network4_priority=0
# network5_priority=0
# network6_priority=0
# network7_priority=0
# network8_priority=0
# network9_priority=0
network0_priority=10

[provide_mirror_to_others]
# PROVIDE_MIRROR_DIR
# Define a dir for making symlinks to downloaded distfiles. This dir can
# be used to provide local mirror for other hosts (with help of Apache,
# vsftp, etc).
# If set to none, segget will not make symlinks.
# Default:
# provide_mirror_dir=none
provide_mirror_dir=./provide_mirror_dir

# SYNOPSIS: PROVIDE_MIRROR_FILES_RESTRICT_LIST_ON= 0 | 1
# If PROVIDE_MIRROR_DIR=none this option will be ignored.
# - If set to 1, segget will compare distfile name with the list of
# forbiden patterns from the restricted.conf file. If distfile name
# contains any of the patterns, no symlink will be provided to this
# distfile.
# Default:
# provide_mirror_files_restrict_list_on=0
provide_mirror_files_restrict_list_on=1

----------------- Added to network#.conf files -----------------

[network_mirrors]
# SYNOPSIS: NETWORK_USES_OWN_MIRROR_LIST_ONLY_ON=0 | 1
# - If set to 1, segget will replace mirror list provided by portage
# system with the list from network0_mirrors.conf file
# - If set to 0, segget will use ONLY mirror list provided by portage
# system, and will NOT use the list from network0_mirrors.conf file
# In some cases it's necessary to make segget prefer local mirrors over
# the remote ones. For this purpose define settings for 2 networks:
# settings for network0 (to provide access to local mirrors),
# settings for network1 (to provide access to remote ones).
# 1) Set the following options in segget.conf file:
# [networks]
# network0_priority=10
# network1_priority=9
# As you can see network0 (with local mirrors) has higher priority than
# network1 (with mirrors provided by portage).
# 2) Set NETWORK_USES_OWN_MIRROR_LIST_ONLY_ON=1 in network0.conf file.
# 3) Create network0_mirrors.conf file with the list of your local
# mirrors.
# For example, network0_mirrors.conf may look like this:
# http://192.168.210.12/
# ftp://192.168.210.205/
# http://192.168.210.56/
# 4) Set NETWORK_USES_OWN_MIRROR_LIST_ONLY_ON=0 in network1.conf file,
# so segget will use remote mirrors working via this network.
# NOTE: Actually network0 and network1 can be the same LAN with only
# one ip address set on the host. The only difference is that in case of
# network1 segget will have to use a gateway to access remote mirrors.
# Default:
# use_own_mirror_list_only_on=0
use_own_mirror_list_only_on=0

# SYNOPSIS: ONLY_LOCAL_WHEN_POSSIBLE=0 | 1
# If NETWORK_USES_OWN_MIRROR_LIST_ONLY_ON=0 this option will be ignored.
# - If set to 1, segget will not use remote mirrors with equal or lower
# priority until all mirrors in network#_mirrors.conf file have failed.
# - If set to 0, segget will use remote mirrors with equal priority or
# mirrors with lower priority when this network has NO free connections
# (see option NETWORK_MAX_CONNECTIONS in [network_connections] section
# of this file).
# NOTE: Following the example for NETWORK_USES_OWN_MIRROR_LIST_ONLY_ON option,
# if in network0.conf has option ONLY_LOCAL_WHEN_POSSIBLE=1, segget
# will NOT start to use network1 for a particular distfile until all
# mirrors specified in network0_mirrors.conf file will have failed
# to provide this distfile.
# On the other hand if ONLY_LOCAL_WHEN_POSSIBLE=0 segget will start
# to use network1 as soon as NETWORK_MAX_CONNECTIONS limit, set
# in network0.conf file has been reached.
# Default:
# only_local_when_possible=1
only_local_when_possible=1

------------- Moved from segget.conf to network#.conf files ------------

[network_bind]
# BIND INTERFACE / IP
# Pass a string as parameter. This sets the interface name to use as
# outgoing network interface. The name can be an interface name, an IP
# address, or a host name. No binding is set by default.
# Default:
# bind_interface=none
bind_interface=none

# BIND LOCALPORT
# Pass a long. This sets the local port number of the socket used for
# connection.
# This can be used in combination with BIND_INTERFACE and you are
# recommended to use BIND_LOCALPORTRANGE as well when this is set.
# Valid port numbers are 1 - 65535.

# BIND_LOCALPORTRANGE
# Pass a long. This is the number of attempts segget should make to find
# a working local port number. It starts with the given BIND_LOCALPORT
# and adds one to the number for each retry. Setting this to 1 or below
# will make segget do only one try for the exact port number. Port
# numbers by nature are scarce resources that will be busy at times so
# setting this value to something too low might cause unnecessary
# connection setup failures.

[network_connections]
# NETWORK_MAX_CONNECTIONS
# Define maximum number of connections
# Minimum value: 1
# Maximum value: 20
# Default:
# max_connections=10
max_connections=2

# CONNECTION_TIMEOUT
# Set the number of seconds to wait while trying to connect. Use 0 to
# wait indefinitely. Pass a long. It should contain the maximum time in
# seconds that you allow the connection to the server to take. This
# only limits the connection phase, once it has connected, this option
# is of no more use. Set to zero to disable connection timeout (it will
# then only timeout on the system's internal timeouts). See also the
# TIMEOUT option.
# Minimum value: 1
# Maximum value: 1000
# Default:
# connection_timeout=15
connection_timeout=15

# FTP_RESPONSE_TIMEOUT
# Set a timeout period (in seconds) on the amount of time that the
# server is allowed to take in order to generate a response message for
# a command before the session is considered hung. While awaiting for a
# response, this value overrides TIMEOUT. It is recommended that if
# used in conjunction with TIMEOUT, you set FTP_RESPONSE_TIMEOUT to a
# value smaller than TIMEOUT.
# Minimum value: 1
# Maximum value: -1 (for no limit)
# Default:
# ftp_response_timeout=180
ftp_response_timeout=180

# TIMEOUT
# maximum amount of time to download segment in seconds
# Set the maximum number of seconds for a connection to execute.
# Pass a long as parameter containing the maximum time in seconds that
# you allow the transfer operation to take. Normally, name lookups can
# take a considerable time and limiting operations to less than a few
# minutes risk aborting perfectly normal operations.
# Minimum value: 100
# Maximum value: -1 (for no limit)
# Default:
# timeout=500
timeout=500

# LOW_CONNECTION_SPEED_LIMIT
# Define the low speed limit for connection. Pass a long as parameter.
# It contains the transfer speed in bytes per second that the transfer
# should be below during LOW_CONNECTION_SPEED_TIME seconds to consider
# it too slow and abort.
# Minimum value: 1
# Maximum value: -1 (-1 for no limit)
# Default:
# low_connection_speed_limit=1000
low_connection_speed_limit=1000

# LOW_CONNECTION_SPEED_TIME
# Pass a long as parameter. It contains the time in seconds that the
# transfer should be below the LOW_CONNECTION_SPEED_LIMIT to consider
# it too slow and abort.
# Minimum value: 1
# Maximum value: 600
# Default:
# low_connection_speed_time=10
low_connection_speed_time=10

# MAX_CONNECTION_SPEED
# If a download exceeds this speed (counted in bytes per second) on
# cumulative average during the transfer, the transfer will pause to
# keep the average rate less than or equal to the parameter value.
# Defaults to unlimited speed.
# Minimum value: 1
# Maximum value: -1 (-1 for no limit)
# Default:
# max_connection_speed=0
max_connection_speed=3000

[network_user_data]
# USER_AGENT
# Set the User-Agent: header in the http request sent to the remote
# server.
# This can be used to fool servers or scripts.
# Default:
# user_agent=segget
user_agent=segget

[network_proxy]
# PROXY_IP_OR_NAME
# Specify a proxy to use (address and port).
# Set HTTP proxy to use. The parameter should be a string holding the
# proxy host name or dotted IP address. To specify port number in this
# string, append :[port] to the end of the host name. The proxy string
# may be prefixed with [protocol]:// since any such prefix will be
# ignored. The proxy's port number may optionally be specified with the
# separate option. If not specified, by default port 1080 will be used
# for proxies.
# When you tell segget to use an HTTP proxy, segget will transparently
# convert operations to HTTP even if you specify an FTP URL etc.
# Segget respects the environment variables http_proxy, ftp_proxy,
# all_proxy etc, if any of those are set. The PROXY option does however
# override any possibly set environment variables.
# Default:
# proxy_ip_or_name=none
proxy_ip_or_name=none

# PROXY_PORT
# Set the proxy port to connect to unless it is specified in the PROXY
# option.
# Minimum value: 1
# Maximum value: 65535
# Default:
# proxy_port=3128
proxy_port=3128

# PROXY_USER
# Set user name to use for the transfer while connecting to Proxy.
# The PROXY_USER option should be used in same way as the PROXY_PASSWORD
# is used.
# In order to specify the password to be used in conjunction with the
# user name use the PROXY_PASSWORD option.
# Default:
# proxy_user=none
proxy_user=none

# PROXY_PASSWORD
# Set password to use for the transfer while connecting to Proxy.
# The PROXY_PASSWORD option should be used in conjunction with
# the PROXY_USER option.
# Default:
# proxy_password=none
proxy_password=none

# SYNOPSIS: proxy_off=0 | 1
# Setting the proxy_off=1 will explicitly disable the use of a proxy,
# even if there is an environment variable set for it.
# Default:
# proxy_off=1
proxy_off=1

[1]http://en.wikipedia.org/wiki/Multi-level_marketing
[2]http://idfetch.isgreat.org/_content2/segget_doc.html#mirrors
[3]http://idfetch.isgreat.org/_content2/segget_doc.html#network_bind
[4]http://idfetch.isgreat.org/_content2/segget_doc.html#provide_mirror_to_others
[5]http://idfetch.isgreat.org/index.php/timeline

WR#5 – “Suffering From Worms, Fighting Bugs”

July 1, 2010 Leave a comment
Hard to say what's worse worms eating cherries or Colorado bug [1]
eating potato. Though it's close to midterm and the wOrming up period
is far behind, can't say i got accustomed to the bugs, it's rather that
they started to bother me even more. Fighting them gets quite tedious
and frustrating sometimes***  - no new features, no new options, lot's
of time spent and only hope that there will be something for the
harvest still keeps me ticking.

***Especially when it should be a peace of cake, but web hosting is free...

IDFetch progress:
=================

1. Several bugs were fixed: including checks if segget can open, read, write files.

2. Try..catch blocks were added all over the segget to fail-prove its code.

3. Range limits were introduced for settings, to prevent bizarre behavior on misconfiguration.

4. Even more errors are logged.

5. Some minor improvements:
   6.1. Time measurement is more precise now (in milliseconds).
   6.2. Added avg speed measurement.
   6.3. Screen gets cleaned from inactive connections.

6. Documentation was improved.

7. Improvement to the IDFetch site:
   1. Added blog [2].
   2. Added documentation [3].
   3. Links to repositories [4-5].
   4. Timeline page [6] enriched with links to the resulting implementations of the proposed ideas.

8. Just a few options were added to segget.conf:

[pkg_list]
# PKG_LIST_DIR
# Define a dir with pkg.list file
# Default:
# pkg_list_dir=./
pkg_list_dir=./

# SYNOPSIS: del_pkg_list_when_dld_finished=0 | 1
# - If del_pkg_list_when_dld_finished set to 1:
# Segget deletes pkg.list file, after all distfiles were successfuly fetched.
# Default:
# del_pkg_list_when_dld_finished=1
del_pkg_list_when_dld_finished=1

# CURRENT_SPEED_TIME_INTERVAL_MSECS
# segget transfers may have bursty nature of their traffic. Therefore, while
# measuring current speed, segget actually calculates average speed during
# current_speed_time_interval_msecs time interval, defined in milliseconds.
# Minimum value: 100
# Maximum value: 60000
# Default:
# current_speed_time_interval_msecs=1000
current_speed_time_interval_msecs=1000

[logs]
# LOGS_DIR
# Define a dir to store log files.
# Default:
# logs_dir=./logs
logs_dir=./logs

# GENERAL_LOG_FILE
# Define a file name to store general log.
# Default:
# general_log_file=segget.log
general_log_file=segget.log

# ERROR_LOG_FILE
# Define a file name to store error log.
# Default:
# error_log_file=segget.log
error_log_file=error.log

# DEBUG_LOG_FILE
# Define a file name to store debug log.
# Default:
# debug_log_file=segget.log
debug_log_file=debug.log

[1] http://idfetch.isgreat.org/_content2/colorado_bug.jpg
[2] http://idfetch.isgreat.org/index.php/blog
[3] http://idfetch.isgreat.org/index.php/documentation
[4] http://idfetch.isgreat.org/index.php/idfetchporgagemodifications
[5] http://idfetch.isgreat.org/index.php/twrappersegget
[6] http://idfetch.isgreat.org/index.php/timeline

WR#4 – “Growing Muscles on Skeleton”

June 24, 2010 Leave a comment

Fresh milk and cottage cheese gradually enforce the skeleton, at the same time growing muscles on it can take a lot of efforts. Nevertheless pulling-ups, mirror benchmarking [1], and other things were added to my program.

NOTE:
Previous weeks i kept writing reports, but while sending them i used my default email account, which unfortunately happened to be not the one subscribed to [gentoo-soc]. Hence, i don't think anybody got them.

Yesterday i resent these reports (better later then never):
- Project IDFetch - Weekly report #2 ("Replacing Stuff")
- Project IDFetch - Weekly report #3 ("Strawberry Issues")

Since i've already tried writing reports without sending them, why not to do the opposite - to send a report without actually writing it ;o)
So please take a look at the list of the options implemented in segget during the 4th week of IDFetch project.

IDFetch: adding muscles to segget.conf skeleton:
================================================

[folders]
# DISTFILES_DIR
# Define a dir to store distfiles
# Default:
# distfiles_dir=./distfiles
distfiles_dir=./distfiles

# SEGMENTS_DIR
# Define a dir to store distfiles' segments
# Default:
# segments_dir=./tmp
segments_dir=./tmp

[distfiles]
# MAX_CONNECTION_NUM_PER_DISTFILE
# Each distfile can have up to max_connection_num_per_distfile
# simultaneous connections.
# default:
# max_connection_num_per_distfile=3
max_connection_num_per_distfile=3

[segments]
# MAX_SEGMENT_SIZE
# Define maximum segment size in bytes.
# Default:
# max_segment_size=500000
max_segment_size=500000

# SYNOPSIS: resume_on=0 | 1
# - If resume_on set to 1:
# Before starting downloading a segment segget checks whether this
# segment is already downloaded, checks segments size and if size
# mathces considers this segment to be downloaded and skips downloading
# process.
# - If resume_on set to 0:
# Segget always starts new fetch for a segment regardless of the
# fact whether it is downloaded or not.
# Default:
# resume_on=1
resume_on=1

# MAX_TRIES
# If segment download was unsuccessful, new attempts are made. When
# attempts number reaches max_tries, segment gets FAILED status and
# error logged to error_log.
# Default:
# max_tries=10
max_tries=10

[connections]
# MAX_CONNECTIONS
# Define maximum number of connections
max_connections=10

# CONNECTION_TIMEOUT
# Set the number of seconds to wait while trying to connect. Use 0 to
# wait indefinitely. Pass a long. It should contain the maximum time in
# seconds that you allow the connection to the server to take. This
# only limits the connection phase, once it has connected, this option
# is of no more use. Set to zero to disable connection timeout (it will
# then only timeout on the system's internal timeouts). See also the
# TIMEOUT option.
# Default:
# connection_timeout=15
connection_timeout=15

# FTP_RESPONSE_TIMEOUT
# Set a timeout period (in seconds) on the amount of time that the
# server is allowed to take in order to generate a response message for
# a command before the session is considered hung. While awaiting for a
# response, this value overrides TIMEOUT. It is recommended that if
# used in conjunction with TIMEOUT, you set FTP_RESPONSE_TIMEOUT to a
# value smaller than TIMEOUT.
# Default:
# ftp_response_timeout=180
ftp_response_timeout=180

# TIMEOUT
# maximum amount of time to download segment in seconds
# Set the maximum number of seconds for a connection to execute.
# Pass a long as parameter containing the maximum time in seconds that
# you allow the transfer operation to take. Normally, name lookups can
# take a considerable
# time and limiting operations to less than a few minutes risk aborting
# perfectly normal operations.
# Default:
# timeout=500
timeout=500

# LOW_CONNECTION_SPEED_LIMIT
# Define the low speed limit for connection. Pass a long as parameter.
# It contains the transfer speed in bytes per second that the transfer
# should be below during LOW_CONNECTION_SPEED_TIME seconds to consider
# it too slow and abort.
# Default:
# low_connection_speed_limit=1000
low_connection_speed_limit=1000

# LOW_CONNECTION_SPEED_TIME
# Pass a long as parameter. It contains the time in seconds that the
# transfer should be below the LOW_CONNECTION_SPEED_LIMIT to consider
# it too slow and abort.
# Default:
# low_connection_speed_time=10
low_connection_speed_time=10

# MAX_CONNECTION_SPEED
# If a download exceeds this speed (counted in bytes per second) on
# cumulative average during the transfer, the transfer will pause to
# keep the average rate less than or equal to the parameter value.
# Defaults to unlimited speed.
# Default:
# max_connection_speed=0
max_connection_speed=0

# BIND INTERFACE / IP
# Pass a string as parameter. This sets the interface name to use as
# outgoing network interface. The name can be an interface name, an IP
# address, or a host name. No binding is set by default.
# Default:
# bind_interface=none
bind_interface=none

[mirrors]
# MAX_CONNECTIONS_NUM_PER_MIRROR
# Define how many simultaneous downloads from one mirror segget is
# allowed to have. While choosing a mirror segget will skip mirrors
# with max_connections_num_per_mirror active downloads.
# Default:
# max_connections_num_per_mirror=1
max_connections_num_per_mirror=1

# SYNOPSIS: collect_benchmark_stats_on=0 | 1
# - If set to 1, stats on mirrors performance will be collected.
# default:
# collect_benchmark_stats_on=1
# ***Note: at the moment collect_benchmark_stats_on can NOT be set to 0
collect_benchmark_stats_on=1

# SYNOPSIS: use_benchmark_stats=0 | 1
# If use_benchmark_stats=1 statistics on mirrors is used to rate them
# and therefore improve performance.
# Each time connection from a particular mirror closes mirror->dld_time,
# and mirror->dld_size get increased (in case of unsuccessful connection
# only time gets increased), so avg speed for a mirror can be
# calculated:
#
#           mirror->avg_speed=mirror->dld_size/mirror->dld_time.       (1)
#
# When new segment is going to be started segget goes through the list of
# the mirrors distfile/segment has, and asks each mirror for self_rating:
#
#          "ulong self_rating=mirror->mirror_on_the_wall();".
#
# This way segget chooses a mirror with the best self_rating. To calculate
# self_rating mirrors use the following formula:
#
#               self_rating=dld_time/dld_size*honesty.                 (2)
#
# So mirrors actually say how bad they are.
# Even mirrors can have critical times, so to give mirrors another chance
# honesty was added to the formula (2). honesty can get values in
# interval (0,1]. Each time connection from a mirror opens or closes mirror
# sets its honesty=1. If mirror was asked for self_evaluation with
# mirror->mirror_on_the_wall(), but wasn't chosen its honesty decreases
# somewhat (see [mirror].benchmark_oblivion option), so next time it will
# lie little bit more about how bad it's.
# Default:
# use_benchmark_results=1
# ***Note: at the moment use_benchmark_results can NOT be set to 0
use_benchmark_stats=1

# BENCHMARK_OBLIVION
# benchmark_oblivion option allows to adjust how fast segget "forgets"
# benchmarking statistics on mirrors performance.
# Each time mirror->mirror_on_the_wall() called, mirror decreases its
# honesty (to have more chances next time) using the following formula:
#
#       honesty=honesty*100/(100+settings.benchmark_oblivion)         (3)
#
# Therefore, setting benchmark_oblivion=100 will make mirror look twice
# less bad next time mirror->mirror_on_the_wall() called.
# Default:
# benchmark_oblivion=5
benchmark_oblivion=5

[user-data]
# USER_AGENT
# Set the User-Agent: header in the http request sent to the remote
# server.
# This can be used to fool servers or scripts.
# Default:
# user_agent=segget
user_agent=segget

[proxy]
# PROXY_IP_OR_NAME
# Specify a proxy to use (address and port).
# Set HTTP proxy to use. The parameter should be a string holding the
# proxy host name or dotted IP address. To specify port number in this
# string, append :[port] to the end of the host name. The proxy string
# may be prefixed with [protocol]:// since any such prefix will be
# ignored. The proxy's port number may optionally be specified with the
# separate option. If not specified, by default port 1080 will be used
# for proxies. When you tell segget to use an HTTP proxy, segget will
# transparently convert operations to HTTP even if you specify an FTP
# URL etc. Segget respects the environment variables http_proxy,
# ftp_proxy, all_proxy etc, if any of those are set. The PROXY option
# does however override any possibly set environment variables.
# Default:
# proxy_ip_or_name=none
proxy_ip_or_name=none

# PROXY_PORT
# Set the proxy port to connect to unless it is specified in the PROXY
# option.
# Default:
# proxy_port=3128
proxy_port=3128

# PROXY_USER
# Set user name to use for the transfer while connecting to Proxy.
# The PROXY_USER option should be used in same way as the
# PROXY_PASSWORD is used. In order to specify the password to be used
# in conjunction with the user name use the PROXY_PASSWORD option.
# Default:
# proxy_user=none
proxy_user=none

# PROXY_PASSWORD
# Set password to use for the transfer while connecting to Proxy.
# The PROXY_PASSWORD option should be used in conjunction with the
# PROXY_USER option.
# Default:
# proxy_password=none
proxy_password=none

# SYNOPSIS: proxy_off=0 | 1
# Setting the proxy_off=1 will explicitly disable the use of a proxy,
# even if there is an environment variable set for it.
# Default:
# proxy_off=1
proxy_off=1

[logs]
# LOGS_DIR
# Define a dir to store log files.
# Default:
# logs_dir=./logs
logs_dir=./logs

# GENERAL_LOG_FILE
# Define a file name to store general log.
# Default:
# general_log_file=segget.log
general_log_file=segget.log

# ERROR_LOG_FILE
# Define a file name to store error log.
# Default:
# error_log_file=segget.log
error_log_file=error.log

# DEBUG_LOG_FILE
# Define a file name to store debug log.
# Default:
# debug_log_file=segget.log
debug_log_file=debug.log

[1]http://www.twolia.com/blogs/daily-beauty-break/files/2010/04/mirror-on-the-wall.jpg

WR#3 – “Strawberry issues”

June 18, 2010 Leave a comment

Switching development environment is usually not an easy thing, and it was not so easy this time when i decided to move to village. Thought you get clear air, lungs full of oxygen, and mouthful of strawberries [1] there’re some drawbacks – Internet connection choices are limited to GPRS. Without going into details of my experiments on attaching my cell phone to the 3rd store of the 2-stored house let’s just say that taking things for granted when you have them is simple, but fighting for things you used to have can be quite a deal. In conclusion results are:

I) No 3G yet, but DTR is up to 15 Kb/s (not so bad);

II) 2 broken battery-chargers for my cellphone (one got broken connector
when the phone was falling from the 3rd floor, i still can't
understand how i managed to brake the 2nd one);

III) My cat "MOOrcello Mastroianni" AKA "Moorchik" gone missing ;-((((

IV) Few more results on idfetch project itself (read further).

IDFetch project progress
========================

1. [ Exporting duplicate distfiles ]

Sometimes exporting pkg.list by #emerge -f pkg command produces
duplicate distfiles. Some fixes for fetch.py have been added to
control/skip duplicate distfiles.

2. [ Skipping downloaded distfiles ]

Skipping distfiles is fun, so skipping for already downloaded distfiles
has been added to the segget.

3. [ Skipping downloaded segments ]

More skipping. If settings.resume_on option is set then fully downloaded
segments are also skipped.

4. [ Interface improvements ]
Introduction of Ncurses [2] into segget for more intuitive view on the
downloading processes [3].

5. [ Gethering stats ]

5.1. Runtime visualization of progress for each connection has been
implemented [3].

5.2. Each connection provided with speed measurement [3].
For now it's only current speed, though it's actually avg over 1 second.
TO-DO: avg speed measurement for the whole connection time period.

5.3. Total speed is also measured [3].
TO-DO: Avg speed measurement from the moment segget started.
Note: Later in daemon mode - times when daemon is idle won't be counted.

5.4. In case of unsuccessful segment fetch, segget increases
segment->try_num and restarts fetching.

6. [ Combining segments into distfiles ]
There're few improvements/fixes to the process of combining segments
into distfiles. Segments get deleted after being joined.

7. [ RMD160, SHA1, SHA256, SHA512, WHIRLPOOL, MD5, CRC32 ]

7.1. libcrypto++[4] employed to check RMD160, SHA1 and SHA256 hashes of
downloaded distfiles. Errors are logged.

7.2. Support for SHA512, WHIRLPOOL, MD5, CRC32 have also been added
after remark from Robin H.Johnson.

7.3. Later libcrypto++ might be replaced by one of: libmhash, NSS,
openssl.

8. [ Logs ]
Ncurses is a real jam but there's no life without logs: segget.log,
error.log and debug.log have been added.

9. [ Strawberries in virtual boxes ]
Cost of going for strawberries is more expensive and slow Internet
connection. Hence, emulating mirrors became quite useful:
- no need to consume Internet traffic;
- no cable in my laptop (no wifi limitations either);
- more control on experimental environment (changing speed, connection
limits on the server side).
For this purpose VirtualBox with Host-only Ethernet adapters [5] is used
and some modifications to pkg.list file has been made (see json fragment
with pkg dict):

{
"CATEGORY": "dev-libs",
"distfile_list": [
{
"RMD160": "82e5061ec76f23643ba5477ab28bbee6eebd393a",
"SHA1": "b836783ebd72d5bc6a916620ab2b1ecec316fef1",
"SHA256":"b522f0b5f850b50e9917823ea986f855295407380fafbe30f358875c41998bc5",
"name": "cryptopp560.zip",
"size": 1049029,
"url_list": [
"ftp://192.168.6.11/cryptopp560.zip",
"ftp://192.168.6.12/cryptopp560.zip",
"ftp://192.168.6.13/cryptopp560.zip",
"ftp://192.168.6.14/cryptopp560.zip"
]
}
],
"pkg_name": "crypto++-5.6.0-r1"
}

[1] http://soc.dev.gentoo.org/~simka/wr3/strawberry.jpg
[2] http://en.wikipedia.org/wiki/Ncurses
[3] http://soc.dev.gentoo.org/~simka/wr3/ui.jpg
[4] http://www.cryptopp.com/
[5] http://www.virtualbox.org/manual/ch06.html#network_hostonly